New project: PronounceMe

I'm a big believer in the power of API Mesh and automation at scale. In December 2018, I got an idea of the project for potential passive income I started to work on in early January 2019

The Problem

For me, English is a second language. Although I read in English since 14 and speak since 20, there is a long way to go. As for Eastern, the pronunciation is incredibly challenging, especially while living in the Great Britan. There are 1 billion people for who English is not the first language.


I found myself googling something like "how to pronounce <..>" to get instant result, for example

As can be seen, there are a lot of short videos with pronunciation examples.

Just a video with the word in it and sound. As simple as that!

Fortunately for me currently there are roughly 15 accounts on YouTube publishing this kind of pronunciation videos. Half of them clearly used opensource TTS engines such as Festival known for very rusty mechanical sounds. A few publishers, the best ones, are recording themselves being native speakers. That must be very difficult to do that at scale! And only 5 of them are using modern TTS voices such as Google WaveNet, Azure, etc..

Proposed solution

The generator of high-quality videos with decent pronunciations using the best in class voice engines.

Why is it different from others?

Short answer - quality and relevancy.

Videos I generate are using appropriate background picture to grab more attention and having more variety of accents, categorized in playlists. Noteworthy, terms(words) are taken from the actual search engine searches which are increases chances for the video to be seen.
So far it's a theory which has to be proven.

Show me something!

This is how it started - the first iteration.

I uploaded ~1200 videos with top search terms and British towns/cities names to validate the hypothesis. Surprisingly I've got results way better than I expected.

The current quality (as Feb 2019)

Now generator system is mostly autonomous and uses some image recognition, image discovery along with ffmpeg and paid TTS engines output

So where is money?

From advertisement at YouTube. One might think it's not feasiable since there is no traffic for such very narrow search terms. That's exactly true, but on the scale.. My math shows me there it is possible to make some money for juice with proper level of automation. And programming - it's all about it!

I spent a few nights with spreadsheets doing some math for various scenarios based on industry's figures. I might disclosure my estimations later.

So what is next?

I'll do the update on this side project every month. Make sure you've subscribed to the blog.

Check other posts about this project: #pronounceme