Ever wonder how voice assistants are able to understand you? What goes on underneath the hood of such devices to quickly analyze what you said? If you are planning to build your own voice recognition software prepare to accumulate a data set of hundreds of thousands of voice samples to get the software to work roughly 90% of the time.
The process of using one’s voice to give instructions to a device, and the device to process this information in a way that it can use is called Natural Language Processing. In other words when you ask Alexa “Alexa, shuffle songs by Queen” four different instructions are processed by Alexa.
First the trigger word “Alexa” wakes the device up and prompts it to listen to your instructions. Secondly “shuffle” tells Alexa that you want to launch the shuffle command . Next, “songs” would be the Invocation name or what you want to shuffle. Lastly, you want to listen to the band Queen, which would be the utterance or specific item you want to shuffle. That is a lot of information to process and execute in a matter of seconds.
Alexa is essentially translated everything said to her, into a language she can understand and process.
For Alexa to properly understand the utterance, or meaning, of our statement there is likely multiple utterances to account for the same phrase. For example, to listen to just queen, you would likely need utterances like “Queen”, “The band Queen”, “Freddie Mercury” etc.
You also have to make sure that there isn’t overlap with other items in the music app used (like spotify) that will contradict what you mean. You don’t want to hear the album “God Save the Queen” by the Sex Pistols for example, so the utterances used have to be specific enough to select the right item, but broad enough to cover the various ways someone could ask to list to the band Queen. The complexities of creating a user friendly and accurate voice assistant is a difficult task, but as natural language processing continues to improve, so will Alexa.
Interested in learning more? Follow us on LinkedIn and join the discussion.