Adding voice recognition to a project can open many possibilities and beneficial across many different applications. Why should you consider voice recognition and what will you need to consider?
Before we discuss voice recognition and why it can be helpful, first we’ll learn exactly what it is. Many often get mixed up between Voice Recognition and Speech Recognition. Speech Recognition is when a computer converts spoken words into text and Voice Recognition is when a computer recognizes a specific user’s voice. Speech recognition is often used in speech-to-text whereby a user can say something, and the computer can perform a specific task, whereas voice recognition is useful for security environments. This brings us to our first potential application: security!
Older security systems are often based on passwords and/or unique identifiers which can be difficult to crack and often don’t prove the user is authentic. If a password is tied into a specific voice, then the system will only recognize a valid password when spoken by an individual who is authorized to do so. The use of biological identification in this manner is known as biometrics. Example projects where this type of security could be useful would include a smart door that only allows in family members, a vault that holds precious goods, or even a cookie jar that allows parents, but not children, access.
Voice recognition can also play a large role in personalization, whereby a system can recognize a specific user and customize itself to provide a unique experience. Imagine a scenario where some project sorts out different colored sweets and then dispenses specific combinations to individuals. Voice recognition could be used to provide that customization. Other examples of a customizable user experience include DIY vending machines for coffee, smart media players that choose a specific genre, or a DIY Jarvis-like system that adjusts the setting in each room in a house to suit the occupier best.
While the idea of adding voice biometrics to a project sounds incredibly useful, it won’t be a reality for hobbyists and amateur designers alike for a few years. To understand why let’s look at speech and voice recognition and how they are currently implemented. Speech recognition involves recording a piece of spoken text which is then sent through an AI network to then produce text. Simple AI systems exist on local machines but usually are found on large data-centers in the form of cloud computing. This means that when you ask your phone a question, it sends a small audio file to a data-center over the internet, it gets processed, and the results are returned.
The reason for cloud computing is that a single system can learn from many millions of people and improve itself significantly more than if it were running locally on a device. The reason many millions of different people can teach such a system is the common denominator of all those people is what they are saying, not how they are saying it. Voice recognition would require a near-identical process in which an audio sample is passed into an AI network and then the specific user is determined. It does not matter what the users say but how they say it.
A cloud-based AI may not have any benefit to voice recognition since user submissions have nothing in common. Therefore, voice recognition is more likely to be local to a device instead of being cloud-based. This means it requires training.
When voice recognition does become available to hobbyists and designers, hardware will also need to be carefully considered. AI systems that run locally often require more resources than your average microcontroller can provide and require audio data. While there are some SoCs that can handle this - such as the Snapdragon series from Qualcomm - devices such as the Raspberry Pi may be more appropriate.
Having said that, the future of microcontrollers is unclear and when voice recognition becomes more commonplace, manufacturers may look towards producing hardware/peripheral solutions with biometrics. This could see simple microcontrollers with direct audio inputs that can process data in real-time and then return a unique checksum or ID based on a voice pattern.
If voice recognition does move to the cloud, then designers will need to think about integrating Wi-Fi or an Ethernet connection into their project. The next 5 years may see ethernet IoT devices become non-existent since wires are awkward and inconvenient. More likely than not, Wi-Fi will be the dominant form of internet communication. Wi-Fi will bring its own problems, including power consumption, network traffic, security, and electromagnetic compatibility issues, but is overall preferable to ethernet dependency.
Since voice recognition and speech recognition often get confused, it is imperative to understand the difference to create a successful project. Voice recognition has the very real potential for adding layers of security that make systems more secure, along with hands-free personalization.