How edge computing makes voice assistants faster and more powerful

Voice is becoming a pervasive way to manage and interact with everyday tech devices, going from initial adoption in phones and smart speakers toward smartwatches, cars, laptops, home appliances and much more.

Cloud platforms take most of the praise for enabling voice assistant services such as Amazon Alexa, Google Assistant or Microsoft Cortana – neglecting due credit to the increasing role that edge computing plays in enabling voice interfaces. A substantial amount of processing and analysis occur on devices themselves to allow users to interface with them by simply talking.

Keyword detection

Voice-enabled devices are not constantly recording audio and sending it to the cloud to determine if someone is giving them an instruction. That would not only be a privacy concern, but also a waste of energy, computing and network resources. Having to send all words to the cloud and back also introduces latency and slows the responsiveness of the system. Today’s voice interfaces typically use keyword or “wake-word” detection, dedicating a small portion of edge computing resources (i.e. computing done on the device itself or “at the edge”) to process microphone signals while the rest of the system remains idle. This is a power-efficient approach particularly important to help extent usage time in portable battery-operated devices including smartphones and wearables.

When the always-on processing core handling keyword detection, usually a digital signal processor (DSP), finds a match with the expected word (e.g. “Alexa”), it wakes up the rest of the system to support functions requiring more computing power such as audio capture, compression and transmission, language processing and voice tracking.

Leave a Reply

Your email address will not be published. Required fields are marked *