Dev Depot: Annyang, Adding Voice Control To Sites

Stephen Yagielowicz

A JavaScript library with a small footprint that lets visitors control your website using voice commands, annyang ( supports multiple languages, has no dependencies, weighs less than 1Kb in file size and is free to use.

Developed as an alternative user interface by Tal Ater, who says that annyang works with all web browsers, progressively enhancing browsers supporting SpeechRecognition, while leaving users with older browsers unaffected, the script is surprisingly easy to use.

The annyang system is also capable of understanding more complicated commands, such as those with named variables, splats, and optional words.

This requirement for SpeechRecognition support makes it a good choice for desktop Chrome installations, but eliminates (for now) access via many common platforms, such as Safari on the iPad. Given this limitation, annyang might be best at enhancing a site’s user experience, rather than serving as its foundation. Think of the frosting, not the cake.

This issue set aside (and the annyang script called conditionally), the results can add an accessible bit of technological “wow factor” to your site that is extremely useful for adult website visitors who may enjoy and value the “hands free” control possibilities that speech recognition and voice control offer — the technology isn’t just for navigation with your car’s GPS, or with your phone anymore, it’s now in your bedroom as well.

Developers can learn more about the Web Speech API Specification by visiting the W3C website at

According to the standards body, the specification defines a JavaScript API to enable web developers to incorporate speech recognition and synthesis into their web pages and allows developers to use scripting to generate text-to-speech output and to use speech recognition as an input for forms, continuous dictation, control and more, while allowing web pages to control activation and timing and to handle results and alternatives.

In its simplest application, annyang allows coders to specify an expected input string, such as “Show me Bree’s boobs,” upon which hearing this phrase, the script executes a specified function — such as triggering the display of a photo gallery of Bree’s boobs…

The annyang system is also capable of understanding more complicated commands, such as those with named variables, splats, and optional words. Named variables are used for one word arguments inside commands, while splats capture multiword text at the end of a command.

Optional words or phrases can be used to define a part of the command as being optional.

In the following example, annyang will capture everything after a splat (*) and pass it to the function. For instance, saying “Show me Batman and Robin” is the same as calling showFlickr(‘Batman and Robin’):

According to HTML5 Rocks, one important implementation note for any application using SpeechRecognition is that the first time speech recognition is used Chrome needs to ask the user for permission to access the user’s microphone. The site also notes that pages hosted on secure HTTPS servers will not need to repeatedly ask for the visitor’s permission, although websites hosted on HTTP servers do.

“Grab the latest version of annyang.min.js, drop it in your HTML, and start adding commands,” Ater states, highlighting the ease with which this solution cab be deployed. When it’s that easy, why not give it a try. Sure, only a percentage of visitors benefit, but those that do may be wowed.