Jacer Omri's Blog: Hacking Google Voice API in Linux

You should have seen voice-aware input zones coming with the new google chrome release about a month ago. Yeah it's a cool way to input text easily without typing for long seconds, with the opportunity to get search results for "laughable clothes" when you say "fashionable clothes". Seriously i cannot see how this is useful, especially when it comes to desktop PCs.

But there's a good guy on the internet who happily made good use of it. He made a shell script that listens to your voice and use Google Voice API to decode it and convert it to text. I will be explaining this hack he made so you all can make good use of it.

First thing is we need a url for the API, do we define the API variable

API="http://www.google.com/speech-api/v1/recognize?lang=en"

Note that at the end of it there is this lang parameter, we can make our script more efficient if it would be able to handle multiple languages, let's put it in a variable, or maybe get it passed as an argument :)

if [ -z "$1" ]
  then
    echo "No language supplied, using en\n"
    LANG="en"
  else
    echo "using $1 as language\n"
    LANG="$1"
fi
API="http://www.google.com/speech-api/v1/recognize?lang=$LANG"

Now we need to send to this url a sound file containing our voice. But it's not that simple of course, we need:

arecord to record our voice over the mic

flac to convert the file format

wget to interact with the api

Make sure these 3 packages are installed, if not, you can always use your package manager like apt-get to install it. The reason we're converting the file into flac format is that is required by the API itself. Now let's mix things together!

JSON=`arecord -f cd -t wav -d 3 -r 16000 | flac - -f --best --sample-rate 16000 -o out.flac;\
wget -O - -o /dev/null --post-file out.flac --header="Content-Type: audio/x-flac; rate=16000" "$API"`

As you can see, we did good so far and the script will receive the response in JSON format, so we need to parse it using sed and awk. I already wrote an article about sed here, you want to check it out. This may look freaky but it does the job

UTTERANCE=`echo $JSON\
 |sed -e 's/[{}]/''/g'\
  |awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]; exit }'\
   |awk -F: 'NR==3 { print $3; exit }'\
    |sed -e 's/["]/''/g'`
echo "utterance: $UTTERANCE"

Yeah now we had our script to echo the text! That seems pretty geeky, but how can this be useful? Controlling our PC maybe? why not! To do that we must define string to which the script compares the final text, if it matches one of the strings, it executes the corresponding command.

CMD_LIST_DIRECTORY="list directory"
CMD_WHOAMI="who am i"
if [ `echo "$UTTERANCE" | grep -ic "^$CMD_LIST_DIRECTORY$"` -gt 0 ]; then
     ls .
elif [ `echo "$UTTERANCE" | grep -ic "^$CMD_WHOAMI$"` -gt 0 ]; then
     whoami
fi

We can define countless numbers of commands, i will be working on using arrays for this (maybe one of you can do it for us :) ). You can find a complete script here if you are too lazy to save a new file :p

Guess what, we just made good use of Google Voice API! I will leave you to test it, improve it and why not share it. Your comments are welcome.

6 comments:

Using Google Translate in PHP | Jacer Omri's BlogJuly 19, 2013 at 10:03 AM
[…] Hacking Google Voice API, We are about to hack Google Translate this time! We are going to write a full featured yet basic […]
XarlosOctober 22, 2013 at 8:29 PM
Nice little tutorial. If possibly, how would you extend this so that the voice is being read at all times and can stream the output? Having to type the command is perhaps circumventable?
Xarlos.
Jacer OmriOctober 22, 2013 at 8:54 PM
maybe attaching the script to a hotkey?
krishnaanarilDecember 20, 2013 at 9:37 AM
Cool tutorial. Gotta try it today itself...
KaydarlaDecember 21, 2013 at 2:09 PM
For parsing the JSON response from the API, 'jq' command line parser could be used - http://stedolan.github.io/jq/. Cheers.
femalefaustJanuary 16, 2014 at 1:25 AM
you effin' rock.

Jacer Omri's Blog

Pages

Tuesday, July 16, 2013

Hacking Google Voice API in Linux

6 comments: