But there's a good guy on the internet who happily made good use of it. He made a shell script that listens to your voice and use Google Voice API to decode it and convert it to text. I will be explaining this hack he made so you all can make good use of it.
First thing is we need a url for the API, do we define the API variable
API="http://www.google.com/speech-api/v1/recognize?lang=en"
Note that at the end of it there is this lang parameter, we can make our script more efficient if it would be able to handle multiple languages, let's put it in a variable, or maybe get it passed as an argument :)
if [ -z "$1" ]
then
echo "No language supplied, using en\n"
LANG="en"
else
echo "using $1 as language\n"
LANG="$1"
fi
API="http://www.google.com/speech-api/v1/recognize?lang=$LANG"
Now we need to send to this url a sound file containing our voice. But it's not that simple of course, we need:
- arecord to record our voice over the mic
- flac to convert the file format
- wget to interact with the api
Make sure these 3 packages are installed, if not, you can always use your package manager like apt-get to install it. The reason we're converting the file into flac format is that is required by the API itself. Now let's mix things together!
JSON=`arecord -f cd -t wav -d 3 -r 16000 | flac - -f --best --sample-rate 16000 -o out.flac;\
wget -O - -o /dev/null --post-file out.flac --header="Content-Type: audio/x-flac; rate=16000" "$API"`
As you can see, we did good so far and the script will receive the response in JSON format, so we need to parse it using sed and awk. I already wrote an article about sed here, you want to check it out. This may look freaky but it does the job
UTTERANCE=`echo $JSON\
|sed -e 's/[{}]/''/g'\
|awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]; exit }'\
|awk -F: 'NR==3 { print $3; exit }'\
|sed -e 's/["]/''/g'`
echo "utterance: $UTTERANCE"
Yeah now we had our script to echo the text! That seems pretty geeky, but how can this be useful? Controlling our PC maybe? why not! To do that we must define string to which the script compares the final text, if it matches one of the strings, it executes the corresponding command.
CMD_LIST_DIRECTORY="list directory"
CMD_WHOAMI="who am i"
if [ `echo "$UTTERANCE" | grep -ic "^$CMD_LIST_DIRECTORY$"` -gt 0 ]; then
ls .
elif [ `echo "$UTTERANCE" | grep -ic "^$CMD_WHOAMI$"` -gt 0 ]; then
whoami
fi
We can define countless numbers of commands, i will be working on using arrays for this (maybe one of you can do it for us :) ). You can find a complete script here if you are too lazy to save a new file :p
Guess what, we just made good use of Google Voice API! I will leave you to test it, improve it and why not share it. Your comments are welcome.
[…] Hacking Google Voice API, We are about to hack Google Translate this time! We are going to write a full featured yet basic […]
ReplyDeleteNice little tutorial. If possibly, how would you extend this so that the voice is being read at all times and can stream the output? Having to type the command is perhaps circumventable?
ReplyDeleteXarlos.
maybe attaching the script to a hotkey?
ReplyDeleteCool tutorial. Gotta try it today itself...
ReplyDeleteFor parsing the JSON response from the API, 'jq' command line parser could be used - http://stedolan.github.io/jq/. Cheers.
ReplyDeleteyou effin' rock.
ReplyDelete