During my Human-Computer Interaction course in undergrad, I took on the challenge of developing an accessibility application for Google Glass with two graduate students. The purpose of this project was to closed-caption live conversations as they happened, with the future goal of offering real-time translation for people acclimating to foreign countries. The application could also help those who are hard of hearing as an alternative to hearing aids.
On top of gaining experience with Google Glass, this project was informative in other ways as well. For example, as my first foray into speech processing, the project disproved some core assumptions I had about natural language understanding (NLU). I was able to take the results of the project and dig deeper to figure out why certain parts of the system did not function as well as I had thought they would when designing our real-time recognition algorithm.
Below is a video demonstration taken near the end of the project. While the system is demonstrated as working in some capacity, the response time is unsatisfactory and the output is not accurate. With additional time on this project, I would have revised the algorithm to allow for the speech recognition component to establish the domain in which the user is speaking, and have access to other linguistic hints that would improve accuracy. I would also rearchitect the data flow to allow for faster response.
(Note: this video was recorded before discovering the nuances of how speech APIs work. Though I trash the Google Web Speech API a little bit for its slow/inaccurate performance, further optimizations did improve performance, and I gained a more full understanding of how speech systems worked in the following months while trying to figure out what went wrong.)