The source includes some real doozies, including “Repulsive Tory” for “Repository.”
My point here is not at all to make fun of Google's speech recognition capabilities. I've long been a staunch defender of current ASR technology in general, and Google's implementation of it in particular. And in fact, the overall quality of the Kagan transcripts is very good — there are stretches where nearly all the words are correct.
Still, errors of the kind illustrated above indicate some of the… shall we say, areas for potential improvement.
There are some cases where the transcript is a plausible rendering of the pronunciation, but is not very plausible as English-language content, e.g. "the searched ford general …" in place of "the search for general …", or "like adams molecule selves and tissues" for "like atoms molecules cells and tissues". I'm surprised that the recognizer's n-gram language model, which is contemporary ASR's approximation to what makes sense, made these choices. And there are a few things that are just bizarre, where I can only imagine that some obscure bug has short-circuited Google's language model entirely: "became evermore phadke" for "became ever more vague", or "tutsi you can write" in place of "to each new generation".
No comments:
Post a Comment