Thursday, May 5, 2011

PBS plays Google’s word game, transcribing thousands of hours of video into crawler-friendly text » Nieman Journalism Lab » Pushing to the Future of Journalism

Case study in converting from meta-model (spoken narrative) to another (written narrative)…

Blogs and newspaper sites enjoy a built-in advantage when it comes to search-engine optimization. They deal in words. But a whole universe of audio and video content is practically invisible to Google.

Say I want to do research on Osama bin Laden. A web search would return news articles about his assassination, a flurry of tweets, the Wikipedia page, Michael Scheuer’s biography, and an old Frontline documentary, “Hunting Bin Laden.” I might then take my search to Lexis Nexis and academic journals. But I would never find, for example, Frontline’s recent reporting on the Egyptian revolution, where bin Laden makes an appearance, or any number of other video stories in which the name is mentioned.

While video and audio transcripts are rich for Google mining, they’re also time-consuming and expensive. PBS is out to fix that by building a better search engine. The network has transcribed and tagged, automatically, more than 2,000 hours of video using software called MediaCloud.

PBS plays Google’s word game, transcribing thousands of hours of video into crawler-friendly text » Nieman Journalism Lab » Pushing to the Future of Journalism

No comments:

Post a Comment