Wednesday, May 7, 2014

Data Accessibility in Linguistics Research

Praiseworthy information responsibility in linguistics research.

I'll spare you the details, though I intend to try some of the ideas out myself later. What I want to underline here is something that the six papers in the session had in common.

What they all had in common was that they reported results on published databases. Two papers used NIST SRE 2008 data. Three papers used theNIST RT05, RT07, RT08, and/or RT09 datasets. One paper used the AMI corpus. And one used the REPERE collection.

None of the presentations used proprietary or unpublished data. This illustrates the fact that in most speech processing fields, it has become normal to cite the performance of new algorithms on data that is also available to others, so that comparisons are quantitatively meaningful.

In some sense, this is also really about accessibility. When you want to evaluate or extend someone's ideas, it's critical to be able to replicate their work — and that requires access to the datasets they analyzed.

Language Log: Accessibility and diarization

No comments:

Post a Comment