Saturday, February 19, 2011

Abstract: Reproducible Software versus Reproducible Research (2011 AAAS Annual Meeting (17-21 February 2011))

The development of the stored-program concept is a milestone of computer science that blurred (but certainly did not obliterate) the distinction between code and data.  The ability to treat code as data is now so deeply ingrained, few computer practitioners ever think about the alternative, which seems like a quaint historical curiosity. 

Here’s a contemporary problem that provides a chance to think about the distinction afresh: Computational scientists should publish their code as well as their data.  Data dissemination—a pillar of scientific transparency—is inadequate without concomitant dissemination of the code that generates the computed/simulated results.

In sharp contrast we have incentives in computational research, strongly biased towards rapid publication of papers without any realistic requirement of validation, that lead to a completely different outcome.  Publications in computationally-based research (applied to any specific discipline) often lack any realistic hope of being reproduced, as the code behind them is not available, or if it is it rarely has any automated validation, history tracking, bug database, etc.

Abstract: Reproducible Software versus Reproducible Research (2011 AAAS Annual Meeting (17-21 February 2011))

No comments:

Post a Comment