Massive data, the digitalization of science, and reproducibility of result
published: March 9, 2012, recorded: July 2010, views: 2442
released under terms of: Creative Commons Attribution Non-Commercial Share Alike (CC-BY-NC-SA)
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
As the scientific enterprise becomes increasingly computational and data-driven, the nature of the information communicated must change. Without inclusion of the code and data with published computational results, we are engendering a credibility crisis in science. Controversies such as ClimateGate, the microarray-based drug sensitivity clinical trials under investigation at Duke University, and retractions from prominent journals due to unverified code suggest the need for greater transparency in our computational science. In this talk I argue that the scientific method be restored to (1) a focus on error control as central to scientific communication and (2) complete communication of the underlying methodology producing the results, ie. reproducibility. I outline barriers to these goals based on recent survey work (Stodden 2010), and suggest solutions such as the “Reproducible Research Standard” (Stodden 2009), giving open licensing options designed to create an intellectual property framework for scientists consonant with longstanding scientific norms.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !