探花直播Rotherhithe Picture Research Library

Cambridge computer scientists have established a new gold standard for open research, in order to make scientific results more robust and reliable.

Open access isn鈥檛 as open as you think, especially when there are corporate interests involved

Matthew Grosvenor

A group of Cambridge computer scientists have set a new gold standard for openness and reproducibility in research by sharing the more than 200GB of data and 20,000 lines of code behind their latest results 鈥� an unprecedented degree of openness in a peer-reviewed publication. 探花直播researchers hope that this new gold standard will be adopted by other fields, increasing the reliability of research results, especially for work which is publicly funded.

探花直播researchers are presenting their results at a talk today (4 May) at the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI) in Oakland, California.

In recent years there鈥檚 been a great deal of discussion about so-called 鈥榦pen access鈥� publications 鈥� the idea that research publications, particularly those funded by public money, should be made publicly available.

Computer science has embraced open access more than many disciplines, with some publishers sub-licensing publications and allowing authors to publish them in open archives. However, as more and more corporations publish their research in academic journals, and as academics find themselves in a 鈥榩ublish or perish鈥� culture, the reliability of research results has come into question.

鈥淥pen access isn鈥檛 as open as you think, especially when there are corporate interests involved,鈥� said Matthew Grosvenor, a PhD student from the 探花直播鈥檚 Computer Laboratory, and the paper鈥檚 lead author. 鈥淒ue to commercial sensitivities, corporations are reluctant to make their code and data sets available when they publish in peer-reviewed journals. But without the code or data sets, the results are irrelevant 鈥� we can鈥檛 know whether an experiment is the same if we try to recreate it.鈥�

Beyond computer science, a number of high-profile incidents of errors, fraud or misconduct have called quality standards in research into question. This has thrown the issue of reproducibility 鈥� that a result can be reliably repeated given the same conditions 鈥� into the spotlight.

鈥淚f a result cannot be reliably repeated, then how can we trust it?鈥� said Grosvenor. 鈥淚f you try to reproduce other people鈥檚 work from the paper alone, you often end up with different numbers. Unless you have access to everything, it鈥檚 useless to call a piece of research open source. It鈥檚 either open source or it鈥檚 not 鈥� you can鈥檛 open source just a little bit.鈥�

With their most recent publication, Grosvenor and his colleagues have gone several steps beyond typical open access standards 鈥� setting a new gold standard for open and reproducible research. All of the experimental figures and tables in the award-winning of their paper, which describes a new method of making data centres more efficient, are clickable.

By clicking on any of the figures or tables in the paper, readers are taken to a website where the researchers have produced technically detailed descriptions of the methods for every one of their experiments. These descriptions include the original data sets and tools that were used to produce the figures as well as free and open source access to all of the source code that they wrote and modified.

In the past this might not have been possible, but thanks to cheap cloud storage, the researchers have put nearly 200GB of data and 20,000 lines of code on to the internet and made it freely available to all under a permissive open-source license.

鈥淚t now should be possible for anyone with a collection of computers to follow our instructions and produce our exact graphs,鈥� said Grosvenor. 鈥淲e think that this is the way forward for all scientific publications and so we鈥檝e put our money where our mouth is and done it.鈥�



探花直播text in this work is licensed under a . For image use please see separate credits above.