Peer Review Week 2023: Views on the future of code peer review

The significant growth of genomic data and bioinformatics tools to analyze them has led, inevitably, to discussions around reproducibility and robustness of software.

For computational research, the code associated with a manuscript is arguably as important as the manuscript describing it; without the code, the hypothesis and finding of such a study cannot fully be assessed and the usability of the code or tool cannot be fully tested.

For computational research, the code associated with a manuscript is arguably as important as the manuscript describing it; without the code, the hypothesis and finding of such a study cannot fully be assessed and the usability of the code or tool cannot be fully tested.

Code Ocean is one platform that can be used to peer review code.  It provides “compute capsules” for researchers to upload their code and reviewers to access anonymously to test it. A small cluster of Nature journals have now integrated Code Ocean into their submission system.

However, is this welcomed in the computational science community? How do researchers feel about the peer review of code in general? After all, as we don’t ask reviewers to try and reproduce experimental results in the laboratory during manuscript assessment, so perhaps we shouldn’t ask them to test code either.

I reached out to a number of researchers that serve on the editorial board for the journal BMC Bioinformatics to find out, inviting them to share their views in response to a few questions.

What are your thoughts on mandating the sharing of previously unreported code described in a manuscript submitted to a computational journal such as BMC Bioinformatics?

There was a, possibly unsurprisingly, consensus that this was a good idea and that all authors should be required to deposit their code in Github or similar platform.

Dimitris Polychronopoulos from Ochre Bio stated that, in his view, “publishing previously unreported code in code repositories such as Github, Bitbucket or others is a prerequisite for any work that involves computational analyses.”

He then went on to explain that this was for two reasons – to “ensure code reproducibility and code robustness for people running on different computer architectures” and “disseminate the work and potentially integrate findings generated from a pipeline to another under development, promoting this way scientific progress.”

Miha Moškon from the University of Ljubljana went one step further, mentioning the importance of documentation, “ … a code published without a proper documentation is sometimes as useless as no code at all. The requirement to share the code should thus be complemented with the requirement to also document it appropriately.”

Timothy Shaw from the H. Lee Moffitt Cancer Center and Research Institute points out that it “ … should be a priority of requiring authors to submit ready-to-run code, which should include examples of inputs and outputs.”  Hence platforms such as Code Ocean of course, although interestingly all but one of my respondents hadn’t heard of it.

But were my respondents as keen on the actual process of testing code as part of the peer review process?  Did they see any benefit to it? I asked them.

What might the advantages of peer review of code be?

© joyfotoliakid / Stock.adobe.com

All respondents agreed that peer review would lead to improvements in the code prior to publication.

Bashar Ibrahim from the Gulf University for Science and Technology and University of Jena said that “Peer review of code is a valuable practice in software development that enhances code quality, knowledge sharing, and collaboration. It helps identify and address issues early, leading to more efficient development and better overall tools.”

Shaw stated that he “personally benefitted from having users test our software.”

Johann Rohwer from Stellenbosch University was of a similar opinion but also cautioned that it may not be possible to inspect everything in detail if the code base is particularly large.

This statement lent itself nicely to my next question.

Can you think of any disadvantages of including the assessment of code in the peer review process?

As with the previous question, there was a general theme, that is, concerns focusing on finding peer reviewers that can test code and the additional time that may be required, which may lead to delays in publication.

Moškon, for example, stated that “finding reviewers with appropriate expertise who are willing to provide timely reviews with high standard is already a challenging task, which editors are facing every day. And finding a reviewer who is competent in assessing, e.g., the biological significance of the results as well as the quality and reproducibility of the published code will sometimes be an impossible task.”

Anil Kesarwani from GSK R&D Pharma pointed out that “reviewers need expertise in both the specific research domain and coding, which may not always be readily available” and that “code review can be subjective as different reviewers may have varying opinions on code style and quality.”

Boyang Ji from the Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, who was the one respondent that had any experience of using Code Ocean to peer review code, said “usually I just have a look the core part of the algorithm part. For machine learning projects, I usually have a check to make sure the pipeline is as authors described in the manuscript. Usually, I cannot review the code line by line.”

So, perhaps using automated AI-tools such as ChatGPT can help solve the issues around additional resources needed to peer review code. I asked my respondents what they thought about this.

“Reviewers need expertise in both the specific research domain and coding, which may not always be readily available.”

Anil Kesarwani, GSK R&D Pharma (USA)

What are your views on using a ChatGPT tool to assess code automatically?

Here, Ibrahim voiced the opinion shared by almost everyone:

“AI-based tools can be highly beneficial for improving code quality, and consistency. However, it is a valuable tool within the broader code review but not a complete replacement for human expertise. The best practice is to combine automated code assessments with regular human code reviews for code quality assurance.”

Jia Meng from Xi’an Jiaotong-Liverpool University was more cautious, stating that they were not convinced that ChatGPT is currently able to perform peer review of code well at present, but this might change in the future.

Polychronopoulos was particularly enthusiastic, stating “I am a big fan. In my mind, assessing code can be largely broken down into two steps; first you check out the whole pipeline to familiarize yourself and get a general idea of what’s going on and then you delve into the specifics. I see ChatGPT being particularly useful at the first step.”

How do you see the future of peer review of code?

In essence, most respondents do feel that AI will have a role to play.

Ibrahim commented that “Ultimately, the future of peer review of code review will be shaped by advances in AI, collaboration, and on improving code quality. Embracing these trends can help development teams deliver higher-quality software more efficiently.”

Moškon was a little more circumspect: “… the future of peer review of code will rely on the automatic assessment tools. To what degree can we trust these tools is another question. Are we willing to transfer the peer review process to artificial intelligence?”

And Sri Krishna Subramanian from the Institute of Microbial Technology concludes the discussion with “in the foreseeable future, I anticipate widespread adoption of code peer review as a standard practice across reputable journals in all major fields. The task is to find the sweet spot by balancing the integration of AI-driven tools for automated code evaluation, fostering community involvement for comprehensive code testing, and establishing standardized protocols. This approach should be complemented by the expertise of human reviewers to assess both the code and its accompanying manuscript, aligning with the principles of open science and adhering to ethical standards governing code and its associated data.”

In conclusion

So, in summary, what do the researchers I reached out to think about the future of peer view of code? In essence, it is laudable and something that should be considered essential when assessing papers describing computational research.  It should lead to higher quality and robust researcher-developed software and algorithms. There are some challenges, such as the difficulties in finding suitable reviewers and authors may have to wait longer for a decision, due to the time it takes to assess the code.  The advent of ChatGPT may, in part, be a solution to these, but it is important that human reviewers are still involved and gudelines incorporated to ensure that the peer review is suitably robust.

Acknowledgements 

Thank you to all the following BMC Bioinformatics Editorial Board Members for taking the time to share their thoughts on the future of code peer review.

  • Dimitris Polychronopoulos, Ochre Bio (UK)
  • Miha Moškon, University of Ljubljana (Slovenia)
  • Jia Meng, Xi’an Jiaotong-Liverpool University (China)
  • Johann Rohwer, Stellenbosch University (South Africa)
  • Timothy Shaw, H. Lee Moffitt Cancer Center and Research Institute (USA)
  • Bashar Ibrahim, Gulf University for Science and Technology (Kuwait) and University of Jena (Germany)
  • Anil Kesarwani, GSK R&D Pharma (USA)
  • Sri Krishna Subramanian, Institute of Microbial Technology (India)

View the latest posts on the BMC Series blog homepage

Comments