The guidance of wise men: Reproducibility, reporting standards, and the views of our Editorial Board

Following the launch today of BioMed Central's new Minimum Standards of Reporting ChecklistBMC Biology give us an insight into how it was developed and their board members' thoughts on its implementation.

“Rules are for the obedience of fools and the guidance of wise men”
-Douglas Bader (attributed)

PrintFrom today, authors submitting research manuscripts to four BioMed Central journals – BMC BiologyBMC NeuroscienceGenome Biology, and GigaScience – will be confronted with a checklist and asked whether their paper has met the standards of reporting it lays out.

This development is our response to the increasing awareness of the need for more fastidious attention to transparency and reporting standards in the wake of concerns about the irreproducibility of alarmingly many preclinical studies. You can read more about why we implemented it in the launch Editorial.

Checklists already in operation with other publishers have not been met with unalloyed approval, so before launching our own – at present in pilot phase – we asked our Editorial Board what they thought.

Their views, from the broadly supportive to the frankly skeptical, strongly influenced the implementation of our checklist pilot: here are some of them, selected and collated by Penelope Austin, BMC Biology Deputy Editor, who has spearheaded the pilot for the journal. They are attributed where we have the permission of the authors, anonymous where not.

Sanitizing sunlight and SEMs

Endorsement by our Editorial Board reflected what we suspect is a general exasperation with carelessness, perhaps combined with ignorance, that impedes the proper assessment of research.

One Editorial Board member, for example applauded  “an excellent initiative that will have the double merit of encouraging something a bit better than worst practice, and also impeding hasty and careless publication.”

An excellent initiative that will have the double merit of encouraging something a bit better than worst practice, and also impeding hasty and careless publication

…while another was “delighted to read your guidance to authors on reporting numbers of experimental points etc. These are things I am often asking for as a reviewer because authors often just don’t think to include the information (or don’t have it).”

More specific issues, helpful in focusing our checklist, were raised by several people. For example, on the meaninglessness of replicates if not properly specified:

“The distinction between technical and biological replicates is important. It is common, particularly in the area of molecular biology, to see presentations of values plus or minus what can be impressively small standard errors, and to realise, eventually, that the standard errors represent measurement error in an experiment with no biological replication at all.” (John Brookfield)

And on the iniquities of histograms, which may conceal more than they reveal:

“I agree with the presentation of data points, as opposed to histograms. It is MUCH easier to look at the distribution of points to determine whether the comparison methods are valid.”

 “All data statistics (mean, standard deviation, SEM [standard error of the mean]) can hide interesting patterns in data and there is simply no reason (with the widespread availability of fast computers and good software) not to plot all the data.

“Even box plots, with all their advantages, are not a substitute for seeing the data…the neuroscience literature includes many of the worst examples of hiding the data behind SEMS and I think a huge amount of sanitizing sunlight will be brought to this work if authors plot their data in a more informative way than they do now.”

These remarks, by the way, inspired a new series, to be launched shortly by BMC Biology and entitled ‘What is wrong with this picture’, illustrating just how misleading many presentations can be. You can make sure you don’t miss it by signing up for our monthly contents alerts.

On the other hand…

We shouldn’t delude ourselves however that transparent reporting guarantees reproducibility:

“We mustn’t assume that including information that gives other researchers the tools to try to reproduce the authors’ data means that ‘we only publish results that are reproducible’, which would falsely suggest that the results had, in fact, been reproduced by others prior to our publishing them. Of course, statistical testing is all about inferring that results are likely to be reproducible, but that is a slightly different thing.” (John Brookfield again)

And of course in some cases reproducibility, arguably may not even be paramount:

“Personally, I feel that this issue of reproducibility is something that has been blown out of all proportion with respect to experimental studies (as opposed to population studies, where errors of statistical analysis are a real issue).

“Yes, we should have sufficient information to decide if a study is likely to be correct, but I am less concerned with whether someone in my laboratory can repeat something published by another laboratory.

We do the experiments to determine if the conclusions from a body of work are useful to our understanding of a process. And as fields move forward, we learn what studies in other areas hold merit.”

And on statistics…and lies and statistics…

Many argued, in one way and another, that assessing the validity of the statistics  was ultimately the domain of the reviewer, not the author. One such was Frank Uhlmann:

“We will have designed and conducted our experiments according to how we thought we can reach conclusive results. We will have done this long before we submit our paper. Then we realise that the journal asks us to answer a long list of statistical compliance questions. Will this change our study? The answer is no.

“We will have to argue in the questionnaire that our statistics are sound. The real test comes during the review process, when our peers look at our approach and discuss with us whether or not we were right in our assumptions.”

The appropriateness of particular statistical tests needs to be assessed by reviewers. An author can use the right statistical test even without knowing exactly why it’s the right test.

…a view endorsed by Arthur Lander, who is by no means mathematically unsophisticated (he is a Guest Editor for our mathematical modelling series, Beyond Mendel):

“The appropriateness of particular statistical tests needs to be assessed by reviewers. An author can use the right statistical test even without knowing exactly why it’s the right test.

“As long as a careful reviewer is satisfied, it should be fine. And authors using the wrong statistical tests are highly unlikely to know that they are doing so, and therefore likely to provide the requested justifications even when they are wrong.”

Power calculations were a particular focus of cautionary remarks, partly as a matter of principle – their relevance depends on what kind of conclusions are being drawn from the experiments, and they are based on guesses about effect sizes that can’t be known in advance – and partly for practical reasons that also apply to the vexed issue of whether data follow a normal distribution – “…problematic for animal experiments in which large numbers are simply not achievable (affordable or permitted by the panels that review animal use)”.

…a caveat most strongly worded by one of our immunologists, who warned that if we insisted on power calculations and proof of normality we should risk losing all immunology papers.

And we cannot resist quoting the waspish perspective of another immunological Board member:

The statistical illiteracy admitted by [another Editorial Board member] and shared by myself is in some respects an advantage, because we are unable to call effortlessly on recondite tests that finally squeeze out a significant effect.

“I am against an overdidactic approach to statistics….The statistical illiteracy admitted by [another Editorial Board member] and shared by myself is in some respects an advantage, because we are unable to call effortlessly on recondite tests that finally squeeze out a significant effect.”

Resources and repositories

Nobody argued against proper identification of resources, with much strongly worded endorsement of Research Resource Identifiers (RRIDs) as a real contribution to reproducibility that spare the authors the need to “give all the gory details about each resource.” (Arthur Lander again).

Antibodies (no surprise here) came in for particularly emphatic expressions of concern:

“… failing to specify the exact clone used for the indicated molecular target, will without question lead to failures in reproducing work, because antibodies directed to different regions of a protein or with different affinities often give different results in what are otherwise ostensibly the same experiments.”

Perhaps a little more surprising were reservations expressed, for practical reasons, about the demand to make all primary data available, with our immunologists to the forefront of informed dissent again:

“This aspect of the requirements needs to be thought out more fully. For example, immunologists do large amounts of flow cytometry. There is no place for depositing the primary list mode files that would allow another investigator to conduct a re-analysis of the primary data (pre-gating), and I doubt the journal would host such voluminous data. Likewise for very large image files that are reduced to quantitative analyses in a manuscript, and so on.”

Sequence data, too, are apparently threatening to overwhelm national repositories.

Which said, however, we have excellent in-house help to offer to authors in finding an appropriate place to deposit datasets where difficulties arise.

Principle and practice – it’s all very well…

One of our most important objectives in consulting our Board was to avoid imposing unworkable reporting standards on our submitting authors, so in piloting requirements we have paid due attention to these impassioned sentiments:

“[Journal X] has an endless compliance sheet for statistical standards that the authors spend a lot of time filling out and that the reviewers ignore. Pointless waste of time for all concerned.”

“I echo [the Editorial Board member’s] sentiments about the awful [Journal X] stat sheet (which, needs to be re-filed with every revision, no matter how minor). If you ever feel overwhelmed by an excess of good submissions, you could implement their method and decrease your workload significantly.”

So our pilot has been devised to ensure responses as far as possible from authors and reviewers to the checklist, but avoid putting undue burdens on either.

What, you may ask, is undue?

We do not, for example, ask authors to give page and line number for every piece of information requested – only to confirm that they have met the requirements applicable to their research, and if they are aware of omissions, to give reasons so that reviewers and editors can take these into account.

We recognize that the checklist will not be a perfect fit for each individual study and think, with Bob Horvitz, that the appropriate balance in requirements and implementation can be resolved only by careful monitoring of the system we have set up:

“Whatever you do should be regarded as a work-in-progress: don’t engrave anything in stone and perhaps have a mechanism for feedback from authors and reviewers.”

Hence the decision to pilot rather than go full steam ahead.

The aim is rigor but not rigidity, and we shall be keeping an eye on ourselves.

View the latest posts on the On Biology homepage