Help put the open in Open Data and Open Bibliography


Legal restrictions and uncertainties surrounding scientific data are a barrier to efficient data sharing and reuse, and ultimately the pace of research. Copyright in particular is problematic for data. It is often unclear if data are protected by copyright and the law differs greatly internationally.

To try and more universally clarify the legal status of data published in our open access journals – maximizing the potential for secondary data uses such as text mining – we have been working towards a solution: public domain dedication of data under the Creative Commons CC0 waiver.

In a detailed editorial published in BMC Research Notes we set out the case and process for evolving the copyright and licensing structure in open access journals to make published data truly open, according to the Panton Principles. But we need the community’s help putting these principles into practice.

The drafting of this article was one of the main outcomes of our June 2011 Publishing Open Data Working Group meeting – a meeting involving, publishers, editors, funders, librarians and scientists (authors) themselves. In the article we propose a revised copyright license for content published in open access journals, which applies a Creative Commons attribution license to the main body of articles (papers) and a CC0 waiver to data included in additional files (supplementary material), reference lists (bibliographic data), and tabular data.

Practical examples of secondary uses of data in journal articles made possible by CC0 and definitions of data are discussed. Also, new legal wording (license statement) for all published articles is proposed and we believe this model could be adopted by many publishers. However, we want to make this change – which we believe will have numerous benefits for science – with the consensus and support of the scientific community. We therefore now seek public views on the proposals.

Questions we seek the scientific community’s input on:
– How appropriate is public domain dedication for data you (already) publish in journals?
– How do you define data – what data file types do you commonly publish as additional files (supplementary material)?
– How might removing legal restrictions on data sharing benefit (or harm) your research?

We look forward to receiving responses to these proposals within the next two months. Assuming we have support from our authors and editors for making published data maximally open we will begin the next phase of implementing Open Data in our journals, the process for which is also outlined in the article.

View the latest posts on the Research in progress blog homepage


Gerard Ridgway

Why remove the attribution requirement?

If I wanted to report findings from text-mining or other analysis of a bunch of papers, I would be quite happy to cite them all. Traditional printed journals would not want a bibliography running to dozens of pages, but surely modern online-only journals remove this barrier?

On the other hand, if I saw one of my tables reproduced verbatim in another paper that didn’t cite mine, I would be a little upset.

Open data is great, but so is attribution, and I don’t see why they should be considered incompatible.


For UK government-funded research, the RCUK (group of funding research councils) has recently mandated that papers submitted for publication from 1st April 2013:

* must be published in journals which are
compliant with Research Council policy on Open Access,
and;* must include details of the funding that
supported the research, and a statement on how the
underlying research materials such as data, samples or
models can be accessed.

This is a much more nuanced position than saying that “all data must be in the public domain”, as it allows, e.g. for data from human participants that cannot be adequately anonymised, to be published to identified researchers in line with the data subjects’ informed consent.

It also suggests – as does the available evidence – that the major problem with data sharing is not copyright, but the readiness and capability to make data available to an adequate standard and in a timely manner – see e.g.:





Public Availability of Published Research Data in High-Impact Journals.
PLoS ONE 6(9):

Heather Morrison

We need mechanisms to help researchers get appropriate credit for releasing datasets (e.g. making it count towards tenure and promotion). Obviously, this is not something a license could take care of – rather, what is needed is engaging a variety of scholarly communities to come up with solutions, such as faculty associations, university senates, scholarly societies, research funders and university administrators. Qualitative research could be helpful here (e.g. interviews with tenure and promotion committees). 

As for attribution and the license, one suggestion is that outside of the license per se, there are academic norms that may favor attribution even where formal attribution is not required by the license. For example, reproducing another scholar’s work without attribution is plagiarism, and releasing your work without citing what it is built upon is poor academic practice.

The feasibility of attribution will depend on the matter that is drawn from. A work that is largely built on one, or a few, datasets, can easily manage full attribution, but with broad-scale data mining, this may not be either possible or desirable. However, it should be possible to replicate the work for scholarly verification. This suggests a kind of attribution (different from citing) that may be important for data.

Thanks for bringing this up for consulation.

Heather Morrison

Another thought is that decisions about how their work should be shared really should be made by scholars themselves – perhaps with some help from publishers, librarians, etc. This is their work, after all. This could be a role for scholarly societies. This kind of consultation takes time; but this is a very major change for scholars, and we should take the time to do this. This online consultation is a good start, but only a few people are likely to get involved – people like me for whom this is my scholarly area. The researchers who gather the datasets are almost all way too busy to even notice this.

Usman Iqbal

Thanks for providing
opportunity to everyone for sharing his or her ideas. As a PhD student, I would
like to say about the present overview as a research student of highly cited
journals highlights three main features of the status of data availability
practices in the high impact scientific literature’s.

As there are
heterogeneous instructions to investigators publishing in high impact journals,
with requirement of some journals public data availability as a condition for
publication, some others are encouraging data sharing but having no binding
instructions, and a few journals having no specific instructions at all. Some papers
were not usually subject to any data availability policies, either because they
were published in journals without such policies or with some specific policies
that do not cover the primary data upon which the research was based. Even when research
published in journals with specific instructions regarding data availability,
most of publications did not adhere to the data availability instructions in
their respective journals.

So, we have to focus
on opportunities for improvement. Journals should adopt more routinely policies
for data sharing, expanding the types of data that are subject to public
sharing policies with the ultimate target of covering all types of data. It is
essential to develop mechanisms for journals to ensure that existing data
availability policies are consistently followed by researchers and published
research findings are easily reproducible.

Iain Hrynaszkiewicz

Thanks for your comments and emails so far. Please keep them coming. Rather than respond to all comments and emails individually we will be collating them and identifying common themes before providing a summary and responses. However, if you have a specific question about the consultation or would like a more rapid, full reply to your comments please email me at or indicate this in your blog comment(s).

Best regards,

Iain Hrynaszkiewicz
Publisher (Open Science), BioMed Central

Applied Scientist

It seems to me that all the benefits of Open Data publishing accrue to those wishing to utilise the published data; I see no clear benefit to the author. My research is of an applied nature and it concerns me that the Creative Commons CC0 waiver specifically invites others, including commercial organisations, to exploit published works however they so choose. Am I correct in my understanding that, for instance, a pharmaceutical company could use any of my published data as basis for a new drug license application without my consent? Having to sign such a waiver would definitely inhibit me from publishing in a BMC group journal.

Mohammad Abdollahi

I agree with the idea of “Help put the open data and open bibliography”. I believe the growth of INTERNET has made a new era in the field of copyright law. As a matter of fact, sometimes full texts are available in the INTERNET and are found in some places but it is unclear who has released the file. Therefore, more protection with such copyright laws is just wasting time. So the idea raised by BMC is fundamentally right. But something is important and that is the citation of the original work should not be forgotten at all and should be adhered by authors and should be emphasized in all related laws.


This initiative is very welcome for my field the rare diseases field, where data sharing is mandatory to make kwick progresses for the benefit of the patients. 

Comments are closed.