It just got easier to find open access research for textmining/redistribution

The challenges faced by researchers in obtaining access to published research articles for textmining purposes has been in the news again this month. BioMed
Central makes its full
open access corpus
 available online, but in general obtaining permission from
publishers can be a major headache.

PubMed Central, the NIH’s archive of almost 2 million freely-accessible articles is a promising resources for textminers but unfortunately, due to licensing restrictions, only a fraction of those articles are available for redistribution and mining. Nevertheless the PubMed Central Open
Access Subset
already contains almost 400,000 articles and is growing
rapidly.

A technical tweak at the NCBI has just made the open
access subset even more useful. It has long been possible to restrict a PubMed
search to include only articles in PubMed Central, by including  pubmed
pmc[sb]
in the search
. And PubMed Central’s full article search  has provided the ability to limit
to the open access subset only, by adding open access[filter] .

Until recently, though, it hasn’t been possible to filter a PubMed
search to find open access, redistributable articles. The good news is that NCBI just fixed this
omission – you can now restrict a PubMed search to just the articles that are licensed to allow redistribution as part of the open access subset, by adding the
following restriction to any PubMed query:

pubmed pmc open access [filter]

This means that the full power of the PubMed search interface
(including LinkOut filters, Subsets and MEDLINE annotations) is available to text miners and others wishing to track down open access articles available for reuse.

For example, the following query:

Randomized
Controlled Trial[ptyp] AND pubmed pmc open access[filter] NOT ( loprovbmc[FILTER]
OR loprovplos[FILTER])

finds articles classified by MEDLINE as RCTs
that are available for redistribution/reuse, from publishers other than BioMed Central and PLoS.

Further details on copyright and licensing for the PubMed
Central Open Access Subset are available here. It should be
noted that PubMed Central does not allow robotic web downloading of articles from PubMed Central, but instead provides bulk download options via
OAI/FTP
 which can be used to retrieve articles found by querying PubMed using the new filter.

View the latest posts on the Research in progress blog homepage

Comments