From taxonomy to cybertaxonomy

- 0 Comments

Earlier this year, the International Committee on Zoological Nomenclature (ICZN) voted to amend “the Code” that governs how new animal species are named, in order to allow electronic – not just paper – publication of new species names. The change was a long time coming, having originally been proposed in 2008 (see our previous blog here for a bit more background), but was great news for online-only open access journals, which have the widest possible availability to readers.

But taxonomy is more than just nomenclature, of course, and the philosophy of open access extends to open data too; BioMed Central is active in encouraging authors to supply underlying data in a reusable form alongside research articles, with Dryad being the repository of choice for biodiversity data. (Those with an interest in open data might like to look at BMC Research Notesarticle collection and collaboration with BioSharing on data sharing, or to contribute to BioMed Central’s consultation on open data before it closes in a couple of days’ time. But back to the topic at hand.) There is a huge literature of taxonomic papers, and by necessity each one includes a slightly different set of information: on nomenclature, observational data, DNA sequences and so forth. The challenge is to liberate this data from the traditional publication format, and make it easily reusable by other researchers. It was with this aim in mind that Jeremy Miller and colleagues assembled last year at the Encyclopedia of Life’s Biodiversity Synthesis Center to discuss practical tools and strategies for transferring data from taxonomic literature to cybertaxonomic repositories. The outcome of their discussions are reported in BMC Biology, suggesting that the way forward lies in implementing XML markup to semantically tag data elements within taxonomic papers.

This isn’t the first time that XML markup and semantic tagging have been suggested, and indeed the taxonomic XML markup editor the authors suggest using, GoldenGATE, was first described back in 2009 by Agosti and Egloff in BMC Research Notes. What is new, however, is a clear determination to make things happen: rather than once again extolling the benefits of semantic markup, there is a focus on the practical aspects of marking up the literature, both prospectively and retrospectively. As the authors point out, the expectation that every individual taxonomist will keep up with the technology seems unreasonable – but they suggest that an approach centred on markup by publishers, which involves many fewer actors, is much more realistic. Of course, the most daunting task is not integrating taxonomic markup into existing workflows, but dealing with the huge number of legacy taxonomic publications – and here there is also optimism. Initial calculations suggest that fully 50% of the literature is covered by only 107 journals, a number that makes a concerted effort at retrospective markup seem if not easy, then at least achievable. Miller and colleagues’ contribution will make a material difference to this effort and the continuing debate on the future of taxonomy.