First methylated nematode genome and other new datasets in GigaDB

- 4 Comments

The worm that turned (epigenetics)
GigaDB, GigaScience’s associated database, has had a number of new datasets just added, many for data types previously not hosted. Today marks the publication of new research in our sister BMC journal Genome Biology shaking up the epigenetics field by shattering the assumption that DNA methylation is absent in nematodes. As a novel and potentially controversial finding, the supporting data has been deposited in GigaDB to assist others to follow on and reproduce the results. Hosting our first bisulfite sanger sequencing and mass spectrometry data, these are both data types that do not have well established domain specific repositories, demonstrating that GigaDB can assist areas of research that may be traditionally less represented by data sharing infrastructure. As with our previous methylome data from the mouse, we have also presented this in the interoperable ISA-Tab format to maximize its reusability (for an overview of ISA-TAB see the Nature Genetics paper we contributed to).

DNA methylation is an epigenetic modification whose function and distribution is currently an intense focus of research, and C. elegans, science’s most studied nematode, is a bona fide methylation-free zone – a curious exception to the widespread presence of DNA methylation across the tree of life. By extension, conventional thinking has held that DNA methylation is missing from all nematodes. Until now – because researchers at our parent organization the BGI studying the parasite Trichinella spiralis describe today, for the first time, the presence of both DNA methylation and a DNA methyltransferase in a nematode.

Dr Fei Gao, lead author of the study, explained: “We observed changes in DNA methylation during the transitions between T. spiralis’s three life cycle stages. Interestingly, we also found evidence to suggest that DNA methylation might be controlling parasitism-related genes. Our surprising discovery therefore may open a new avenue for developing therapeutics against T. spiralis infection, through targeting DNA methylation processes.” Co-author Prof Mingyuan Liu (Jilin University) continued: “T. spiralis is one of the most widespread meat-borne parasites. It infects a broad range of animals and in humans causes trichinosis, a serious disease, so there are important implications to anything we can learn about its biology.”

More Data
This is the second dataset after the Sorghum genome we have hosted that has been published in Genome Biology, and the last month has also been particularly successful for our datasets getting published in high profile journals. After our success at the start of the year in getting a RNA-editing dataset cited in Nature Biotechnology, for the first time this month we have data making it into the main Nature journal, with two studies in the same issue. Firstly the supporting data for the Pacific Oyster genome published in Nature cites data available in GigaDB, and in the same week the first metagenomics data in GigaDB was cited in a high profile study looking at the microbiomes of type-two diabetes patients. With additional genomes released in recent weeks including the Puerto Rico Parrot (associated with a Data Note article published last month in GigaScience) and the Darwin’s Finch genome (so far unpublished in any journal), there are now about 40 datasets to play with using the added functionality soon to be available from our new look database launching in the next few weeks. For a sneak preview of what it will look like check out Tam’s slides from his talk at the Genome Informatics meeting in Cambridge last month, and watch this space for further announcements.

More Utility
These examples of published and cited datasets are particularly timely with the Thomson Reuters data citation index having launched (see here) and finally providing a mechanism enabling the citation and reuse of data to be properly tracked. Coupled with the ORCID author identifier system that launched yesterday now allowing authors to take credit and include DOIs from datasets among their research outputs, the ultimate aim of data citation to be able to value and treat data generated in the course of research in the same way as papers should now be finally possible. Now has never been a better time to publish to your data, so please contact us if you have any questions or are interested in publishing data notes or full articles integrating and utilizing your large-scale datasets.

Further Reading
1. Fei Gao, Xiaolei Liu, Xiuping Wu, Xuelin Wang, Desheng Gong, Hanlin Lu, Yudong Xia, Yanxia Song, Junwen Wang, Jing Du, Siyang Liu, Xu Han, Yizhi Tang, Huanming Yang, Qi Jin, Xiuqing Zhang and Mingyuan Liu (2012) Differential DNA methylation in discrete developmental stages of the parasitic nematode Trichinella spiralis. Genome Biology, 13:R100 doi:10.1186/gb-2012-13-10-r100 (17 October 2012)

2. Gao, F; Wang, J; Ji, G (2012): Bisulfite-PCR combined with cloning Sanger sequencing data for validating DNA methylation level in Trichinella spiralis. GigaScience. http://dx.doi.org/10.5524/100043

UPDATE 18/10/12: See also the great Genome Biology blog posting and accompanying commentary on the nematode study.