<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>GigaScience blog</title>
	<atom:link href="http://blogs.biomedcentral.com/gigablog/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.biomedcentral.com/gigablog</link>
	<description>Just another Biomed Central Blogs site</description>
	<lastBuildDate>Thu, 09 May 2013 14:40:48 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>The difficulties sharing neuroscience data: can data publishing help?</title>
		<link>http://blogs.biomedcentral.com/gigablog/2013/05/09/the-difficulties-sharing-neuroscience-data-can-data-publishing-help/</link>
		<comments>http://blogs.biomedcentral.com/gigablog/2013/05/09/the-difficulties-sharing-neuroscience-data-can-data-publishing-help/#comments</comments>
		<pubDate>Thu, 09 May 2013 14:25:04 +0000</pubDate>
		<dc:creator>Scott Edmunds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[data citation]]></category>
		<category><![CDATA[data publication]]></category>
		<category><![CDATA[DOI]]></category>
		<category><![CDATA[fMRI]]></category>
		<category><![CDATA[gigaDB]]></category>
		<category><![CDATA[imaging]]></category>
		<category><![CDATA[neuroscience]]></category>

		<guid isPermaLink="false">http://blogs.biomedcentral.com/gigablog/?p=574</guid>
		<description><![CDATA[<p><a href="http://blogs.biomedcentral.com/gigablog/files/2013/05/brains_charlie_llewellin.jpg"></a>Last week we published our <a href="http://dx.doi.org/10.1186/2047-217X-2-6" title="fMRI datanote" target="_blank">first neuroscience data note</a> containing <a href="http://dx.doi.org/10.5524/100051" title="fMRI data" target="_blank">10GB of fMRI data</a> hosted and integrated into the paper by a DOI to our <a href="http://gigadb.org/" title="GigaDB homepage" target="_blank">GigaDB</a> database. While we have published a number of genomics datasets and data notes (see the Puerto Rican Parrot genome <a href="http://www.gigasciencejournal.com/content/1/1/14" title="Puerto Rican Parrot genome paper" target="_blank">data note</a> and its associated <a href="http://dx.doi.org/10.5524/100039" title="Puerto Rican Parrot genome data" target="_blank">data DOI</a>), this is a nice example of us providing a home for “orphan data”, the long tail of data types without community agreed curated repositories. Sharing of data enables re-use and new work to be created, all goals and reasons why we ...</p><p class="clearfix"><a class="btn alignright continue-reading" href="http://blogs.biomedcentral.com/gigablog/2013/05/09/the-difficulties-sharing-neuroscience-data-can-data-publishing-help/">Read more</a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://blogs.biomedcentral.com/gigablog/files/2013/05/brains_charlie_llewellin.jpg"><img src="http://blogs.biomedcentral.com/gigablog/files/2013/05/brains_charlie_llewellin-226x300.jpg" alt="" width="226" height="300" class="alignleft size-medium wp-image-589" /></a>Last week we published our <a href="http://dx.doi.org/10.1186/2047-217X-2-6" title="fMRI datanote" target="_blank">first neuroscience data note</a> containing <a href="http://dx.doi.org/10.5524/100051" title="fMRI data" target="_blank">10GB of fMRI data</a> hosted and integrated into the paper by a DOI to our <a href="http://gigadb.org/" title="GigaDB homepage" target="_blank">GigaDB</a> database. While we have published a number of genomics datasets and data notes (see the Puerto Rican Parrot genome <a href="http://www.gigasciencejournal.com/content/1/1/14" title="Puerto Rican Parrot genome paper" target="_blank">data note</a> and its associated <a href="http://dx.doi.org/10.5524/100039" title="Puerto Rican Parrot genome data" target="_blank">data DOI</a>), this is a nice example of us providing a home for “orphan data”, the long tail of data types without community agreed curated repositories. Sharing of data enables re-use and new work to be created, all goals and reasons why we have built the infrastructure that makes up <em>GigaScience</em> and its integrated data hosting environment. This is even more timely, with a number of recent and high profile retractions and fraud cases in the areas such as <a href="http://www.nature.com/news/psychology-must-learn-a-lesson-from-fraud-case-1.9513" title="Psychology fraud Nature" target="_blank">psychology</a> and <a href="http://retractionwatch.wordpress.com/2012/09/05/former-harvard-psychology-prof-marc-hauser-committed-misconduct-in-four-nih-grants-ori/" title="Marc Hauser retraction watch" target="_blank">animal cognition</a> showing there is a growing need for increased transparency and access to supporting data to tackle this growing reproducibility gap.</p>
<p><strong>The challenges sharing neuroscience data</strong><br />
Genomics has been a fertile area for data sharing through having a relatively limited number of platforms and resulting data types, and being unified by being focused on a particular technology. This has made it very different to an area such as neuroscience that falls into separate communities, each studying the brain at different levels and with different tools, which has made it much harder to provide motivation and a need to provide data to researchers outside their community. Compared to the (relatively) smooth rise of genomics data sharing led by the Human genome project and community guidelines formulated at the <a href="http://www.sciencemag.org/content/291/5507/1192.full" title="Bermuda Rules" target="_blank">Bermuda</a> and <a href="http://www.genome.gov/10506537" title="NHGRI Fort Lauderdale rules" target="_blank">Fort Lauderdale</a> meetings, moves to spur similar efforts in neuroscience have been more challenging. Organizations such the <a href="http://www.incf.org/" title="INCF website" target="_blank">International Neuroinformatics Coordinating Facility</a> (INCF) have been set up to coordinate and encourage neuroimaging data sharing, and have been producing tools and infrastructure to enable it. </p>
<p>One of the many tools <a href="http://software.incf.org/software/the-fmri-data-center" title="fMRI data center INCF page" target="_blank">promoted by the INCF</a> has been the <a href="http://www.nitrc.org/projects/fmridatacenter/" title="fMRI data center" target="_blank">fMRI data center</a> (fMRIDC) — a large-scale effort to gather, curate, and openly share fMRI data used in peer reviewed studies. Pioneering data sharing in neuroscience, despite receiving support from journals such as the <a href="http://www.mitpressjournals.org/loi/jocn" title="JoCN homepage" target="_blank">Journal of Cognitive Neuroscience</a>, the platform initially <a href="http://www.nature.com/neuro/journal/v3/n9/full/nn0900_845.html" title="Nature Neuroscience editorial" target="_blank">received skepticism</a> from many of the community due to concerns over scooping and practicalities such as the time and effort it would require to deposit usefully curated data. The resource helped <a href="http://www.sciencedirect.com/science/article/pii/S1053811912011068" title="fMRIDC article" target="_blank">prove its utility</a> over the following years by collecting over 100 complete studies, and enabling many publications that produced new results and/or conclusions from reusing this data, but unfortunately in the current difficult financial climate it did not receive long term funding and currently is not taking new submissions. </p>
<p><strong>Promising moves on the horizon</strong><br />
Despite this setback, things are now more optimistic in this area, with the community looking more receptive to sharing their data, and work underway to overcome many of the roadblocks that have been holding things back (see <a href="http://dx.doi.org/10.1186/2047-217X-1-9" title="GIgaScience Neuroimaging commentary" target="_blank">our commentary</a> on this subject). There are a number of promising moves from brain atlas, (e.g. <a href="http://www.brain-map.org/" title="Allen Brain Atlas" target="_blank">the Allen Brain Atlas</a>), and connectome projects (e.g. the <a href="http://humanconnectome.org/data/" title="Human Connectome Data" target="_blank">human connectome project</a> and <a href="http://www.cmrr.umn.edu/multiband/" title="CMRR data" target="_blank">CMRR</a>) for providing huge resources of public data. In the functional imaging space, the <a href="http://openfmri.org/" title="openfMRI" target="_blank">openMRI project</a> also provides a home for raw fMRI data, and has developed nice platform and <a href="https://openfmri.org/content/data-organization" title="openfMRI data organization page" target="_blank">data organisation standards</a>.</p>
<p>The <a href="http://adni.loni.ucla.edu/" title="ADNI homepage" target="_blank">Alzheimer’s Disease Neuroimaging Initiative</a> (ADNI) was launched in 2003 to speed up drug development by validating imaging and biomarker data for Alzheimer’s disease clinical treatment trials, and set a new standard in neuroscience for data sharing without embargo. Despite data being de-identified, the consortium allows access only to approved members of the “scientific community” after their authorization, and there are stricter rules regarding data reuse and credit than the genomics community are used to. In contrast to these limitations we issue all of our data use a <a href="http://creativecommons.org/choose/zero/" title="CC0 homepage" target="_blank">CC0 waiver</a> to maximise its potential re-use, whilst giving all our hosted datasets a citable <a href="http://datacite.org/" title="DataCite homepage" target="_blank">DataCite</a> DOI to enable attribution and credit to the producers. The broader imaging community has shown that there are mechanisms and a large potential user base for sharing their data, with the <a href="http://www.openmicroscopy.org" title="OME homepage" target="_blank">Open Microscopy Environment</a> (OME) community producing a suite of open visualization and conversion tools and data standards. OME’s open imaging platform <a href="http://www.openmicroscopy.org/site/products/omero" title="OMERO website" target="_blank">OMERO</a> (developed in part by our editorial board member Jason Swedlow), has a growing user base from labs organizing and presenting their data, and has worked with journals and societies to produce products such as the <a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2958474/" title="JCB image viewer paper" target="_blank">Journal of Cell Biology image viewer</a>, and the American Society of Cell Biology “<a href="http://cellimages.ascb.org/" title="Cell Image Library" target="_blank">The Cell: an Image Library”</a>.  In the longer term the future is looking bright, the European <a href="http://www.eurobioimaging.eu/" title="Euro Bioimaging" target="_blank">Euro-Bioimaging initiative</a> is also planning and aiming to build large-scale public biological and biomedical imaging infrastructure by 2017.</p>
<p><strong>Data Publishing to the rescue?</strong><br />
Complementing these forthcoming subjects specific resources, the rise of a number of unstructured repositories such as <a href="http://datadryad.org/" title="Dryad homepage" target="_blank">Dryad</a> and <a href="http://figshare.com/" title="Figshare" target="_blank">Figshare</a> gives additional options and makes data sharing even easier. Structuring the data and storing necessary metadata to use it is essential to enable its reuse, and on top of providing the storage infrastructure the time and effort to carry this from data producers out needs to be credited. Data-publishing provides an incentive for data producers to make this effort, crediting early release of data with a citable publication, and allowing publication of the downstream analyses later.</p>
<p><a href="http://blogs.biomedcentral.com/gigablog/files/2013/05/group_3d_ffl_complex1.png"><img src="http://blogs.biomedcentral.com/gigablog/files/2013/05/group_3d_ffl_complex1-150x150.png" alt="" width="150" height="150" class="alignright size-thumbnail wp-image-586" /></a>Our <a href="http://dx.doi.org/10.1186/2047-217X-2-6" title="Data note link" target="_blank">recently published data note</a> by Chris Gorgolewski and colleagues is a great example of this, providing test-retest functional MRI <a href="http://dx.doi.org/10.5524/100051" title="Data for download" target="_blank">data</a> for motor, language and spatial attention functions for 10 de-identified patients. This allows validation of fMRI tasks used in pre-surgical planning for tumor resection, and provides a valuable resource for the development of new methods and algorithms. On top of the <em>GigaScience</em> datanote the authors have already published an <a href="http://dx.doi.org/10.1016/j.neuroimage.2012.10.085" title="fMRI analysis paper" target="_blank">analysis</a> using this, demonstrating that this model works. Overcoming some of the previous concerns about deposition of this type of data, all patient details have been deidentified, and the data is structured in the <a href="http://openfmri.org/" title="openfMRI homepage" target="_blank">openfMRI</a> manner. These authors are keen proponents of data-sharing, and have <a href="http://multiplecomparisons.blogspot.hk/2013/02/making-data-sharing-count.html" title="Making data sharing count blog" target="_blank">talked about the benefits</a> of this type of data-publishing in the past, so it is great to see them use <em><a href="http://www.gigasciencejournal.com/" title="GigaScience homepage" target="_blank">GigaScience</a></em> as a vehicle for this.</p>
<p>Data publishing is currently very topical, and coming on top of the many other recent schemes we have <a href="http://blogs.biomedcentral.com/gigablog/2012/01/23/data-citation-enters-the-year-of-the-dragon/" title="Data citation in the year of the dragon blog" target="_blank">written about</a> in the past, Nature Publishing Group have recently <a href="http://blogs.nature.com/scientificdata/2013/04/03/press-release-npg-to-launch-scientific-data-to-help-scientists-publish-and-reuse-research-data/" title="Scientific Data press release" target="_blank">announced</a> their own foray into the field, with the upcoming launch of <em><a href="http://blogs.nature.com/scientificdata/" title="Scientific Data Homepage" target="_blank">Scientific Data</a></em>. Due to launch in spring 2014, their “data descriptors” are very similar to our data note articles, and also use the interoperable <a href="http://isatab.sourceforge.net/" title="ISA-tab homepage" target="_blank">ISA-TAB</a> metadata format that a number of our submitters have utilized. It is flattering that two years after we started publishing data in this way that Nature will be following what we have done, and it encourages and validates our approach, spurring us to take more data. We would data producers to take advantage of the fact that all article and data <a href="http://www.biomedcentral.com/about/apcfaq" title="APC FAQ" target="_blank">processing charges</a> are currently covered by <a href="http://www.genomics.cn/en/index" title="BGI homepage" target="_blank">BGI</a> until the end of the year, and contact us if you are interested in publishing data notes with us or submit <a href="http://www.gigasciencejournal.com/manuscript" title="Submission system" target="_blank">here</a>. For more on our data notes also see <a href="http://www.gigasciencejournal.com/authors/instructions/datanote" title="Data note I4A" target="_blank">here</a>.</p>
<p>Image credits: Charlie Llewellin, <a href="http://www.flickr.com/photos/76913520@N00/6309457974" title="image link" target="_blank">Flickr</a>; Gorgolewski et al.</p>
<p><strong>Further Reading</strong><br />
<a href="http://dx.doi.org/10.1186/2047-217X-2-6" title="Data note link" target="_blank">1.</a> Gorgolewski KJ, Storkey A, Bastin ME, Whittle IR, Wardlaw JM, Pernet CR. A test-retest fMRI dataset for motor, language and spatial attention functions. <em>Gigascience</em>. 2013 <strong>2</strong>:6. <a href="http://dx.doi.org/10.1186/2047-217X-2-6" title="Data note link" target="_blank">http://dx.doi.org/10.1186/2047-217X-2-6.</a></p>
<p><a href="http://dx.doi.org/10.5524/100051" title="fMRI data" target="_blank">2.</a> Gorgolewski, KJ; Storkey, A; Bastin,ME; Whittle, IR; Wardlaw, JM; Pernet, CR (2013) A test-retest functional MRI dataset for motor, language and spatial attention functions. <em>GigaScience Database</em> <a href="http://dx.doi.org/10.5524/100051" title="fMRI data" target="_blank">http://dx.doi.org/10.5524/100051</a></p>
<p><a href="http://dx.doi.org/10.1016/j.neuroimage.2012.10.085" title="fMRI analysis paper" target="_blank">3.</a> Gorgolewski KJ, Storkey AJ, Bastin ME, Whittle I, Pernet C. <em>Neuroimage</em>. Single subject fMRI test-retest reliability metrics and confounding factors.  2013 <strong>69</strong>:231-43.<a href="http://dx.doi.org/10.1016/j.neuroimage.2012.10.085" title="fMRI analysis paper" target="_blank"> http://dx.doi.org/10.1016/j.neuroimage.2012.10.085</a></p>
<p><a href="http://dx.doi.org/10.1186/2047-217X-1-9" title="GIgaScience Neuroimaging commentary" target="_blank">4.</a> Breeze JL, Poline JB, Kennedy DN. Data sharing and publishing in the field of  neuroimaging. <em>Gigascience</em>. 2012 <strong>1</strong>:9. doi: <a href="http://dx.doi.org/10.1186/2047-217X-1-9" title="GIgaScience Neuroimaging commentary" target="_blank">http://dx.doi.org/10.1186/2047-217X-1-9</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.biomedcentral.com/gigablog/2013/05/09/the-difficulties-sharing-neuroscience-data-can-data-publishing-help/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	<enclosure type="image/png" length="127603" url="http://blogs.biomedcentral.com/gigablog/files/2013/05/group_3d_ffl_complex-150x150.png" />	</item>
		<item>
		<title>Giga-Galaxy moves into the metabolomics universe</title>
		<link>http://blogs.biomedcentral.com/gigablog/2013/05/03/giga-galaxy-moves-into-the-metabolomics-universe/</link>
		<comments>http://blogs.biomedcentral.com/gigablog/2013/05/03/giga-galaxy-moves-into-the-metabolomics-universe/#comments</comments>
		<pubDate>Fri, 03 May 2013 01:32:59 +0000</pubDate>
		<dc:creator>Peter Li</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[workflows galaxy metabolomics]]></category>

		<guid isPermaLink="false">http://blogs.biomedcentral.com/gigablog/?p=545</guid>
		<description><![CDATA[<p>&#160;</p>
<p>Sophisticated computational analyses must be performed on metabolomics data in order to measure the abundances of the metabolites. However, this typically requires expert knowledge in computer programming and biostatistics, restricting the usefulness of metabolomics to specialised laboratories. Thanks to funding from the UK’s <a title="NERC" href="http://www.nerc.ac.uk" target="_blank">Natural Environment Research Counci</a>l, this project will develop a software platform based on <a title="Galaxy project" href="http://galaxyproject.org" target="_blank">Galaxy</a> to make it much easier for non-specialist scientists to analyse their metabolomics datasets.</p>
<p>As the first metabolomics project in the recently announced <a href="http://www.genomics.cn/en/news/show_news?nid=99162" target="_blank">Joint BGI-University of Birmingham Environment &#38; Health Centre</a>, the funding will enable Rob Davidson, a post-doctoral researcher from <a href="http://www.biosciences-labs.bham.ac.uk/viant" target="_blank">Mark Viant’s research group</a> at the University’s School of Biosciences, to travel to Hong ...</p><p class="clearfix"><a class="btn alignright continue-reading" href="http://blogs.biomedcentral.com/gigablog/2013/05/03/giga-galaxy-moves-into-the-metabolomics-universe/">Read more</a>]]></description>
			<content:encoded><![CDATA[<p>&nbsp;</p>
<p>Sophisticated computational analyses must be performed on metabolomics data in order to measure the abundances of the metabolites. However, this typically requires expert knowledge in computer programming and biostatistics, restricting the usefulness of metabolomics to specialised laboratories. Thanks to funding from the UK’s <a title="NERC" href="http://www.nerc.ac.uk" target="_blank">Natural Environment Research Counci</a>l, this project will develop a software platform based on <a title="Galaxy project" href="http://galaxyproject.org" target="_blank">Galaxy</a> to make it much easier for non-specialist scientists to analyse their metabolomics datasets.</p>
<p>As the first metabolomics project in the recently announced <a href="http://www.genomics.cn/en/news/show_news?nid=99162" target="_blank">Joint BGI-University of Birmingham Environment &amp; Health Centre</a>, the funding will enable Rob Davidson, a post-doctoral researcher from <a href="http://www.biosciences-labs.bham.ac.uk/viant" target="_blank">Mark Viant’s research group</a> at the University’s School of Biosciences, to travel to Hong Kong and work with <em>GigaScience</em> in developing the popular Galaxy workflow system for use in metabolomics data analyses.</p>
<p><a href="http://blogs.biomedcentral.com/gigablog/files/2013/05/400px-Chicken_feet_The_Hague-200x3002.jpg"><img class="alignright size-thumbnail wp-image-568" src="http://blogs.biomedcentral.com/gigablog/files/2013/05/400px-Chicken_feet_The_Hague-200x3002-150x150.jpg" alt="" width="150" height="150" /></a>Rob arrived this week and will be working with us at BGI-Hong Kong for two months. Since Rob is a keen foodie, <em>GigaScience</em> is also looking forward to showing him the culinary delights of Hong Kong such as chickens feet dim sum stylee!</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.biomedcentral.com/gigablog/2013/05/03/giga-galaxy-moves-into-the-metabolomics-universe/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	<enclosure type="image/jpeg" length="50882" url="http://blogs.biomedcentral.com/gigablog/files/2013/05/400px-Chicken_feet_The_Hague-100x150.jpg" />	</item>
		<item>
		<title>Genetics and open access publishing meets Chilli Crab</title>
		<link>http://blogs.biomedcentral.com/gigablog/2013/04/17/genetics-and-open-access-publishing-meets-chilli-crab/</link>
		<comments>http://blogs.biomedcentral.com/gigablog/2013/04/17/genetics-and-open-access-publishing-meets-chilli-crab/#comments</comments>
		<pubDate>Wed, 17 Apr 2013 07:15:53 +0000</pubDate>
		<dc:creator>Scott Edmunds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[BGI]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[genetics]]></category>
		<category><![CDATA[genomics]]></category>
		<category><![CDATA[HGM-ICG]]></category>
		<category><![CDATA[Open Access]]></category>

		<guid isPermaLink="false">http://blogs.biomedcentral.com/gigablog/?p=525</guid>
		<description><![CDATA[<p><a href="http://blogs.biomedcentral.com/gigablog/files/2013/04/photo2.jpg"></a>Singapore &#8211; a wonderful city of diverse cultures and culinary affair, is not just about the infamous chilli crab or signature cocktail, Singapore Sling, but from the 13-19 April the vibrant country plays host to one of the largest meetings in genetics and genomics. The <a href="http://www.hgm2013-icg.org/" title="HGM-ICG meeting homepage" target="_blank">2013 Human Genome Meeting/International Congress of Genetics</a> (HGM/ICG) meeting is the first of its kind in Asia &#8211; bringing together a genetics congress that has been held every five years since 1899 with the young upstart genomics experts from HUGO (the <a href="http://hugo-international.org/index.php" title="HUGO homepage" target="_blank">Human Genome Organisation</a>), the 2013 meeting brought together over 1000 scientists from around the world at the Marina Bay Sands Convention Centre (a favourite ...</p><p class="clearfix"><a class="btn alignright continue-reading" href="http://blogs.biomedcentral.com/gigablog/2013/04/17/genetics-and-open-access-publishing-meets-chilli-crab/">Read more</a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://blogs.biomedcentral.com/gigablog/files/2013/04/photo2.jpg"><img src="http://blogs.biomedcentral.com/gigablog/files/2013/04/photo2-300x225.jpg" alt="" width="300" height="225" class="alignleft size-medium wp-image-531" /></a>Singapore &#8211; a wonderful city of diverse cultures and culinary affair, is not just about the infamous chilli crab or signature cocktail, Singapore Sling, but from the 13-19 April the vibrant country plays host to one of the largest meetings in genetics and genomics. The <a href="http://www.hgm2013-icg.org/" title="HGM-ICG meeting homepage" target="_blank">2013 Human Genome Meeting/International Congress of Genetics</a> (HGM/ICG) meeting is the first of its kind in Asia &#8211; bringing together a genetics congress that has been held every five years since 1899 with the young upstart genomics experts from HUGO (the <a href="http://hugo-international.org/index.php" title="HUGO homepage" target="_blank">Human Genome Organisation</a>), the 2013 meeting brought together over 1000 scientists from around the world at the Marina Bay Sands Convention Centre (a favourite location of ours since <a href="http://blogs.biomedcentral.com/gigablog/2012/06/22/the-genomics-view-from-the-57th-floor/" title="GigaBlog write-up on Bio-IT world Asia" target="_blank">last years Bio-IT World Asia meeting</a>).</p>
<p><a href="http://blogs.biomedcentral.com/gigablog/files/2013/04/EwanBirneyHGM1.jpg"><img src="http://blogs.biomedcentral.com/gigablog/files/2013/04/EwanBirneyHGM1-150x150.jpg" alt="" width="150" height="150" class="alignright size-thumbnail wp-image-538" /></a><em>Gigascience</em>&#8216;s Executive Editor, Scott Edmunds, and Commissioning Editor, Nicole Nogoy, attended the diverse meeting to learn the latest in the genetics and genomics world. Although having a heavy human focus, the meeting covered a wide range of topics from plant, animal and medical applications, as well as database and privacy. With a fantastic line-up spanning big science (workshops and talks from ENCODE and Ewan Birney, pictured) to monogenic gene-disorders, many of our editorial board were present, and our BGI colleagues also <a href="http://www.hgm2013-icg.org/special_feature.html?utm_term=More%20details#bgi" title="BGI workshop at HGM-ICG" target="_blank">hosted a workshop</a>. </p>
<p>A few highlights this week included Patrick Tan&#8217;s (from the Genome Institute of Singapore) presentation from a study of Aristolochia plants; these plants produce aristolochic acid (AA) – and is found in many Chinese medicine and herbal supplements commonly used to treat a variety of ailments from arthritis to menstrual cramps. AA is highly mutagenic and has been shown to cause mutations leading to kidney and urinary tract cancers. An increasing global burden in AA-related urinary tract cancers in China and Taiwan, as well as outbreaks in the USA and Europe, sparked Tan&#8217;s interest to study its potential relationship in other cancer types, and it was nice to see him re-analyse data produced by the Asian Cancer Research Group (ACRG) that is <a href="http://dx.doi.org/10.5524/100034" title="Cancer data" target="_blank">hosted in our GigaDB database</a>.</p>
<p>Dennis Lo from the Chinese University of Hong Kong presented his pioneering work in the field of non-invasive pre-implantation genetic diagnosis of trisomy 21, 13 and 18. His latest work focuses on twin pregnancies which present a new challenge due to mono- or dizygocity. They applied a targeting sequencing approach and found differences in allele expression between mono and dizygotic twins, useful in determining zygosity and noninvasive testing for genetic disorders.</p>
<p>Michael Hayden of the Translational Medicine Lab at A*STAR gave an interesting presentation on how genetics is helping drug discovery. His group are particularly interested in &#8216;<a href="http://en.wikipedia.org/wiki/Black_swan_theory" title="Black Swan Events wikipedia" target="_blank">black swan</a>&#8216; events &#8211; events that are completely unexpected and have extreme impact. Hayden emphasised we no longer have a sequencing issue, but a phenotype issue. He gave several nice examples of how target validation is tightly linked to a phenotype which led to drug discovery and clinical translation/treatment. An example &#8216;black swan&#8217; event was his work on sclerosteosis &#8211; an opposite phenotype to osteoporosis consisting of increased bone density; their analyses lead to the creation of a humanized monoclonal antibody to inhibit sclerostin for osteoporosis.</p>
<p>Mike Snyder (Stanford) gave an update on his &#8220;Synderome&#8221; personal genome work, iPOP (integrated personal omics profiling), and highlighted their genome phasing approach. Being a big fan of crowdsourcing and community annotation approaches (see our <a href="http://www.gigasciencejournal.com/content/2/1/2" title="OpenAshDB paper" target="_blank">OpenAshDB</a> and “<a href="http://www.gigasciencejournal.com/content/1/1/14" title="Puerto Rican Parrot genome paper" target="_blank">peoples parrot</a>” papers) &#8211; there is currently 50Tb of Snyder&#8217;s own data consisting of genomic (blood, saliva and maternal), transcriptome and protein array, as well as proteomic and metabolomic data, all available from his website: <a href="http://snyderome.stanford.edu/" title="Snyderome-home" target="_blank">snyderome.standford.edu</a>. Peer Bork also presented his <a href="http://microbes.eu/" title="my.microbes homepage" target="_blank">my.microbes</a> project to catalog, share and compare microbiomes between interested parties. His concept of people with specific microbiota enterotypes using social networks to discuss which brand of yoghurts are best for stomach upsets shows how important genomic advances are increasingly becoming part of our every day lives.</p>
<p>Nicole Nogoy, Commissioning Editor, <em>GigaScience</em></p>
<p><strong>Promoting open access publishing in Singapore</strong><br />
Tomorrow morning, <em>GigaScience</em> will be participating in an Open Access workshop held at A*Star&#8217;s Fusionopolis kindly organised by Kostas Repanas, in a bid to promote open access publishing and data sharing in Singapore. In addition to <em>GigaScience</em>, there will be presentations from <a href="http://www.biomedcentral.com/" title="BMC homepage" target="_blank">BioMed Central</a>, <a href="http://www.springeropen.com/" title="SpringerOpen" target="_blank">Springer Open</a>, and <a href="http://www.wileyopenaccess.com/view/index.html" title="Wiley Open Access" target="_blank">Wiley Open Access</a>, and if you are at A*STAR or in Singapore see the program below and please drop by:</p>
<p>Date: Thursday 18th April<br />
Time: 9:00-12:00<br />
Venue: Infuse, Level 14, Connexis South Tower, Fusionopolis<br />
1 Fusionopolis Way</p>
<p><strong>A*STAR Open Access (OA) Workshop</strong></p>
<p>9:00-9:20         Coffee/Tea</p>
<p>9:20-9:30         Opening remarks by Sir David Lane, A*STAR Chief Scientist</p>
<p>9:30-10:00       OA, open-data, open-source and open-review: GigaScience, and how licensing can change the way we do research</p>
<p>Dr Scott Edmunds, Executive Editor, <em>GigaScience</em> journal<br />
Dr Nicole Nogoy, Commissioning Editor, <em>GigaScience</em> journal</p>
<p>10:00-10:30     OA and the future of publishing</p>
<p>Mr Robert Long, Vice President and Sales Director, Asia Pacific, Wiley<br />
Mr Anthony Lau, Vice President, International Development, Asia, Wiley</p>
<p>10:30-10:50     Tea break</p>
<p>10:50- 11:10    OA publishing at BioMed Central: a pioneer’s perspective</p>
<p>Dr Nandita Quaderi, Publisher, BioMed Central<br />
Dr Rebecca Furlong, Editor, <em>Genome Medicine</em></p>
<p>11:10-11:30     OA repositories and Self-Archiving / SpringerOpen Journals</p>
<p>Mr Leo Cheung, Open Access Publishing Manager, BioMed Central<br />
Mr Bin Walters, SpringerOpen Journal Manager, Springer Asia</p>
<p>11:30-12:00     Open discussion/debate </p>
<p>Update 20/04/13: here are the slides from our talk:</p>
<div style="margin-bottom:5px"> <strong> <a href="http://www.slideshare.net/GigaScience/scott-edmunds-astar-open-access-workshop-how-licensing-can-change-the-way-we-do-research" title="Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research" target="_blank">Scott Edmunds A*STAR open access workshop: how licensing can change the way we do research</a> </strong> from <strong><a href="http://www.slideshare.net/GigaScience" target="_blank">GigaScience, BGI Hong Kong</a></strong> </div>
]]></content:encoded>
			<wfw:commentRss>http://blogs.biomedcentral.com/gigablog/2013/04/17/genetics-and-open-access-publishing-meets-chilli-crab/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	<enclosure type="image/jpeg" length="422027" url="http://blogs.biomedcentral.com/gigablog/files/2013/04/photo3-e1366182420516.jpg" />	</item>
		<item>
		<title>Call for papers for a special GCC2013 Galaxy series</title>
		<link>http://blogs.biomedcentral.com/gigablog/2013/04/05/call-for-papers-for-a-special-gcc2013-galaxy-series/</link>
		<comments>http://blogs.biomedcentral.com/gigablog/2013/04/05/call-for-papers-for-a-special-gcc2013-galaxy-series/#comments</comments>
		<pubDate>Fri, 05 Apr 2013 08:15:59 +0000</pubDate>
		<dc:creator>Scott Edmunds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[#usegalaxy]]></category>
		<category><![CDATA[APC]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[galaxy]]></category>
		<category><![CDATA[Giga-Galaxy]]></category>
		<category><![CDATA[papers]]></category>
		<category><![CDATA[workflows]]></category>

		<guid isPermaLink="false">http://blogs.biomedcentral.com/gigablog/?p=499</guid>
		<description><![CDATA[<p><br />
The <a href="http://wiki.galaxyproject.org/Events/GCC2013" title="GCC2013" target="_blank">2013 Galaxy Community Conference</a> (GCC2013) and <em><a href="http://www.gigasciencejournal.com/" title="GigaScience homepage" target="_blank">GigaScience</a></em> are <a href="http://wiki.galaxyproject.org/News/GigaScienceGalaxyCFP" title="GCC2013 CfP" target="_blank">today announcing a call for papers</a> for a special thematic focused series on studies utilizing large-scale datasets and workflows. <a href="http://usegalaxy.org/" title="Galaxy Homepage" target="_blank">Galaxy</a> is an open, web-based platform for data intensive biomedical research allowing their growing community of users to reproduce and share analyses. <a href="http://www.gigasciencejournal.com/" title="GigaScience homepage" target="_blank"><em>GigaScience</em></a>, with its aims to increase reproducibility and transparency of research has just launched its own <a href="http://galaxy.cbiit.cuhk.edu.hk/" title="Giga-Galaxy" target="_blank">Giga-Galaxy server</a>, enabling the hosting and implementation of Galaxy-based workflows and methods. To see examples of how this works, see the <a href="http://www.slideshare.net/GigaScience/scott-edmunds-flashtalk-slides-from-beyond-the-pdf2" title="GigaSlides, #BtPDF2" target="_blank">slides</a> from our flashtalk at the ...</p><p class="clearfix"><a class="btn alignright continue-reading" href="http://blogs.biomedcentral.com/gigablog/2013/04/05/call-for-papers-for-a-special-gcc2013-galaxy-series/">Read more</a>]]></description>
			<content:encoded><![CDATA[<p><img alt="" src="http://wiki.galaxyproject.org/Images/Logos?action=AttachFile&amp;do=get&amp;target=GCC2013Logo400.png" class="aligncenter" width="400" height="267" /><br />
The <a href="http://wiki.galaxyproject.org/Events/GCC2013" title="GCC2013" target="_blank">2013 Galaxy Community Conference</a> (GCC2013) and <em><a href="http://www.gigasciencejournal.com/" title="GigaScience homepage" target="_blank">GigaScience</a></em> are <a href="http://wiki.galaxyproject.org/News/GigaScienceGalaxyCFP" title="GCC2013 CfP" target="_blank">today announcing a call for papers</a> for a special thematic focused series on studies utilizing large-scale datasets and workflows. <a href="http://usegalaxy.org/" title="Galaxy Homepage" target="_blank">Galaxy</a> is an open, web-based platform for data intensive biomedical research allowing their growing community of users to reproduce and share analyses. <a href="http://www.gigasciencejournal.com/" title="GigaScience homepage" target="_blank"><em>GigaScience</em></a>, with its aims to increase reproducibility and transparency of research has just launched its own <a href="http://galaxy.cbiit.cuhk.edu.hk/" title="Giga-Galaxy" target="_blank">Giga-Galaxy server</a>, enabling the hosting and implementation of Galaxy-based workflows and methods. To see examples of how this works, see the <a href="http://www.slideshare.net/GigaScience/scott-edmunds-flashtalk-slides-from-beyond-the-pdf2" title="GigaSlides, #BtPDF2" target="_blank">slides</a> from our flashtalk at the recent <a href="http://www.force11.org/beyondthepdf2" title="#BtPDF2 homepage" target="_blank">Beyond the PDF2</a> meeting, and <a href="http://galaxy.cbiit.cuhk.edu.hk/galaxy/u/peter/p/soapdenovo2-tutorial-1" title="SOAPdenovo2 tutorial" target="_blank">examples of workflows</a> implemented from our recent <a href="http://www.gigasciencejournal.com/content/1/1/18" title="SOAPdenovo2 paper" target="_blank">SOAPdenovo2 paper</a>.  </p>
<p>All accepted oral presentations from the meeting will be eligible for consideration in the <em>GigaScience</em> series, and working with the <a href="http://wiki.galaxyproject.org/Events/GCC2013/Organizers#Scientific_Committee" title="GCC2013 scientific committee" target="_blank">GCC2013 scientific committee</a>, peer review will be coordinated, thorough and timely. <a href="http://www.genomics.cn/en/index" title="BGI homepage" target="_blank">BGI</a> has been generously covering the open-access <a href="http://www.biomedcentral.com/about/apcfaq" title="APC FAQ" target="_blank">article-processing charges</a> for the journal’s launch, and this offer will be extended to all submissions from the 2013 conference, which is being held on the 30th June-2nd July in Oslo.</p>
<p><a href="http://blogs.biomedcentral.com/gigablog/files/2013/04/bgi-galaxy-transparent-2.png"><img src="http://blogs.biomedcentral.com/gigablog/files/2013/04/bgi-galaxy-transparent-2-300x257.png" alt="" width="150" height="128" class="alignright size-medium wp-image-502" /></a>Covering the themes of the conference, discussion and research is considered highlighting best practice for <a href="http://wiki.galaxyproject.org/Admin/Get%20Galaxy" title="Get Galaxy page" target="_blank">local Galaxy installation</a>, management and use, as well as interesting tools, data sources, or novel uses of Galaxy. Addressing many of the goals of Galaxy to enable more accessible, reproducible, and transparent genomic science, submissions can utilize our novel format, where all of the workflows, tools and supporting data can be hosted and integrated into accepted papers using independently citable <a href="http://datacite.org/" title="DataCite homepage" target="_blank">DataCite</a> <a href="http://en.wikipedia.org/wiki/Digital_object_identifier" title="Digital Object Identifier" target="_blank">DOIs</a> from the journals <a href="http://galaxy.cbiit.cuhk.edu.hk/" title="Giga-Galaxy" target="_blank">Giga-Galaxy</a> server and <a href="http://gigadb.org/" title="GigaDB homepage" target="_blank">GigaDB</a> database. For more on <em>GigaScience</em>’s efforts to promote reproducibility see the <a href="http://blogs.biomedcentral.com/gigablog/2012/12/27/opening-peer-review-our-new-paper-on-soapdenovo2-shows-how-it-works/" title="GigaBlog on SOAPdenovo2 review &amp; transparency" target="_blank">recent blog post</a> on how this worked for some of our recent publications.</p>
<p>Please contact the <a href="mailto:gcc2013-sci@galaxyproject.org" title="gcc2013-sci@galaxyproject.org" target="_blank">conference organizers</a> or <a href="mailto:editorial@gigasciencejournal.com" title="editorial@gigasciencejournal.com" target="_blank"><em>GigaScience</em> editors</a> for further information, or <a href="http://www.gigasciencejournal.com/manuscript" title="Submit to GigaScience" target="_blank">submit a manuscript</a> or <a href="http://bit.ly/gcc2013abs" title="conference abstract submission" target="_blank">conference abstract</a>, mentioning you would like to be considered in the series. The deadline for consideration for oral presentations to the meeting is 12th April, but later submissions for exceptional poster presentations (deadline 3rd May) may also be considered. This series will remain open in a similar manner to our <a href="http://www.gigasciencejournal.com/series/GSC_and_beyond" title="GSC and Beyond series" target="_blank">Genomic Standards Consortium and beyond: best practice in genomics research</a> series, so related work utilizing Galaxy can also be continued to be added to the <a href="http://www.gigasciencejournal.com/series" title="GigaScience series pages" target="_blank">virtual issue</a>. Follow <a href="https://twitter.com/gigascience" title="@gigascience" target="_blank">@gigascience</a> and <a href="https://twitter.com/search?q=%23usegalaxy" title="#usegalaxy" target="_blank">#usegalaxy</a> on twitter, and the <a href="http://wiki.galaxyproject.org/News" title="Galaxy news page" target="_blank">Galaxy news page</a> for further updates and news.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.biomedcentral.com/gigablog/2013/04/05/call-for-papers-for-a-special-gcc2013-galaxy-series/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	<enclosure type="image/png" length="37369" url="http://blogs.biomedcentral.com/gigablog/files/2013/04/bgi-galaxy-transparent-2-150x128.png" />	</item>
		<item>
		<title>Q&amp;A with Xin Zhou, author of our insect “squishome” paper.</title>
		<link>http://blogs.biomedcentral.com/gigablog/2013/04/02/qa-with-xin-zhou-author-of-our-insect-squishome-paper/</link>
		<comments>http://blogs.biomedcentral.com/gigablog/2013/04/02/qa-with-xin-zhou-author-of-our-insect-squishome-paper/#comments</comments>
		<pubDate>Tue, 02 Apr 2013 09:45:41 +0000</pubDate>
		<dc:creator>Scott Edmunds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[biodiversity]]></category>
		<category><![CDATA[genomics]]></category>
		<category><![CDATA[GigaScience]]></category>
		<category><![CDATA[Insect]]></category>
		<category><![CDATA[pipelines]]></category>

		<guid isPermaLink="false">http://blogs.biomedcentral.com/gigablog/?p=479</guid>
		<description><![CDATA[<p><a href="http://blogs.biomedcentral.com/gigablog/files/2013/04/396093_122813224529754_274194545_n.jpg"></a>Following from our <a href="http://blogs.biomedcentral.com/gigablog/2013/03/27/new-in-gigascience-the-squishome/" title="Squishome GigaBlog posting" target="_blank">previous blog posting</a>, here we profile and interview Dr Xin Zhou, lead author of our recent “squishome” insect goo metabarcoding <a href="http://dx.doi.org/10.1186/2047-217X-2-4" title="Insect goo paper" target="_blank">paper</a>. This NGS (next generation sequencing)-based work has already generated a lot of interest (see <a href="http://www.wired.co.uk/news/archive/2013-03/27/insect-dna" title="Wired squishome write-up" target="_blank">this write-up</a> in Wired and this <a href="http://dna-barcoding.blogspot.hk/2013/03/ultra-deep-sequencing.html?" title="DNA metabarcoding blog" target="_blank">blog posting</a> for nice examples), and here Dr Zhou gives more insight into the potential for the technique in studying biodiversity, as well as some of the quirky findings his team made validating the technique behind their laboratory in China. </p>
<p>Dr Xin Zhou is Director of the Environmental Genomics research group at <a href="http://www.genomics.cn/en/index" title="BGI ...</p><p class="clearfix"><a class="btn alignright continue-reading" href="http://blogs.biomedcentral.com/gigablog/2013/04/02/qa-with-xin-zhou-author-of-our-insect-squishome-paper/">Read more</a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://blogs.biomedcentral.com/gigablog/files/2013/04/396093_122813224529754_274194545_n.jpg"><img src="http://blogs.biomedcentral.com/gigablog/files/2013/04/396093_122813224529754_274194545_n-300x225.jpg" alt="" width="300" height="225" class="alignleft size-medium wp-image-484" /></a>Following from our <a href="http://blogs.biomedcentral.com/gigablog/2013/03/27/new-in-gigascience-the-squishome/" title="Squishome GigaBlog posting" target="_blank">previous blog posting</a>, here we profile and interview Dr Xin Zhou, lead author of our recent “squishome” insect goo metabarcoding <a href="http://dx.doi.org/10.1186/2047-217X-2-4" title="Insect goo paper" target="_blank">paper</a>. This NGS (next generation sequencing)-based work has already generated a lot of interest (see <a href="http://www.wired.co.uk/news/archive/2013-03/27/insect-dna" title="Wired squishome write-up" target="_blank">this write-up</a> in Wired and this <a href="http://dna-barcoding.blogspot.hk/2013/03/ultra-deep-sequencing.html?" title="DNA metabarcoding blog" target="_blank">blog posting</a> for nice examples), and here Dr Zhou gives more insight into the potential for the technique in studying biodiversity, as well as some of the quirky findings his team made validating the technique behind their laboratory in China. </p>
<p>Dr Xin Zhou is Director of the Environmental Genomics research group at <a href="http://www.genomics.cn/en/index" title="BGI Shenzhen website" target="_blank">BGI</a>, and Director of the Bio-resource Bank of the <a href="http://www.nationalgenebank.org/en/index.html" title="China National Genebank homepage" target="_blank">China National GeneBank</a>. A biodiversity expert and Entomologist by training, Dr Zhou carried out his postgraduate studies and postdoctoral training at Rutgers University and University of Guelph, managing barcoding projects for the <a href="http://www.ibol.org/" title="iBOL database" target="_blank">International Barcode of Life</a>. From assembling and curating barcode reference libraries for a number of aquatic insect groups, his work has moved to the development of sequencing based analytical pipelines for bulk insect samples at Guelph, and since moving to China in October 2010, at BGI.</p>
<p><strong>How is this method of PCR-free metabarcoding an improvement on previous techniques? </strong></p>
<p>XZ: In PCR-based metabarcoding approaches, various primer sets are used to amplify target DNA fragments, which almost always introduce taxonomic biases, such that some organisms are easier to be detected while others are constantly missing or under-represented. This artificial bias poses a serious problem to all biodiversity studies where species composition is of a concern. Our work is the first of its kind that shows analyzing natural biodiversity samples doesn’t have to rely on PCR therefore bypassing the primer issue. In addition, our work demonstrates that the PCR-free pipeline may potentially reveal species abundance from the mixed arthropod sample, providing yet another piece of crucial information to ecologists alike.</p>
<p><strong>What does this NGS-based technology bring to it over just manually barcoding the collected samples?</strong></p>
<p>XZ: Significantly reduced time and labor in sample processing as well as overall cost to analyze bulk samples. </p>
<p><strong>What can you actually do with this the data, and what potential new applications does this technique enable? If you can say, what are you planning to do with this technique next?</strong></p>
<p>XZ: This paper is a proof of concept demonstrating that natural bulk samples can be analyzed using NGS without having to relying on PCR amplification. This is the first step towards empirical applications of the new methodology in ecological and biodiversity related researches. We demonstrate that this new pipeline CAN work while a few technical issues can be improved for its wide implementation, such as in mitochondrial enrichment and tissue preservation. While trying to improve these technical details, we plan on increasing diversity scales of arthropod samples by analyzing those collected in tropical regions and arrays of insect samples collected from real-world ecological sampling designs.</p>
<p><a href="http://blogs.biomedcentral.com/gigablog/files/2013/04/IMG_0100_111.jpg"><img src="http://blogs.biomedcentral.com/gigablog/files/2013/04/IMG_0100_111-300x200.jpg" alt="" width="300" height="200" class="alignright size-medium wp-image-483" /></a><strong>Why did you collect the specimens you validated it with behind your lab at BGI [based in the large urban city of Shenzhen], and were you surprised by the number of specimens you detected?</strong></p>
<p>XZ: This was an advantage working in a subtropical region where biological samples are relatively easy to obtain. Although the sampling was not comprehensive in terms of intensity of traps and number of species, we were surprised to see what we managed to collect in the middle of a community township. The 2 sampling sites were very close to each other yet there were merely ~10% of the total species being shared between them. Also, the fact that only very few of our barcoded specimens received a sequence match from the <a href="http://www.boldsystems.org/" title="iBOL database">Barcode of Life Data Systems</a>, the world’s largest barcode reference database, suggests that much of China’s arthropod fauna still remains as a mystery, at least from a molecular aspect. On top of all that, we thought it would be an interesting idea to present BGI’s headquarters in a scientific publication for the first time, with its GPS coordinates recorded in a meta-database.</p>
<p><strong>Does this example say anything useful about the biodiversity in Shenzhen and the area around the BGI HQ?</strong></p>
<p>XZ: as stated above, although the 2 arthropod bulk samples represent typical fauna of a secondary forestry ecosystem in Southern China, the overlap between samples was minimum and much of the community was poorly understood both morphologically and molecularly. We believe there is an urgent need to improve our knowledge on China’s arthropod fauna. And we will start from where we live. We have a plan to barcode and metabarcode insects and plants of the Shenzhen municipal area.</p>
<p><strong>In the paper it was interesting you found a novel COI [<a href="http://en.wikipedia.org/wiki/DNA_barcoding#Mitochondrial_DNA" title="DNA barcoding wikipedia page" target="_blank">cytochrome oxidase subunit I</a> – a common mitochondrial marker used in barcoding studies] from a <a href="http://en.wikipedia.org/wiki/Lepidoptera" title="Lepidoptera (butterflies and moths) species wikipedia" target="_blank">Lepidoptera</a> species not found in the reference library. Can you say a little more on this example, and is this a potentially new species?</strong></p>
<p>XZ: Based on the quality of the nucleotide sequences and overall coverage of the novel barcode, we tend to believe that this is a real taxon that was somehow not detected in our morphological and barcode examinations. However, given the protocols used in this work, it is not possible to identify the exact source of this novel COI sequence. We listed a few potential possible reasons, including gut content, small residual tissues in the bulk sample, extracellular DNA etc. This novel sequence doesn’t get a sequence match in any existing barcode databases. But this is not a big surprise as we know that Chinese insect species are not well-sequenced. The ultra-deep sequencing capacity of the NGS platforms opens up a new prospective where we are now capable of revealing diversity of the even-smaller-things-that-run-the-world via detecting their molecules. This would not have been possible if we had to rely on the visual cues of these organisms. In some sense, the contribution of NGS technology to biodiversity research is equivalent to what microscopes did to microbiology.</p>
<p><strong>On that subject, what potential does this technique have to help discover new species, and how much can you actually tell about them using it?</strong></p>
<p>XZ: NGS technology creates an alternative way to analyze biodiversity pattern and its temporal and spatial variations by detecting molecular or genomic heterogeneity (<a href="http://xyala.cap.ed.ac.uk/research/barcoding/motu_defined.html" title="MOTUs defined" target="_blank">MOTUs</a>) in bulk environmental samples. However, to make sense of these molecular operational units, one would have to compare this sequence information to well-curated sequence databases that are tied to conventional biological species concepts. A good example of these databases is the <a href="http://www.barcodinglife.com/" title="iBOL database" target="_blank">Barcode of Life Data Systems</a>, where millions of barcode sequences are linked to voucher specimens. My feeling is that the construction of sequence reference databases will remain critical in future molecular/genomic biodiversity research as it is a crucial step to provide linkages to the classic school of organismal science. However, NGS cataloging of world biodiversity can be performed in parallel. As long as meta-data are maintained for the bulk samples, biodiversity can be registered as MOTUs at first in a much accelerated fashion, and then be compared against existing reference databases available at the time. Known and (potentially) new species can be gradually revealed during this procedure. As biodiversity registration can be significantly accelerated using NGS, understanding biodiversity and especially interactions among species will be a long-term endeavor. </p>
<p><strong>What are the implications for this technique in the growth of data taxa in the databases? As it is more high-throughput does it have the potential to massively increase the number of new entries?</strong></p>
<p>XZ: This <a href="http://en.wikipedia.org/wiki/PCR" title="PCR: polymerase chain reaction" target="_blank">PCR</a>-free approach can produce more accurate result in terms of species composition for bulk biological samples. </p>
<p>In terms of impact in increasing data entries in the databases, I believe this will be the future trend in biodiversity genomics. As the emergence of new technologies and rapid reduction in costs, the research community will be able to analyze much more biological samples in much shortened periods of time. The outcome will be an improved understanding of biodiversity changes based on consistent and standardized analysis procedures and intensified sampling (in terms of numbers of sampling sites across space and time and specimen numbers).</p>
<p><strong>Is there anything else that you want to tell me about the technique?</strong></p>
<p>XZ: the new PCR-free pipeline we created in this paper has further potentials in terms of construction reference genomes, such as mitochondrial and chloroplast genomes, in a much more economically efficient way. Based on our findings in the present work, much of the other mitochondrial genes of most of the insect species from the mixed sample can also be assembled with a decent <a href="http://en.wikipedia.org/wiki/N50_statistic" title="N50 statistic in sequencing" target="_blank">N50</a> value. For instance, the largest scaffold we managed to assemble from the insect soup was a moth representing almost the entire length of its mitochondrial genome. This means that with some tweak of the current pipeline, we would be able to sequence and assemble small genomes for many different species in one shot. Having a comprehensive reference library for mitochondrial genomes can solve many of the difficult questions faced in the classic barcoding community, such as primer designs for the standard barcode region for difficult groups, e.g., <a href="http://en.wikipedia.org/wiki/Hymenoptera" title="Hymenoptera:  sawflies, wasps, bees and ants" target="_blank">Hymenoptera</a>. Also this potential opens up the door to expanding classic barcoding methods from the current single-molecule approach to genomic screening.</p>
<p><strong>Further reading:</strong><br />
<a href="http://dx.doi.org/10.1186/2047-217X-2-4" title="Insect goo paper" target="_blank">1.</a> Zhou X; et al., Ultra-deep sequencing enables high-fidelity recovery of biodiversity for bulk arthropod samples without PCR amplification <em>GigaScience</em> 2013 <strong>2</strong>:4 <a href="http://dx.doi.org/10.1186/2047-217X-2-4" title="Insect goo paper" target="_blank">http://dx.doi.org/10.1186/2047-217X-2-4</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.biomedcentral.com/gigablog/2013/04/02/qa-with-xin-zhou-author-of-our-insect-squishome-paper/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	<enclosure type="image/jpeg" length="1518682" url="http://blogs.biomedcentral.com/gigablog/files/2013/04/IMG_0100_11-150x100.jpg" />	</item>
		<item>
		<title>New in GigaScience: the Squishome</title>
		<link>http://blogs.biomedcentral.com/gigablog/2013/03/27/new-in-gigascience-the-squishome/</link>
		<comments>http://blogs.biomedcentral.com/gigablog/2013/03/27/new-in-gigascience-the-squishome/#comments</comments>
		<pubDate>Wed, 27 Mar 2013 05:33:57 +0000</pubDate>
		<dc:creator>Scott Edmunds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[biodiversity]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[DOI]]></category>
		<category><![CDATA[genomics]]></category>
		<category><![CDATA[insects]]></category>
		<category><![CDATA[metagenomics]]></category>
		<category><![CDATA[open data]]></category>
		<category><![CDATA[workflows]]></category>

		<guid isPermaLink="false">http://blogs.biomedcentral.com/gigablog/?p=453</guid>
		<description><![CDATA[<p><a href="http://blogs.biomedcentral.com/gigablog/files/2013/03/IMG_0110_1_Insects1.jpg"></a><strong>Insect goo aids biodiversity research</strong><br />
Apologies to Jonathan Eisen (see <a href="http://www.gigasciencejournal.com/content/1/1/6" title="Badomics in GigaScience" target="_blank">Badomics</a> in the journal), but today in <em>GigaScience</em> we <a href="http://dx.doi.org/10.1186/2047-217X-2-4" title="Squishomics paper" target="_blank">publish a new “squishomics” approach</a> for assessing and understanding biodiversity, using the slightly wacky sounding method of combining DNA-soup made from crushed-up insects and the latest sequencing technology. This bulk-collected insect goo has the potential to rapidly and cheaply reveal the diversity and make-up of both known and unknown species collected in a particular time and place.</p>
<p>Creepy crawlies are important indicators of diversity, as arthropods make up <a href="http://membracid.wordpress.com/2013/03/22/planet-of-the-arthropods/" title="Bug Girl blog post" target="_blank">80% of described species</a>, and with an estimated only <a href="http://ngm.nationalgeographic.com/2013/04/explore/seeking-new-species?" title="Nat Geo blog on discovery of new ...</p><p class="clearfix"><a class="btn alignright continue-reading" href="http://blogs.biomedcentral.com/gigablog/2013/03/27/new-in-gigascience-the-squishome/">Read more</a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://blogs.biomedcentral.com/gigablog/files/2013/03/IMG_0110_1_Insects1.jpg"><img src="http://blogs.biomedcentral.com/gigablog/files/2013/03/IMG_0110_1_Insects1-300x200.jpg" alt="" width="300" height="200" class="alignleft size-medium wp-image-461" /></a><strong>Insect goo aids biodiversity research</strong><br />
Apologies to Jonathan Eisen (see <a href="http://www.gigasciencejournal.com/content/1/1/6" title="Badomics in GigaScience" target="_blank">Badomics</a> in the journal), but today in <em>GigaScience</em> we <a href="http://dx.doi.org/10.1186/2047-217X-2-4" title="Squishomics paper" target="_blank">publish a new “squishomics” approach</a> for assessing and understanding biodiversity, using the slightly wacky sounding method of combining DNA-soup made from crushed-up insects and the latest sequencing technology. This bulk-collected insect goo has the potential to rapidly and cheaply reveal the diversity and make-up of both known and unknown species collected in a particular time and place.</p>
<p>Creepy crawlies are important indicators of diversity, as arthropods make up <a href="http://membracid.wordpress.com/2013/03/22/planet-of-the-arthropods/" title="Bug Girl blog post" target="_blank">80% of described species</a>, and with an estimated only <a href="http://ngm.nationalgeographic.com/2013/04/explore/seeking-new-species?" title="Nat Geo blog on discovery of new species" target="_blank">20% of insects characterized</a> to date. The new method devised by Xin Zhou and colleagues at <a href="http://www.genomics.cn/en/index" title="BGI homepage" target="_blank">BGI</a>, is a more accurate and quantitative version of a new biodiversity analysis technique called metabarcoding. Doing some initial validation on the analyses has already revealed how diverse and poorly characterized insect communities (or diversity) can be, even from two small sites within the researchers’ own backyard— literally. </p>
<p>Combining <a href="http://en.wikipedia.org/wiki/DNA_barcoding" title="DNA barcoding wikipedia page" target="_blank">DNA barcoding</a>, which utilizes a standard gene fragment for species identification with next generation sequencing technologies; previous metabarcoding methods, however, have required a step to amplify the amount of DNA collected that uses <a href="http://en.wikipedia.org/wiki/PCR" title="Polymerase Chain Reaction" target="_blank">PCR</a>. This step can introduce problematic errors into the analysis. The authors of this study have found a way to carry out this method without this step, giving it the potential to be more accurate. In addition to assessing species diversity, it also allows the researcher to determine the total quantity of mitochondrial DNA present for each species, making it possible to reveal relative abundance and biomass of each species. Allowing more consistent and rapid sampling, this may simplify the study of changes in biodiversity over space and time and transform the way we study ecosystems. </p>
<p><a href="http://blogs.biomedcentral.com/gigablog/files/2013/03/IMG_2388_hillside.jpg"><img src="http://blogs.biomedcentral.com/gigablog/files/2013/03/IMG_2388_hillside-300x225.jpg" alt="" width="300" height="225" class="alignright size-medium wp-image-464" /></a><strong>What really lies at the bottom of your garden</strong><br />
Testing the technique on species collected on a hillside behind their laboratory, the authors were very surprised by what they managed to find in their own neighborhood. <a href="http://www.genomics.cn/en/index" title="BGI Shenzhen website" target="_blank">BGI</a>, the world’s largest genomics organization, is situated on the edge of Shenzhen, a city of 12 million people in the densely urbanized Pearl River delta. Setting up two traps close to each not only revealed how much diversity there was, but also detected species not currently present in online databases. The findings demonstrated how little is known about insect diversity in China, and by opening up the ability to carry out these types of systematic and high-throughput analyses — enabling it to be tested if this is the case everywhere else in the world. Unfortunately this hillside is currently being leveled and built upon for new building projects as the relentless urban and industrial development in China continues (see picture), so this surprisingly rich environment is not likely to remain as diverse for much longer. </p>
<p>Of the study, Dr. Zhou said: “The 2 sampling sites were very close to each other, yet there were only around 10% of the total species being shared between them. The fact that only very few of our barcoded specimens received a sequence match from the <a href="http://www.boldsystems.org/" title="BOLD database" target="_blank">Barcode of Life Data Systems</a>, the world’s largest barcode reference database, suggests that much of China’s arthropod fauna still remains as a mystery, at least from a molecular aspect.” With the ability to detect and discover tiny organisms, stomach contents and partial samples without the usual visual cues, he also adds, “In some sense, the contribution of NGS technology to biodiversity research is equivalent to what microscopes did to microbiology.”</p>
<p><strong>Open Science: the best way of proving insect data isn&#8217;t &#8220;buggy&#8221;</strong><br />
Following from our recent <a href="http://www.gigasciencejournal.com/content/1/1/18" title="SOAPdenovo2 paper" target="_blank">SOAPdenovo2 paper</a>, this study is the second time we have integrated into the paper separate DOIs for making all of the supporting <a href="http://dx.doi.org/10.5524/100045" title="Data DOI" target="_blank">data</a> and <a href="http://dx.doi.org/10.5524/100046" title="software DOI" target="_blank">pipelines</a> available. Hosted in our <a href="http://gigadb.org/" title="GigaDB homepage" target="_blank">GigaDB</a> database, the ability to independently cite these rewards the authors for making them available, and also boosts the transparency, reproducibility and utility of this work. As the pipeline is adapted from the open-source <a href="http://soapdenovo2.sourceforge.net/" title="SOAPdenovo2 sourceforge page" target="_blank">SOAPdenovo2</a> application all of the code has to be released under a similar license, and we have hosted it in our <a href="https://github.com/gigascience/papers/tree/master/zhou2013" title="Giga Github page for the pipeline" target="_blank">GitHub repository</a>. To further boost the utility, the authors have worked hard to follow best practices for metadata laid out by the <a href="http://gensc.org/" title="Genomic Standards Consortium homepage" target="_blank">Genomic Standards Consortium (GSC)</a>. Contextual information is essential for environmental studies such as this, and while there are currently not modules for this new data type, the authors have built upon the GSC <a href="http://gensc.org/gc_wiki/index.php/MIGS/MIMS" title="MIMS checklist" target="_blank">MIMs</a> checklist and provided all of the information they thought relevant as a starting point for building new standards. With a transparent and open-review process (see the pre-publication history <a href="http://www.gigasciencejournal.com/content/2/1/4/prepub" title="Open review squishome history" target="_blank">here</a>), and with work currently underway to implement the workflows in our <a href="http://galaxy.cbiit.cuhk.edu.hk/" title="Giga-Galaxy" target="_blank">Giga-Galaxy platform</a>, we hope that this paper presents another good example of what we are attempting to do at <em>GigaScience</em> to our papers more transparent and reproducible. As a really exciting study potentially opening up huge new areas of research, we hope this additional work pays dividends allowing future users to recreate and adopt the technique much quicker and easier.</p>
<p><strong>Further reading:</strong><br />
<a href="http://dx.doi.org/10.1186/2047-217X-2-4" title="Insect goo paper" target="_blank">1.</a> Zhou X; et al., Ultra-deep sequencing enables high-fidelity recovery of biodiversity for bulk arthropod samples without PCR amplification <em>GigaScience</em> 2013 <strong>2</strong>:4 http://dx.doi.org/10.1186/2047-217X-2-4</p>
<p><a href="http://dx.doi.org/10.5524/100045" title="Data DOI" target="_blank">2.</a> Zhou, X; Li, Y; Liu, S; Yang, Q; Su, X; Zhou, L; Tang, M; Fu, R; Li, J (2013): Raw data, assembly and annotation results for: “Ultra-deep sequencing enables high-fidelity recovery of biodiversity for bulk arthropod samples without PCR amplification”. GigaScience Database <a href="http://dx.doi.org/10.5524/100045" title="Data DOI" target="_blank">http://dx.doi.org/10.5524/100045</a></p>
<p><a href="http://dx.doi.org/10.5524/100046" title="software DOI" target="_blank">3.</a> Zhou, X; Li, Y; Liu, S; Yang, Q; Su, X; Zhou, L; Tang, M; Fu, R; Li, J; Huang, Q (2013): Software and supporting material for: “Ultra-deep sequencing enables high-fidelity recovery of biodiversity for bulk arthropod samples without PCR amplification”. GigaScience Database <a href="http://dx.doi.org/10.5524/100046" title="software DOI" target="_blank">http://dx.doi.org/10.5524/100046</a></p>
<p>UPDATE 2/4/13: check out the related Q&amp;A with the lead author in a <a href="http://blogs.biomedcentral.com/gigablog/2013/04/02/qa-with-xin-zhou-author-of-our-insect-squishome-paper/" title="Q&amp;A blog" target="_blank">follow-up posting</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.biomedcentral.com/gigablog/2013/03/27/new-in-gigascience-the-squishome/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	<enclosure type="image/jpeg" length="616542" url="http://blogs.biomedcentral.com/gigablog/files/2013/03/IMG_0110_1_Insects-150x100.jpg" />	</item>
		<item>
		<title>Tweenome on Film: Excellent Video on Crowdsourcing Killer Outbreaks</title>
		<link>http://blogs.biomedcentral.com/gigablog/2013/02/21/tweenome-on-film-excellent-video-on-crowdsourcing-killer-outbreaks/</link>
		<comments>http://blogs.biomedcentral.com/gigablog/2013/02/21/tweenome-on-film-excellent-video-on-crowdsourcing-killer-outbreaks/#comments</comments>
		<pubDate>Thu, 21 Feb 2013 10:07:19 +0000</pubDate>
		<dc:creator>Scott Edmunds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Ash Dieback]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[Disease]]></category>
		<category><![CDATA[E. coli]]></category>
		<category><![CDATA[open-science]]></category>

		<guid isPermaLink="false">http://blogs.biomedcentral.com/gigablog/?p=431</guid>
		<description><![CDATA[<p>The <a href="http://www.bbsrc.ac.uk/home/home.aspx" title="BBSRC homepage" target="_blank">BBSRC</a> has just released an excellent <a href="http://youtu.be/ttMnQIE-P-s" title="Crowdsourcing video" target="_blank">video</a> and <a href="http://www.bbsrc.ac.uk/news/food-security/2013/130218-f-crowdsourcing-killer-outbreaks.aspx" title="BBSRC crowdsourcing write-up" target="_blank">article</a> on crowdsourcing killer disease outbreaks very relevant to our <a href="http://www.gigasciencejournal.com/content/2/1/2" title="OpenAshDB paper" target="_blank">recent commentary</a> and <a href="http://blogs.biomedcentral.com/gigablog/2013/02/13/open-science-versus-ash-dieback-and-the-tweenome-revisited/" title="Tweenome revisited blog" target="_blank">blog postings</a> on the <a href="http://oadb.tsl.ac.uk/" title="OpenAshDB website" target="_blank">OpenAshDB</a> (the Ash Dieback disease crowdsourcing) project. Featuring interviews from Nick Loman and Lisa Crossman (also an author on our <a href="http://www.gigasciencejournal.com/content/2/1/2" title="OpenAshDB paper" target="_blank">OpenAshDB paper</a>), key contributors to the 2011 <em>E. coli</em> O104:H4 outbreak genome crowdsourcing effort, it gives a very good overview of the how our initial release of public domain genomic <a href="http://dx.doi.org/10.5524/100001" title="E. coli DOI" target="_blank">data</a> via twitter helped kick-start a burst of crowd-sourced, ...</p><p class="clearfix"><a class="btn alignright continue-reading" href="http://blogs.biomedcentral.com/gigablog/2013/02/21/tweenome-on-film-excellent-video-on-crowdsourcing-killer-outbreaks/">Read more</a>]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://www.bbsrc.ac.uk/home/home.aspx" title="BBSRC homepage" target="_blank">BBSRC</a> has just released an excellent <a href="http://youtu.be/ttMnQIE-P-s" title="Crowdsourcing video" target="_blank">video</a> and <a href="http://www.bbsrc.ac.uk/news/food-security/2013/130218-f-crowdsourcing-killer-outbreaks.aspx" title="BBSRC crowdsourcing write-up" target="_blank">article</a> on crowdsourcing killer disease outbreaks very relevant to our <a href="http://www.gigasciencejournal.com/content/2/1/2" title="OpenAshDB paper" target="_blank">recent commentary</a> and <a href="http://blogs.biomedcentral.com/gigablog/2013/02/13/open-science-versus-ash-dieback-and-the-tweenome-revisited/" title="Tweenome revisited blog" target="_blank">blog postings</a> on the <a href="http://oadb.tsl.ac.uk/" title="OpenAshDB website" target="_blank">OpenAshDB</a> (the Ash Dieback disease crowdsourcing) project. Featuring interviews from Nick Loman and Lisa Crossman (also an author on our <a href="http://www.gigasciencejournal.com/content/2/1/2" title="OpenAshDB paper" target="_blank">OpenAshDB paper</a>), key contributors to the 2011 <em>E. coli</em> O104:H4 outbreak genome crowdsourcing effort, it gives a very good overview of the how our initial release of public domain genomic <a href="http://dx.doi.org/10.5524/100001" title="E. coli DOI" target="_blank">data</a> via twitter helped kick-start a burst of crowd-sourced, curiosity-driven analyses around the world aimed at understanding and fighting the outbreak.  </p>
<p><iframe width="500" height="281" src="http://www.youtube.com/embed/ttMnQIE-P-s?feature=oembed" frameborder="0" allowfullscreen></iframe></p>
<p>We have written a lot about this twitter driven <a href="http://blogs.biomedcentral.com/gigablog/2011/08/03/notes-from-an-e-coli-tweenome-lessons-learned-from-our-first-data-doi/" title="Notes from a Tweenome blog" target="_blank">“tweenome” analysis</a> in the past, but this <a href="http://www.bbsrc.ac.uk/news/food-security/2013/130218-f-crowdsourcing-killer-outbreaks.aspx" title="BBSRC crowdsourcing write-up" target="_blank">video and article</a> tries to learn lessons from the <em>E. coli</em> project and use these experiences for the related effort to tackle the devastating spread of Ash Dieback disease in Europe. Since the publication of the <a href="http://www.gigasciencejournal.com/content/2/1/2" title="OpenAshDB paper" target="_blank">OpenAshDB paper</a> (currently our <a href="http://www.gigasciencejournal.com/mostviewed" title="Most viewed articles" target="_blank">most viewed</a> article this month), further analyses and data continues to be added to the <a href="https://github.com/ash-dieback-crowdsource" title="AshDieback GitHub" target="_blank">GitHub based repository</a>, including <a href="https://github.com/ash-dieback-crowdsource/data/tree/master/ash_dieback/chalara_fraxinea/geospatial_data" title="Geospatial data in GitHub" target="_blank">geospatial data</a> and new <a href="https://github.com/ash-dieback-crowdsource/data/tree/master/ash_dieback/chalara_fraxinea/Kenninghall_wood_KW1/annotations/Secretome_prediction" title="Secretome predictions folder in GitHub" target="_blank">secretome predictions</a>.  The BBSRC is taking a big interest in the potential of this method of potentially speeding up research practices, with the <a href="http://www.bbsrc.ac.uk/funding/opportunities/2013/crowd-sourcing-biological-sciences.aspx" title="BBSRC crowdsourcing grant" target="_blank">recent announcement</a> of funding for up to £2M proposals to develop and deploy crowdsourcing approaches to complex, large-scale scientific problems. Further interesting viewing on this subject include Mark Pallen’s <a href="http://youtu.be/HyN2BZPItrg" title="Mark Pallen video" target="_blank">excellent talks</a> on open-source genomics, and the brilliant <a href="http://youtu.be/LmAugMSJ1-Y" title="Jennifer Gardy on 21st century public health" target="_blank">TEDx talk</a> by Jennifer Gardy on 21st century public health. This talk was extremely prescient in 2009, and highlighted early efforts using <a href="http://tree.bio.ed.ac.uk/groups/influenza/" title="H1N1 wiki" target="_blank">wiki based resources</a> for tackling that years H1N1 influenza pandemic. </p>
<p><iframe width="500" height="375" src="http://www.youtube.com/embed/LmAugMSJ1-Y?feature=oembed" frameborder="0" allowfullscreen></iframe></p>
<p>As the cost of and speed of sequencing continues to drop, these early examples hopefully will inspire and encourage what our editorial board member Mike Schatz terms in another recent <em>GigaScience</em> commentary: the rise of a <a href="http://www.gigasciencejournal.com/content/1/1/4" title="Rise of the Digital Immune System" target="_blank">“digital immune system”</a>, by observing the microbial landscape, detecting potential threats, and neutralizing them before they spread beyond control. The ultimate aim of these projects is to shape the way we tackle future infectious disease outbreaks, speeding up the response to such an extent we potentially stop these outbreaks before they even happen. These are laudable goals and a fantastic example of what can be enabled by open-science, and <a href="http://www.gigasciencejournal.com/" target="_blank"><em>GigaScience</em></a> will continue to do what we can to help promote and encourage such schemes in the future. </p>
<p><strong>Further Reading</strong><br />
<a href="http://www.gigasciencejournal.com/content/2/1/2" title="OpenAshDB paper" target="_blank">1.</a> MacLean, D; et al., Crowdsourcing genomic analyses of ash and ash dieback — power to the people. <em>GigaScience</em> 2013, <strong>2</strong>:2<br />
<a href="http://oadb.tsl.ac.uk/" title="OpenAshDB website" target="_blank">2.</a> OpenAshDB Website: <a href="http://oadb.tsl.ac.uk/" title="OpenAshDB website" target="_blank">http://oadb.tsl.ac.uk/</a><br />
<a href="http://www.gigasciencejournal.com/content/1/1/4" title="Rise of the Virtual Immune System" target="_blank">3.</a> Schatz, MC &amp; Phillippy, AM The rise of a digital immune system. <em>GigaScience</em> 2012, <strong>1</strong>:4 </p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.biomedcentral.com/gigablog/2013/02/21/tweenome-on-film-excellent-video-on-crowdsourcing-killer-outbreaks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	<enclosure type="image/png" length="675703" url="http://blogs.biomedcentral.com/gigablog/files/2013/02/Diverse_e_Coli-150x108.png" />	</item>
		<item>
		<title>Open Science versus Ash Dieback (and the Tweenome revisited)</title>
		<link>http://blogs.biomedcentral.com/gigablog/2013/02/13/open-science-versus-ash-dieback-and-the-tweenome-revisited/</link>
		<comments>http://blogs.biomedcentral.com/gigablog/2013/02/13/open-science-versus-ash-dieback-and-the-tweenome-revisited/#comments</comments>
		<pubDate>Wed, 13 Feb 2013 16:53:27 +0000</pubDate>
		<dc:creator>Scott Edmunds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[genomics]]></category>
		<category><![CDATA[GitHub]]></category>
		<category><![CDATA[open data]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[open-science]]></category>
		<category><![CDATA[tweenome]]></category>

		<guid isPermaLink="false">http://blogs.biomedcentral.com/gigablog/?p=403</guid>
		<description><![CDATA[<p><a href="http://blogs.biomedcentral.com/gigablog/files/2013/02/DSCF75661.jpg"></a><strong>Bye Bye Bluebells</strong><br />
Bluebell woods, the dense carpets of violet–blue flowers found in ancient woodland are a spectacular and famous springtime sight in Britain, but this picture postcard scene is threatened as never before. <a href="http://en.wikipedia.org/wiki/Hymenoscyphus_pseudoalbidus" title="Hymenoscyphus pseudoalbidus/Chalara fraxinea wikipedia" target="_blank"><em>Chalara fraxinea</em> or ash dieback</a>, a devastating fungal disease of ash trees has swept across northern Europe, and has now reached Britain, a country <a href="http://www.channel4.com/news/q-and-a-ash-dieback-disease" title="Channel 4 FAQ on AshaDieback" target="_blank">particular susceptible</a> to its potential onslaught, as the estimated 80 million ash trees make up 30 per cent of woodland across the country. Ash trees particularly encourage biodiversity, as the tree branches are ideally spaced for light to pass through and let the bluebells and other species on ...</p><p class="clearfix"><a class="btn alignright continue-reading" href="http://blogs.biomedcentral.com/gigablog/2013/02/13/open-science-versus-ash-dieback-and-the-tweenome-revisited/">Read more</a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://blogs.biomedcentral.com/gigablog/files/2013/02/DSCF75661.jpg"><img src="http://blogs.biomedcentral.com/gigablog/files/2013/02/DSCF75661-300x225.jpg" alt="" width="300" height="225" class="alignleft size-medium wp-image-406" /></a><strong>Bye Bye Bluebells</strong><br />
Bluebell woods, the dense carpets of violet–blue flowers found in ancient woodland are a spectacular and famous springtime sight in Britain, but this picture postcard scene is threatened as never before. <a href="http://en.wikipedia.org/wiki/Hymenoscyphus_pseudoalbidus" title="Hymenoscyphus pseudoalbidus/Chalara fraxinea wikipedia" target="_blank"><em>Chalara fraxinea</em> or ash dieback</a>, a devastating fungal disease of ash trees has swept across northern Europe, and has now reached Britain, a country <a href="http://www.channel4.com/news/q-and-a-ash-dieback-disease" title="Channel 4 FAQ on AshaDieback" target="_blank">particular susceptible</a> to its potential onslaught, as the estimated 80 million ash trees make up 30 per cent of woodland across the country. Ash trees particularly encourage biodiversity, as the tree branches are ideally spaced for light to pass through and let the bluebells and other species on the forest floor grow. Spreading over mainland Europe since its discovery in Poland in 1992, the disease has been particularly virulent in Northern Europe, with <a href="http://www.guardian.co.uk/world/2012/oct/07/disease-killing-denmarks-ash-trees" title="Ashdieback in Denmark, and need to keep UK disease free: Guardian" target="_blank">90% of ash trees in Denmark affected</a>. It had been hoped that Britain could act as a bulwark against the disease, but in October last year due to <a href="http://www.guardian.co.uk/environment/2012/nov/05/ash-dieback-government-legal-action" title="Guardian: legal action against UK government for Ashdieback" target="_blank">slow government response</a> and a <a href="http://www.guardian.co.uk/environment/2012/dec/11/ash-dieback-plant-scientists-environment-committee" title="Guardian: shortage of plant scientists blamed" target="_blank">shortage of qualified plant pathologists</a> everyone’s worst fears were realized, when the <a href="http://www.guardian.co.uk/environment/2012/oct/30/ash-tree-crisis-dieback-disease?intcmp=239" title="Guardian Environment, October 2012: Ashdieback discovery in UK" target="_blank">fungus was for the first time found growing</a> growing in mature trees in Eastern England.</p>
<p><strong>Crowdsourcing and Open-Science to the Rescue!</strong><br />
With hope of quarantine and <a href="http://www.bbc.co.uk/news/science-environment-20253767" title="BBC News: Ash dieback will not be eradicated" target="_blank">eradication</a> now over, the latest government strategy is to slow its spread and develop and restructure the woodland with resistant trees. To keep on top of an evolving highly infectious pathogen with a wind-borne spread that can spread in the wind is a particularly onerous task, particularly with the <a href="http://www.guardian.co.uk/environment/2012/dec/11/ash-dieback-plant-scientists-environment-committee" title="Guardian: shortage of plant scientists blamed" target="_blank">lack of experts and scientists</a> on the ground. Crowdsourcing, opening up the fight against the pathogen to the global wisdom of the crowds, as well as harnessing the rapid transfer of information on social networks and &#8220;hive mind&#8221; of the web is one potentially way to address this.  Already a team of developers and scientists <a href="http://www.guardian.co.uk/environment/2012/oct/29/ashtag-app-tree-disease-dieback" title="Guardian: AshTag app launched to prevent spread of disease" target="_blank">have developed AshTag</a>, a smartphone app that the public can use to report suspected cases of infection. </p>
<p>Following on from this, this week in <em>GigaScience</em> we <a href="http://www.gigasciencejournal.com/content/2/1/2/abstract" title="Ashdieback GigaScience paper" target="_blank">publish a paper</a> from a community of scientists taking an &#8220;open-source genomics&#8221; approach to engage and use the global genomics community in this fight. To kick start genomic analyses of the pathogen and host, Dan MacLean and colleagues from <a href="http://oadb.tsl.ac.uk/" title="OpenAshDB website" target="_blank">OpenAshDB</a> present a call to arms to the research community entitled &#8220;Crowdsourcing genomic analyses of ash and ash dieback &#8211; power to the people&#8221;. Taking an usual step to immediately release the initial genomics datasets as soon as it is produced, they have producing a website (<a href="http://oadb.tsl.ac.uk/" title="OpenAshDB website" target="_blank">oadb.tsl.ac.uk</a>) and GitHub based platform to share and analyse the data and results. While there have been attempts to crowdsource human disease outbreaks before (see this <a href="http://www.youtube.com/watch?v=LmAugMSJ1-Y" title="Jennifer Gardy on 21st century public health" target="_blank">excellent TEDx talk</a> from Jennifer Gardy on H1N1), this is the first time it has been attempted on a plant disease of such importance. This open-source genomics approach also follows on and learns lessons from the deadly 2011 European <em>E. coli</em> 0104:H4 outbreak, and the approach that our colleagues at the BGI and others followed to crowdsource the analysis of the pathogens genome via twitter, blogs and GitHub. </p>
<p>Using a very similarly structured collaborative GitHub-based platform that the <a href="https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis/" title="E. coli github">Era7 team built</a> upon the original <em>E. coli </em>data released by the BGI as the <a href="http://dx.doi.org/10.5524/100001" title="E. coli DOI" target="_blank">first data DOI</a> in our GigaDB database, the OpenAshDB project aim to take this open-science approach even further, with plans for collaborative authorship for contributors and to work with pre-print servers before publication of the final products. On top of the altruistic reasons of scientific curiosity and wanting to protect biodiversity and the environment, one of the key incentives for taking part in a project such as this is obviously the traditional one of obtaining scientific credit. While everything has to be quickly released into the public domain to maximize its use, contributions can still be tracked through GitHub via commit number and traditional mechanisms such as citation. Working with <a href="http://datacite.org/" title="DataCite homepage" target="_blank">DataCite</a> we issued our first DOI for the <em>E. coli</em> genome, and the <a href="http://altmetrics.org/manifesto/" title="Altmetrics homepage" target="_blank">altmetrics</a> community using tools such as Impact Story have shown it is also possible to track GitHub use through similar means (see <a href="http://impactstory.org/item/url/https://github.com/ehec-outbreak-crowdsourced/BGI-data-analysis" title="Impact Story - E. coli GitHub" target="_blank">this example</a> for the <em>E. coli</em> GitHub).  </p>
<p><strong>The Tweenome Revisited</strong><br />
Crowdsourcing and open-science is an area we at <a href="http://www.gigasciencejournal.com/" title="GigaScience homepage" target="_blank"><em>GigaScience</em></a> are keen to promote and support, and this paper comes on top of recent papers published on <a href="http://www.gigasciencejournal.com/content/1/1/13" title="Parrot Genome commentary" target="_blank">community sponsored/assembled Parrot genomes</a> (AKA <a href="http://www.bio-itworld.com/2012/09/28/peoples-parrot-first-community-sponsored-genome.html" title="Bio-IT world, the peoples parrot" target="_blank">the Peoples Parrot</a>), and <a href="http://www.gigasciencejournal.com/content/1/1/15" title="Genome Blogging Paper" target="_blank">personal genomics analysis via blogs</a>. We have <a href="http://blogs.biomedcentral.com/gigablog/2011/08/03/notes-from-an-e-coli-tweenome-lessons-learned-from-our-first-data-doi/" title="Notes from a Tweenome blog" target="_blank">written previously</a> on how our and collaborators at UMC Hamburg-Eppendorf release of CC0 (the most open public domain waiver) <em>E. coli</em> genome data via twitter enabled others (with special mention of early work from Nick Loman and the Era7 team who helped get the ball rolling) to kick-start a burst of crowd-sourced, curiosity-driven analyses from bioinformaticians around the world. Dubbed by some as the first &#8220;Tweenome&#8221;, this project led to a <a href="http://www.nejm.org/doi/full/10.1056/nejmoa1107643#t=article" title="NEJM paper" target="_blank">high profile paper</a> in <em>New England Journal of Medicine</em>, and now over 18 months on it is a good time to look back and see the downstream consequences, and if any lessons can be learned for OpenAshDB and other projects with similar aims. </p>
<p>Among the over 110 citations to the paper so far (<a href="http://scholar.google.com/scholar?cites=1437052908649410386&amp;as_sdt=2005&amp;sciodt=0,5&amp;hl=en" title="google scholar citations for E. coli paper" target="_blank">according to google scholar</a>), the study provided insight into the pathogenicity, evolution, and treatment of the pathogen as well as assisting <a href="http://www.nature.com/nbt/journal/v30/n5/full/nbt.2198.html" title="Loman et al., NBT 2012" target="_blank">platform comparison studies</a>. Obviously the main aim of doing science in this accelerated way was speed up <a href="http://www.genomics.cn/en/news/show_news?nid=98968" title="BGI release diagnostic primers" target="_blank">diagnosis</a> and treatments, and the <em>E. coli</em> data enabled the rapid development of <a href="http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0034498" title="PLOS One E. coli PCR based test paper" target="_blank">diagnostic tests</a> and <a href="http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0033637" title="PLOS One E. coli anti-microbial agent paper" target="_blank">anti-microbial agents</a>. These are useful examples, as better diagnostic tests and potential therapies are obviously important downstream outcomes and goals that the OpenAshDB project could help enable.</p>
<p>Probably the projects biggest legacy is as an example of open-science, data-citation, and the use of <a href="http://creativecommons.org/choose/zero/" title="CC0 homepage" target="_blank">CC0 data</a>. After releasing the data under a CC0 license this allowed truly open-source analysis, and the <a href="http://www.hpa-bioinformatics.org.uk/lgp/genomes" title="HPA data" target="_blank">UK HPA</a> and <a href="http://bacpathgenomics.wordpress.com/2011/06/13/e-coli-data-released-under-creative-commons-0-license/" title="Loman CC0 blog" target="_blank">github members</a> followed suit in releasing their work in this way. Following this example, a team at Pacific-Biosystems also released their related data in a similar manner, using <a href="http://www.nature.com/news/open-data-project-aims-to-ease-the-way-for-genomic-research-1.10507" title="Data consent, Nature News" target="_blank">the example</a> of their fellow <em>E. coli</em> data producers to allow them to release their data without wasting time on legal wrangling. This example has subsequently been used as an example for future UK and EU science policy, with the Royal Society in the UK using the <em>E. coli</em> crowsourcing as an example of &#8220;the power of intelligently open data&#8221;, and highlighting it on the cover of their influential &#8220;<a href="http://royalsociety.org/policy/projects/science-public-enterprise/report/" title="Science as an Open Enterprise report" target="_blank">Science as an Open Enterprise&#8221; report</a>.</p>
<p>We hope that the OpenAshDB project leaves a similar legacy, and being Ash Wednesday we hope that the many in the genomics community join the effort to study and fight this devastating ecological threat. It will not only enable future generations to continue to appreciate the beauty of bluebell woods, but provide an example of how science can be more collaborative, faster and more efficient in this new era of open-science and open data. As the authors of the paper state in the working title of the article &#8211; power to the people!</p>
<p><strong>Further Reading</strong><br />
<a href="http://www.gigasciencejournal.com/content/2/1/2/abstract" title="Ashdieback GigaScience paper" target="_blank">1.</a> MacLean, D; et al., Crowdsourcing genomic analyses of ash and ash dieback &#8212; power to the people. <em>GigaScience</em> 2013, <strong>2</strong>:2<br />
<a href="http://oadb.tsl.ac.uk/" title="OpenAshDB website" target="_blank">2.</a> OpenAshDB Website: <a href="http://oadb.tsl.ac.uk/" title="OpenAshDB website" target="_blank">http://oadb.tsl.ac.uk/</a><br />
<a href="http://blogs.biomedcentral.com/gigablog/2011/08/03/notes-from-an-e-coli-tweenome-lessons-learned-from-our-first-data-doi/" title="Notes from a Tweenome blog" target="_blank">3.</a> Notes from a Tweenome: <a href="http://blogs.biomedcentral.com/gigablog/2011/08/03/notes-from-an-e-coli-tweenome-lessons-learned-from-our-first-data-doi/" title="Notes from a Tweenome blog" target="_blank">http://blogs.biomedcentral.com/gigablog/2011/08/03/notes-from-an-e-coli-tweenome-lessons-learned-from-our-first-data-doi/</a><br />
<a href="http://www.nejm.org/doi/full/10.1056/NEJMoa1107643" title="NEJM paper" target="_blank">4.</a> Rohde, H; et al., Open-Source Genomic Analysis of Shiga-Toxin–Producing E. coli O104:H4. <em>N Engl J Med</em> 2011, <strong>365</strong>:718-724. </p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.biomedcentral.com/gigablog/2013/02/13/open-science-versus-ash-dieback-and-the-tweenome-revisited/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	<enclosure type="image/jpeg" length="811776" url="http://blogs.biomedcentral.com/gigablog/files/2013/02/DSCF7566-150x112.jpg" />	</item>
		<item>
		<title>Opening peer-review: our new paper on SOAPdenovo2 shows how it works</title>
		<link>http://blogs.biomedcentral.com/gigablog/2012/12/27/opening-peer-review-our-new-paper-on-soapdenovo2-shows-how-it-works/</link>
		<comments>http://blogs.biomedcentral.com/gigablog/2012/12/27/opening-peer-review-our-new-paper-on-soapdenovo2-shows-how-it-works/#comments</comments>
		<pubDate>Thu, 27 Dec 2012 17:09:48 +0000</pubDate>
		<dc:creator>Scott Edmunds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[data citation]]></category>
		<category><![CDATA[gigaDB]]></category>
		<category><![CDATA[GigaScience]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[open-science]]></category>
		<category><![CDATA[peer-review]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://blogs.biomedcentral.com/gigablog/?p=355</guid>
		<description><![CDATA[<p><a href="http://blogs.biomedcentral.com/gigablog/files/2012/12/ScientificReview.jpg"></a>With everyone in a reflective mood as the year comes to a close, one of the big scientific trends of 2012 has obviously been the high profile that open-access and more open methods of carrying out science has received. With the <a href="http://thecostofknowledge.com/" title="Boycott Elseview" target="_blank">Elsevier boycott</a>, UK <a href="http://www.guardian.co.uk/science/2012/jun/19/open-access-academic-publishing-finch-report" title="Finch report in Guardian" target="_blank">Finch report</a>, and launch of a number of innovative new schemes in publishing open-access research and data (including <a href="http://f1000research.com/" title="F1000 Research" target="_blank"><em>F1000 Research</em></a>, <a href="www.elifesciences.org/" title="eLife" target="_blank"><em>eLife</em></a>, <a href="https://peerj.com/" title="PeerJ" target="_blank"><em>PeerJ</em></a> and of course <em><a href="http://www.gigasciencejournal.com" title="GigaHomepage" target="_blank">GigaScience</a></em>), 2012 has been talked of as the year of an <a href="http://en.wikipedia.org/wiki/Academic_Spring" title="Academic Spring" target="_blank">“academic spring”</a> that has started to shake up the centuries old, stuffy ...</p><p class="clearfix"><a class="btn alignright continue-reading" href="http://blogs.biomedcentral.com/gigablog/2012/12/27/opening-peer-review-our-new-paper-on-soapdenovo2-shows-how-it-works/">Read more</a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://blogs.biomedcentral.com/gigablog/files/2012/12/ScientificReview.jpg"><img src="http://blogs.biomedcentral.com/gigablog/files/2012/12/ScientificReview-300x168.jpg" alt="" width="300" height="168" class="alignleft size-medium wp-image-357" /></a>With everyone in a reflective mood as the year comes to a close, one of the big scientific trends of 2012 has obviously been the high profile that open-access and more open methods of carrying out science has received. With the <a href="http://thecostofknowledge.com/" title="Boycott Elseview" target="_blank">Elsevier boycott</a>, UK <a href="http://www.guardian.co.uk/science/2012/jun/19/open-access-academic-publishing-finch-report" title="Finch report in Guardian" target="_blank">Finch report</a>, and launch of a number of innovative new schemes in publishing open-access research and data (including <a href="http://f1000research.com/" title="F1000 Research" target="_blank"><em>F1000 Research</em></a>, <a href="www.elifesciences.org/" title="eLife" target="_blank"><em>eLife</em></a>, <a href="https://peerj.com/" title="PeerJ" target="_blank"><em>PeerJ</em></a> and of course <em><a href="http://www.gigasciencejournal.com" title="GigaHomepage" target="_blank">GigaScience</a></em>), 2012 has been talked of as the year of an <a href="http://en.wikipedia.org/wiki/Academic_Spring" title="Academic Spring" target="_blank">“academic spring”</a> that has started to shake up the centuries old, stuffy and closed system of scientific discourse. </p>
<p>On top of changes to the way scientists and readers are demanding they can access and mine the literature and data, new incentives and mechanisms to release and publish data (of which we have <a href="http://blogs.biomedcentral.com/gigablog/tag/data-citation/" title="GigaBlog on data citation" target="_blank">written extensively</a>), the process of peer-review has also come under the spotlight, and there has been a lot of talk on the <a href="http://www.michaeleisen.org/blog/?p=694" title="Eisen blog" target="_blank">deficiencies</a> <a href="http://www.guardian.co.uk/science/lost-worlds/2012/dec/01/dinosaurs-fossils" title="Guardian blog on peer review" target="_blank">of this system</a>. Many newly launched journals have tried to make the system more transparent, using systems such as post-publication peer-review (e.g. <a href="http://f1000research.com/about/" title="F1000 Research about page" target="_blank">F1000 Research</a>), pre-print servers (e.g. the increasing acceptance of <a href="http://arxiv.org/" title="arXiv" target="_blank">arXiv</a> in biology), providing access to anonymized (e.g. <a href="http://www.nature.com/emboj/about/process.html#Transparent_Process" title="EMBO J peer review" target="_blank">EMBO journals</a>) or partial (<a href="http://www.elifesciences.org/the-journal/review-process/" title="eLife review process" target="_blank">elife</a>) parts of the peer-review history, or encouraging reviewers to opt-into open peer-review (<a href="https://peerj.com/about/policies-and-procedures/#open-peer-review" title="PeerJ open review policies" target="_blank"><em>PeerJ</em></a>, and <a href="http://www.plosone.org/static/reviewerGuidelines;jsessionid=A48FAA073462674555394D0178417DB9#anonymity" target="_blank">experimented with</a> a little at <em>PLOS</em>). At <em>GigaScience</em> we have decided to take this process one step further and ask for <a href="http://www.gigasciencejournal.com/about/reviewers" title="GigaReviewers page" target="_blank">open peer-review as default</a>, and as our aims are to promote more open, reproducible and transparent-science we feel it promotes accountability, fairness, and importantly gives credit to reviewers for their hard efforts. A <a href="http://www.gigasciencejournal.com/content/1/1/18/abstract" title="SOAPdenovo2 paper" target="_blank">new publication</a> in the journal today provides a particularly useful example of how this process has worked, so we have decided to highlight it here in <a href="http://blogs.biomedcentral.com/gigablog/" title="GigaBlog" target="_blank">GigaBlog</a>, and would welcome feedback and comments on our approach.</p>
<p><strong>What is SOAPdenovo2?</strong><br />
Today we <a href="http://www.gigasciencejournal.com/content/1/1/18/abstract" title="SOAPdenovo2 paper" target="_blank">publish</a> an updated version of BGI’s popular SOAPdenovo software application (the <a href="http://genome.cshlp.org/content/20/2/265.short" title="SOAPdenovo1 paper" target="_blank">original version</a> having 460 citations according to <a href="http://scholar.google.com.hk/scholar?cites=11447276992969970821&amp;as_sdt=2005&amp;sciodt=0,5&amp;hl=en" title="google scholar citations" target="_blank">googlescholar</a>), a start of the art tool for <em>de novo</em> genome assembly. <em>De novo</em> assembly – piecing together genomes from sequencing data without the aid of a previously assembled reference, is a particularly computationally intensive and technically challenging task. BGI and their SOAPdenovo tool has been particularly adept in this area, using it to assemble hundreds of new plant and animal species genomes, as well as finding huge amounts of <a href="http://www.nature.com/nbt/journal/v29/n8/full/nbt.1904.html" title="Nature Biotech SV paper" target="_blank">previously undetected structural changes</a> when applied to individual human genomes. <em>De novo </em>genome assembly is an important and competitive area in bioinformatics, and there have been a number of assembly competitions and genome assembler “bake offs” to compare and benchmark the various applications and methods available for carrying this out, the <a href="http://assemblathon.org/" title="Assemblathon" target="_blank">Assemblathon</a> and <a href="http://gage.cbcb.umd.edu/" title="GAGE homepage" target="_blank">GAGE</a> assembly competitions and evaluations being notable examples of this. </p>
<p>New developments in version 2 of SOAPdenovo have focused on using more efficient algorithms and data structures to reduce the memory requirements, better optimizing and handling of errors and low coverage or heterozygous regions, as well as improved closing of gaps. To demonstrate the improvements and that the application truly is the state-of-the-art for de novo assembly of large vertebrate genomes, the authors reassembled BGI’s <a href="http://yh.genomics.org.cn/" title="YH homepage" target="_blank">YH Asian reference genome</a> with the new and original versions of SOAPdenovo, version 2 producing contig sizes 3 times larger, and with nearly two thirds of the maximum memory consumption. Doing comparisons against other state of the-art assemblers such as ALLPATHS-LG, SOAPdenovo2 outperformed them for many metrics on the Assemblathon and GAGE benchmark datasets tested, really showcasing and demonstrating the potential power and utility of this new application for the bioinformatics community.</p>
<p><strong>Open peer-review, <em>GigaScience</em> style</strong><br />
Stating that SOAPdenevo2 can perform better than other state-of-the-art assembly tools is one thing, but to justify and prove this review and testing by independent peers is needed, and the larger and more complicated an application is (particularly an issue for us being a journal that focuses on data heavy research studies), the more challenging this can be. In order to ease, throw light and credit the reviewers in this process <em>GigaScience</em> uses a much more transparent, accountable and open peer-review process. Tailoring the process for such data heavy studies our <a href="http://www.gigasciencejournal.com/about/reviewers" title="GigaReviewers page" target="_blank">criteria for publication</a> is based more on relative amount of data created or used, and transparency and availability more than subjective and unpredictable measures such as supposed “impact”. For software and methods papers what is being presented obviously has to be an improvement on what is currently available, but for scenarios such as genome assembly where there is obviously no “one-size-fits-all” solution, assessment has to be based on the new method being an improvement in at least one potential application. </p>
<p>During peer review we host all of the supporting information and data (totaling 78GB in this case) and our curators work and make all of it available to the peer-reviewers from our ftp servers. In this case we worked with three groups of expert reviewers (8 independent experts in total) who thoroughly tested the software against various tools and datasets provided to ensure the claims made by the authors were correct. On top of providing all of the test data and scripts and tools that support the paper, to aid the process the authors also provide detailed pipelines with the tools and configured packages including commands and necessary utilities to reproduce the different tests carried out in the paper.</p>
<p>Whilst used in a number of medical journals, almost unprecedentedly in biology we ask as default all of the reviewers to carry out open peer-review, and in this case all 8 of them consented and signed their names to the reports that are now <a href="http://www.gigasciencejournal.com/content/1/1/18/prepub" title="SOAPdenovo prepub history" target="_blank">available to view</a> from the pre-publication history section associated with our published articles. To see how this looks you can follow the history of the SOAPdenovo2 paper <a href="http://www.gigasciencejournal.com/content/1/1/18/prepub" title="SOAPdenovo prepub history" target="_blank">here</a>. </p>
<p>While we do have the option for reviewers to opt-out and anonymize their reports if they have concerns about this process, it is encouraging that for all of the papers we have reviewed so far none have asked to do this. A number of new journals are starting to encourage reviewers to sign their reports, but this is the default option for <em>GigaScience</em>, with the option of opting out if the referees have reasons to remain anonymous. We also give the reviewers the option of making confidential comments to the editors (particularly on ethical and policy issues), but so far the quality of the reports has generally been very constructive, and <a href="http://bjp.rcpsych.org/content/176/1/47.long" title="Open peer review RCT paper" target="_blank">previous</a> <a href="http://www.bmj.com/content/318/7175/23?view=long&amp;pmid=9872878" title="BMJ open peer review quality paper" target="_blank">studies</a> on open peer-review have also found that quality and courteousness of reviews were increased, with little if any negative effects. By making the process more open and transparent competing interests and biases are reduced, and reviewers are able to take credit for the hard efforts they have put into the review process, and even declare and include it in their CV if they wish as we would like to put the content of accepted papers reviews under a <a href="http://creativecommons.org/licenses/by/3.0/" title="CC-BY" target="_blank">CC-BY license</a>. The benefits of this increased transparency to readers are also useful, as they do not have to take it on trust that published manuscripts were reviewed by qualified reviewers, and for educational purposes they can see good examples of how peer review operates.</p>
<p><strong>Promoting reproducibility, <em>GigaScience</em> style</strong><br />
On top of boosting transparency and reproducibility of peer-review of data-heavy studies, <em>GigaScience</em> also carries this over to the publication process, and this paper is also an excellent example of this goal. On top of SOAPdenovo2 meeting our requirements of being open source and having its code in a repository (<a href="http://soapdenovo2.sourceforge.net/" title="SOAPdenovo2 sourceforge page" target="_blank">sourceforge</a>), the authors also provide detailed pipelines with the tools and configured packages including commands and necessary utilities to reproduce the different tests carried out in the paper. With 78GB of test data and 30MB of tools and scripts being much larger than any other journal is able to handle, we have made all of these available from our <a href="http://gigadb.org/" title="GigaDB" target="_blank">GigaDB database</a> as separate citable DOIs. Taking this a step further, on top of being able to be downloaded by ftp and our <a href="http://asperasoft.com/" title="Aspera" target="_blank">Aspera</a> license (allowing up to 10-100X faster access), the software and analyses are also currently being integrated into our Galaxy-workflow system based data platform. </p>
<p>While we have previously published software articles and pipeline studies combining reference datasets and tools before (see our <a href="http://www.gigasciencejournal.com/content/1/1/3" title="Mouse methylome paper" target="_blank">methylome pipeline paper</a> with 84GB of <a href="http://dx.doi.org/10.5524/100035" title="Mouse methylome GigaDB data" target="_blank">supporting information</a>), this is the first paper that we have given separate DOIs to the <a href="http://dx.doi.org/10.5524/100044" title="software DOI" target="_blank">tools</a> and <a href="http://dx.doi.org/10.5524/100038" title="V2 YH genome DOI" target="_blank">data</a>. The logic for doing this is that both can now be credited to potentially different groups of authors, the data and methods/analyses may be used and cited independently of each other, and each can be tracked and credited to each author via DOIs listed in their <a href="http://about.orcid.org/" title="ORCID about page" target="_blank">ORCID</a> account.  We feel that it is important to credit method as well as data production, and while <a href="http://datacite.org/" title="DataCite" target="_blank">DataCite</a> currently recognizes “Software” as a resource type, we are encouraging and working with them to add “Worklow” to their list of handled objects. </p>
<p>Work is ongoing on the workflow and data platform side, but we are currently in the process of reviewing a number of other software articles (called <a href="http://www.gigasciencejournal.com/authors/instructions/technicalnote" title="Technical Note I4A" target="_blank">Technical Note</a> in <em>GigaScience</em>), and if you have similar studies you are interested in having reviewed in a more transparent and constructive manner please contact us at editorial@gigasciencejournal or submit it through our <a href="http://www.gigasciencejournal.com/manuscript" title="Submit to GigaScience" target="_blank">online submission system</a>. As process is still currently evolving and being fine-tuned we would welcome any feedback via this blog, <a href="https://twitter.com/gigascience" title="@gigascience" target="_blank">twitter</a> or email. We would like to thank the team of reviewers of <a href="http://www.gigasciencejournal.com/content/1/1/18/prepub" title="SOAPdenovo prepub history" target="_blank">this</a> and our other manuscripts so far for their hard efforts, as well as the authors for being so helpful in making their work and data available in such a reproducible manner.  Many journals are tentatively starting to experiment going down a partially more open route, but from our positive experiences so far we would encourage them and others to be more bold and go all of the way.</p>
<p><strong>Further Reading</strong><br />
<a href="http://www.gigasciencejournal.com/content/1/1/18/abstract" title="SOAPdenovo2 paper" target="_blank">1.</a> Luo R et al., SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler <em>GigaScience</em> 2012, <strong>1</strong>:18<br />
<a href="http://www.bmj.com/content/318/7175/23?view=long&amp;pmid=9872878" title="BMJ open peer review quality paper" target="_blank">2.</a> van Rooyen S et al., Effect of open peer review on quality of reviews and on reviewers&#8217; recommendations: a randomised trial. <em>BMJ</em> 1999, <strong>318</strong>:23-7<br />
<a href="http://bjp.rcpsych.org/content/176/1/47.long" title="Open peer review RCT paper" target="_blank">3.</a> Walsh E et al., Open peer review: a randomised controlled trial. <em>Br J Psychiatry</em> 2000, <strong>176</strong>:47-51.<br />
<a href="http://dx.doi.org/10.5524/100038" title="V2 YH genome DOI" target="_blank">4.</a> Wang, J; et al., (2012): Updated genome assembly of YH: the first diploid genome sequence of a Han Chinese individual (version 2, 07/2012). GigaScience Database. <a href="http://dx.doi.org/10.5524/100038" title="V2 YH genome DOI" target="_blank">http://dx.doi.org/10.5524/100038</a><br />
<a href="http://dx.doi.org/10.5524/100044" title="software DOI" target="_blank">5.</a> Luo, R; et al., (2012): Software and supporting material for “SOAPdenovo2: An empirically improved memory-efficient short read de novo assembly”. GigaScience Database. <a href="http://dx.doi.org/10.5524/100044" title="software DOI" target="_blank">http://dx.doi.org/10.5524/100044</a></p>
<p>UPDATE 24th Jan 2013: we have <a href="http://www.gigasciencejournal.com/content/2/1/1" title="GigaScience peer-review editorial" target="_blank">produced an editorial</a> on our peer-review policies based on this blog and the feedback we received on it. Also check out the great work the Homolog_us blog has done <a href="http://www.homolog.us/blogs/category/soapdenovo/" title="Homolog_us blog SOAPdenovo2 tag" target="_blank">testing and studying SOAPdenovo2</a> making the source-code even more transparent via this wiki: <a href="http://homolog.us/wiki/index.php?title=SOAPdenovo2" title="SOAPdenovo2 wiki" target="_blank">http://homolog.us/wiki/index.php?title=SOAPdenovo2</a> </p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.biomedcentral.com/gigablog/2012/12/27/opening-peer-review-our-new-paper-on-soapdenovo2-shows-how-it-works/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	<enclosure type="image/jpeg" length="78720" url="http://blogs.biomedcentral.com/gigablog/files/2012/12/ScientificReview-150x84.jpg" />	</item>
		<item>
		<title>Promoting Data Citation in Nature (and Pushing Past Panda Problems)</title>
		<link>http://blogs.biomedcentral.com/gigablog/2012/12/21/promoting-datacitation-in-nature/</link>
		<comments>http://blogs.biomedcentral.com/gigablog/2012/12/21/promoting-datacitation-in-nature/#comments</comments>
		<pubDate>Fri, 21 Dec 2012 10:58:19 +0000</pubDate>
		<dc:creator>Scott Edmunds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[data citation]]></category>
		<category><![CDATA[data publication]]></category>
		<category><![CDATA[DataCite]]></category>
		<category><![CDATA[GigaScience]]></category>
		<category><![CDATA[Nature]]></category>
		<category><![CDATA[pandas]]></category>
		<category><![CDATA[publishing]]></category>

		<guid isPermaLink="false">http://blogs.biomedcentral.com/gigablog/?p=338</guid>
		<description><![CDATA[<p><a href="http://blogs.biomedcentral.com/gigablog/files/2012/12/Slide11.jpg"></a>Regular reader of this blog will be aware of our <a href="http://blogs.biomedcentral.com/gigablog/tag/data-citation/" title="Data Citation in GigaBlog" target="_blank">efforts to promote data citation</a> using <a href="http://en.wikipedia.org/wiki/Digital_object_identifier" title="Wikipedia DOI entry" target="_blank">digital object identifiers</a> (DOIs), and this week, alongside Rebecca Lawrence from <em><a href="http://f1000research.com/" title="F1000 Research" target="_blank">F1000 Research</a></em> and Kevin Ashley from the <a href="http://www.dcc.ac.uk/" title="DCC homepage" target="_blank">Digital Curation Centre</a>, our Editor in Chief Laurie Goodman has a <a href="http://www.nature.com/nature/journal/v492/n7429/full/492356d.html" title="Nature DataCition correspondence" target="_blank">correspondence in <em>Nature</em></a> strongly making this case. The motivation to cite datasets comes from a recognition that data generated in the course of research are just as valuable to the ongoing academic discourse as papers, and should therefore be treated in the same manner. The correspondence makes this case, and ...</p><p class="clearfix"><a class="btn alignright continue-reading" href="http://blogs.biomedcentral.com/gigablog/2012/12/21/promoting-datacitation-in-nature/">Read more</a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://blogs.biomedcentral.com/gigablog/files/2012/12/Slide11.jpg"><img src="http://blogs.biomedcentral.com/gigablog/files/2012/12/Slide11-300x225.jpg" alt="" width="300" height="225" class="alignleft size-medium wp-image-341" /></a>Regular reader of this blog will be aware of our <a href="http://blogs.biomedcentral.com/gigablog/tag/data-citation/" title="Data Citation in GigaBlog" target="_blank">efforts to promote data citation</a> using <a href="http://en.wikipedia.org/wiki/Digital_object_identifier" title="Wikipedia DOI entry" target="_blank">digital object identifiers</a> (DOIs), and this week, alongside Rebecca Lawrence from <em><a href="http://f1000research.com/" title="F1000 Research" target="_blank">F1000 Research</a></em> and Kevin Ashley from the <a href="http://www.dcc.ac.uk/" title="DCC homepage" target="_blank">Digital Curation Centre</a>, our Editor in Chief Laurie Goodman has a <a href="http://www.nature.com/nature/journal/v492/n7429/full/492356d.html" title="Nature DataCition correspondence" target="_blank">correspondence in <em>Nature</em></a> strongly making this case. The motivation to cite datasets comes from a recognition that data generated in the course of research are just as valuable to the ongoing academic discourse as papers, and should therefore be treated in the same manner. The correspondence makes this case, and is timely with the recent launch of <a href="http://wokinfo.com/products_tools/multidisciplinary/dci/" title="TR data citation index" target="_blank">Thomson-Reuters data citation index</a>. While datasets can be linked to database accessions and other identifiers, DOIs are more stable and permanent than URLs, and have the crucial advantage over alternatives in that they are already familiar to researchers, publishers, and libraries. With the new <a href="http://about.orcid.org/" title="ORCID about page" target="_blank">ORCID</a> system allowing DOIs to be imported and linked to an authors other research works, funders such as NSF allowing datasets to be <a href="http://researchremix.wordpress.com/page/7/" title="HP NSF biosketch blog" target="_blank">listed in biosketches</a>, and <a href="http://wokinfo.com/products_tools/multidisciplinary/dci/" title="TR data citation index" target="_blank">data citation indexes</a> now allowing datasets to be tracked and credited to data producers, there are finally tangible benefits and incentives for data producers in creating and citing data DOIs.</p>
<p>We have previously written about this subject and making the point in high profile journal such as <em>Nature</em>, the authors hope this can promote this point to a much wider audience, and also help directly lobby publishers such as NPG. This is tempered slightly by having to make these arguments in a closed access forum, especially as this means we are not able to reproduce the content here, but we are currently double checking we can put the pre-print version up. The published letter <a href="http://www.nature.com/nature/journal/v492/n7429/full/492356d.html" title="Nature DataCitation correspondence" target="_blank">“Data-set visibility: Cite links to data in reference lists”</a> is very short, but in the limited word limit allowed the authors ask publishers, funders, researchers and institutions that Datasets should be more prominently linked to their associated research articles as standard practice, and link to the <a href="http://www.dcc.ac.uk/resources/how-guides/cite-datasets" title="DCC data citation guidelines" target="_blank">DCC best practice guidelines</a> to give more detailed instructions on how to do this. They also make a further argument that this increased visibility and accessibility of datasets would also benefit peer-reviewers and readers by “raising standards of data analysis, promoting more detailed review, encouraging data curation and boosting reproducibility and data reuse”.</p>
<p><strong>A brief history of data citation</strong><br />
Working with <a href="http://datacite.org/" title="DataCite" target="_blank">DataCite</a> and the British Library, our <a href="http://gigadb.org/" title="GigaDB" target="_blank">GigaDB</a> databases first DOI (<a href="http://blogs.biomedcentral.com/gigablog/2011/08/03/notes-from-an-e-coli-tweenome-lessons-learned-from-our-first-data-doi/" title="Tweenome blog" target="_blank">the genome of the deadly outbreak strain <em>E. coli</em></a>) was issued in June 2011 and we and the growing number of data publishers (including our co-authors on this correspondence <em><a href="http://f1000research.com/" title="F1000 Research" target="_blank">F1000 Research</a></em>) have been working closely with a number of publishers to allow the citation of datasets. The <em>Nature</em> commentary makes the point that at present very few journals are currently doing this, but after our initially <a href="http://blogs.biomedcentral.com/gigablog/2011/10/21/gigadata-news-macaque-dois-published-in-nature-biotechnology/" title="Macaque posting" target="_blank">unsuccessful attempts</a> at getting DOIs included in the <em>NEJM</em> <em>E. coli paper</em> and DOIs into a <em>Nature Biotechnology</em> paper, our experiences of working with publishers has generally been more positive. Of the journals <a href="http://f1000research.com/about/?utm_source=jrnlbtn" title="F1000 Research poll" target="_blank">polled by F1000</a>, only Cell Press and Ann Oncol have said they would have an issue with the pre-publication release of data in this manner. After working closely with the editors of <em>Genome Biology</em> to <a href="http://blogs.biomedcentral.com/gigablog/2012/05/11/adventures-in-data-citation-sorghum-as-a-standard-for-data-release/" title="Sorghum DOI" target="_blank">include the DOI of the Sorghum genome</a> in the references of a paper in November 2011, BioMed Central have used this example in their instructions for authors as how to cite datasets, and we published a <a href="http://www.biomedcentral.com/1756-0500/5/223" title="BMC Res Notes commentary" target="_blank">commentary</a> in the <em>BMC Research Notes</em> <a href="http://www.biomedcentral.com/bmcresnotes/series/datasharing" title="BMC Res Notes series" target="_blank">Data Sharing and Standardization series</a> highlighting this best practice. Since the Sorghum example, PLoS, Springer, Science, and a number of publishers have now started properly integrating DOIs from Dryad, Figshare and a number of other data publication platforms into their references. </p>
<p><strong>Two Steps Forward, One Step Back</strong><br />
Following some initial difficulties getting the DOIs from <a href="http://dx.doi.org/10.5524/100002" title="Macaque DOI" target="_blank">Macaque</a> genomes into a <em>Nature Biotechnology</em> publication, in February 2012 the editors agreed to allow the <a href="http://blogs.biomedcentral.com/gigablog/2012/02/17/what-links-rna-editing-data-citation-and-ancient-chinese-emperors/" title="NBT DOI blog" target="_blank">first DOIs to be cited in the journal</a>. In October this year two papers in the same issue of the main <em>Nature</em> journal included GigaDB datasets in the references, and this letter is a further welcome sign of support from <em>Nature</em> and NPG. That there is still much work to be done is still clear though, and we have a demonstration of the challenges that are still to be overcome in the very week that this letter has been published. Having the correct editorial policies is one thing, but these need made to be made clear in the instructions the authors, editors and production departments follow, and in a paper just published in <em>Nature Genetics</em> on <a href="http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.2494.html" title="Nature Genetics Panda paper" target="_blank">Panda population genomics</a>, the DOIs for the <a href="http://dx.doi.org/10.5524/100004" title="Panda DOI" target="_blank">Panda</a> and <a href="http://dx.doi.org/10.5524/100008" title="Polar Bear DOI" target="_blank">Polar Bear</a> genomes were moved by the journal from the references list to the URLs section of the journal. Relegating the datasets in this way not only prevents the data producers receiving due credit for making their data publicly available, not having the data in the references means they have also lost the ability to count and track the citations in the data citation index. </p>
<p>As the DOIs in the article were changed into the URLs that they redirect to (e.g. <a href="http://dx.doi.org/10.5524/100004" title="Panda DOI" target="_blank">http://dx.doi.org/10.5524/100004</a> was changed to the current GigaDB URL <a href="http://gigadb.org/giant-panda/" title="Panda URL" target="_blank">http://gigadb.org/giant-panda/</a>), this loses the stability and persistence that DOIs bring, and there is much greater risk in the future that the URLs may change and will no longer become resolvable. This was one of the reasons we have bought DOIs from the British Library and are using them for our datasets, and as we are currently migrating <a href="http://gigadb.org/" title="GigaDB" target="_blank">GigaDB</a> to a new location and platform, DOIs give us the flexibility to move domains and locations without downstream users losing access, and preventing the need for citations having to change. Unlike the data <a href="http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.2494.html#/accessions" title="Accessions section" target="_blank">Accessions section</a> of the journal that is publicly accessible, and the reference section that is mined by citation indexes, the URLs section is the <em>Nature Genetics</em> is totally hidden behind a paywall and so is not mineable or accessible in any way, totally losing the advantages of linking literature and data. Coming out a few days after this particular example occurred, this <a href="http://www.nature.com/nature/journal/v492/n7429/full/492356d.html" title="Nature DataCitation correspondence" target="_blank">letter in <em>Nature</em></a> will hopefully increase awareness and prevent such incidents happening in the future, and speed up the acceptance and treatment of data as first-class records of research.</p>
<p><strong>Further Reading</strong><br />
<a href="http://www.nature.com/nature/journal/v492/n7429/full/492356d.html" title="Nature DataCitation correspondence" target="_blank">1.</a>	Goodman, L., Lawrence, R. &amp; Ashley, K. Data-set visibility: Cite links to data in reference lists. Nature 492, 356 (2012).<br />
<a href="http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.2494.html" title="Nature Genetics Panda paper" target="_blank">2.</a>	Zhao, S. et al. Whole-genome sequencing of giant pandas provides insights into demographic history and local adaptation. Nature Genetics (2012).doi:10.1038/ng.2494<br />
<a href="http://www.biomedcentral.com/1756-0500/5/223" title="BMC Res Notes commentary" target="_blank">3.</a>	Edmunds, S. et al. Adventures in data citation: sorghum genome data exemplifies the new gold standard. BMC Research Notes 5, 223 (2012). </p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.biomedcentral.com/gigablog/2012/12/21/promoting-datacitation-in-nature/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	<enclosure type="image/jpeg" length="208110" url="http://blogs.biomedcentral.com/gigablog/files/2012/12/Slide11-150x112.jpg" />	</item>
	</channel>
</rss>
