Publish Data: Fight World Hunger

- 0 Comments

3000 Rice Genome Sequences Made Publicly Available on World Hunger Day
OLYMPUS DIGITAL CAMERAYesterday marked the publication in GigaScience of the first data from the 3,000 Rice Genomes Project, a collaboration between the Chinese Academy of Agricultural Sciences (CAAS), the International Rice Research Institute (IRRI), and BGI; as well as a commentary from the Directors of these institutes outlining the goals of this ambitious project. Our biggest Data Note to date, the publication and release of this enormous dataset in our GigaDB repository quadruples the current amount of publicly available rice sequence data, and ranks amongst the largest – in terms of samples – genome projects that have been published to date (see Nick Loman’s “biggest sequencing projects uber list”, where the project would currently rank as the largest non-microbial genome project). This demonstrates the profound effect that whole-genome sequencing is having on the study of biology, the size and scope of where the field of genomics is now. Following closely on from the large worm imaging dataset we also published as a Data Note this month, this paper further shows the utility of our (“big”) data publishing approach in aiding and incentivizing the rapid release data in such important areas of research as this.

The publication coincided with World Hunger Day, highlighting one of the primary goals of this project— to develop resources that will aid in improving global food security, especially in the poorest areas of the world. With more than 1/8th of the world’s population living in extreme hunger and poverty, and the world population estimated to reach 9.6 billion by 2050, there is a huge need to create new resources to improve crop yield, reduce the impact on the environment, and develop food crops that are of high yield and nutrition and can grow successfully in environments stressed by drought, pests, diseases, or soil degradation. While rice research has greatly advanced since the completion of the first high-quality rice genome sequence in 2005, there has been limited change in breeding practices that are important for producing improved and better adapted rice strains.

IMG_2679The 3,000 Rice Genomes Project provides a major step forward for addressing these challenges by creating and releasing an extensive amount of genetic information that can ultimately be applied to intelligent breeding practices, which take advantage of the natural variation between different plant strains and information on the genetic mechanisms that underlie these traits to select strains for breeding that will be more successful in producing hybrid strains with characteristics that are highly suited for growing successfully in different environments. This work is only the first stage of an ongoing the project, and has been funded by the Bill and Melinda Gates Foundation and the Chinese Ministry of Science and Technology, as part of their goals to kickstart a second green revolution, and develop badly needed new strains of rice tailored to different environments (see the recent economist article highlighting these efforts). The Gates Foundation has taken a particular interest in improving agricultural development through technological innovations, and Bill Gates visited our BGI Hong Kong offices last year to see how this (and other BGI collaborations) were going.

Dr. Zhikang Li (the Project Director at CAAS), stated that the 3000 Rice Genomes Project is part of an ongoing effort to provide resources specifically for poverty-stricken farmers in Africa and Asia, aiming to reach at least 20 million rice farmers in 16 target countries. “Rice is the staple food for most Asian people, and has increasing consumption in Africa,” said Dr. Li. “With decreasing resources (water and land), food security is —and will be— the most challenging issue in these countries, both currently and in the future. As a scientist in rice genetics, breeding and genomics, it would be a dream to help to solve this problem.”

Dr. Jun Wang (BGI Director and member of our Editorial board), added to this, saying that, “the population boom and worsening climate crisis have presented big challenges on global food shortage and safety. BGI is dedicated to applying genomics technologies to make a fast, controllable and highly efficient molecular breeding model possible. This opens a new way to carry out agricultural breeding. With the joined forces with CAAS, IRRI and Gates Foundation, we have made a step forward in big-data-based crop research and digitalized breeding. We believe every step will get us closer to the ultimate goal of improving the wellbeing of human race.”

According to the corresponding author of our accompanying commentary, IRRI director general Dr. Robert Zeigler, “access to 3,000 genomes of rice sequence data will tremendously accelerate the ability of breeding programs to overcome key hurdles mankind faces in the near future.” This collaborative project, added Zeigler, “will add an immense amount of knowledge to rice genetics, and enable detailed analysis by the global research community to ultimately benefit the poorest farmers who grow rice under the most difficult conditions.”

Genomics-assisted breeding to the rescue
To reach their goals, the three-institute collaboration has not only released 13.4 terabytes of data into the public domain, they have also collected seeds from each strain (available in the International Rice Genebank Collection housed at IRRI). Having banked seeds is essential to make full use of these now genetically defined strains to develop and sustain the most appropriate hybrid strains for different environments. There remains, however, one additional component to achieve this goal: this is information that allows researchers and breeders to directly link the genetic information (genotype) to the physical traits (phenotype) of these different strains. This requires careful assessment and curation of each rice strain for agriculturally important traits, which can then be linked to genetic markers in the now available genome sequences.

Current breeding practices, which have essentially remained the same since the development of agriculture, typically use apparent physical traits to guide strain selection for crossbreeding with the hope that the offspring will manifest a combination and improvement of the desired traits, such as drought, pest and disease resistance and increased crop productivity and improved nutritional value. However, the underlying genetic makeup can often confound breeder expectations because unknown genetic interactions can block, modify, or alter the development of the selected physical characteristics when two strains are bred. Thus, trial and error and multiple successive breeding stages are often required.

Having full knowledge of the genetic makeup of a plant allows researchers to identify genetic markers related to specific physical traits, and better understand how different genetic interactions effect plant phenotypes. This information allows a breeder to make more intelligent choices in strain selection, resulting in more accurate and rapid development of rice strains that are better suited to different agricultural environments in poor and environmentally stressed economies.

This genomics-assisted breeding approach (which differs from GMO) is a process that requires a great deal of care and manpower. Thus, the release of these data, and making the genetic information freely available to plant breeders and scientists across the world, will greatly aid in defining genotype/phenotype relationships as well as serve as an extensive resource improving our understanding of plant biology.

On top of hosting the terabytes of supporting data in GigaDB, sequence reads for this project have also been submitted to the SRA repository at PRJEB6180.

Further Reading
1. The 3,000 Rice Genomes Project. The 3,000 Rice Genomes Project. GigaScience 3:7 http://dx.doi.org/10.1186/2047-217X-3-7

2. Li, J-Y, Wang, J. and Zeigler The 3000 Rice Genomes Project: new opportunities and challenges for future rice research. GigaScience 2014, 3:8 http://dx.doi.org/10.1186/2047-217X-3-8

3. The 3000 Rice Genomes Project (2014): The Rice 3000 Genomes Project Data. GigaScience Database. http://dx.doi.org/10.5524/200001