REAPR: a new genome assembly evaluation tool to rule them all

In January we published CGAL  – a new metric for the evaluation of genome assembly quality. This article was a result of a fairly recent revelation in the field that the traditionally defined N50 metric is not sufficient enough and new approaches are needed. It shouldn’t come as a surprise then that the CGAL article was not the only one trying to address this issue in recent months.

Some of the proposed solutions, including CGAL, are based on assembly likelihoods. Others use modified N50 metrics. But they are all lacking in one respect: ease of use. And this should be a priority: with gargantuan genome sequencing projects, aiming to complete not just one genome but as many as thousands of them, proliferating like rabbits, the ability to just sit down and use a tool that is easy to apply and that gives useful output has never been more important.

wiki; public domain

And this is exactly what REAPR, a new software tool published this week in Genome Biology, is trying to achieve. The article’s Sanger Institute-based authors, Dr Otto and colleagues, provide us with a tool that can identify errors in genome assemblies without the need for a reference sequence.

Although primarily validated on bacterial, Plasmodium and C.elegans sequences, REAPR easily scales up all the way to mammalian, including human, genomes. The authors demonstrate how REAPR can also be used for progress-monitoring in ongoing sequencing projects. Combine all that with an implementation for Windows and Mac operating systems, as well as clear and thorough documentation, and you end up with a tool that simply cuts to the chase.

View the latest posts on the On Biology homepage