Google Scholar

Error filtering, pair assembly and error correction for next-generation sequencing reads

RC Edgar, H Flyvbjerg - Bioinformatics, 2015 - academic.oup.com

RC Edgar, H Flyvbjerg

Bioinformatics, 2015•academic.oup.com

Motivation: Next-generation sequencing produces vast amounts of data with errors that are
difficult to distinguish from true biological variation when coverage is low. Results: We
demonstrate large reductions in error frequencies, especially for high-error-rate reads, by
three independent means:(i) filtering reads according to their expected number of errors,(ii)
assembling overlapping read pairs and (iii) for amplicon reads, by exploiting unique
sequence abundances to perform error correction. We also show that most published paired …

Abstract

Motivation: Next-generation sequencing produces vast amounts of data with errors that are difficult to distinguish from true biological variation when coverage is low.

Results: We demonstrate large reductions in error frequencies, especially for high-error-rate reads, by three independent means: (i) filtering reads according to their expected number of errors, (ii) assembling overlapping read pairs and (iii) for amplicon reads, by exploiting unique sequence abundances to perform error correction. We also show that most published paired read assemblers calculate incorrect posterior quality scores.

Availability and implementation: These methods are implemented in the USEARCH package. Binaries are freely available at http://drive5.com/usearch.

Contact: robert@drive5.com

Supplementary information: Supplementary data are available at Bioinformatics online.

Oxford University Press

Show moreShow less

Save Cite Cited by 1126 Related articles All 10 versions

Cite

Advanced search

Saved to My library

Error filtering, pair assembly and error correction for next-generation sequencing reads