+ Site Statistics
+ Search Articles
+ Subscribe to Site Feeds
EurekaMag Most Shared ContentMost Shared
EurekaMag PDF Full Text ContentPDF Full Text
+ PDF Full Text
Request PDF Full TextRequest PDF Full Text
+ Follow Us
Follow on FacebookFollow on Facebook
Follow on TwitterFollow on Twitter
Follow on LinkedInFollow on LinkedIn

+ Translate

Compression of next-generation sequencing quality scores using memetic algorithm

Compression of next-generation sequencing quality scores using memetic algorithm

Bmc Bioinformatics 15 Suppl 15(): S10-S10

The exponential growth of next-generation sequencing (NGS) derived DNA data poses great challenges to data storage and transmission. Although many compression algorithms have been proposed for DNA reads in NGS data, few methods are designed specifically to handle the quality scores. In this paper we present a memetic algorithm (MA) based NGS quality score data compressor, namely MMQSC. The algorithm extracts raw quality score sequences from FASTQ formatted files, and designs compression codebook using MA based multimodal optimization. The input data is then compressed in a substitutional manner. Experimental results on five representative NGS data sets show that MMQSC obtains higher compression ratio than the other state-of-the-art methods. Particularly, MMQSC is a lossless reference-free compression algorithm, yet obtains an average compression ratio of 22.82% on the experimental data sets. The proposed MMQSC compresses NGS quality score data effectively. It can be utilized to improve the overall compression ratio on FASTQ formatted files.

(PDF emailed within 0-6 h: $19.90)

Accession: 057482442

Download citation: RISBibTeXText

PMID: 25474747

DOI: 10.1186/1471-2105-15-S15-S10

Related references

Transformations for the compression of FASTQ quality scores of next-generation sequencing data. Bioinformatics 28(5): 628-635, 2012

ReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data. Bmc Bioinformatics 13(): 221-221, 2013

Comparing nominal and real quality scores on next-generation sequencing genotype calls. Bmc Proceedings 5 Suppl 9: S14-S14, 2012

Error-free image compression algorithm using classifying-sequencing techniques. Applied Optics 31(14): 2554-2559, 1992

A memetic algorithm for VLSI floorplanning. IEEE Transactions on Systems, Man, and Cybernetics. Part B, Cybernetics 37(1): 62-69, 2007

A multiobjective memetic algorithm for PPI network alignment. Bioinformatics 31(12): 1988-1998, 2015

Memetic algorithm for community detection in networks. Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics 84(5 Pt 2): 056101-056101, 2012

Transcriptome sequencing of the Microarray Quality Control (MAQC) RNA reference samples using next generation sequencing. Bmc Genomics 10(): 264-264, 2009

A memetic genetic algorithm for the vertex p-center problem. Evolutionary Computation 16(3): 417-436, 2008

QTL mapping using a memetic algorithm with modifications of BIC as fitness function. Statistical Applications in Genetics and Molecular Biology 11(4): Article 2-Article 2, 2012

A Memetic Algorithm for Periodic Capacitated Arc Routing Problem. IEEE Transactions on Systems, Man, and Cybernetics. Part B, Cybernetics 41(6): 1654-1667, 2011

A multilevel memetic algorithm for large SAT-encoded problems. Evolutionary Computation 20(4): 641-664, 2013

A Memetic Algorithm for 3-D Protein Structure Prediction Problem. Ieee/Acm Transactions on Computational Biology and Bioinformatics (): -, 2016

Benchmarking a memetic algorithm for ordering microarray data. Bio Systems 88(1-2): 56-75, 2006

Specific PCR product primer design using memetic algorithm. Biotechnology Progress 25(3): 745-753, 2009