+ Site Statistics
+ Search Articles
+ Subscribe to Site Feeds
Most Shared
PDF Full Text
+ PDF Full Text
Request PDF Full Text
+ Follow Us
Follow on Facebook
Follow on Twitter
Follow on LinkedIn

+ Translate
+ Recently Requested

A quality control algorithm for DNA sequencing projects

A quality control algorithm for DNA sequencing projects

Nucleic Acids Research 21(16): 3829-3838

Heterologous DNA sequences from rearrangements with the genomes of host cells, genomic fragments from hybrid cells, or impure tissue sources can threaten the purity of libraries that are derived from RNA or DNA. Hybridization methods can only detect contaminants from known or suspected heterologous sources, and whole library screening is technically very difficult. Detection of contaminating heterologous clones by sequence alignment is only possible when related sequences are present in a known database. We have developed a statistical test to identity heterologous sequences that is based on the differences in hexamer composition of DNA from different organisms. This test does not require that sequences similar to potential heterologous contaminants are present in the database, and can in principle detect contamination by previously unknown organisms. We have applied this test to the major public expressed sequence tag (EST) data sets to evaluate its utility as a quality control measure and a peer evaluation tool. There is detectable heterogeneity in most human and C. elegans EST data sets but it is not apparently associated with cross-species contamination. However, there is direct evidence for both yeast and bacterial sequence contamination in some public database sequences annotated as human. Results obtained with the hexamer test have been confirmed with similarity searches using sequences from the relevant data sets.

(PDF emailed within 0-6 h: $19.90)

Accession: 002287578

Download citation: RISBibTeXText

PMID: 8367301

DOI: 10.1093/nar/21.16.3829

Related references

A frameshift error detection algorithm for DNA sequencing projects. Nucleic acids research, 23(15): 2900-2908, 1995

Compression of next-generation sequencing quality scores using memetic algorithm. Bmc Bioinformatics 15 Suppl 15(): S10-S10, 2015

Transcriptome sequencing of the Microarray Quality Control (MAQC) RNA reference samples using next generation sequencing. Bmc Genomics 10(): 264-264, 2009

Quality control of DNA sequencing templates and contig assembly using quantitative DNA fiber mapping facilities parallelism in producing sequencing. American Journal of Human Genetics 59(4 SUPPL ): A314, 1996

Shared investment projects and forecasting errors: setting framework conditions for coordination and sequencing data quality activities. Plos One 10(3): E0121362-E0121362, 2016

Sampling quality assurance and quality control requirements for mining projects. Australasian Institute of Mining and Metallurgy 3/2010: 51-62, 2010

Improving data quality control in quality improvement projects. International Journal for Quality in Health Care 21(2): 145-150, 2009

Kilo sequencing a strategy for rapid acquisition of dna sequence data for large sequencing projects. Journal of Supramolecular Structure & Cellular Biochemistry (SUPPL 5): 435, 1981

Quality control for evaluation of effect of control over occupational hazards in construction projects. Zhonghua Lao Dong Wei Sheng Zhi Ye Bing Za Zhi 24(9): 575-576, 2006

Small-scale, specific cDNA sequencing projects can usefully complete data provided by massive sequencing centers. European Journal of Human Genetics 4(SUPPL 1): 110, 1996

Multi-period quarry production planning through sequencing techniques and sequencing algorithm. Journal of Mining Science 44(2): 206-217, 2008

Assign 2.0: software for the analysis of Phred quality values for quality control of HLA sequencing-based typing. Tissue Antigens 64(5): 556-565, 2004

Application of a Quality Control Algorithm Independent of Control Materials in Molecular Testing for Personalized Medicine. Clinical Laboratory 61(12): 1837-1844, 2016

Guidelines for in lab quality control for LRTAP projects. 1983