EurekaMag.com logo
+ Site Statistics
References:
52,725,316
Abstracts:
28,411,598
+ Search Articles
+ Subscribe to Site Feeds
EurekaMag Most Shared ContentMost Shared
EurekaMag PDF Full Text ContentPDF Full Text
+ PDF Full Text
Request PDF Full TextRequest PDF Full Text
+ Follow Us
Follow on FacebookFollow on Facebook
Follow on TwitterFollow on Twitter
Follow on Google+Follow on Google+
Follow on LinkedInFollow on LinkedIn

+ Translate

A quality control algorithm for DNA sequencing projects


Nucleic Acids Research 21(16): 3829-3838
A quality control algorithm for DNA sequencing projects
Heterologous DNA sequences from rearrangements with the genomes of host cells, genomic fragments from hybrid cells, or impure tissue sources can threaten the purity of libraries that are derived from RNA or DNA. Hybridization methods can only detect contaminants from known or suspected heterologous sources, and whole library screening is technically very difficult. Detection of contaminating heterologous clones by sequence alignment is only possible when related sequences are present in a known database. We have developed a statistical test to identity heterologous sequences that is based on the differences in hexamer composition of DNA from different organisms. This test does not require that sequences similar to potential heterologous contaminants are present in the database, and can in principle detect contamination by previously unknown organisms. We have applied this test to the major public expressed sequence tag (EST) data sets to evaluate its utility as a quality control measure and a peer evaluation tool. There is detectable heterogeneity in most human and C. elegans EST data sets but it is not apparently associated with cross-species contamination. However, there is direct evidence for both yeast and bacterial sequence contamination in some public database sequences annotated as human. Results obtained with the hexamer test have been confirmed with similarity searches using sequences from the relevant data sets.


Accession: 002287578

PMID: 8367301

DOI: 10.1093/nar/21.16.3829



Related references

A frameshift error detection algorithm for DNA sequencing projects. Nucleic acids research, 23(15): 2900-2908, 1995

A frameshift error detection algorithm for DNA sequencing projects. Nucleic Acids Research 23(15): 2900-2908, 1995

Compression of next-generation sequencing quality scores using memetic algorithm. Bmc Bioinformatics 15 Suppl 15(): S10-S10, 2015

Transcriptome sequencing of the Microarray Quality Control (MAQC) RNA reference samples using next generation sequencing. Bmc Genomics 10(): 264-264, 2009

Quality control of DNA sequencing templates and contig assembly using quantitative DNA fiber mapping facilities parallelism in producing sequencing. American Journal of Human Genetics 59(4 SUPPL ): A314, 1996

Shared investment projects and forecasting errors: setting framework conditions for coordination and sequencing data quality activities. Plos One 10(3): E0121362-E0121362, 2016

Sampling quality assurance and quality control requirements for mining projects. Australasian Institute of Mining and Metallurgy 3/2010: 51-62, 2010

Improving data quality control in quality improvement projects. International Journal for Quality in Health Care 21(2): 145-150, 2009

Kilo sequencing a strategy for rapid acquisition of dna sequence data for large sequencing projects. Journal of Supramolecular Structure & Cellular Biochemistry (SUPPL 5): 435, 1981

Quality control for evaluation of effect of control over occupational hazards in construction projects. Zhonghua Lao Dong Wei Sheng Zhi Ye Bing Za Zhi 24(9): 575-576, 2006