+ Site Statistics
+ Search Articles
+ PDF Full Text Service
How our service works
Request PDF Full Text
+ Follow Us
Follow on Facebook
Follow on Twitter
Follow on LinkedIn
+ Subscribe to Site Feeds
Most Shared
PDF Full Text
+ Translate
+ Recently Requested

Comprehensive and relaxed search for oligonucleotide signatures in hierarchically clustered sequence datasets

Comprehensive and relaxed search for oligonucleotide signatures in hierarchically clustered sequence datasets

Bioinformatics 27(11): 1546-1554

PCR, hybridization, DNA sequencing and other important methods in molecular diagnostics rely on both sequence-specific and sequence group-specific oligonucleotide primers and probes. Their design depends on the identification of oligonucleotide signatures in whole genome or marker gene sequences. Although genome and gene databases are generally available and regularly updated, collections of valuable signatures are rare. Even for single requests, the search for signatures becomes computationally expensive when working with large collections of target (and non-target) sequences. Moreover, with growing dataset sizes, the chance of finding exact group-matching signatures decreases, necessitating the application of relaxed search methods. The resultant substantial increase in complexity is exacerbated by the dearth of algorithms able to solve these problems efficiently. We have developed CaSSiS, a fast and scalable method for computing comprehensive collections of sequence- and sequence group-specific oligonucleotide signatures from large sets of hierarchically clustered nucleic acid sequence data. Based on the ARB Positional Tree (PT-)Server and a newly developed BGRT data structure, CaSSiS not only determines sequence-specific signatures and perfect group-covering signatures for every node within the cluster (i.e. target groups), but also signatures with maximal group coverage (sensitivity) within a user-defined range of non-target hits (specificity) for groups lacking a perfect common signature. An upper limit of tolerated mismatches within the target group, as well as the minimum number of mismatches with non-target sequences, can be predefined. Test runs with one of the largest phylogenetic gene sequence datasets available indicate good runtime and memory performance, and in silico spot tests have shown the usefulness of the resulting signature sequences as blueprints for group-specific oligonucleotide probes. Software and Supplementary Material are available at http://cassis.in.tum.de/.

Please choose payment method:

(PDF emailed within 0-6 h: $19.90)

Accession: 052269959

Download citation: RISBibTeXText

PMID: 21471017

DOI: 10.1093/bioinformatics/btr161

Related references

Human tumor microRNA signatures derived from large-scale oligonucleotide microarray datasets. International Journal of Cancer 129(7): 1624-1634, 2011

Biclustering for the comprehensive search of correlated gene expression patterns using clustered seed expansion. Bmc Genomics 14: 144-144, 2013

Identification of T cell-restricted genes, and signatures for different T cell responses, using a comprehensive collection of microarray datasets. Journal of Immunology 175(12): 7837-7847, 2005

Clustered sequence representation for fast homology search. Journal of Computational Biology 14(5): 594-614, 2007

Evaluation of public cancer datasets and signatures identifies TP53 mutant signatures with robust prognostic and predictive value. Bmc Cancer 15(): 179-179, 2016

The evolution of social discounting in hierarchically clustered populations. Molecular Ecology 21(3): 447-471, 2012

ODS Overlapping oligonucleotide database with deductive engine for signal sequence search. Lim, H A , Fickett, J W , Cantor, C R , Robbins, R J The Second International Conference on Bioinformatics, Supercomputing, and Complex Genome Analysis 263-272, 1993

Hierarchically clustered adaptive quantization CMAC and its learning convergence. IEEE Transactions on Neural Networks 18(6): 1658-1682, 2007

Search for the optimal sequence of the ribosome binding site by random oligonucleotide-directed mutagenesis. Nucleic Acids Research 16(11): 5075-5088, 1988

Development of the Overlapping Oligonucleotide Database and its application to signal sequence search of the human genome. Computer Applications in the Biosciences 9(4): 427-434, 1993

Joint modeling of hierarchically clustered and overdispersed non-gaussian continuous outcomes for comet assay data. Pharmaceutical Statistics 11(6): 449-455, 2013

Fast and sensitive alignment of microbial whole genome sequencing reads to large sequence datasets on a desktop PC: application to metagenomic datasets and pathogen identification. Plos One 9(7): E103441-E103441, 2015

Detecting signatures of selection through haplotype differentiation among hierarchically structured populations. Genetics 193(3): 929-941, 2013

Identifying functional gene sets from hierarchically clustered expression data: map of abiotic stress regulated genes in Arabidopsis thaliana. Nucleic Acids Research 34(18): E124-E124, 2006

Signatures from clustered airguns. Pages (unpaginated 1990, 1990