+ Site Statistics
+ Search Articles
+ Subscribe to Site Feeds
Most Shared
PDF Full Text
+ PDF Full Text
Request PDF Full Text
+ Follow Us
Follow on Facebook
Follow on Twitter
Follow on LinkedIn
+ Translate
+ Recently Requested

Total ancestry measure: quantifying the similarity in tree-like classification, with genomic applications

Total ancestry measure: quantifying the similarity in tree-like classification, with genomic applications

Bioinformatics 23(16): 2163-2173

Many classifications of protein function such as Gene Ontology (GO) are organized in directed acyclic graph (DAG) structures. In these classifications, the proteins are terminal leaf nodes; the categories 'above' them are functional annotations at various levels of specialization and the computation of a numerical measure of relatedness between two arbitrary proteins is an important proteomics problem. Moreover, analogous problems are important in other contexts in large-scale information organization--e.g. the Wikipedia online encyclopedia and the Yahoo and DMOZ web page classification schemes. Here we develop a simple probabilistic approach for computing this relatedness quantity, which we call the total ancestry method. Our measure is based on counting the number of leaf nodes that share exactly the same set of 'higher up' category nodes in comparison to the total number of classified pairs (i.e. the chance for the same total ancestry). We show such a measure is associated with a power-law distribution, allowing for the quick assessment of the statistical significance of shared functional annotations. We formally compare it with other quantitative functional similarity measures (such as, shortest path within a DAG, lowest common ancestor shared and Azuaje's information-theoretic similarity) and provide concrete metrics to assess differences. Finally, we provide a practical implementation for our total ancestry measure for GO and the MIPS functional catalog and give two applications of it in specific functional genomics contexts. The implementations and results are available through our supplementary website at: http://gersteinlab.org/proj/funcsim. Supplementary data are available at Bioinformatics online.

(PDF emailed within 0-6 h: $19.90)

Accession: 018187389

Download citation: RISBibTeXText

PMID: 17540677

DOI: 10.1093/bioinformatics/btm291

Related references

Similarity classifier using similarity measure derived from Yu's norms in classification of medical data sets. Computers in Biology and Medicine 37(8): 1133-1140, 2006

Inferring ancestry from population genomic data and its applications. Frontiers in Genetics 5: 204, 2014

Quantifying similarity between people psychiatric classification. U S Public Health Service Publication (1584): 201-211, 1968

A new algorithm for quantifying binding site pattern similarity with applications for Next Generation Sequencing. International Journal of Bioinformatics Research and Applications 8(1-2): 4-17, 2012

Using a similarity measure for credible classification. Discrete Applied Mathematics 157(5): 1104-1112, 2010

The tree-edit-distance, a measure for quantifying neuronal morphology. Neuroinformatics 7(3): 179-190, 2009

Similarity measure for quality control of dental CAD/CAM-applications. Computers in Biology and Medicine 42(11): 1086-1090, 2013

Genomic ancestry evaluated by ancestry-informative markers in patients with sickle cell disease. Genetics and Molecular Research 15(1), 2016

Feature trees: A new molecular similarity measure based on tree matching. Journal of Computer-Aided Molecular Design 12(5): 471-490, Sept, 1998

Fuzzy classification using similarity measure for remotely sensed data. Proceedings of the Thematic Conference on Geologic Remote Sensing 12(2): 217-221, 1997

A Multidimensional Time-Series Similarity Measure With Applications to Eldercare Monitoring. IEEE Journal of Biomedical and Health Informatics 20(3): 953-962, 2015

A measure of DNA sequence similarity by Fourier Transform with applications on hierarchical clustering. Journal of Theoretical Biology 359: 18-28, 2015

Similarity measure learning in closed-form solution for image classification. Thescientificworldjournal 2014: 747105, 2015

Unsupervised multistage image classification using hierarchical clustering with a Bayesian similarity measure. IEEE Transactions on Image Processing 14(3): 312-320, 2005