+ Site Statistics
+ Search Articles
+ Subscribe to Site Feeds
Most Shared
PDF Full Text
+ PDF Full Text
Request PDF Full Text
+ Follow Us
Follow on Facebook
Follow on Twitter
Follow on LinkedIn

+ Translate
+ Recently Requested

Evaluation of human-readable annotation in biomolecular sequence databases with biological rule libraries

Evaluation of human-readable annotation in biomolecular sequence databases with biological rule libraries

Bioinformatics 15(7-8): 528-535

Motivation: Computer-based selection of entries from sequence databases with respect to a related functional description, e.g. with respect to a common cellular localization or contributing to the same phenotypic function, is a difficult task. Automatic semantic analysis of annotations is not only hampered by incomplete functional assignments. A major problem is that annotations are written in a rich, non-formalized language and are meant for reading by a human expert. This person can extract from the text considerably more information than is immediately apparent due to his extended biological background knowledge and logical reasoning. Approach: A technique of automated annotation evaluation based on a combination of lexical analysis and the usage of biological rule libraries has been developed. The proposed algorithm generates new functional descriptors from the annotation of a given entry using the semantic units of the annotation as prepositions for implications executed in accordance with the rule library. Results: The prototype of a software system, the MetaA(nnotator) program, is described and the results of its application to sequence attribute assignment and sequence selection problems, such as cellular localization and sequence domain annotation of SWISS-PROT entries, are presented. The current software version assigns useful subcellular localization qualifiers to apprx88% of all SWISS-PROT entries. As shown by demonstrative examples, the combination of sequence and annotation analysis is a powerful approach for the detection of mutual annotation/sequence inconsistencies. Availability: The software is available on request from Frank.Eisenhaber@embl-heidelberg.de. Results for the cellular localization assignment can be viewed at the URL http://www.bork.embl-heidelberg.de/CELLLOC/CELLLOC.html. Contact: Frank.Eisenhaber@EMBL-Heidelberg.DE.

(PDF emailed within 0-6 h: $19.90)

Accession: 010616236

Download citation: RISBibTeXText

PMID: 10487860

DOI: 10.1093/bioinformatics/15.7.528

Related references

Key2Ann: a tool to process sequence sets by replacing database identifiers with a human-readable annotation. Journal of Integrative Bioinformatics 8(1): -, 2011

Human-readable rule generator for integrating amino scid sequence information and stability of mutant proteins. Ieee/Acm Transactions on Computational Biology and Bioinformatics 7(4): 681-687, 2011

Large-scale annotation of small-molecule libraries using public databases. Journal of Chemical Information and Modeling 47(4): 1386-1394, 2007

Tidying up international nucleotide sequence databases: ecological, geographical and sequence quality annotation of its sequences of mycorrhizal fungi. Plos One 6(9): E24940-E24940, 2012

The DNA sequence and biological annotation of human chromosome 1. Nature 441(7091): 315-321, 2006

The annotation-enriched non-redundant patent sequence databases. Database 2013: Bat005-Bat005, 2013

Automatic genome annotation and the status of sequence databases. Bioinformatics and genomes: current perspectives: 107-121, 2003

BISANCE: a French service for access to biomolecular sequence databases. Computer Applications in the Biosciences 6(4): 355-356, 1990

Editorial board-authored annotation drivers for sequence databases. FASEB Journal 12(8): A1396, April 24, 1998

DNA algorithms of implementing biomolecular databases on a biological computer. IEEE Transactions on Nanobioscience 14(1): 104-111, 2015

Percolation of annotation errors through hierarchically structured protein sequence databases. Mathematical Biosciences 193(2): 223-234, 2005

Development and annotation of shotgun sequence libraries from New World monkeys. Molecular Ecology Resources 12(5): 950-955, 2013

Automated system for gene annotation and metabolic pathway reconstruction using general sequence databases. Chemistry & Biodiversity 4(11): 2593-2602, 2007

Contamination of cDNA- libraries and expressed-sequence-tags databases. American Journal of Human Genetics 57(5): 1254-1255, 1995