+ Site Statistics
+ Search Articles
+ Subscribe to Site Feeds
Most Shared
PDF Full Text
+ PDF Full Text
Request PDF Full Text
+ Follow Us
Follow on Facebook
Follow on Twitter
Follow on LinkedIn

+ Translate
+ Recently Requested

Tidying up international nucleotide sequence databases: ecological, geographical and sequence quality annotation of its sequences of mycorrhizal fungi

Tidying up international nucleotide sequence databases: ecological, geographical and sequence quality annotation of its sequences of mycorrhizal fungi

Plos One 6(9): E24940

Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi.

(PDF emailed within 0-6 h: $19.90)

Accession: 056581731

Download citation: RISBibTeXText

PMID: 21949797

DOI: 10.1371/journal.pone.0024940

Related references

Plant virus sequences in the EMBL and GenBank nucleotide sequence databases. Binary Computing in Microbiology 4(5): 169-174, 1992

Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database. Nucleic Acids Research 36(Database Issue): D5-12, 2007

Plan for finding homologies in nucleotide sequence databases using preliminarily calculated sequence samples. Molekuliarnaia Biologiia 29(4): 790-800, 1995

Sequence validation for the identification of the white-rot fungi Bjerkandera in public sequence databases. Journal of Microbiology and Biotechnology 24(10): 1301-1307, 2015

Motifer, a search tool for finding amino acid sequence patterns from nucleotide sequence databases. FEBS Letters 465(1): 85-88, July 30, 1999

Intraspecific ITS variability in the kingdom fungi as expressed in the international sequence databases and its implications for molecular species identification. Evolutionary Bioinformatics Online 4: 193-201, 2009

Improving quality of expressed sequence tag (EST) databases: recovery of reversed, antisense cDNA sequences. Microbial and Comparative Genomics 5(1): 17-24, 2000

Nucleotide sequence of the lys2 gene of saccharomyces cerevisiae homology to bacillus brevis tyrocidine synthetase 1 alpha aminoadipate reductase open reading frame dna molecular sequence data amino acid sequence nucleotide sequence. Gene (Amsterdam) 98(1): 141-146, 1991

Identical 5S rRNA nucleotide sequence of Vibrio cholerae strains representing temporal, geographical, and ecological diversity. Applied and Environmental Microbiology 48(1): 119-121, 1984

Identical 5s ribosomal rna nucleotide sequence of vibrio cholerae strains representing temporal geographical and ecological diversity. Applied & Environmental Microbiology 48(1): 119-121, 1984

Phylogenetically Structured Differences in rRNA Gene Sequence Variation among Species of Arbuscular Mycorrhizal Fungi and Their Implications for Sequence Clustering. Applied and Environmental Microbiology 82(16): 4921-4930, 2017

The annotation-enriched non-redundant patent sequence databases. Database 2013: Bat005, 2013

Automatic genome annotation and the status of sequence databases. Bioinformatics and genomes: current perspectives: 107-121, 2003

Editorial board-authored annotation drivers for sequence databases. FASEB Journal 12(8): A1396, April 24, 1998

Percolation of annotation errors through hierarchically structured protein sequence databases. Mathematical Biosciences 193(2): 223-234, 2005