+ Site Statistics
+ Search Articles
+ Subscribe to Site Feeds
EurekaMag Most Shared ContentMost Shared
EurekaMag PDF Full Text ContentPDF Full Text
+ PDF Full Text
Request PDF Full TextRequest PDF Full Text
+ Follow Us
Follow on FacebookFollow on Facebook
Follow on TwitterFollow on Twitter
Follow on LinkedInFollow on LinkedIn

+ Translate

Fast assignment of protein structures to sequences using the intermediate sequence library PDB-ISL

Fast assignment of protein structures to sequences using the intermediate sequence library PDB-ISL

Bioinformatics 16(2): 117-124

Motivation: For large-scale structural assignment to sequences, as in computational structural genomics, a fast yet sensitive sequence search procedure is essential. A new approach using intermediate sequences was tested as a shortcut to iterative multiple sequence search methods such as PSI-BLAST. Results: A library containing potential intermediate sequences for proteins of known structure (PDB-ISL) was constructed. The sequences in the library were collected from a large sequence database using the sequences of the domains of proteins of known structure as the query sequences and the program PSI-BLAST. Sequences of proteins of unknown structure can be matched to distantly related proteins of known structure by using pairwise sequence comparison methods to find homologues in PDB-ISL. Searches of PDB-ISL were calibrated, and the number of correct matches found at a given error rate was the same as that found by PSI-BLAST. The advantage of this library is that it uses pairwise sequence comparison methods, such as FASTA or BLAST2, and can, therefore, be searched easily and, in many cases, much more quickly than an iterative multiple sequence comparison method. The procedure is roughly 20 times faster than PSI-BLAST for small genomes and several hundred times for large genomes. Availability: Sequences can be submitted to the PDB-ISL servers at or and can be downloaded from or Contact: and

(PDF emailed within 0-6 h: $19.90)

Accession: 010656582

Download citation: RISBibTeXText

PMID: 10842732

DOI: 10.1093/bioinformatics/16.2.117

Related references

An integrated approach to the analysis and modeling of protein sequences and structures. II. On the relationship between sequence and structural similarity for proteins that are not obviously related in sequence. Journal of Molecular Biology 301(3): 679-689, 2000

Assessing secondary structure assignment of protein structures by using pairwise sequence-alignment benchmarks. Proteins 71(1): 61-67, 2007

libcov: a C++ bioinformatic library to manipulate protein structures, sequence alignments and phylogeny. Bmc Bioinformatics 6: 138-138, 2005

Sequence-specific Ni(II)-dependent peptide bond hydrolysis for protein engineering. Combinatorial library determination of optimal sequences. Journal of the American Chemical Society 132(10): 3355-3366, 2010

3DCoffee: combining protein sequences and structures within multiple sequence alignments. Journal of Molecular Biology 340(2): 385-395, 2004

Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. Journal of Molecular Biology 313(4): 903-919, 2001

Complete sequence of human fast-type and slow-type muscle myosin-binding-protein C (MyBP-C). Differential expression, conserved domain structure and chromosome assignment. European Journal of Biochemistry 216(2): 661-669, 1993

Modeling three-dimensional protein structures for amino acid sequences of the CASP3 experiment using sequence-derived predictions. Proteins 0(SUPPL 3): 61-65, 1999

An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments. Journal of Molecular Biology 301(3): 691-711, 2000

Expressed sequence tags (ESTs) derived from a heat stressed Phaseolus lunatus cDNA library contain a high percentage of heat shock protein gene sequences. Annual report1(41): 11-12, 1998

Chromosomal assignment and tissue distribution of novel expressed sequence tags from a human pancreatic islet cDNA library. Genomics 29(1): 276-281, 1995

GenePainter: a fast tool for aligning gene structures of eukaryotic protein families, visualizing the alignments and mapping gene structures onto protein structures. Bmc Bioinformatics 14(): 77-77, 2013

Decombinator a tool for fast, efficient gene assignment in T-cell receptor sequences using a finite state machine. Bioinformatics 29(5): 542-550, 2013