+ Site Statistics
+ Search Articles
+ Subscribe to Site Feeds
Most Shared
PDF Full Text
+ PDF Full Text
Request PDF Full Text
+ Follow Us
Follow on Facebook
Follow on Twitter
Follow on LinkedIn

+ Translate
+ Recently Requested

Large-scale annotation of small-molecule libraries using public databases

Large-scale annotation of small-molecule libraries using public databases

Journal of Chemical Information and Modeling 47(4): 1386-1394

While many large publicly accessible databases provide excellent annotation for biological macromolecules, the same is not true for small chemical compounds. Commercial data sources also fail to encompass an annotation interface for large numbers of compounds and tend to be cost prohibitive to be widely available to biomedical researchers. Therefore, using annotation information for the selection of lead compounds from a modern day high-throughput screening (HTS) campaign presently occurs only under a very limited scale. The recent rapid expansion of the NIH PubChem database provides an opportunity to link existing biological databases with compound catalogs and provides relevant information that potentially could improve the information garnered from large-scale screening efforts. Using the 2.5 million compound collection at the Genomics Institute of the Novartis Research Foundation (GNF) as a model, we determined that approximately 4% of the library contained compounds with potential annotation in such databases as PubChem and the World Drug Index (WDI) as well as related databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) and ChemIDplus. Furthermore, the exact structure match analysis showed 32% of GNF compounds can be linked to third party databases via PubChem. We also showed annotations such as MeSH (medical subject headings) terms can be applied to in-house HTS databases in identifying signature biological inhibition profiles of interest as well as expediting the assay validation process. The automated annotation of thousands of screening hits in batch is becoming feasible and has the potential to play an essential role in the hit-to-lead decision making process.

(PDF emailed within 0-6 h: $19.90)

Accession: 054078419

Download citation: RISBibTeXText

PMID: 17608408

DOI: 10.1021/ci700092v

Related references

Large-scale sequencing based on full-length-enriched cDNA libraries in pigs: contribution to annotation of the pig genome draft sequence. Bmc Genomics 13(): 581-581, 2013

Applications of the small-scale and large-scale SOTER databases in Hainan, China. 17th World Congress of Soil Science, Bangkok, Thailand, 14-20 August 2002: 655, 2002

Privacy and public benefit in using large scale health databases. Yakugaku Zasshi 134(5): 607-612, 2014

Evaluation of human-readable annotation in biomolecular sequence databases with biological rule libraries. Bioinformatics 15(7-8): 528-535, 1999

Identification of small molecule aggregators from large compound libraries by support vector machines. Journal of Computational Chemistry 31(4): 752-763, 2010

The bologna annotation resource: a non hierarchical method for the functional and structural annotation of protein sequences relying on a comparative large-scale genome analysis. Journal of Proteome Research 8(9): 4362-4371, 2009

DNA-encoded chemical libraries: advancing beyond conventional small-molecule libraries. Accounts of Chemical Research 47(4): 1247-1255, 2015

Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. Plos Computational Biology 5(12): E1000605-E1000605, 2010

The annotation and the usage of scientific databases could be improved with public issue tracker software. Database 2010: Baq035-Baq035, 2011

High Throughput, Label-free Screening Small Molecule Compound Libraries for Protein-Ligands using Combination of Small Molecule Microarrays and a Special Ellipsometry-based Optical Scanner. International Drug Discovery: 8-13, 2012

Chemoinformatic analysis of combinatorial libraries, drugs, natural products, and molecular libraries small molecule repository. Journal of Chemical Information and Modeling 49(4): 1010-1024, 2009

Toward automated biochemotype annotation for large compound libraries. Molecular Diversity 10(3): 495-509, 2006

Information access through electronic databases for rural public libraries. Rural libraries7(1): 7-33, 1997

Standardization and structural annotation of public toxicity databases Improving SAR capabilities and linkage to omics data. Abstracts of Papers American Chemical Society 226(1-2): TOXI 24, 2003

Large-scale small-molecule screen using zebrafish embryos. Methods in Molecular Biology 486: 43-55, 2009