+ Site Statistics
+ Search Articles
+ PDF Full Text Service
How our service works
Request PDF Full Text
+ Follow Us
Follow on Facebook
Follow on Twitter
Follow on LinkedIn
+ Subscribe to Site Feeds
Most Shared
PDF Full Text
+ Translate
+ Recently Requested

Non-gaussian distributions affect identification of expression patterns, functional annotation, and prospective classification in human cancer genomes

Non-gaussian distributions affect identification of expression patterns, functional annotation, and prospective classification in human cancer genomes

Plos one 7(10): E46935

Gene expression data is often assumed to be normally-distributed, but this assumption has not been tested rigorously. We investigate the distribution of expression data in human cancer genomes and study the implications of deviations from the normal distribution for translational molecular oncology research. We conducted a central moments analysis of five cancer genomes and performed empiric distribution fitting to examine the true distribution of expression data both on the complete-experiment and on the individual-gene levels. We used a variety of parametric and nonparametric methods to test the effects of deviations from normality on gene calling, functional annotation, and prospective molecular classification using a sixth cancer genome. Central moments analyses reveal statistically-significant deviations from normality in all of the analyzed cancer genomes. We observe as much as 37% variability in gene calling, 39% variability in functional annotation, and 30% variability in prospective, molecular tumor subclassification associated with this effect. Cancer gene expression profiles are not normally-distributed, either on the complete-experiment or on the individual-gene level. Instead, they exhibit complex, heavy-tailed distributions characterized by statistically-significant skewness and kurtosis. The non-Gaussian distribution of this data affects identification of differentially-expressed genes, functional annotation, and prospective molecular classification. These effects may be reduced in some circumstances, although not completely eliminated, by using nonparametric analytics. This analysis highlights two unreliable assumptions of translational cancer gene expression analysis: that "small" departures from normality in the expression data distributions are analytically-insignificant and that "robust" gene-calling algorithms can fully compensate for these effects.

Please choose payment method:

(PDF emailed within 0-6 h: $19.90)

Accession: 054639273

Download citation: RISBibTeXText

PMID: 23118863

DOI: 10.1371/journal.pone.0046935

Related references

The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Research 32(18): 5539-5545, 2004

Enhancement of force patterns classification based on Gaussian distributions. Journal of Biomechanics 67: 144-149, 2018

Texture classification by modeling joint distributions of local patterns with gaussian mixtures. IEEE Transactions on Image Processing 19(6): 1548-1557, 2010

Mining gene expression data for functional annotation of genomes. Trends in Biotechnology 20(7): 282-0, 2002

Comparative analysis of chloroplast genomes: functional annotation, genome-based phylogeny, and deduced evolutionary patterns. Genome Research 12(4): 567-583, 2002

Extraction and annotation of human mitochondrial genomes from 1000 Genomes Whole Exome Sequencing data. Bmc Genomics 15(Suppl. 3): S2, 2014

The annotation of both human and mouse kinomes in UniProtKB/Swiss-Prot: one small step in manual annotation, one giant leap for full comprehension of genomes. Molecular and Cellular Proteomics 7(8): 1409-1419, 2008

Aetiology-specific patterns in end-stage heart failure patients identified by functional annotation and classification of microarray data. European Journal of Heart Failure 8(4): 381-389, 2006

The duplicated genes database: identification and functional annotation of co-localised duplicated genes across genomes. Plos one 7(11): E50653, 2012

Comparison of functional annotation schemes for genomes. Functional & Integrative Genomics 1(1): 56-69, 2000

Protozoan genomes: gene identification and annotation. International Journal for Parasitology 35(5): 495-512, 2005

Identification and functional annotation of lncRNA genes with hypermethylation in colorectal cancer. Gene 572(2): 259-265, 2015

Automatic discovery of regulatory patterns in promoter regions based on whole cell expression data and functional annotation. Bioinformatics 16(4): 326-333, 2000

Gene3D: comprehensive structural and functional annotation of genomes. Nucleic Acids Research 36(Database Issue): D414-D418, 2008

Functional Annotation Analytics of Rhodopseudomonas palustris Genomes. Bioinformatics and Biology Insights 5: 115-129, 2011