+ Site Statistics
+ Search Articles
+ PDF Full Text Service
How our service works
Request PDF Full Text
+ Follow Us
Follow on Facebook
Follow on Twitter
Follow on LinkedIn
+ Subscribe to Site Feeds
Most Shared
PDF Full Text
+ Translate
+ Recently Requested

IRSpot-SF: Prediction of recombination hotspots by incorporating sequence based features into Chou's Pseudo components



IRSpot-SF: Prediction of recombination hotspots by incorporating sequence based features into Chou's Pseudo components



Genomics 111(4): 966-972



Recombination hotspots in a genome are unevenly distributed. Hotspots are regions in a genome that show higher rates of meiotic recombinations. Computational methods for recombination hotspot prediction often use sophisticated features that are derived from physico-chemical or structure based properties of nucleotides. In this paper, we propose iRSpot-SF that uses sequence based features which are computationally cheap to generate. Four feature groups are used in our method: k-mer composition, gapped k-mer composition, TF-IDF of k-mers and reverse complement k-mer composition. We have used recursive feature elimination to select 17 top features for hotspot prediction. Our analysis shows the superiority of gapped k-mer composition and reverse complement k-mer composition features over others. We have used SVM with RBF kernel as a classification algorithm. We have tested our algorithm on standard benchmark datasets. Compared to other methods iRSpot-SF is able to produce significantly better results in terms of accuracy, Mathew's Correlation Coefficient and sensitivity which are 84.58%, 0.6941 and 84.57%. We have made our method readily available to use as a python based tool and made the datasets and source codes available at: https://github.com/abdlmaruf/iRSpot-SF. An web application is developed based on iRSpot-SF and freely available to use at: http://irspot.pythonanywhere.com/server.html.

Please choose payment method:






(PDF emailed within 0-6 h: $19.90)

Accession: 065261085

Download citation: RISBibTeXText

PMID: 29935224

DOI: 10.1016/j.ygeno.2018.06.003


Related references

IRSpot-DTS: Predict recombination spots by incorporating the dinucleotide-based spare-cross covariance information into Chou's pseudo components. Genomics 111(6): 1760-1770, 2019

IRSpot-ADPM: Identify recombination spots by incorporating the associated dinucleotide product model into Chou's pseudo components. Journal of Theoretical Biology 441: 1-8, 2018

IRSpot-PDI: Identification of recombination spots by incorporating dinucleotide property diversity information into Chou's pseudo components. Genomics 111(3): 457-464, 2019

iRSpot-ADPM: Identify recombination spots by incorporating the associated dinucleotide product model into Chous pseudo components. Journal of Theoretical Biology 441: 1-8, 2018

Prediction of essential proteins in prokaryotes by incorporating various physico-chemical features into the general form of Chou's pseudo amino acid composition. Protein and Peptide Letters 20(7): 781-795, 2013

Prediction of lysine propionylation sites using biased SVM and incorporating four different sequence features into Chou's PseAAC. Journal of Molecular Graphics and Modelling 76: 356-363, 2017

Prediction of G-protein coupled receptors and their subfamilies by incorporating various sequence features into Chou's general PseAAC. Computer Methods and Programs in Biomedicine 134: 197-213, 2016

IRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components. International Journal of Molecular Sciences 15(2): 1746-1766, 2014

Prediction of antioxidant proteins by incorporating statistical moments based features into Chou's PseAAC. Journal of Theoretical Biology 473: 1-8, 2019

Chou's pseudo amino acid composition improves sequence-based antifreeze protein prediction. Journal of Theoretical Biology 356: 30-35, 2014

IRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou's PseAAC to formulate DNA samples. Molecular Genetics and Genomics 291(1): 285-296, 2016

Protein sequence analysis by incorporating modified chaos game and physicochemical properties into Chou's general pseudo amino acid composition. Journal of Theoretical Biology 406: 105-115, 2016

Accurate prediction of protein structural classes by incorporating predicted secondary structure information into the general form of Chou's pseudo amino acid composition. Journal of Theoretical Biology 344: 12-18, 2014

IRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Research 41(6): E68, 2013

Analysis and prediction of animal toxins by various Chou's pseudo components and reduced amino acid compositions. Journal of Theoretical Biology 462: 221-229, 2019