+ Site Statistics
+ Search Articles
+ Subscribe to Site Feeds
EurekaMag Most Shared ContentMost Shared
EurekaMag PDF Full Text ContentPDF Full Text
+ PDF Full Text
Request PDF Full TextRequest PDF Full Text
+ Follow Us
Follow on FacebookFollow on Facebook
Follow on TwitterFollow on Twitter
Follow on LinkedInFollow on LinkedIn

+ Translate

Oversampling the Minority Class in the Feature Space

Oversampling the Minority Class in the Feature Space

IEEE Transactions on Neural Networks and Learning Systems 27(9): 1947-1961

The imbalanced nature of some real-world data is one of the current challenges for machine learning researchers. One common approach oversamples the minority class through convex combination of its patterns. We explore the general idea of synthetic oversampling in the feature space induced by a kernel function (as opposed to input space). If the kernel function matches the underlying problem, the classes will be linearly separable and synthetically generated patterns will lie on the minority class region. Since the feature space is not directly accessible, we use the empirical feature space (EFS) (a Euclidean space isomorphic to the feature space) for oversampling purposes. The proposed method is framed in the context of support vector machines, where the imbalanced data sets can pose a serious hindrance. The idea is investigated in three scenarios: 1) oversampling in the full and reduced-rank EFSs; 2) a kernel learning technique maximizing the data class separation to study the influence of the feature space structure (implicitly defined by the kernel function); and 3) a unified framework for preferential oversampling that spans some of the previous approaches in the literature. We support our investigation with extensive experiments over 50 imbalanced data sets.

(PDF emailed within 0-6 h: $19.90)

Accession: 058486932

Download citation: RISBibTeXText

PMID: 26316222

DOI: 10.1109/TNNLS.2015.2461436

Related references

A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique. Biomed Research International 2018: 9364182-9364182, 2018

RAMOBoost: Ranked Minority Oversampling in Boosting. IEEE Transactions on Neural Networks 21(10): 1624-1642, 2011

Multi-class pattern classification using single, multi-dimensional feature-space feature extraction evolved by multi-objective genetic programming and its application to network intrusion detection. Genetic Programming and Evolvable Machines 13(1): 33-63, 2012

Simultaneous optimization of class configuration and feature space for object recognition. Systems and Computers in Japan 38(13): 72-81, 2007

Improving lung cancer prognosis assessment by incorporating synthetic minority oversampling technique and score fusion method. Medical Physics 43(6): 2694-2694, 2016

Reducing the n-gram feature space of class C GPCRs to subtype-discriminating patterns. Journal of Integrative Bioinformatics 11(3): 254-254, 2015

Reducing the n-gram feature space of class C GPCRs to subtype-discriminating patterns. Journal of Integrative Bioinformatics 11(3): 99-115, 2017

Intelligent computational model for classification of sub-Golgi protein using oversampling and fisher feature selection methods. Artificial Intelligence in Medicine 78: 14-22, 2017

Feature selection and oversampling in analysis of clinical data for extubation readiness in extreme preterm infants. Conference Proceedings 2015: 4427-4430, 2016

Transfer Learning across Feature-Rich Heterogeneous Feature Spaces via Feature-Space Remapping (FSR). Acm Transactions on Intelligent Systems and Technology 6(1): -, 2016

S-SulfPred: A sensitive predictor to capture S-sulfenylation sites based on a resampling one-sided selection undersampling-synthetic minority oversampling technique. Journal of Theoretical Biology 422: 84-89, 2017

Sample-space-based feature extraction and class preserving projection for gene expression data. International Journal of Data Mining and Bioinformatics 8(2): 224-246, 2014

Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification. Biodata Mining 9: 37-37, 2016

Classification of Imbalanced Data by Oversampling in Kernel Space of Support Vector Machines. IEEE Transactions on Neural Networks and Learning Systems, 2017

Feature-based attention elicits surround suppression in feature space. Current Biology 24(17): 1985-1988, 2015