Geometric SMOTE for regression

Artificial Intelligence

Machine Learning

Imbalanced Data

Publication

Authors

Affiliation

Luis Camacho

NOVA IMS

Georgios Douzas

NOVA IMS

Fernando Bacao

NOVA IMS

Abstract

Learning from imbalanced data sets is known to be a challenging task. There are many proposals to tackle the challenge for classification problems, but regarding regression the solutions are few. In the context of regression, imbalanced learning means that there is a concern with the accurate prediction of the target values in a subset of the continuous target variable, considering that these values rarely occur in the data set. In this article, we extend the G-SMOTE algorithm that is used in classification to regression tasks. G-SMOTE is a pre-processing algorithm that differs from the SMOTE algorithm as it allows the generation of synthetic instances in a geometric region around the selected instances rather than in the line segment that joins the two selected instances. The performance of G-SMOTE for regression was compared against other methods, and the empirical results show that our proposal outperformed those methods.