imbalanced-learn-extra
Category | Tools |
---|---|
Development | |
Package | |
Documentation | |
Communication |
Introduction
imbalanced-learn-extra
is a Python package that extends imbalanced-learn. It implements algorithms that are not included in
imbalanced-learn due to their novelty or lower citation number. The current version includes the following:
-
A general interface for clustering-based oversampling algorithms.
-
The Geometric SMOTE algorithm. It is a geometrically enhanced drop-in replacement for SMOTE, that handles numerical as well as categorical features.
Installation
For user installation, imbalanced-learn-extra
is currently available on the PyPi's repository, and you can
install it via pip
:
pip install imbalanced-learn-extra
Development installation requires cloning the repository and then using PDM to install the project as well as the main and development dependencies:
git clone https://github.com/georgedouzas/imbalanced-learn-extra.git
cd imbalanced-learn-extra
pdm install
SOM clusterer requires optional dependencies:
pip install imbalanced-learn-extra[som]
Usage
All the classes included in imbalanced-learn-extra
follow the imbalanced-learn API using the functionality of the base
oversampler. Using scikit-learn convention, the data are represented as follows:
- Input data
X
: 2D array-like or sparse matrices. - Targets
y
: 1D array-like.
The oversamplers implement a fit
method to learn from X
and y
:
oversampler.fit(X, y)
They also implement a fit_resample
method to resample X
and y
:
X_resampled, y_resampled = clustering_based_oversampler.fit_resample(X, y)
Citing imbalanced-learn-extra
Publications using clustering-based oversampling:
- G. Douzas, F. Bacao, "Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning", Expert Systems with Applications, vol. 82, pp. 40-52, 2017.
- G. Douzas, F. Bacao, F. Last, "Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE", Information Sciences, vol. 465, pp. 1-20, 2018.
- G. Douzas, F. Bacao, F. Last, "G-SOMO: An oversampling approach based on self-organized maps and geometric SMOTE", Expert Systems with Applications, vol. 183,115230, 2021.
Publications using Geometric-SMOTE:
-
Douzas, G., Bacao, B. (2019). Geometric SMOTE: a geometrically enhanced drop-in replacement for SMOTE. Information Sciences, 501, 118-135. https://doi.org/10.1016/j.ins.2019.06.007
-
Fonseca, J., Douzas, G., Bacao, F. (2021). Increasing the Effectiveness of Active Learning: Introducing Artificial Data Generation in Active Learning for Land Use/Land Cover Classification. Remote Sensing, 13(13), 2619. https://doi.org/10.3390/rs13132619
-
Douzas, G., Bacao, F., Fonseca, J., Khudinyan, M. (2019). Imbalanced Learning in Land Cover Classification: Improving Minority Classes’ Prediction Accuracy Using the Geometric SMOTE Algorithm. Remote Sensing, 11(24), 3040. https://doi.org/10.3390/rs11243040