Geometric smote
Class to perform over-sampling using Geometric SMOTE.
GeometricSMOTE(sampling_strategy='auto', k_neighbors=5, truncation_factor=1.0, deformation_factor=0.0, selection_strategy='combined', categorical_features=None, random_state=None, n_jobs=1)
Bases: BaseOverSampler
Class to to perform over-sampling using Geometric SMOTE.
This algorithm is an implementation of Geometric SMOTE, a geometrically enhanced drop-in replacement for SMOTE. Read more in the [user_guide].
Parameters:
Name | Type | Description | Default |
---|---|---|---|
categorical_features
|
ArrayLike | None
|
Specified which features are categorical. Can either be:
|
None
|
sampling_strategy
|
dict[int, int] | str | float | Callable
|
Sampling information to resample the data set.
|
'auto'
|
random_state
|
RandomState | int | None
|
Control the randomization of the algorithm.
|
None
|
truncation_factor
|
float
|
The type of truncation. The values should be in the [-1.0, 1.0] range. |
1.0
|
deformation_factor
|
float
|
The type of geometry. The values should be in the [0.0, 1.0] range. |
0.0
|
selection_strategy
|
str
|
The type of Geometric SMOTE algorithm with the following options:
|
'combined'
|
k_neighbors
|
NearestNeighbors | int
|
If |
5
|
n_jobs
|
int | None
|
The number of threads to open if possible. |
1
|
Attributes:
Name | Type | Description |
---|---|---|
n_features_in_ |
int Number of features in the input dataset. |
|
nns_pos_ |
estimator object
Validated k-nearest neighbours created from the |
|
nn_neg_ |
estimator object
Validated k-nearest neighbours created from the |
|
random_state_ |
RandomState
|
An instance of |
sampling_strategy_ |
dict[int, int]
|
Actual sampling strategy. |
Examples:
>>> import numpy as np
>>> from collections import Counter
>>> from sklearn.datasets import make_classification
>>> from imblearn_extra.gsmote import GeometricSMOTE
>>> np.set_printoptions(legacy='1.25')
>>> X, y = make_classification(n_classes=2, class_sep=2,
... weights=[0.1, 0.9], n_informative=3, n_redundant=1, flip_y=0,
... n_features=20, n_clusters_per_class=1, n_samples=1000, random_state=10)
>>> print('Original dataset shape %s' % Counter(y))
Original dataset shape Counter({{1: 900, 0: 100}})
>>> gsmote = GeometricSMOTE(random_state=1)
>>> X_resampled, y_resampled = gsmote.fit_resample(X, y)
>>> print('Resampled dataset shape %s' % Counter(y_resampled))
Resampled dataset shape Counter({{0: 900, 1: 900}})
Source code in src/imblearn_extra/gsmote/geometric_smote.py
219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 |
|
make_geometric_sample(center, surface_point, truncation_factor, deformation_factor, random_state)
A support function that returns an artificial point.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
center
|
NDArray
|
The center point. |
required |
surface_point
|
NDArray
|
The point on the surface of the hypersphere. |
required |
truncation_factor
|
float
|
The truncation factor of the algorithm. |
required |
deformation_factor
|
float
|
The defirmation factor of the algorithm. |
required |
random_state
|
RandomState
|
The random state of the process. |
required |
Returns:
Name | Type | Description |
---|---|---|
geometric_sample |
NDArray
|
The generated geometric sample. |
Source code in src/imblearn_extra/gsmote/geometric_smote.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
|
populate_categorical_features(X_new, neighbors, categories_size, random_state)
A support function that populates categorical features.
Source code in src/imblearn_extra/gsmote/geometric_smote.py
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|