Distribution
Distributor classes for clustering-based oversampling.
DensityDistributor(filtering_threshold='auto', distances_exponent='auto', sparsity_based=True, distribution_ratio=1.0)
Bases: BaseDistributor
Class to perform density based distribution.
Samples are distributed based on the density of clusters.
Read more in the [user_guide].
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filtering_threshold
|
float | str
|
The threshold of a filtered cluster. It can be any non-negative number or
|
'auto'
|
distances_exponent
|
float | str
|
The exponent of the mean distance in the density calculation. It can be
any non-negative number or
|
'auto'
|
sparsity_based
|
bool
|
Whether sparse clusters receive more generated samples.
|
True
|
distribution_ratio
|
float
|
The ratio of intra-cluster to inter-cluster generated samples. It is a
number in the |
1.0
|
Attributes:
Name | Type | Description |
---|---|---|
clusters_density_ |
Density
|
Each dict key is a multi-label tuple of shape |
distances_exponent_ |
float
|
Actual exponent of the mean distance used in the calculations. |
distribution_ratio_ |
float
|
A copy of the parameter in the constructor. |
filtered_clusters_ |
List[MultiLabel]
|
Each element is a tuple of |
filtering_threshold_ |
float
|
Actual filtering threshold used in the calculations. |
inter_distribution_ |
InterDistribution
|
Each dict key is a multi-label tuple of
shape |
intra_distribution_ |
IntraDistribution
|
Each dict key is a multi-label tuple of shape |
labels_ |
Labels
|
Labels of each sample. |
neighbors_ |
Neighbors
|
An array that contains all neighboring pairs. Each row is a unique neighboring pair. |
majority_class_label_ |
int
|
The majority class label. |
n_samples_ |
int
|
The number of samples. |
sparsity_based_ |
bool
|
A copy of the parameter in the constructor. |
unique_class_labels_ |
Labels
|
An array of unique class labels. |
unique_cluster_labels_ |
Labels
|
An array of unique cluster labels. |
Examples:
>>> import numpy as np
>>> from imblearn_extra.clover.distribution import DensityDistributor
>>> from sklearn.datasets import load_iris
>>> from sklearn.cluster import KMeans
>>> from imblearn.datasets import make_imbalance
>>> np.set_printoptions(legacy='1.25')
>>> X, y = make_imbalance(
... *load_iris(return_X_y=True),
... sampling_strategy={0:50, 1:40, 2:30},
... random_state=0
... )
>>> labels = KMeans(random_state=0, n_init='auto').fit_predict(X, y)
>>> density_distributor = DensityDistributor().fit(X, y, labels)
>>> density_distributor.filtered_clusters_
[(6, 1), (0, 1), (3, 1), (7, 1), (5, 2), (2, 2), (3, 2), (6, 2), (0, 2)]
>>> density_distributor.intra_distribution_
{(6, 1): 0.50604609281055... (0, 1): 0.143311766542168...}
>>> density_distributor.inter_distribution_
{}
Source code in src/imblearn_extra/clover/distribution/_density.py
142 143 144 145 146 147 148 149 150 151 152 |
|