SMOTE¶
- class synthyverse.generators.smote_generator.SMOTEGenerator(target_column, k_neighbors=5, n_jobs=-1, random_state=0, **kwargs)[source]¶
Bases:
TabularBaseGeneratorSynthetic Minority Over-sampling Technique (SMOTE) for tabular data.
Creates synthetic samples via interpolation in feature space using SMOTE.
For classification tasks, the provided target column is used directly for class-conditional oversampling. For regression tasks, a pseudo-binary target is derived by splitting the target at its median, following a strategy similar to the TabDDPM paper.
- Parameters:
target_column (str) – Name of the target column used to drive oversampling.
k_neighbors (int) – Number of nearest neighbors used during interpolation. Default: 5.
n_jobs (int) – Number of parallel jobs for nearest-neighbor search. Default: -1.
random_state (int) – Random seed for reproducibility. Default: 0.
**kwargs – Additional arguments passed to TabularBaseGenerator.
Example
>>> import pandas as pd >>> from synthyverse.generators import SMOTEGenerator >>> >>> # Load data and define discrete features >>> X = pd.read_csv("data.csv") >>> discrete_features = ["target", "category_col"] >>> >>> # Create generator >>> generator = SMOTEGenerator( ... target_column="target", ... k_neighbors=5, ... random_state=42 ... ) >>> >>> # Fit and generate synthetic rows >>> generator.fit(X, discrete_features) >>> X_syn = generator.generate(1000)