CTABGAN¶
- class synthyverse.generators.ctabgan_generator.CTABGANGenerator(target_column, class_dim=(256, 256, 256, 256), random_dim=100, num_channels=64, l2scale=1e-05, batch_size=500, epochs=150, sides=[4, 8, 16, 24, 32, 64], random_state=0, **kwargs)[source]¶
Bases:
TabularBaseGeneratorConditional Tabular GAN (CTABGAN).
This is the CTABGAN+ implementation from the original paper. Improves on previous conditional GANs through convolutional layers and elaborate preprocessing schemes. Unlike the original implementation, we automatically detect feature-type categories (e.g., gaussian-like columns) as part of preprocessing.
Paper: “Ctab-gan+: Enhancing tabular data synthesis” by Zhao et al. (2024).
- Parameters:
target_column (str) – Name of the target column.
class_dim (tuple) – Tuple of dimensions for class-specific layers. Default: (256, 256, 256, 256).
random_dim (int) – Dimension of random noise vector. Default: 100.
num_channels (int) – Number of channels in generator. Default: 64.
l2scale (float) – L2 regularization scale. Default: 1e-5.
batch_size (int) – Batch size for training. Default: 500.
epochs (int) – Number of training epochs. Default: 150.
sides (list) – List of side dimensions for generator. Default: [4, 8, 16, 24, 32, 64].
random_state (int) – Random seed for reproducibility. Default: 0.
**kwargs – Additional arguments passed to TabularBaseGenerator.
Example
>>> import pandas as pd >>> from synthyverse.generators import CTABGANGenerator >>> >>> # Load data >>> X = pd.read_csv("data.csv") >>> discrete_features = ["category_col"] >>> >>> # Create generator (requires target column) >>> generator = CTABGANGenerator( ... target_column="target", ... epochs=150, ... batch_size=500, ... random_state=42 ... ) >>> >>> # Fit and generate >>> generator.fit(X, discrete_features) >>> X_syn = generator.generate(1000)