CTABGAN

class synthyverse.generators.ctabgan_generator.CTABGANGenerator(target_column, class_dim=(256, 256, 256, 256), random_dim=100, num_channels=64, l2scale=1e-05, batch_size=500, epochs=150, sides=[4, 8, 16, 24, 32, 64], random_state=0, **kwargs)[source]

Bases: TabularBaseGenerator

Conditional Tabular GAN (CTABGAN).

This is the CTABGAN+ implementation from the original paper. Improves on previous conditional GANs through convolutional layers and elaborate preprocessing schemes. Unlike the original implementation, we automatically detect feature-type categories (e.g., gaussian-like columns) as part of preprocessing.

Paper: “Ctab-gan+: Enhancing tabular data synthesis” by Zhao et al. (2024).

Parameters:
  • target_column (str) – Name of the target column.

  • class_dim (tuple) – Tuple of dimensions for class-specific layers. Default: (256, 256, 256, 256).

  • random_dim (int) – Dimension of random noise vector. Default: 100.

  • num_channels (int) – Number of channels in generator. Default: 64.

  • l2scale (float) – L2 regularization scale. Default: 1e-5.

  • batch_size (int) – Batch size for training. Default: 500.

  • epochs (int) – Number of training epochs. Default: 150.

  • sides (list) – List of side dimensions for generator. Default: [4, 8, 16, 24, 32, 64].

  • random_state (int) – Random seed for reproducibility. Default: 0.

  • **kwargs – Additional arguments passed to TabularBaseGenerator.

Example

>>> import pandas as pd
>>> from synthyverse.generators import CTABGANGenerator
>>>
>>> # Load data
>>> X = pd.read_csv("data.csv")
>>> discrete_features = ["category_col"]
>>>
>>> # Create generator (requires target column)
>>> generator = CTABGANGenerator(
...     target_column="target",
...     epochs=150,
...     batch_size=500,
...     random_state=42
... )
>>>
>>> # Fit and generate
>>> generator.fit(X, discrete_features)
>>> X_syn = generator.generate(1000)