RealTabFormer¶
- class synthyverse.generators.realtabformer_generator.RealTabFormerGenerator(workspace, epochs=1000, batch_size=8, mask_rate=0, early_stopping_patience=5, early_stopping_threshold=0, random_state=0, **kwargs)[source]¶
Bases:
TabularBaseGeneratorRealistic Relational and Tabular Data using Transformers.
Fine-tunes GPT-2 for tabular synthetic data generation.
Uses the realtabformer pypi package implementation.
Paper: “Realtabformer: Generating realistic relational and tabular data using transformers” by Solatorio et al. (2023).
- Parameters:
workspace (str) – Directory for storing checkpoints and samples.
epochs (int) – Number of training epochs. Default: 1000.
batch_size (int) – Batch size for training. Default: 8.
mask_rate (float) – Masking rate for training. Default: 0.
early_stopping_patience (int) – Patience for early stopping. Default: 5.
early_stopping_threshold (float) – Threshold for early stopping. Default: 0.
random_state (int) – Random seed for reproducibility. Default: 0.
**kwargs – Additional arguments passed to TabularBaseGenerator.
Example
>>> import pandas as pd >>> from synthyverse.generators import RealTabFormerGenerator >>> >>> # Load data >>> X = pd.read_csv("data.csv") >>> discrete_features = ["category_col"] >>> >>> # Create generator (requires workspace) >>> generator = RealTabFormerGenerator( ... workspace="./realtabformer_workspace", ... epochs=1000, ... batch_size=8, ... random_state=42 ... ) >>> >>> # Fit and generate >>> generator.fit(X, discrete_features) >>> X_syn = generator.generate(1000)