ARF¶
- class synthyverse.generators.arf_generator.ARFGenerator(num_trees=20, delta=0.0, max_iters=10, early_stop=True, verbose=True, min_node_size=5, retain_value_ranges=False, random_state=0, **kwargs)[source]¶
Bases:
TabularBaseGeneratorAdversarial Random Forest (ARF).
ARF leverages random forests in alternating rounds of generation/discrimination to estimate densities and generate synthetic data.
Uses the arfpy package implementation.
Paper: “Adversarial random forests for density estimation and generative modeling” by Watson et al. (2023).
- Parameters:
num_trees (int) – Number of trees in the random forests. Default: 20.
delta (float) – Tolerance parameter for convergence. Default: 0.0.
max_iters (int) – Maximum number of adversarial iterations. Default: 10.
early_stop (bool) – Whether to use early stopping. Default: True.
verbose (bool) – Whether to print training progress. Default: True.
min_node_size (int) – Minimum leaf node samples in trees. Default: 5.
retain_value_ranges (bool) – Whether to clip numerical features to training ranges after generation. Default: False.
random_state (int) – Random seed for reproducibility. Default: 0.
**kwargs – Additional arguments passed to TabularBaseGenerator.
Example
>>> import pandas as pd >>> from synthyverse.generators import ARFGenerator >>> >>> # Load data >>> X = pd.read_csv("data.csv") >>> discrete_features = ["category_col"] >>> >>> # Create generator >>> generator = ARFGenerator( ... num_trees=50, ... max_iters=10, ... early_stop=True, ... random_state=42 ... ) >>> >>> # Fit and generate >>> generator.fit(X, discrete_features) >>> X_syn = generator.generate(1000)