ARF

class synthyverse.generators.arf_generator.ARFGenerator(num_trees=20, delta=0.0, max_iters=10, early_stop=True, verbose=True, min_node_size=5, retain_value_ranges=False, random_state=0, **kwargs)[source]

Bases: TabularBaseGenerator

Adversarial Random Forest (ARF).

ARF leverages random forests in alternating rounds of generation/discrimination to estimate densities and generate synthetic data.

Uses the arfpy package implementation.

Paper: “Adversarial random forests for density estimation and generative modeling” by Watson et al. (2023).

Parameters:
  • num_trees (int) – Number of trees in the random forests. Default: 20.

  • delta (float) – Tolerance parameter for convergence. Default: 0.0.

  • max_iters (int) – Maximum number of adversarial iterations. Default: 10.

  • early_stop (bool) – Whether to use early stopping. Default: True.

  • verbose (bool) – Whether to print training progress. Default: True.

  • min_node_size (int) – Minimum leaf node samples in trees. Default: 5.

  • retain_value_ranges (bool) – Whether to clip numerical features to training ranges after generation. Default: False.

  • random_state (int) – Random seed for reproducibility. Default: 0.

  • **kwargs – Additional arguments passed to TabularBaseGenerator.

Example

>>> import pandas as pd
>>> from synthyverse.generators import ARFGenerator
>>>
>>> # Load data
>>> X = pd.read_csv("data.csv")
>>> discrete_features = ["category_col"]
>>>
>>> # Create generator
>>> generator = ARFGenerator(
...     num_trees=50,
...     max_iters=10,
...     early_stop=True,
...     random_state=42
... )
>>>
>>> # Fit and generate
>>> generator.fit(X, discrete_features)
>>> X_syn = generator.generate(1000)