XGenBoost AR¶
- class synthyverse.generators.xgenboost_generator.XGB_AR_Generator(target_column, conditioning='inference', xgboost_params={'device': 'cpu', 'early_stopping_rounds': 20, 'max_bin': 256, 'max_depth': 3, 'n_estimators': 30}, use_early_stopping=False, temperature=1.0, discretization='quantile', per_bin_sampling='eqf', cat_merge_type='clustering', cat_merge_n_infrequent=5, visit_order_method='naive', visit_order_mode='ascending', random_state=0, n_jobs_xgb=1, n_jobs=-1, H=5, route_method='routing', start_method='bootstrap', **kwargs)[source]¶
Bases:
XGenBoostXGenBoost autoregressive generator.
Trains a hierarchical autoregressive model where conditionals are learned by XGBoost classifiers.
- Parameters:
target_column (str) – Name of the target column.
conditioning (str) – Conditioning mode. Options: “generation”, “inference”. Default: “inference”.
xgboost_params (dict) – Parameters passed to each underlying XGBoost model. Default: {“n_estimators”: 30, “max_depth”: 3, “max_bin”: 256, “early_stopping_rounds”: 20, “device”: “cpu”}.
use_early_stopping (bool) – Whether to use validation-based early stopping when validation data is provided. Default: False.
temperature (float) – Sampling temperature for posterior sampling. Default: 1.0.
discretization (str) – Numerical discretization strategy. Default: “quantile”.
per_bin_sampling (str) – Sampling method within numerical bins. Default: “eqf”.
cat_merge_type (str) – Strategy for merging infrequent categories. Default: “clustering”.
cat_merge_n_infrequent (int) – Number of infrequent category clusters to merge into. Default: 5.
visit_order_method (str) – Feature visit-order method. Default: “naive”.
visit_order_mode (str) – Visit-order direction. Options: “ascending”, “descending”. Default: “ascending”.
random_state (int) – Random seed for reproducibility. Default: 0.
n_jobs_xgb (int) – Number of threads used per XGBoost model. Default: 1.
n_jobs (int) – Number of parallel jobs used to train/sample across tasks. Default: -1.
H (int) – Meta-tree height for numerical features. The number of bins is
2**H. Default: 5.route_method (str) – Numerical routing method. Options: “propagate”, “routing”. Default: “routing”.
start_method (str) – Initialization method for the first feature. Options: “bootstrap”, “eqf”. Default: “bootstrap”.
**kwargs – Additional arguments passed to TabularBaseGenerator.
Example
>>> import pandas as pd >>> from synthyverse.generators import XGB_AR_Generator >>> >>> # Load data >>> X = pd.read_csv("data.csv") >>> discrete_features = ["target", "category_col"] >>> >>> # Create generator (requires target column) >>> generator = XGB_AR_Generator( ... target_column="target", ... H=5, ... random_state=42 ... ) >>> >>> # Fit and generate >>> generator.fit(X, discrete_features) >>> X_syn = generator.generate(1000)