Unmasking Trees¶
- class synthyverse.generators.unmaskingtrees_generator.UnmaskingTreesGenerator(depth=4, duplicate_K=50, xgboost_kwargs={}, strategy='kdiquantile', softmax_temp=1, cast_float32=True, tabpfn=False, random_state=0, **kwargs)[source]¶
Bases:
TabularBaseGeneratorUnmasking Trees.
Unmasking Trees is an autoregressive model which hierarchically partitions features into binary bins, to then recursively train XGBoost classifiers along the meta-tree hierarchy.
We use the implementation from the utrees pypi package. Can be costly for large datasets.
Paper: “Unmasking trees for tabular data” by C. McCarter (2024).
- Parameters:
depth (int) – Depth of the meta-tree. Default: 4.
duplicate_K (int) – Number of duplications for each sample. Default: 50.
xgboost_kwargs (dict) – Dictionary of additional XGBoost parameters. Default: {}.
strategy (str) – Strategy for quantization. Options: “quantile”, “uniform”, “kmeans”, “kdiquantile”. Default: “kdiquantile”.
softmax_temp (float) – Temperature for softmax. Default: 1.
cast_float32 (bool) – Whether to cast to float32. Default: True.
tabpfn (bool) – Whether to use TabPFN. Default: False.
random_state (int) – Random seed for reproducibility. Default: 0.
**kwargs – Additional arguments passed to TabularBaseGenerator.
Example
>>> import pandas as pd >>> from synthyverse.generators import UnmaskingTreesGenerator >>> >>> # Load data >>> X = pd.read_csv("data.csv") >>> discrete_features = ["category_col"] >>> >>> # Create generator >>> generator = UnmaskingTreesGenerator( ... depth=4, ... duplicate_K=50, ... strategy="kdiquantile", ... random_state=42 ... ) >>> >>> # Fit and generate >>> generator.fit(X, discrete_features) >>> X_syn = generator.generate(1000)