Unmasking Trees

class synthyverse.generators.unmaskingtrees_generator.UnmaskingTreesGenerator(depth=4, duplicate_K=50, xgboost_kwargs={}, strategy='kdiquantile', softmax_temp=1, cast_float32=True, tabpfn=False, random_state=0, **kwargs)[source]

Bases: TabularBaseGenerator

Unmasking Trees.

Unmasking Trees is an autoregressive model which hierarchically partitions features into binary bins, to then recursively train XGBoost classifiers along the meta-tree hierarchy.

We use the implementation from the utrees pypi package. Can be costly for large datasets.

Paper: “Unmasking trees for tabular data” by C. McCarter (2024).

Parameters:
  • depth (int) – Depth of the meta-tree. Default: 4.

  • duplicate_K (int) – Number of duplications for each sample. Default: 50.

  • xgboost_kwargs (dict) – Dictionary of additional XGBoost parameters. Default: {}.

  • strategy (str) – Strategy for quantization. Options: “quantile”, “uniform”, “kmeans”, “kdiquantile”. Default: “kdiquantile”.

  • softmax_temp (float) – Temperature for softmax. Default: 1.

  • cast_float32 (bool) – Whether to cast to float32. Default: True.

  • tabpfn (bool) – Whether to use TabPFN. Default: False.

  • random_state (int) – Random seed for reproducibility. Default: 0.

  • **kwargs – Additional arguments passed to TabularBaseGenerator.

Example

>>> import pandas as pd
>>> from synthyverse.generators import UnmaskingTreesGenerator
>>>
>>> # Load data
>>> X = pd.read_csv("data.csv")
>>> discrete_features = ["category_col"]
>>>
>>> # Create generator
>>> generator = UnmaskingTreesGenerator(
...     depth=4,
...     duplicate_K=50,
...     strategy="kdiquantile",
...     random_state=42
... )
>>>
>>> # Fit and generate
>>> generator.fit(X, discrete_features)
>>> X_syn = generator.generate(1000)