Univariate

class synthyverse.generators.univariate_generator.UnivariateGenerator(random_state=0, n_quantiles=1000)

Bases: BaseGenerator

Registry name: univariate

Univariate baseline generator for tabular synthetic data.

Generates each feature independently. Categorical features are sampled from their empirical category frequencies. Numerical features are fitted with sklearn.preprocessing.QuantileTransformer and sampled by drawing uniform values followed by inverse transformation.

Parameters:
  • random_state (int) – Random seed for reproducibility. Default: 0.

  • n_quantiles (int) – Maximum number of quantiles used by each numerical QuantileTransformer. The effective value is capped at the number of non-missing observations per feature. Default: 1000.

Example

>>> import pandas as pd
>>> from synthyverse.generators import UnivariateGenerator
>>>
>>> # Load data
>>> X = pd.read_csv("data.csv")
>>> discrete_features = ["category_col"]
>>>
>>> # Create generator
>>> generator = UnivariateGenerator(random_state=42)
>>>
>>> # Fit and generate
>>> generator.fit(X, discrete_features)
>>> X_syn = generator.generate(1000)
fit(X, discrete_features, X_val=None)

Fit the generator to tabular data.

Parameters:
  • X (DataFrame) – Training data in the generator’s input space.

  • discrete_features (list) – Names of categorical/discrete columns in X.

  • X_val (Optional[DataFrame]) – Optional validation data in the same schema as X.

Returns:

The fitted generator.

generate(n)

Generate synthetic tabular data.

Parameters:

n (int) – Number of synthetic rows to generate.

Returns:

Synthetic data in the generator’s model space.

classmethod load(path)

Load a generator persisted with the default pickle layout.

save(path)

Persist the generator state with the default pickle layout.