Univariate¶
- class synthyverse.generators.univariate_generator.UnivariateGenerator(random_state=0, n_quantiles=1000)¶
Bases:
BaseGeneratorRegistry name:
univariateUnivariate baseline generator for tabular synthetic data.
Generates each feature independently. Categorical features are sampled from their empirical category frequencies. Numerical features are fitted with
sklearn.preprocessing.QuantileTransformerand sampled by drawing uniform values followed by inverse transformation.- Parameters:
random_state (int) – Random seed for reproducibility. Default: 0.
n_quantiles (int) – Maximum number of quantiles used by each numerical
QuantileTransformer. The effective value is capped at the number of non-missing observations per feature. Default: 1000.
Example
>>> import pandas as pd >>> from synthyverse.generators import UnivariateGenerator >>> >>> # Load data >>> X = pd.read_csv("data.csv") >>> discrete_features = ["category_col"] >>> >>> # Create generator >>> generator = UnivariateGenerator(random_state=42) >>> >>> # Fit and generate >>> generator.fit(X, discrete_features) >>> X_syn = generator.generate(1000)
- fit(X, discrete_features, X_val=None)¶
Fit the generator to tabular data.
- Parameters:
X (
DataFrame) – Training data in the generator’s input space.discrete_features (
list) – Names of categorical/discrete columns inX.X_val (
Optional[DataFrame]) – Optional validation data in the same schema asX.
- Returns:
The fitted generator.
- generate(n)¶
Generate synthetic tabular data.
- Parameters:
n (
int) – Number of synthetic rows to generate.- Returns:
Synthetic data in the generator’s model space.
- classmethod load(path)¶
Load a generator persisted with the default pickle layout.
- save(path)¶
Persist the generator state with the default pickle layout.