Bayesian Network

class synthyverse.generators.bn_generator.BNGenerator(struct_learning_n_iter=1000, struct_learning_search_method='tree_search', struct_learning_score='k2', struct_max_indegree=4, encoder_max_clusters=10, encoder_noise_scale=0.1, random_state=0)

Bases: BaseGenerator

Registry name: bn

Bayesian Network (BN).

Uses Bayesian networks to model dependencies between variables and generate synthetic data by sampling from the learned joint distribution.

Uses the implementation from Synthcity (https://github.com/vanderschaarlab/synthcity/).

Parameters:
  • struct_learning_n_iter (int) – Number of iterations for DAG learning. Default: 1000.

  • struct_learning_search_method (str) – Search method for DAG learning. Options: “hillclimb”, “pc”, “tree_search”, “mmhc”, “exhaustive”. Default: “tree_search”.

  • struct_learning_score (str) – Scoring function for DAG learning. Options: “k2”, “bdeu”, “bic”, “bds”. Default: “k2”.

  • struct_max_indegree (int) – Maximum number of parents for each node. Decrease to reduce computational overhead. Default: 4.

  • encoder_max_clusters (int) – Maximum clusters for encoding continuous variables. Default: 10.

  • encoder_noise_scale (float) – Noise scale for encoding. Default: 0.1.

  • random_state (int) – Random seed for reproducibility. Default: 0.

Example

>>> import pandas as pd
>>> from synthyverse.generators import BNGenerator
>>>
>>> # Load data
>>> X = pd.read_csv("data.csv")
>>> discrete_features = ["category_col"]
>>>
>>> # Create generator
>>> generator = BNGenerator(
...     struct_learning_search_method="tree_search",
...     struct_learning_score="k2",
...     random_state=42
... )
>>>
>>> # Fit and generate
>>> generator.fit(X, discrete_features)
>>> X_syn = generator.generate(1000)
fit(X, discrete_features, X_val=None)

Fit the generator to tabular data.

Parameters:
  • X (DataFrame) – Training data in the generator’s input space.

  • discrete_features (list) – Names of categorical/discrete columns in X.

  • X_val (Optional[DataFrame]) – Optional validation data in the same schema as X.

Returns:

The fitted generator.

generate(n)

Generate synthetic tabular data.

Parameters:

n (int) – Number of synthetic rows to generate.

Returns:

Synthetic data in the generator’s model space.

classmethod load(path)

Load a generator persisted with the default pickle layout.

save(path)

Persist the generator state with the default pickle layout.