Bayesian Network

class synthyverse.generators.bn_generator.BNGenerator(struct_learning_n_iter=1000, struct_learning_search_method='tree_search', struct_learning_score='k2', struct_max_indegree=4, encoder_max_clusters=10, encoder_noise_scale=0.1, random_state=0, **kwargs)[source]

Bases: TabularBaseGenerator

Bayesian Network (BN).

Uses Bayesian networks to model dependencies between variables and generate synthetic data by sampling from the learned joint distribution.

Uses the implementation from Synthcity (https://github.com/vanderschaarlab/synthcity/).

Parameters:
  • struct_learning_n_iter (int) – Number of iterations for DAG learning. Default: 1000.

  • struct_learning_search_method (str) – Search method for DAG learning. Options: “hillclimb”, “pc”, “tree_search”, “mmhc”, “exhaustive”. Default: “tree_search”.

  • struct_learning_score (str) – Scoring function for DAG learning. Options: “k2”, “bdeu”, “bic”, “bds”. Default: “k2”.

  • struct_max_indegree (int) – Maximum number of parents for each node. Decrease to reduce computational overhead. Default: 4.

  • encoder_max_clusters (int) – Maximum clusters for encoding continuous variables. Default: 10.

  • encoder_noise_scale (float) – Noise scale for encoding. Default: 0.1.

  • random_state (int) – Random seed for reproducibility. Default: 0.

  • **kwargs – Additional arguments passed to TabularBaseGenerator.

Example

>>> import pandas as pd
>>> from synthyverse.generators import BNGenerator
>>>
>>> # Load data
>>> X = pd.read_csv("data.csv")
>>> discrete_features = ["category_col"]
>>>
>>> # Create generator
>>> generator = BNGenerator(
...     struct_learning_search_method="tree_search",
...     struct_learning_score="k2",
...     random_state=42
... )
>>>
>>> # Fit and generate
>>> generator.fit(X, discrete_features)
>>> X_syn = generator.generate(1000)