Bayesian Network¶
- class synthyverse.generators.bn_generator.BNGenerator(struct_learning_n_iter=1000, struct_learning_search_method='tree_search', struct_learning_score='k2', struct_max_indegree=4, encoder_max_clusters=10, encoder_noise_scale=0.1, random_state=0, **kwargs)[source]¶
Bases:
TabularBaseGeneratorBayesian Network (BN).
Uses Bayesian networks to model dependencies between variables and generate synthetic data by sampling from the learned joint distribution.
Uses the implementation from Synthcity (https://github.com/vanderschaarlab/synthcity/).
- Parameters:
struct_learning_n_iter (int) – Number of iterations for DAG learning. Default: 1000.
struct_learning_search_method (str) – Search method for DAG learning. Options: “hillclimb”, “pc”, “tree_search”, “mmhc”, “exhaustive”. Default: “tree_search”.
struct_learning_score (str) – Scoring function for DAG learning. Options: “k2”, “bdeu”, “bic”, “bds”. Default: “k2”.
struct_max_indegree (int) – Maximum number of parents for each node. Decrease to reduce computational overhead. Default: 4.
encoder_max_clusters (int) – Maximum clusters for encoding continuous variables. Default: 10.
encoder_noise_scale (float) – Noise scale for encoding. Default: 0.1.
random_state (int) – Random seed for reproducibility. Default: 0.
**kwargs – Additional arguments passed to TabularBaseGenerator.
Example
>>> import pandas as pd >>> from synthyverse.generators import BNGenerator >>> >>> # Load data >>> X = pd.read_csv("data.csv") >>> discrete_features = ["category_col"] >>> >>> # Create generator >>> generator = BNGenerator( ... struct_learning_search_method="tree_search", ... struct_learning_score="k2", ... random_state=42 ... ) >>> >>> # Fit and generate >>> generator.fit(X, discrete_features) >>> X_syn = generator.generate(1000)