Correlations¶
- class synthyverse.evaluation.fidelity.Correlations(discrete_features=[], numerical_correlation='pearson', img_save_path=None)¶
Bases:
objectRegistry name:
correlationsPairwise correlation matrix difference between real and synthetic data.
Builds a full correlation matrix for both real and synthetic data and returns the L2 norm of their absolute difference. Correlation type is chosen automatically per feature pair: Spearman/Pearson for numerical-numerical, Cramer’s V for categorical-categorical, and the correlation ratio (eta-squared) for mixed pairs.
Lower scores indicate better preservation of feature dependencies.
- Parameters:
discrete_features (list) – List of discrete/categorical feature names. Default: [].
numerical_correlation (str) – Correlation method for numerical-numerical pairs. One of “spearman” or “pearson”. Default: “pearson”.
img_save_path (str, optional) – Directory where correlation matrix plots will be saved. If a file path is provided, its basename is used as a prefix. Default: None.
Example
>>> import pandas as pd >>> from synthyverse.evaluation import Correlations >>> >>> # Prepare data >>> X_real = pd.DataFrame(...) >>> X_syn = pd.DataFrame(...) >>> discrete_features = ["category_col"] >>> >>> # Create metric >>> metric = Correlations( ... discrete_features=discrete_features, ... numerical_correlation="spearman", ... img_save_path="results/correlations", ... ) >>> >>> # Evaluate >>> results = metric.evaluate(X_real, X_syn)
- evaluate(X_train, X_syn)¶
Evaluate synthetic data by comparing pairwise correlation matrices.
- Parameters:
X_train (
DataFrame) – Real training data as a pandas DataFrame.X_syn (
DataFrame) – Synthetic data as a pandas DataFrame.
- Returns:
- Dictionary with key:
”correlations.l2”: L2 norm of the absolute difference between the real and synthetic correlation matrices
”correlations.img_paths”: Saved plot paths when img_save_path is set
- Return type:
dict