Correlations

class synthyverse.evaluation.fidelity.Correlations(discrete_features=[], numerical_correlation='pearson', img_save_path=None)

Bases: object

Registry name: correlations

Pairwise correlation matrix difference between real and synthetic data.

Builds a full correlation matrix for both real and synthetic data and returns the L2 norm of their absolute difference. Correlation type is chosen automatically per feature pair: Spearman/Pearson for numerical-numerical, Cramer’s V for categorical-categorical, and the correlation ratio (eta-squared) for mixed pairs.

Lower scores indicate better preservation of feature dependencies.

Parameters:
  • discrete_features (list) – List of discrete/categorical feature names. Default: [].

  • numerical_correlation (str) – Correlation method for numerical-numerical pairs. One of “spearman” or “pearson”. Default: “pearson”.

  • img_save_path (str, optional) – Directory where correlation matrix plots will be saved. If a file path is provided, its basename is used as a prefix. Default: None.

Example

>>> import pandas as pd
>>> from synthyverse.evaluation import Correlations
>>>
>>> # Prepare data
>>> X_real = pd.DataFrame(...)
>>> X_syn = pd.DataFrame(...)
>>> discrete_features = ["category_col"]
>>>
>>> # Create metric
>>> metric = Correlations(
...     discrete_features=discrete_features,
...     numerical_correlation="spearman",
...     img_save_path="results/correlations",
... )
>>>
>>> # Evaluate
>>> results = metric.evaluate(X_real, X_syn)
evaluate(X_train, X_syn)

Evaluate synthetic data by comparing pairwise correlation matrices.

Parameters:
  • X_train (DataFrame) – Real training data as a pandas DataFrame.

  • X_syn (DataFrame) – Synthetic data as a pandas DataFrame.

Returns:

Dictionary with key:
  • ”correlations.l2”: L2 norm of the absolute difference between the real and synthetic correlation matrices

  • ”correlations.img_paths”: Saved plot paths when img_save_path is set

Return type:

dict