Wasserstein¶
- class synthyverse.evaluation.fidelity.Wasserstein(discrete_features=None, blur=0.001, scaling=0.5, debias=True, backend='online')¶
Bases:
objectRegistry name:
wassersteinMultivariate Wasserstein distance between real and synthetic samples.
Sinkhorn approximation of the Wasserstein-1 distance using a Gower-like cost function.
- Parameters:
discrete_features (list) – List of discrete/categorical feature names. Default: [].
blur (float) – Entropic regularization scale passed to GeomLoss
SamplesLoss. Smaller values are closer to exact optimal transport but can be slower or less stable. Default: 0.001.scaling (float) – GeomLoss epsilon-scaling ratio. Default: 0.5.
debias (bool) – Whether to use the debiased Sinkhorn divergence form, which returns zero for identical empirical distributions. Default: True.
backend (str) – GeomLoss backend. The default “online” backend streams pairwise costs through KeOps and avoids materializing the full sample-by-sample cost matrix. Use “tensorized” only for small datasets or debugging. Default: “online”.
Example
>>> import pandas as pd >>> from synthyverse.evaluation import Wasserstein >>> >>> metric = Wasserstein(discrete_features=["category_col"]) >>> results = metric.evaluate(X_train, X_syn)
- evaluate(X_train, X_syn)¶
Evaluate synthetic data using multivariate Wasserstein distance.
- Parameters:
X_train (
DataFrame) – Real training data as a pandas DataFrame.X_syn (
DataFrame) – Synthetic data as a pandas DataFrame.
- Returns:
- Dictionary with key:
”wasserstein.w1”: Wasserstein-1 distance with L1 ground cost
- Return type:
dict