NMI¶

class synthyverse.evaluation.fidelity.NMI(discrete_features=None, n_bins_numerical=20)¶

Bases: object

Registry name: nmi

Pairwise normalized mutual information preservation.

Paper: “A Sobering Look at Tabular Data Generation via Probabilistic Circuits” by Scassola et al. (2026).

Computes normalized mutual information (NMI) for every feature pair in the real and synthetic data, then returns the weighted average of the per-pair preservation score 1 - abs(NMI_real - NMI_synthetic). Each pair is weighted by abs(NMI_real + NMI_synthetic) and weights are normalized to sum to 1.

Numerical features are discretized into equal-width bins before computing NMI. Higher scores indicate better preservation of feature dependencies.

Parameters:

discrete_features (list) – List of discrete/categorical feature names. Default: [].
n_bins_numerical (int) – Number of equal-width bins used when discretizing numerical features. Must be >= 2. Default: 20.

Example

>>> import pandas as pd
>>> from synthyverse.evaluation import NMI
>>>
>>> # Prepare data
>>> X_real = pd.DataFrame(...)
>>> X_syn = pd.DataFrame(...)
>>> discrete_features = ["category_col"]
>>>
>>> # Create metric
>>> metric = NMI(discrete_features=discrete_features)
>>>
>>> # Evaluate
>>> results = metric.evaluate(X_real, X_syn)

evaluate(X_train, X_syn)¶

Evaluate synthetic data by comparing pairwise NMI values.

Parameters:

X_train (DataFrame) – Real training data as a pandas DataFrame.
X_syn (DataFrame) – Synthetic data as a pandas DataFrame.

Returns:

Dictionary with key:

”nmi.score”: Weighted pairwise NMI preservation score

Return type:

dict