PRDC¶

class synthyverse.evaluation.fidelity.PRDC(discrete_features=None, k=5, n_jobs=-1)¶

Bases: object

Registry name: prdc

Precision, Recall, Density, and Coverage for tabular synthetic data.

Paper: “Reliable fidelity and diversity metrics for generative models” by Naeem et al. (2020).

Parameters:

discrete_features (list) – List of discrete/categorical feature names. Default: [].
k (int) – Number of nearest neighbours used to estimate each sample’s manifold radius. Default: 5.
n_jobs (int) – Number of parallel jobs for sklearn pairwise distances. Default: -1.

Example

>>> import pandas as pd
>>> from synthyverse.evaluation import PRDC
>>>
>>> metric = PRDC(discrete_features=["category_col"], k=5)
>>> results = metric.evaluate(X_train, X_syn)

evaluate(X_train, X_syn)¶

Evaluate synthetic data using PRDC.

Parameters:

X_train (DataFrame) – Real training data as a pandas DataFrame.
X_syn (DataFrame) – Synthetic data as a pandas DataFrame.

Returns:

Dictionary with keys:

”prdc.precision”: Fraction of synthetic samples in the real manifold
”prdc.recall”: Fraction of real samples in the synthetic manifold
”prdc.density”: Average number of real manifolds containing a synthetic sample
”prdc.coverage”: Fraction of real samples whose nearest synthetic sample is in range

Return type:

dict