PRDC

class synthyverse.evaluation.fidelity.PRDC(discrete_features=None, k=5, n_jobs=-1)

Bases: object

Registry name: prdc

Precision, Recall, Density, and Coverage for tabular synthetic data.

Paper: “Reliable fidelity and diversity metrics for generative models” by Naeem et al. (2020).

Parameters:
  • discrete_features (list) – List of discrete/categorical feature names. Default: [].

  • k (int) – Number of nearest neighbours used to estimate each sample’s manifold radius. Default: 5.

  • n_jobs (int) – Number of parallel jobs for sklearn pairwise distances. Default: -1.

Example

>>> import pandas as pd
>>> from synthyverse.evaluation import PRDC
>>>
>>> metric = PRDC(discrete_features=["category_col"], k=5)
>>> results = metric.evaluate(X_train, X_syn)
evaluate(X_train, X_syn)

Evaluate synthetic data using PRDC.

Parameters:
  • X_train (DataFrame) – Real training data as a pandas DataFrame.

  • X_syn (DataFrame) – Synthetic data as a pandas DataFrame.

Returns:

Dictionary with keys:
  • ”prdc.precision”: Fraction of synthetic samples in the real manifold

  • ”prdc.recall”: Fraction of real samples in the synthetic manifold

  • ”prdc.density”: Average number of real manifolds containing a synthetic sample

  • ”prdc.coverage”: Fraction of real samples whose nearest synthetic sample is in range

Return type:

dict