DPI¶
- class synthyverse.evaluation.privacy.DPI(k=20, discrete_features=None, ref_prop=0.5, member_prop=1.0, repeats=1, subsample=False, random_state=0)¶
Bases:
MIARegistry name:
mia.dpiData Plagiarism Index membership inference attack.
Paper: “Data plagiarism index: Characterizing the privacy risk of data-copying in tabular generative models” by Ward et al. (2024)
DPI scores an attack record through the ratio of synthetic/reference samples in its local neighborhood.
- Parameters:
k (int) – Number of nearest neighbors from the combined reference and synthetic pool. Default: 20.
discrete_features (list) – List of discrete/categorical feature names. Default: [].
ref_prop (float) – Proportion of test set to use as attacker reference non-members. Default: 0.5.
member_prop (float) – Proportion of train set to use as members. Default: 1.0.
repeats (int) – Number of repeated evaluations when subsampling records. Default: 1.
subsample (bool) – Whether to subsample synthetic and member sets to match reference and evaluation non-member sizes. Default: False.
random_state (int) – Random seed for reproducibility. Default: 0.
- evaluate(X_train, X_test, X_syn)¶
Evaluate membership inference risk.
- Parameters:
X_train (
DataFrame) – Real training data whose rows are treated as members.X_test (
DataFrame) – Independent real test data split into reference records and evaluation non-members.X_syn (
DataFrame) – Synthetic data available to the attacker.
- Returns:
- Dictionary with attack AUC and lift-at-k scores. Keys have the
form “<attack_name>.<score>”.
- Return type:
dict