AIA¶

class synthyverse.evaluation.privacy.AIA(quasi_identifiers=None, sensitive_features=None, discrete_features=None, model_name='xgboost', model_params=None, random_state=0)¶

Bases: object

Registry name: aia

Attribute Inference Attack (AIA) privacy metric.

Trains a supervised ML model on synthetic data to infer each sensitive feature from quasi-identifiers, then evaluates the inferred sensitive feature values on real data. Higher performance indicates higher attribute disclosure risk.

Parameters:

quasi_identifiers (list) – Feature names used by the attacker. If None, all non-sensitive features are used. If sensitive_features is also None, all other features are used for each target feature.
sensitive_features (list) – Sensitive feature names to infer. If None, all features are evaluated as sensitive features.
discrete_features (list) – List of discrete/categorical feature names. Used as the authoritative source for classification vs. regression targets and quasi-identifier preprocessing.
model_name (str) – Model family. Supported values include “xgboost”, “randomforest”, “decisiontree”, “linearregression”, and “svm”, including some common aliases. Every model except for XGBoost is a scikit-learn model. Default: “xgboost”.
model_params (dict) – Model parameters passed to the selected estimator.
random_state (int) – Random seed for reproducibility. Default: 0.

evaluate(X_train, X_syn)¶

Evaluate AIA on real training data using models trained on synthetic data.

Parameters:

X_train (DataFrame) – Real training data as a pandas DataFrame.
X_syn (DataFrame) – Synthetic data used to train the attribute inference models.

Returns:

Dictionary with per-sensitive-feature attack scores. Keys have: the form “aia.<sensitive_feature>.<score>”.

Return type:

dict