AIA

class synthyverse.evaluation.privacy.AIA(quasi_identifiers=None, sensitive_features=None, discrete_features=None, model_name='xgboost', model_params=None, random_state=0)

Bases: object

Registry name: aia

Attribute Inference Attack (AIA) privacy metric.

Trains a supervised ML model on synthetic data to infer each sensitive feature from quasi-identifiers, then evaluates the inferred sensitive feature values on real data. Higher performance indicates higher attribute disclosure risk.

Parameters:
  • quasi_identifiers (list) – Feature names used by the attacker. If None, all non-sensitive features are used. If sensitive_features is also None, all other features are used for each target feature.

  • sensitive_features (list) – Sensitive feature names to infer. If None, all features are evaluated as sensitive features.

  • discrete_features (list) – List of discrete/categorical feature names. Used as the authoritative source for classification vs. regression targets and quasi-identifier preprocessing.

  • model_name (str) – Model family. Supported values include “xgboost”, “randomforest”, “decisiontree”, “linearregression”, and “svm”, including some common aliases. Every model except for XGBoost is a scikit-learn model. Default: “xgboost”.

  • model_params (dict) – Model parameters passed to the selected estimator.

  • random_state (int) – Random seed for reproducibility. Default: 0.

evaluate(X_train, X_syn)

Evaluate AIA on real training data using models trained on synthetic data.

Parameters:
  • X_train (DataFrame) – Real training data as a pandas DataFrame.

  • X_syn (DataFrame) – Synthetic data used to train the attribute inference models.

Returns:

Dictionary with per-sensitive-feature attack scores. Keys have

the form “aia.<sensitive_feature>.<score>”.

Return type:

dict