ClassifierTest

class synthyverse.evaluation.fidelity.ClassifierTest(discrete_features=None, model_name='xgboost', model_params=None, tune=False, tuning_trials=32, random_state=0)

Bases: object

Registry name: classifier_test

ROCAUC score of a classifier that distinguishes synthetic from real data.

Lower scores indicate better quality synthetic data (harder to distinguish from real).

Parameters:
  • discrete_features (list) – List of discrete/categorical feature names. Default: [].

  • random_state (int) – Random seed for reproducibility. Default: 0.

  • model_name (str) – Classifier family. Supported values include “xgboost”, “randomforest”, “decisiontree”, “linearregression”, and “svm”, including some common aliases. Every model except for XGBoost is a scikit-learn classifier. Default: “xgboost”.

  • model_params (dict) – Classifier parameters passed to the selected estimator.

  • tune (bool) – Whether to tune hyperparameters. If True, a validation set must be provided. Default: False.

  • tuning_trials (int) – Number of Optuna trials for hyperparameter tuning. Default: 32.

Example

>>> import pandas as pd
>>> from synthyverse.evaluation import ClassifierTest
>>>
>>> # Prepare data
>>> X_train = pd.DataFrame(...)
>>> X_test = pd.DataFrame(...)
>>> X_syn = pd.DataFrame(...)
>>> X_syn_test = pd.DataFrame(...)
>>> X_val = pd.DataFrame(...)
>>> discrete_features = ["category_col"]
>>>
>>> # Create metric
>>> metric = ClassifierTest(
...     discrete_features=discrete_features,
...     tune=True,
...     random_state=42
... )
>>>
>>> # Evaluate
>>> results = metric.evaluate(X_train, X_test, X_syn, X_syn_test, X_val)
evaluate(X_train, X_test, X_syn, X_syn_test, X_val=None)

Evaluate synthetic data using classifier test.

Parameters:
  • X_train (DataFrame) – Real training data as a pandas DataFrame.

  • X_test (DataFrame) – Real test data as a pandas DataFrame.

  • X_syn (DataFrame) – Synthetic training data as a pandas DataFrame.

  • X_syn_test (DataFrame) – Synthetic test data as a pandas DataFrame.

  • X_val (Optional[DataFrame]) – Optional validation data used when tune=True.

Returns:

Dictionary with “classifier_test.auc” key and AUC score value.

Return type:

dict