ClassifierTest¶
- class synthyverse.evaluation.fidelity.ClassifierTest(discrete_features=None, model_name='xgboost', model_params=None, tune=False, tuning_trials=32, random_state=0)¶
Bases:
objectRegistry name:
classifier_testROCAUC score of a classifier that distinguishes synthetic from real data.
Lower scores indicate better quality synthetic data (harder to distinguish from real).
- Parameters:
discrete_features (list) – List of discrete/categorical feature names. Default: [].
random_state (int) – Random seed for reproducibility. Default: 0.
model_name (str) – Classifier family. Supported values include “xgboost”, “randomforest”, “decisiontree”, “linearregression”, and “svm”, including some common aliases. Every model except for XGBoost is a scikit-learn classifier. Default: “xgboost”.
model_params (dict) – Classifier parameters passed to the selected estimator.
tune (bool) – Whether to tune hyperparameters. If True, a validation set must be provided. Default: False.
tuning_trials (int) – Number of Optuna trials for hyperparameter tuning. Default: 32.
Example
>>> import pandas as pd >>> from synthyverse.evaluation import ClassifierTest >>> >>> # Prepare data >>> X_train = pd.DataFrame(...) >>> X_test = pd.DataFrame(...) >>> X_syn = pd.DataFrame(...) >>> X_syn_test = pd.DataFrame(...) >>> X_val = pd.DataFrame(...) >>> discrete_features = ["category_col"] >>> >>> # Create metric >>> metric = ClassifierTest( ... discrete_features=discrete_features, ... tune=True, ... random_state=42 ... ) >>> >>> # Evaluate >>> results = metric.evaluate(X_train, X_test, X_syn, X_syn_test, X_val)
- evaluate(X_train, X_test, X_syn, X_syn_test, X_val=None)¶
Evaluate synthetic data using classifier test.
- Parameters:
X_train (
DataFrame) – Real training data as a pandas DataFrame.X_test (
DataFrame) – Real test data as a pandas DataFrame.X_syn (
DataFrame) – Synthetic training data as a pandas DataFrame.X_syn_test (
DataFrame) – Synthetic test data as a pandas DataFrame.X_val (
Optional[DataFrame]) – Optional validation data used when tune=True.
- Returns:
Dictionary with “classifier_test.auc” key and AUC score value.
- Return type:
dict