Utility Metrics¶
- class synthyverse.evaluation.utility.MLE(X_val=None, target_column='target', discrete_features=None, random_state=0, train_set='synthetic', model_name='xgboost', model_params=None, tune=False, tuning_trials=32)[source]¶
Bases:
objectMachine Learning Efficacy from configurable ML models.
Measures how well synthetic data can be used for downstream machine learning tasks compared to real data.
- Parameters:
X_val (pd.DataFrame, optional) – Validation data for hyperparameter tuning. Default: None.
target_column (str) – Name of the target column. Default: “target”.
discrete_features (list) – List of discrete/categorical feature names. Default: [].
random_state (int) – Random seed for reproducibility. Default: 0.
train_set (str) – Which dataset to train on (“synthetic” for TSTR, “real” for TRTS). Default: “synthetic”.
model_name (str) – Estimator name. Use “xgboost” for native XGBoost, or any sklearn estimator class name discoverable via sklearn.utils.discovery.all_estimators.
model_params (dict) – Model parameters passed to the selected estimator.
tune (bool) – Whether to tune hyperparameters. Default: False.
tuning_trials (int) – Number of Optuna trials for hyperparameter tuning. Default: 32.
Example
>>> import pandas as pd >>> from synthyverse.evaluation import MLE >>> >>> # Prepare data >>> X_train = pd.DataFrame(...) >>> X_test = pd.DataFrame(...) >>> X_syn = pd.DataFrame(...) >>> X_val = pd.DataFrame(...) >>> discrete_features = ["category_col"] >>> >>> # Create metric >>> metric = MLE( ... X_val=X_val, ... target_column="target", ... discrete_features=discrete_features, ... train_set="synthetic", ... tune=True, ... random_state=42 ... ) >>> >>> # Evaluate >>> results = metric.evaluate(X_train, X_test, X_syn)
- evaluate(train, test, sd)[source]¶
Evaluate synthetic data utility using machine learning efficacy.
- Parameters:
train (
DataFrame) – Training data as a pandas DataFrame.test (
DataFrame) – Test data as a pandas DataFrame.sd (
DataFrame) – Synthetic data as a pandas DataFrame.
- Returns:
- Dictionary with MLE metric scores. Includes both synthetic-to-real
and real-to-real baseline scores.
- Return type:
dict