ARM¶
- class synthyverse.evaluation.fidelity.ARM(discrete_features=None, min_support=0.1, min_confidence=0.8, n_bins_numerical=5, max_itemset_size=2, print_rule_differences=True, max_rules_to_print=25)¶
Bases:
objectRegistry name:
armAssociation Rule Mining preservation using Apriori.
Mines association rules in the real training data and synthetic data, then compares exact rule matches. Rules are represented as
antecedent -> consequentover tabularcolumn=valueitems. Numerical columns are discretized into equal-width bins fitted on the real training data before mining.Higher precision and recall indicate better preservation of association rules found in the real data.
- Parameters:
discrete_features (list) – List of discrete/categorical feature names. Default: [].
min_support (float or int) – Minimum itemset support for Apriori. Floats in (0, 1] are interpreted as a fraction of rows; integers are interpreted as absolute row counts. Default: 0.1.
min_confidence (float) – Minimum confidence for generated rules. Default: 0.8.
n_bins_numerical (int) – Number of equal-width bins for numerical features. Must be >= 2. Default: 5.
max_itemset_size (int) – Maximum frequent itemset size mined by Apriori. Must be >= 2. Default: 2.
print_rule_differences (bool) – Whether to print missed and hallucinated rules after evaluation. Default: True.
max_rules_to_print (int or None) – Maximum number of missed and hallucinated rules to print per category. None prints all. Default: 25.
Example
>>> import pandas as pd >>> from synthyverse.evaluation import ARM >>> >>> X_real = pd.DataFrame(...) >>> X_syn = pd.DataFrame(...) >>> discrete_features = ["category_col"] >>> >>> metric = ARM( ... discrete_features=discrete_features, ... min_support=0.05, ... min_confidence=0.7, ... ) >>> >>> results = metric.evaluate(X_real, X_syn)
- evaluate(X_train, X_syn)¶
Evaluate synthetic data by comparing mined association rules.
- Parameters:
X_train (
DataFrame) – Real training data as a pandas DataFrame.X_syn (
DataFrame) – Synthetic data as a pandas DataFrame.
- Returns:
- Dictionary with keys:
”arm.precision”: Fraction of synthetic rules also found in real data
”arm.recall”: Fraction of real rules also found in synthetic data
”arm.n_rules_real”: Number of rules mined in real data
”arm.n_rules_syn”: Number of rules mined in synthetic data
- Return type:
dict