ARM

class synthyverse.evaluation.fidelity.ARM(discrete_features=None, min_support=0.1, min_confidence=0.8, n_bins_numerical=5, max_itemset_size=2, print_rule_differences=True, max_rules_to_print=25)

Bases: object

Registry name: arm

Association Rule Mining preservation using Apriori.

Mines association rules in the real training data and synthetic data, then compares exact rule matches. Rules are represented as antecedent -> consequent over tabular column=value items. Numerical columns are discretized into equal-width bins fitted on the real training data before mining.

Higher precision and recall indicate better preservation of association rules found in the real data.

Parameters:
  • discrete_features (list) – List of discrete/categorical feature names. Default: [].

  • min_support (float or int) – Minimum itemset support for Apriori. Floats in (0, 1] are interpreted as a fraction of rows; integers are interpreted as absolute row counts. Default: 0.1.

  • min_confidence (float) – Minimum confidence for generated rules. Default: 0.8.

  • n_bins_numerical (int) – Number of equal-width bins for numerical features. Must be >= 2. Default: 5.

  • max_itemset_size (int) – Maximum frequent itemset size mined by Apriori. Must be >= 2. Default: 2.

  • print_rule_differences (bool) – Whether to print missed and hallucinated rules after evaluation. Default: True.

  • max_rules_to_print (int or None) – Maximum number of missed and hallucinated rules to print per category. None prints all. Default: 25.

Example

>>> import pandas as pd
>>> from synthyverse.evaluation import ARM
>>>
>>> X_real = pd.DataFrame(...)
>>> X_syn = pd.DataFrame(...)
>>> discrete_features = ["category_col"]
>>>
>>> metric = ARM(
...     discrete_features=discrete_features,
...     min_support=0.05,
...     min_confidence=0.7,
... )
>>>
>>> results = metric.evaluate(X_real, X_syn)
evaluate(X_train, X_syn)

Evaluate synthetic data by comparing mined association rules.

Parameters:
  • X_train (DataFrame) – Real training data as a pandas DataFrame.

  • X_syn (DataFrame) – Synthetic data as a pandas DataFrame.

Returns:

Dictionary with keys:
  • ”arm.precision”: Fraction of synthetic rules also found in real data

  • ”arm.recall”: Fraction of real rules also found in synthetic data

  • ”arm.n_rules_real”: Number of rules mined in real data

  • ”arm.n_rules_syn”: Number of rules mined in synthetic data

Return type:

dict