RegionInterpreter


import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sheshe import ModalBoundaryClustering, RegionInterpreter

iris = load_iris()
X, y = iris.data, iris.target
sh = ModalBoundaryClustering().fit(X, y)
cards = RegionInterpreter(feature_names=iris.feature_names).summarize(sh.regions_)
sh.plot_classes(X, y)
plt.show()

Converts ClusterRegion objects into compact human‑readable rule sets.

It summarises each region with axis‑aligned boxes, highlights informative

projections and offers helpers like pretty_print for reporting. Optional

LLM backends can turn the summaries into natural‑language descriptions.

Mathematical formulation

For each feature j, an axis‑aligned rule uses quantiles [Q_q(x_j), Q_{1-q}(x_j)], capturing about 1-2q of the data.

Capped radii are flagged using the z‑score z=(r-μ)/σ when |z| > cap_threshold.

Example: with q_box=0.05 and values 1‑10, the rule becomes [1.45, 9.55]. A radius 3 with mean 2 and σ=0.3 yields z≈3.3.

Example


from sheshe import RegionInterpreter
ri = RegionInterpreter(feature_names=["sepal", "petal"])
summary = ri.summarize(region)

Usage examples


from sheshe import RegionInterpreter

ri = RegionInterpreter(feature_names=["sepal", "petal"])
ri.summarize(region)       # summarize a single region

from sheshe import RegionInterpreter

ri = RegionInterpreter(feature_names=["sepal", "petal"])
ri.summarize([region])     # summarize a list of regions

Interpretability example


from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sheshe import ModalBoundaryClustering, RegionInterpreter

iris = load_iris()
X, y = iris.data, iris.target

sh = ModalBoundaryClustering(
    base_estimator=RandomForestClassifier(random_state=0),
    task="classification",
).fit(X, y)

# Tabular summary of the regions
print(sh.interpretability_summary(iris.feature_names).head())

# Human-readable rules per region
cards = RegionInterpreter(feature_names=iris.feature_names).summarize(sh.regions_)
RegionInterpreter.pretty_print(cards)

Parameters

  • feature_names (list[str] or None, default None): names for each feature used in the generated rules.
  • q_box (float, default 0.05): quantile used to compute robust axis-aligned boxes.
  • k_pairs (int, default 2): number of informative 2D projections to include.
  • decimals (int, default 2): decimal precision in the emitted rules.
  • cap_threshold (float, default 6.393): z-score threshold to mark capped radii.
  • near_const_tol (float, default 0.12): tolerance to report nearly constant dimensions.
  • inverse_transform (callable, optional): function applied to points before rule extraction (e.g. inverse scaling).
  • feature_bounds (Sequence[Tuple[float, float]] or None): hard bounds for each feature to clamp boxes.
  • include_center_in_box (bool, default True): include region centre when computing axis-aligned boxes.

Methods

  • summarize(regions) – return a list of dictionaries containing headlines, axis-aligned rules and pairwise projections for the provided regions.