InsideForest

InsideForest clusters the decision geometry of a RandomForestClassifier into human-readable rules and regions. It now supports both internally trained forests and externally supplied estimators so you can reuse tuned ensembles that are already part of your workflow.

Quick start

from sklearn.datasets import load_iris
from sheshe import InsideForest

X, y = load_iris(return_X_y=True)
explorer = InsideForest()
explorer.fit(X, y)
region_labels = explorer.transform(X, mode="best")
summary = explorer.explain(top_k=5)

Using a pretrained RandomForestClassifier

The estimator accepts a pretrained forest through the random_forest argument of fit. The example below mirrors the experiment that benchmarks runtime on the Wine dataset.

from sklearn.datasets import load_wine
from sklearn.ensemble import RandomForestClassifier
from sheshe import InsideForest

X, y = load_wine(return_X_y=True)
pretrained_rf = RandomForestClassifier(
    n_estimators=200,
    min_samples_leaf=2,
    random_state=42,
    n_jobs=-1,
).fit(X, y)

explorer = InsideForest()
explorer.fit(X, y, random_forest=pretrained_rf)
print(len(explorer.get_rules()), "rules")
print(len(explorer.get_regions()), "regions")

Transforming samples and listing all covering regions

Experiments on the digits dataset evaluate both best-region assignment and the full coverage list. The snippet shows how to recover every covering region for each observation.

from sklearn.datasets import load_digits
from sheshe import InsideForest

X, y = load_digits(return_X_y=True)
explorer = InsideForest()
explorer.fit(X, y)
all_regions = explorer.transform(X, mode="all")
first_sample_regions = all_regions[0]

Generating contrasting hypotheses

Hypothesis generation pairs similar regions with large purity deltas. The runtime experiment measures this step for every dataset.

from sklearn.datasets import load_iris
from sheshe import InsideForest

X, y = load_iris(return_X_y=True)
explorer = InsideForest()
explorer.fit(X, y)
hypotheses = explorer.generate_hypotheses(top_pairs=5)
for hypothesis in hypotheses:
    print(hypothesis["pair"], hypothesis["purity_delta"])

Timing diagnostics

Set the verbose flag on fit, transform, explain, or generate_hypotheses to log how long each internal stage takes. The latest durations are also exposed through get_last_timings(), which feeds the runtime experiments below.

explorer = InsideForest()
explorer.fit(X, y, verbose=1)
explorer.transform(X, mode="best", verbose=1)
print(explorer.get_last_timings()["fit"])

Runtime experiments

The experiments/inside_forest_runtime.py script runs on Iris, Wine and Digits to compare internal training versus reusing a pretrained forest. It now records the per-stage timings surfaced by get_last_timings(), stores the results in benchmark/inside_forest_runtime.csv, and highlights the dominant bottlenecks.

dataset mode n_rules n_regions fit_time_s fit_bottleneck fit_bottleneck_time_s fit_cluster_total_s fit_extract_rules_s fit_train_random_forest_s transform_bottleneck transform_bottleneck_time_s hypotheses_bottleneck hypotheses_bottleneck_time_s
iris internal 724 159 5.47 cluster_total 4.67 4.67 0.045 0.76 select_best_region 0.0024 scan_pairs 0.25
iris external 724 159 4.51 cluster_total 4.47 4.47 0.043 0.00 select_best_region 0.0025 scan_pairs 0.24
wine internal 842 214 15.03 cluster_total 14.24 14.24 0.055 0.74 points_in_regions 0.0038 scan_pairs 1.11
wine external 842 214 14.04 cluster_total 13.98 13.98 0.062 0.00 points_in_regions 0.0063 scan_pairs 0.94
digits internal 8660 10 2.04 train_random_forest 0.90 0.35 0.80 0.89 select_best_region 0.026 scan_pairs 0.010
digits external 8660 10 1.07 extract_rules 0.78 0.29 0.78 0.00 select_best_region 0.022 scan_pairs 0.0099

Small datasets are dominated by rule clustering, especially the Jaccard distance computation (cluster_total). Digits, with 8.6k rules, pushes the fallback KMeans path; internal runs spend most of their time fitting the forest, whereas external runs spend it extracting rules.