InsideForest

InsideForest clusters the decision geometry of a RandomForestClassifier into human-readable rules and regions. It now supports both internally trained forests and externally supplied estimators so you can reuse tuned ensembles that are already part of your workflow.

Quick start

from sklearn.datasets import load_iris
from sheshe import InsideForest

X, y = load_iris(return_X_y=True)
explorer = InsideForest()
explorer.fit(X, y)
region_labels = explorer.transform(X, mode="best")
summary = explorer.explain(top_k=5)

Using a pretrained RandomForestClassifier

The estimator accepts a pretrained forest through the random_forest argument of fit. The example below mirrors the experiment that benchmarks runtime on the Wine dataset.

from sklearn.datasets import load_wine
from sklearn.ensemble import RandomForestClassifier
from sheshe import InsideForest

X, y = load_wine(return_X_y=True)
pretrained_rf = RandomForestClassifier(
    n_estimators=200,
    min_samples_leaf=2,
    random_state=42,
    n_jobs=-1,
).fit(X, y)

explorer = InsideForest()
explorer.fit(X, y, random_forest=pretrained_rf)
print(len(explorer.get_rules()), "rules")
print(len(explorer.get_regions()), "regions")

Transforming samples and listing all covering regions

Experiments on the digits dataset evaluate both best-region assignment and the full coverage list. The snippet shows how to recover every covering region for each observation.

from sklearn.datasets import load_digits
from sheshe import InsideForest

X, y = load_digits(return_X_y=True)
explorer = InsideForest()
explorer.fit(X, y)
all_regions = explorer.transform(X, mode="all")
first_sample_regions = all_regions[0]

Generating contrasting hypotheses

Hypothesis generation pairs similar regions with large purity deltas. The runtime experiment measures this step for every dataset.

from sklearn.datasets import load_iris
from sheshe import InsideForest

X, y = load_iris(return_X_y=True)
explorer = InsideForest()
explorer.fit(X, y)
hypotheses = explorer.generate_hypotheses(top_pairs=5)
for hypothesis in hypotheses:
    print(hypothesis["pair"], hypothesis["purity_delta"])

Timing diagnostics

Set the verbose flag on fit, transform, explain, or generate_hypotheses to log how long each internal stage takes. The latest durations are also exposed through get_last_timings(), which feeds the runtime experiments below.

explorer = InsideForest()
explorer.fit(X, y, verbose=1)
explorer.transform(X, mode="best", verbose=1)
print(explorer.get_last_timings()["fit"])

Runtime experiments

The experiments/inside_forest_runtime.py script runs on Iris, Wine and Digits to compare internal training versus reusing a pretrained forest. It now records the per-stage timings surfaced by get_last_timings(), stores the results in benchmark/inside_forest_runtime.csv, and highlights the dominant bottlenecks.

dataset	mode	n_rules	n_regions	fit_time_s	fit_bottleneck	fit_bottleneck_time_s	fit_cluster_total_s	fit_extract_rules_s	fit_train_random_forest_s	transform_bottleneck	transform_bottleneck_time_s	hypotheses_bottleneck	hypotheses_bottleneck_time_s
iris	internal	724	159	5.47	cluster_total	4.67	4.67	0.045	0.76	select_best_region	0.0024	scan_pairs	0.25
iris	external	724	159	4.51	cluster_total	4.47	4.47	0.043	0.00	select_best_region	0.0025	scan_pairs	0.24
wine	internal	842	214	15.03	cluster_total	14.24	14.24	0.055	0.74	points_in_regions	0.0038	scan_pairs	1.11
wine	external	842	214	14.04	cluster_total	13.98	13.98	0.062	0.00	points_in_regions	0.0063	scan_pairs	0.94
digits	internal	8660	10	2.04	train_random_forest	0.90	0.35	0.80	0.89	select_best_region	0.026	scan_pairs	0.010
digits	external	8660	10	1.07	extract_rules	0.78	0.29	0.78	0.00	select_best_region	0.022	scan_pairs	0.0099

Small datasets are dominated by rule clustering, especially the Jaccard distance computation (cluster_total). Digits, with 8.6k rules, pushes the fallback KMeans path; internal runs spend most of their time fitting the forest, whereas external runs spend it extracting rules.