ModalBoundaryClustering


import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sheshe import ModalBoundaryClustering

X, y = load_iris(return_X_y=True)
sh = ModalBoundaryClustering().fit(X, y)
sh.plot_classes(X, y)
plt.show()

Learns regions of high probability or predicted value by climbing local maxima of a

base estimator. Radial scans trace boundary surfaces around each mode and

gradient ascent refines the centres, enabling both classification and regression

workflows.

Mathematical formulation

Centres are refined with gradient ascent updates x_{k+1} = x_k + α∇f(x_k) until ‖∇f(x_k)‖ < grad_tol.

From each centre, radial scans sample scores f(x_k + r u) along directions u. Scans stop when ∂f/∂r = 0, indicating a stationary point; an inflection point additionally requires d²f/dr² = 0 with a sign change, or when the score falls past a percentile or drop fraction of the peak.

These radii approximate a probability surface from which boundary polygons are built.

Example


from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering()
mbc.fit(X, y)
labels = mbc.predict(X)

Usage examples


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering(random_state=0)
mbc.fit(X, y)                      # fit


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering(random_state=0)
mbc.fit_predict(X, y)              # fit_predict


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering(random_state=0)
mbc.fit_transform(X, y)            # fit_transform


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.transform(X)                   # transform


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.predict(X)                     # predict


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.predict_proba(X)               # predict_proba


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.decision_function(X)           # decision_function


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.predict_regions(X)             # predict_regions


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.score(X, y)                    # score


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.save("mbc.joblib")             # save


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering.load("mbc.joblib")

Additional examples


from sklearn.datasets import load_iris
from sheshe import ModalBoundaryClustering

X, y = load_iris(return_X_y=True)
labels = ModalBoundaryClustering().fit_predict(X, y)

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

reg = ModalBoundaryClustering(task="regression").fit(X_train, y_train)
reg_retrained = ModalBoundaryClustering(
    base_estimator=RandomForestRegressor(random_state=0),
    task="regression",
).fit(X_train, y_train)

Parameters

base_estimator (BaseEstimator, default None): model used to compute probabilities or predictions. Defaults to LogisticRegression when None.
task (str, default "classification"): "classification" or "regression".
base_2d_rays (int, default 32): number of radial directions in 2D; automatically reduced for high-dimensional data when auto_rays_by_dim is True.
direction ({"center_out", "outside_in"}, default "center_out"): direction used to locate inflection points along each ray.
stop_criteria ({"inflexion", "percentile"}, default "inflexion"): rule to stop radial expansion.
percentile_bins (int, default 20): bins for percentile-based stopping.
scan_radius_factor (float, default 3.0): maximum scan radius as a multiple of the global standard deviation.
scan_steps (int, default 24): number of steps sampled along each ray.
smooth_window (int or None, default None): moving-average window to smooth radial scans.
drop_fraction (float, default 0.5): fallback drop from the peak when no inflection is found.
bounds_margin (float, default 0.05): margin added to data bounds to avoid clipping during scans.
grad_lr (float, default 0.2): learning rate for gradient ascent.
grad_max_iter (int, default 80): maximum iterations for gradient ascent.
grad_tol (float, default 1e-5): tolerance on gradient norm to stop the ascent.
grad_eps (float, default 1e-3): finite-difference step for gradients.
optim_method (str, default "gradient_ascent"): optimisation strategy; accepts "gradient_ascent" or "trust_region_newton".
n_max_seeds (int, default 2): number of random starting points.
random_state (int, default 42): seed for reproducibility.
percentile_sample_size (int, default 50000): sample size to compute percentile thresholds.
max_subspaces (int, default 20): maximum subspaces explored when X has more than three dimensions.
verbose (int, default 0): logging level; 0 silent, 1 summary, 2 detailed.
save_labels (bool, default False): store label assignments to disk.
prediction_within_region (bool, default False): evaluate the base estimator only within each region before predicting.
out_dir (str or Path, optional): directory where auxiliary files are written.
auto_rays_by_dim (bool, default True): automatically reduce the number of rays in high dimension.
ray_mode (str, default "grad"): strategy used to generate candidate rays.
use_spsa (bool, default True): use SPSA for gradient estimates when analytical gradients are unavailable.
spsa_delta (float, default 1e-2): SPSA perturbation size.
spsa_avg (int, default 4): number of SPSA evaluations per gradient estimate.
ls_alpha0 (float, default 0.5): initial step size for line search.
ls_shrink (float, default 0.5): multiplicative shrink factor during line search.
ls_min_alpha (float, default 1e-3): minimum allowed step size.
arc_max_steps (int, default 64): maximum steps for arc exploration.
arc_len_max (float, default 3.0): maximum arc length.
line_refine_steps (int, default 8): refinement steps when mapping boundaries.
use_adaptive_scan (bool or None, default None): enable adaptive radial scan when dimensionality is high.
batch_size (int, default 16384): batch size for model evaluations.
coarse_steps (int, default 12): number of coarse scan steps in high dimension.
refine_steps (int, default 4): number of refinement steps after the coarse scan.
early_exit_patience (int, default 1): early termination patience for flat regions.
density_alpha (float, default 0.0): exponent for density penalty.
density_k (int, default 15): neighbour count for density estimation.
cluster_metrics_cls (dict or None): callbacks to evaluate clusters in classification mode.
cluster_metrics_reg (dict or None): callbacks for regression mode.
fast_membership (bool, default False): enable approximate membership computation.

Methods

fit(X, y) – learn regions from data and labels.
predict(X) – assign cluster ids to samples.
fit_predict(X, y=None) – convenience wrapper around fit + predict.
predict_proba(X) – return per-cluster probabilities.
decision_function(X) – base-estimator decision scores.
predict_regions(X, label_path=None) – cluster ids and optional label dump.
interpretability_summary(feature_names=None) – tabular region summary.
plot_pairs(X, y=None, max_pairs=None, show_histograms=False) – 2D decision plots for feature pairs with optional marginal histograms.
plot_pair_3d(X, pair, class_label=None, grid_res=50, alpha_surface=0.6, engine="matplotlib") – render a 3D surface for a feature pair.