ModalBoundaryClustering
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sheshe import ModalBoundaryClustering
X, y = load_iris(return_X_y=True)
sh = ModalBoundaryClustering().fit(X, y)
sh.plot_classes(X, y)
plt.show()
Learns regions of high probability or predicted value by climbing local maxima of a
base estimator. Radial scans trace boundary surfaces around each mode and
gradient ascent refines the centres, enabling both classification and regression
workflows.
Mathematical formulation
Centres are refined with gradient ascent updates x_{k+1} = x_k + α∇f(x_k) until ‖∇f(x_k)‖ < grad_tol.
From each centre, radial scans sample scores f(x_k + r u) along directions u. Scans stop when ∂f/∂r = 0, indicating a stationary point; an inflection point additionally requires d²f/dr² = 0 with a sign change, or when the score falls past a percentile or drop fraction of the peak.
These radii approximate a probability surface from which boundary polygons are built.
Example
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering()
mbc.fit(X, y)
labels = mbc.predict(X)
Usage examples
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering(random_state=0)
mbc.fit(X, y) # fit
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering(random_state=0)
mbc.fit_predict(X, y) # fit_predict
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering(random_state=0)
mbc.fit_transform(X, y) # fit_transform
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.transform(X) # transform
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.predict(X) # predict
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.predict_proba(X) # predict_proba
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.decision_function(X) # decision_function
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.predict_regions(X) # predict_regions
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.score(X, y) # score
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.save("mbc.joblib") # save
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering.load("mbc.joblib")
Additional examples
from sklearn.datasets import load_iris
from sheshe import ModalBoundaryClustering
X, y = load_iris(return_X_y=True)
labels = ModalBoundaryClustering().fit_predict(X, y)
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
reg = ModalBoundaryClustering(task="regression").fit(X_train, y_train)
reg_retrained = ModalBoundaryClustering(
base_estimator=RandomForestRegressor(random_state=0),
task="regression",
).fit(X_train, y_train)
Parameters
base_estimator(BaseEstimator, defaultNone): model used to compute probabilities or predictions. Defaults toLogisticRegressionwhenNone.task(str, default"classification"): "classification" or "regression".base_2d_rays(int, default32): number of radial directions in 2D; automatically reduced for high-dimensional data whenauto_rays_by_dimisTrue.direction({"center_out", "outside_in"}, default"center_out"): direction used to locate inflection points along each ray.stop_criteria({"inflexion", "percentile"}, default"inflexion"): rule to stop radial expansion.percentile_bins(int, default20): bins for percentile-based stopping.scan_radius_factor(float, default3.0): maximum scan radius as a multiple of the global standard deviation.scan_steps(int, default24): number of steps sampled along each ray.smooth_window(intorNone, defaultNone): moving-average window to smooth radial scans.drop_fraction(float, default0.5): fallback drop from the peak when no inflection is found.bounds_margin(float, default0.05): margin added to data bounds to avoid clipping during scans.grad_lr(float, default0.2): learning rate for gradient ascent.grad_max_iter(int, default80): maximum iterations for gradient ascent.grad_tol(float, default1e-5): tolerance on gradient norm to stop the ascent.grad_eps(float, default1e-3): finite-difference step for gradients.optim_method(str, default"gradient_ascent"): optimisation strategy; accepts"gradient_ascent"or"trust_region_newton".n_max_seeds(int, default2): number of random starting points.random_state(int, default42): seed for reproducibility.percentile_sample_size(int, default50000): sample size to compute percentile thresholds.max_subspaces(int, default20): maximum subspaces explored whenXhas more than three dimensions.verbose(int, default0): logging level;0silent,1summary,2detailed.save_labels(bool, defaultFalse): store label assignments to disk.prediction_within_region(bool, defaultFalse): evaluate the base estimator only within each region before predicting.out_dir(strorPath, optional): directory where auxiliary files are written.auto_rays_by_dim(bool, defaultTrue): automatically reduce the number of rays in high dimension.ray_mode(str, default"grad"): strategy used to generate candidate rays.use_spsa(bool, defaultTrue): use SPSA for gradient estimates when analytical gradients are unavailable.spsa_delta(float, default1e-2): SPSA perturbation size.spsa_avg(int, default4): number of SPSA evaluations per gradient estimate.ls_alpha0(float, default0.5): initial step size for line search.ls_shrink(float, default0.5): multiplicative shrink factor during line search.ls_min_alpha(float, default1e-3): minimum allowed step size.arc_max_steps(int, default64): maximum steps for arc exploration.arc_len_max(float, default3.0): maximum arc length.line_refine_steps(int, default8): refinement steps when mapping boundaries.use_adaptive_scan(boolorNone, defaultNone): enable adaptive radial scan when dimensionality is high.batch_size(int, default16384): batch size for model evaluations.coarse_steps(int, default12): number of coarse scan steps in high dimension.refine_steps(int, default4): number of refinement steps after the coarse scan.early_exit_patience(int, default1): early termination patience for flat regions.density_alpha(float, default0.0): exponent for density penalty.density_k(int, default15): neighbour count for density estimation.cluster_metrics_cls(dictorNone): callbacks to evaluate clusters in classification mode.cluster_metrics_reg(dictorNone): callbacks for regression mode.fast_membership(bool, defaultFalse): enable approximate membership computation.
Methods
fit(X, y)– learn regions from data and labels.predict(X)– assign cluster ids to samples.fit_predict(X, y=None)– convenience wrapper aroundfit+predict.predict_proba(X)– return per-cluster probabilities.decision_function(X)– base-estimator decision scores.predict_regions(X, label_path=None)– cluster ids and optional label dump.interpretability_summary(feature_names=None)– tabular region summary.plot_pairs(X, y=None, max_pairs=None, show_histograms=False)– 2D decision plots for feature pairs with optional marginal histograms.plot_pair_3d(X, pair, class_label=None, grid_res=50, alpha_surface=0.6, engine="matplotlib")– render a 3D surface for a feature pair.