ModalBoundaryClustering


import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sheshe import ModalBoundaryClustering

X, y = load_iris(return_X_y=True)
sh = ModalBoundaryClustering().fit(X, y)
sh.plot_classes(X, y)
plt.show()

Aprende regiones de alta probabilidad o valor predicho escalando los máximos locales de un estimador base.

Exploraciones radiales trazan superficies de frontera alrededor de cada modo y

el ascenso por gradiente refina los centros, habilitando flujos tanto de clasificación como de regresión.

Mathematical formulation

Los centros se refinan con actualizaciones de ascenso por gradiente x_{k+1} = x_k + α∇f(x_k) hasta que ‖∇f(x_k)‖ < grad_tol.

Desde cada centro, exploraciones radiales muestrean puntuaciones f(x_k + r u) a lo largo de direcciones u. Las exploraciones se detienen cuando ∂f/∂r = 0, señalando un punto estacionario; un punto de inflexión además requiere d²f/dr² = 0 con cambio de signo, o cuando la puntuación cae más allá de un percentil o fracción del pico.

Estos radios aproximan una superficie de probabilidad de la cual se construyen polígonos de frontera.

Example


from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering()
mbc.fit(X, y)
labels = mbc.predict(X)

Ejemplos de uso


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering(random_state=0)
mbc.fit(X, y)                      # fit


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering(random_state=0)
mbc.fit_predict(X, y)              # fit_predict


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering(random_state=0)
mbc.fit_transform(X, y)            # fit_transform


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.transform(X)                   # transform


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.predict(X)                     # predict


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.predict_proba(X)               # predict_proba


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.decision_function(X)           # decision_function


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.predict_regions(X)             # predict_regions


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.score(X, y)                    # score


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.save("mbc.joblib")             # save


from sheshe import ModalBoundaryClustering

mbc = ModalBoundaryClustering.load("mbc.joblib")

Ejemplos adicionales


from sklearn.datasets import load_iris
from sheshe import ModalBoundaryClustering

X, y = load_iris(return_X_y=True)
labels = ModalBoundaryClustering().fit_predict(X, y)

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

reg = ModalBoundaryClustering(task="regression").fit(X_train, y_train)
reg_retrained = ModalBoundaryClustering(
    base_estimator=RandomForestRegressor(random_state=0),
    task="regression",
).fit(X_train, y_train)

Parameters

base_estimator (BaseEstimator, default None): model used to compute probabilities or predictions. Defaults to LogisticRegression when None.
task (str, default "classification"): "classification" or "regression".
base_2d_rays (int, default 32): number of radial directions in 2D; automatically reduced for high-dimensional data when auto_rays_by_dim is True.
direction ({"center_out", "outside_in"}, default "center_out"): direction used to locate inflection points along each ray.
stop_criteria ({"inflexion", "percentile"}, default "inflexion"): rule to stop radial expansion.
percentile_bins (int, default 20): bins for percentile-based stopping.
scan_radius_factor (float, default 3.0): maximum scan radius as a multiple of the global standard deviation.
scan_steps (int, default 24): number of steps sampled along each ray.
smooth_window (int or None, default None): moving-average window to smooth radial scans.
drop_fraction (float, default 0.5): fallback drop from the peak when no inflection is found.
bounds_margin (float, default 0.05): margin added to data bounds to avoid clipping during scans.
grad_lr (float, default 0.2): learning rate for gradient ascent.
grad_max_iter (int, default 80): maximum iterations for gradient ascent.
grad_tol (float, default 1e-5): tolerance on gradient norm to stop the ascent.
grad_eps (float, default 1e-3): finite-difference step for gradients.
optim_method (str, default "gradient_ascent"): optimisation strategy; accepts "gradient_ascent" or "trust_region_newton".
n_max_seeds (int, default 2): number of random starting points.
random_state (int, default 42): seed for reproducibility.
percentile_sample_size (int, default 50000): sample size to compute percentile thresholds.
max_subspaces (int, default 20): maximum subspaces explored when X has more than three dimensions.
verbose (int, default 0): logging level; 0 silent, 1 summary, 2 detailed.
save_labels (bool, default False): store label assignments to disk.
prediction_within_region (bool, default False): evaluate the base estimator only within each region before predicting.
out_dir (str or Path, optional): directory where auxiliary files are written.
auto_rays_by_dim (bool, default True): automatically reduce the number of rays in high dimension.
ray_mode (str, default "grad"): strategy used to generate candidate rays.
use_spsa (bool, default True): use SPSA for gradient estimates when analytical gradients are unavailable.
spsa_delta (float, default 1e-2): SPSA perturbation size.
spsa_avg (int, default 4): number of SPSA evaluations per gradient estimate.
ls_alpha0 (float, default 0.5): initial step size for line search.
ls_shrink (float, default 0.5): multiplicative shrink factor during line search.
ls_min_alpha (float, default 1e-3): minimum allowed step size.
arc_max_steps (int, default 64): maximum steps for arc exploration.
arc_len_max (float, default 3.0): maximum arc length.
line_refine_steps (int, default 8): refinement steps when mapping boundaries.
use_adaptive_scan (bool or None, default None): enable adaptive radial scan when dimensionality is high.
batch_size (int, default 16384): batch size for model evaluations.
coarse_steps (int, default 12): number of coarse scan steps in high dimension.
refine_steps (int, default 4): number of refinement steps after the coarse scan.
early_exit_patience (int, default 1): early termination patience for flat regions.
density_alpha (float, default 0.0): exponent for density penalty.
density_k (int, default 15): neighbour count for density estimation.
cluster_metrics_cls (dict or None): callbacks to evaluate clusters in classification mode.
cluster_metrics_reg (dict or None): callbacks for regression mode.
fast_membership (bool, default False): enable approximate membership computation.

Methods

fit(X, y) – learn regions from data and labels.
predict(X) – assign cluster ids to samples.
fit_predict(X, y=None) – convenience wrapper around fit + predict.
predict_proba(X) – return per-cluster probabilities.
decision_function(X) – base-estimator decision scores.
predict_regions(X, label_path=None) – cluster ids and optional label dump.
interpretability_summary(feature_names=None) – tabular region summary.
plot_pairs(X, y=None, max_pairs=None, show_histograms=False) – 2D decision plots for feature pairs with optional marginal histograms.
plot_pair_3d(X, pair, class_label=None, grid_res=50, alpha_surface=0.6, engine="matplotlib") – render a 3D surface for a feature pair.