ModalBoundaryClustering
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sheshe import ModalBoundaryClustering
X, y = load_iris(return_X_y=True)
sh = ModalBoundaryClustering().fit(X, y)
sh.plot_classes(X, y)
plt.show()
Aprende regiones de alta probabilidad o valor predicho escalando los máximos locales de un estimador base.
Exploraciones radiales trazan superficies de frontera alrededor de cada modo y
el ascenso por gradiente refina los centros, habilitando flujos tanto de clasificación como de regresión.
Mathematical formulation
Los centros se refinan con actualizaciones de ascenso por gradiente x_{k+1} = x_k + α∇f(x_k) hasta que ‖∇f(x_k)‖ < grad_tol.
Desde cada centro, exploraciones radiales muestrean puntuaciones f(x_k + r u) a lo largo de direcciones u. Las exploraciones se detienen cuando ∂f/∂r = 0, señalando un punto estacionario; un punto de inflexión además requiere d²f/dr² = 0 con cambio de signo, o cuando la puntuación cae más allá de un percentil o fracción del pico.
Estos radios aproximan una superficie de probabilidad de la cual se construyen polígonos de frontera.
Example
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering()
mbc.fit(X, y)
labels = mbc.predict(X)
Ejemplos de uso
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering(random_state=0)
mbc.fit(X, y) # fit
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering(random_state=0)
mbc.fit_predict(X, y) # fit_predict
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering(random_state=0)
mbc.fit_transform(X, y) # fit_transform
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.transform(X) # transform
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.predict(X) # predict
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.predict_proba(X) # predict_proba
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.decision_function(X) # decision_function
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.predict_regions(X) # predict_regions
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.score(X, y) # score
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering(random_state=0).fit(X, y)
mbc.save("mbc.joblib") # save
from sheshe import ModalBoundaryClustering
mbc = ModalBoundaryClustering.load("mbc.joblib")
Ejemplos adicionales
from sklearn.datasets import load_iris
from sheshe import ModalBoundaryClustering
X, y = load_iris(return_X_y=True)
labels = ModalBoundaryClustering().fit_predict(X, y)
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
reg = ModalBoundaryClustering(task="regression").fit(X_train, y_train)
reg_retrained = ModalBoundaryClustering(
base_estimator=RandomForestRegressor(random_state=0),
task="regression",
).fit(X_train, y_train)
Parameters
base_estimator(BaseEstimator, defaultNone): model used to compute probabilities or predictions. Defaults toLogisticRegressionwhenNone.task(str, default"classification"): "classification" or "regression".base_2d_rays(int, default32): number of radial directions in 2D; automatically reduced for high-dimensional data whenauto_rays_by_dimisTrue.direction({"center_out", "outside_in"}, default"center_out"): direction used to locate inflection points along each ray.stop_criteria({"inflexion", "percentile"}, default"inflexion"): rule to stop radial expansion.percentile_bins(int, default20): bins for percentile-based stopping.scan_radius_factor(float, default3.0): maximum scan radius as a multiple of the global standard deviation.scan_steps(int, default24): number of steps sampled along each ray.smooth_window(intorNone, defaultNone): moving-average window to smooth radial scans.drop_fraction(float, default0.5): fallback drop from the peak when no inflection is found.bounds_margin(float, default0.05): margin added to data bounds to avoid clipping during scans.grad_lr(float, default0.2): learning rate for gradient ascent.grad_max_iter(int, default80): maximum iterations for gradient ascent.grad_tol(float, default1e-5): tolerance on gradient norm to stop the ascent.grad_eps(float, default1e-3): finite-difference step for gradients.optim_method(str, default"gradient_ascent"): optimisation strategy; accepts"gradient_ascent"or"trust_region_newton".n_max_seeds(int, default2): number of random starting points.random_state(int, default42): seed for reproducibility.percentile_sample_size(int, default50000): sample size to compute percentile thresholds.max_subspaces(int, default20): maximum subspaces explored whenXhas more than three dimensions.verbose(int, default0): logging level;0silent,1summary,2detailed.save_labels(bool, defaultFalse): store label assignments to disk.prediction_within_region(bool, defaultFalse): evaluate the base estimator only within each region before predicting.out_dir(strorPath, optional): directory where auxiliary files are written.auto_rays_by_dim(bool, defaultTrue): automatically reduce the number of rays in high dimension.ray_mode(str, default"grad"): strategy used to generate candidate rays.use_spsa(bool, defaultTrue): use SPSA for gradient estimates when analytical gradients are unavailable.spsa_delta(float, default1e-2): SPSA perturbation size.spsa_avg(int, default4): number of SPSA evaluations per gradient estimate.ls_alpha0(float, default0.5): initial step size for line search.ls_shrink(float, default0.5): multiplicative shrink factor during line search.ls_min_alpha(float, default1e-3): minimum allowed step size.arc_max_steps(int, default64): maximum steps for arc exploration.arc_len_max(float, default3.0): maximum arc length.line_refine_steps(int, default8): refinement steps when mapping boundaries.use_adaptive_scan(boolorNone, defaultNone): enable adaptive radial scan when dimensionality is high.batch_size(int, default16384): batch size for model evaluations.coarse_steps(int, default12): number of coarse scan steps in high dimension.refine_steps(int, default4): number of refinement steps after the coarse scan.early_exit_patience(int, default1): early termination patience for flat regions.density_alpha(float, default0.0): exponent for density penalty.density_k(int, default15): neighbour count for density estimation.cluster_metrics_cls(dictorNone): callbacks to evaluate clusters in classification mode.cluster_metrics_reg(dictorNone): callbacks for regression mode.fast_membership(bool, defaultFalse): enable approximate membership computation.
Methods
fit(X, y)– learn regions from data and labels.predict(X)– assign cluster ids to samples.fit_predict(X, y=None)– convenience wrapper aroundfit+predict.predict_proba(X)– return per-cluster probabilities.decision_function(X)– base-estimator decision scores.predict_regions(X, label_path=None)– cluster ids and optional label dump.interpretability_summary(feature_names=None)– tabular region summary.plot_pairs(X, y=None, max_pairs=None, show_histograms=False)– 2D decision plots for feature pairs with optional marginal histograms.plot_pair_3d(X, pair, class_label=None, grid_res=50, alpha_surface=0.6, engine="matplotlib")– render a 3D surface for a feature pair.