ModalScoutEnsemble


import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sheshe import ModalScoutEnsemble

X, y = load_iris(return_X_y=True)
mse = ModalScoutEnsemble().fit(X, y)
mse.plot_classes(X, y)
plt.show()

Ensemble that applies ModalBoundaryClustering on the most promising

subspaces discovered by SubspaceScout. Each submodel is weighted by scout

score, cross-validation and feature importance, and the ensemble can delegate

optimisation to ShuShu when ensemble_method="shushu".

Example


from sheshe import ModalScoutEnsemble
from sklearn.linear_model import LogisticRegression
mse = ModalScoutEnsemble(base_estimator=LogisticRegression())
mse.fit(X, y)
labels = mse.predict(X)

Usage examples


from sheshe import ModalScoutEnsemble
from sklearn.linear_model import LogisticRegression

mse = ModalScoutEnsemble(base_estimator=LogisticRegression(), random_state=0)
mse.fit(X, y)                      # fit

from sheshe import ModalScoutEnsemble
from sklearn.linear_model import LogisticRegression

mse = ModalScoutEnsemble(base_estimator=LogisticRegression(), random_state=0)
mse.fit_predict(X, y)              # fit_predict

from sheshe import ModalScoutEnsemble
from sklearn.linear_model import LogisticRegression

mse = ModalScoutEnsemble(base_estimator=LogisticRegression(), random_state=0)
mse.fit_transform(X, y)            # fit_transform

from sheshe import ModalScoutEnsemble
from sklearn.linear_model import LogisticRegression

mse = ModalScoutEnsemble(base_estimator=LogisticRegression(), random_state=0).fit(X, y)
mse.transform(X)                   # transform

from sheshe import ModalScoutEnsemble
from sklearn.linear_model import LogisticRegression

mse = ModalScoutEnsemble(base_estimator=LogisticRegression(), random_state=0).fit(X, y)
mse.predict(X)                     # predict

from sheshe import ModalScoutEnsemble
from sklearn.linear_model import LogisticRegression

mse = ModalScoutEnsemble(base_estimator=LogisticRegression(), random_state=0).fit(X, y)
mse.predict_proba(X)               # predict_proba

from sheshe import ModalScoutEnsemble
from sklearn.linear_model import LogisticRegression

mse = ModalScoutEnsemble(base_estimator=LogisticRegression(), random_state=0).fit(X, y)
mse.decision_function(X)           # decision_function

from sheshe import ModalScoutEnsemble
from sklearn.linear_model import LogisticRegression

mse = ModalScoutEnsemble(base_estimator=LogisticRegression(), random_state=0).fit(X, y)
mse.predict_regions(X)             # predict_regions

from sheshe import ModalScoutEnsemble
from sklearn.linear_model import LogisticRegression

mse = ModalScoutEnsemble(base_estimator=LogisticRegression(), random_state=0).fit(X, y)
mse.score(X, y)                    # score

from sheshe import ModalScoutEnsemble
from sklearn.linear_model import LogisticRegression

mse = ModalScoutEnsemble(base_estimator=LogisticRegression(), random_state=0).fit(X, y)
mse.save("mse.joblib")             # save

from sheshe import ModalScoutEnsemble
from sklearn.linear_model import LogisticRegression

mse = ModalScoutEnsemble.load("mse.joblib")

Additional examples


from sheshe import ModalScoutEnsemble
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression

iris = load_iris()
X, y = iris.data, iris.target

mse = ModalScoutEnsemble(
    base_estimator=LogisticRegression(max_iter=200),
    task="classification",
    random_state=0,
    scout_kwargs={"max_order": 2, "top_m": 4, "sample_size": None},
    cv=2,
    # ensemble_method="shushu" would use the ShuShu optimizer
)
mse.fit(X, y)
print(mse.predict(X[:5]))
print(mse.predict_proba(X[:5]))

Parameters

  • base_estimator (BaseEstimator): model used to compute probabilities or predictions in each subspace.
  • task (str, optional): "classification" or "regression". Inferred from the base estimator if None.
  • ensemble_method (str, default "modal_scout"): either "modal_scout" to use the internal subspace ensemble or "shushu" to delegate to ShuShu.
  • top_k (int, default 8): maximum number of subspaces kept.
  • min_score (float or None): minimum score required for a subspace to be used.
  • max_order (int or None): maximum order of subspaces evaluated.
  • metric (str or None, default "mi_synergy"): criterion used to rank subspaces.
  • jaccard_threshold (float, default 0.55): minimum Jaccard similarity to consider two subspaces redundant.
  • alpha (float, default 0.5): exponent for the scout score in the final weighting.
  • beta (float, default 0.5): exponent for cross‑validation performance.
  • gamma (float, default 0.5): exponent for global feature importance.
  • cv (int or None, default 3): number of CV folds; 0 or None uses a holdout split.
  • cv_metric_cls (Callable, default balanced_accuracy_score): metric for classification CV.
  • cv_metric_reg (Callable, default r2_score): metric for regression CV.
  • cv_floor (float or None): discard subspaces with CV below this value.
  • n_jobs (int, default 1): number of parallel jobs for CV.
  • random_state (int or None, default 0): RNG seed.
  • base_2d_rays (int, default 8): base number of rays for MBC fits in each subspace.
  • ray_cap (int, default 48): maximum rays allowed per subspace.
  • time_budget_s (float or None): optional global time budget for fitting.
  • use_importances (bool, default True): include global feature importances in the weighting.
  • importance_sample_size (int or None, default 4096): sample size for computing global importances.
  • scout_kwargs (dict or None): parameters forwarded to SubspaceScout.
  • shushu_kwargs (dict or None): parameters forwarded to ShuShu when ensemble_method="shushu".
  • mbc_kwargs (dict or None): additional arguments passed to each ModalBoundaryClustering instance.
  • verbose (int, default 0): logging level.
  • prediction_within_region (bool, default False): evaluate base estimator only within each region during prediction.

Methods

  • fit(X, y) – train the ensemble on X and y.
  • predict(X) – predict labels or cluster ids.
  • predict_proba(X) – class probabilities aggregated across subspaces.
  • decision_function(X) – decision scores averaged across submodels.
  • predict_regions(X) – DataFrame with region assignments.
  • plot_pairs(X, y=None, show_histograms=False, **kwargs) – delegate to ModalBoundaryClustering.plot_pairs for a selected submodel, including optional marginal histograms.
  • plot_pair_3d(X, pair, **kwargs) – 3D surface for a feature pair of a submodel.