CheChe


import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sheshe import CheChe

X, y = load_iris(return_X_y=True)
che = CheChe().fit(X, y, max_pairs=1)
che.plot_classes(X, y)
plt.show()

Computes convex-hull frontiers for selected feature pairs and provides simple

2D visualisations. Useful for exploring decision boundaries or cluster shapes,

it can subsample points via mapping_level and plot frontiers per class or

for scalar score functions.

Frontiers can also follow a score function contour when score_frontier is provided.

Mathematical formulation

For feature pair (i,j), CheChe projects samples with P_{ij}(x) = (x_i, x_j). QuickHull constructs the convex hull H of these projections.

The decision function outputs the negative distance from the projected point to the hull centroid: -‖P_{ij}(x) - c_H‖.

Example


from sheshe import CheChe
cc = CheChe()
cc.fit(X, y)
cc.plot_classes(X, y)

Usage examples


from sheshe import CheChe

che = CheChe(random_state=0)
che.fit(X, y)                      # fit

from sheshe import CheChe

che = CheChe(random_state=0)
che.fit_predict(X, y)              # fit_predict

from sheshe import CheChe

che = CheChe(random_state=0).fit(X, y)
che.predict(X)                     # predict

from sheshe import CheChe

che = CheChe(random_state=0).fit(X, y)
che.predict_proba(X)               # predict_proba

from sheshe import CheChe

che = CheChe(random_state=0).fit(X, y)
che.predict_regions(X)             # predict_regions

from sheshe import CheChe

che = CheChe(random_state=0).fit(X, y)
che.decision_function(X)           # decision_function

from sheshe import CheChe

che = CheChe(random_state=0).fit(X, y)
che.save("che.joblib")             # save

from sheshe import CheChe

che = CheChe.load("che.joblib")

Additional examples


from sklearn.datasets import load_iris
from sheshe import CheChe

X, y = load_iris(return_X_y=True)
ch = CheChe().fit(
    X,
    y,
    feature_names=["sepal length", "sepal width", "petal length", "petal width"],
    mapping_level=2,  # use every other sample
)
ch.plot_classes(X, y)

from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=200).fit(X, y)
CheChe().fit(X, y, score_model=model)

score_fn = lambda Z: model.predict_proba(Z)[:, 0]
CheChe().fit(X, score_fn=score_fn)

che = CheChe().fit(X, score_fn=score_fn, score_frontier=0.8, grid_res=40)
che.plot_pairs(X)

Parameters

  • random_state (int or None, default None): seed for reproducibility.

Fit-time options

The fit method accepts additional arguments mirroring the ShuShu API:

  • score_fn (callable, optional): scalar score function when y is not provided.
  • feature_names (list[str] or None): names for features.
  • score_model (estimator, optional): model used to derive class probabilities.
  • score_fn_multi (callable, optional): multi-class score function.
  • score_fn_per_class (list[callable], optional): per-class score functions.
  • max_pairs (int or None, default 10): maximum number of feature pairs to analyse.
  • mapping_level (int or None, default None): down-sampling level for frontier computation.
  • score_frontier (float, optional): contour level for score-based frontiers when a score_fn is supplied.
  • grid_res (int, default 200): evaluation grid resolution used for score-based frontiers.

Methods

  • fit(X, y=None, **kwargs) – estimate 2D frontiers for feature pairs.
  • fit_predict(X, y=None, **kwargs) – fit the model and immediately return predictions.
  • predict(X) – assign region ids or class labels.
  • predict_proba(X) – return class probabilities in multiclass mode.
  • predict_regions(X) – DataFrame with labels and region ids.
  • decision_function(X) – negative distances to region centres.
  • plot_pairs(X, class_index=None, feature_names=None, show_histograms=False) – scatter plots with frontier overlays for stored pairs and optional marginal histograms.
  • plot_classes(X, y, ...) – plot frontiers for each class when in supervised mode.