CheChe


import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sheshe import CheChe

X, y = load_iris(return_X_y=True)
che = CheChe().fit(X, y, max_pairs=1)
che.plot_classes(X, y)
plt.show()

Computes convex-hull frontiers for selected feature pairs and provides simple

2D visualisations. Useful for exploring decision boundaries or cluster shapes,

it can subsample points via mapping_level and plot frontiers per class or

for scalar score functions.

Frontiers can also follow a score function contour when score_frontier is provided.

Mathematical formulation

For feature pair (i,j), CheChe projects samples with P_{ij}(x) = (x_i, x_j). QuickHull constructs the convex hull H of these projections.

The decision function outputs the negative distance from the projected point to the hull centroid: -‖P_{ij}(x) - c_H‖.

Example


from sheshe import CheChe
cc = CheChe()
cc.fit(X, y)
cc.plot_classes(X, y)

Usage examples


from sheshe import CheChe

che = CheChe(random_state=0)
che.fit(X, y)                      # fit


from sheshe import CheChe

che = CheChe(random_state=0)
che.fit_predict(X, y)              # fit_predict


from sheshe import CheChe

che = CheChe(random_state=0).fit(X, y)
che.predict(X)                     # predict


from sheshe import CheChe

che = CheChe(random_state=0).fit(X, y)
che.predict_proba(X)               # predict_proba


from sheshe import CheChe

che = CheChe(random_state=0).fit(X, y)
che.predict_regions(X)             # predict_regions


from sheshe import CheChe

che = CheChe(random_state=0).fit(X, y)
che.decision_function(X)           # decision_function


from sheshe import CheChe

che = CheChe(random_state=0).fit(X, y)
che.save("che.joblib")             # save


from sheshe import CheChe

che = CheChe.load("che.joblib")

Additional examples


from sklearn.datasets import load_iris
from sheshe import CheChe

X, y = load_iris(return_X_y=True)
ch = CheChe().fit(
    X,
    y,
    feature_names=["sepal length", "sepal width", "petal length", "petal width"],
    mapping_level=2,  # use every other sample
)
ch.plot_classes(X, y)


from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=200).fit(X, y)
CheChe().fit(X, y, score_model=model)

score_fn = lambda Z: model.predict_proba(Z)[:, 0]
CheChe().fit(X, score_fn=score_fn)

che = CheChe().fit(X, score_fn=score_fn, score_frontier=0.8, grid_res=40)
che.plot_pairs(X)

Parameters

random_state (int or None, default None): seed for reproducibility.

Fit-time options

The fit method accepts additional arguments mirroring the ShuShu API:

score_fn (callable, optional): scalar score function when y is not provided.
feature_names (list[str] or None): names for features.
score_model (estimator, optional): model used to derive class probabilities.
score_fn_multi (callable, optional): multi-class score function.
score_fn_per_class (list[callable], optional): per-class score functions.
max_pairs (int or None, default 10): maximum number of feature pairs to analyse.
mapping_level (int or None, default None): down-sampling level for frontier computation.
score_frontier (float, optional): contour level for score-based frontiers when a score_fn is supplied.
grid_res (int, default 200): evaluation grid resolution used for score-based frontiers.

Methods

fit(X, y=None, **kwargs) – estimate 2D frontiers for feature pairs.
fit_predict(X, y=None, **kwargs) – fit the model and immediately return predictions.
predict(X) – assign region ids or class labels.
predict_proba(X) – return class probabilities in multiclass mode.
predict_regions(X) – DataFrame with labels and region ids.
decision_function(X) – negative distances to region centres.
plot_pairs(X, class_index=None, feature_names=None, show_histograms=False) – scatter plots with frontier overlays for stored pairs and optional marginal histograms.
plot_classes(X, y, ...) – plot frontiers for each class when in supervised mode.