CheChe
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sheshe import CheChe
X, y = load_iris(return_X_y=True)
che = CheChe().fit(X, y, max_pairs=1)
che.plot_classes(X, y)
plt.show()
Computes convex-hull frontiers for selected feature pairs and provides simple
2D visualisations. Useful for exploring decision boundaries or cluster shapes,
it can subsample points via mapping_level and plot frontiers per class or
for scalar score functions.
Frontiers can also follow a score function contour when score_frontier is provided.
Mathematical formulation
For feature pair (i,j), CheChe projects samples with P_{ij}(x) = (x_i, x_j). QuickHull constructs the convex hull H of these projections.
The decision function outputs the negative distance from the projected point to the hull centroid: -‖P_{ij}(x) - c_H‖.
Example
from sheshe import CheChe
cc = CheChe()
cc.fit(X, y)
cc.plot_classes(X, y)
Usage examples
from sheshe import CheChe
che = CheChe(random_state=0)
che.fit(X, y) # fit
from sheshe import CheChe
che = CheChe(random_state=0)
che.fit_predict(X, y) # fit_predict
from sheshe import CheChe
che = CheChe(random_state=0).fit(X, y)
che.predict(X) # predict
from sheshe import CheChe
che = CheChe(random_state=0).fit(X, y)
che.predict_proba(X) # predict_proba
from sheshe import CheChe
che = CheChe(random_state=0).fit(X, y)
che.predict_regions(X) # predict_regions
from sheshe import CheChe
che = CheChe(random_state=0).fit(X, y)
che.decision_function(X) # decision_function
from sheshe import CheChe
che = CheChe(random_state=0).fit(X, y)
che.save("che.joblib") # save
from sheshe import CheChe
che = CheChe.load("che.joblib")
Additional examples
from sklearn.datasets import load_iris
from sheshe import CheChe
X, y = load_iris(return_X_y=True)
ch = CheChe().fit(
X,
y,
feature_names=["sepal length", "sepal width", "petal length", "petal width"],
mapping_level=2, # use every other sample
)
ch.plot_classes(X, y)
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(max_iter=200).fit(X, y)
CheChe().fit(X, y, score_model=model)
score_fn = lambda Z: model.predict_proba(Z)[:, 0]
CheChe().fit(X, score_fn=score_fn)
che = CheChe().fit(X, score_fn=score_fn, score_frontier=0.8, grid_res=40)
che.plot_pairs(X)
Parameters
random_state(intorNone, defaultNone): seed for reproducibility.
Fit-time options
The fit method accepts additional arguments mirroring the ShuShu API:
score_fn(callable, optional): scalar score function whenyis not provided.feature_names(list[str]orNone): names for features.score_model(estimator, optional): model used to derive class probabilities.score_fn_multi(callable, optional): multi-class score function.score_fn_per_class(list[callable], optional): per-class score functions.max_pairs(intorNone, default10): maximum number of feature pairs to analyse.mapping_level(intorNone, defaultNone): down-sampling level for frontier computation.score_frontier(float, optional): contour level for score-based frontiers when ascore_fnis supplied.grid_res(int, default200): evaluation grid resolution used for score-based frontiers.
Methods
fit(X, y=None, **kwargs)– estimate 2D frontiers for feature pairs.fit_predict(X, y=None, **kwargs)– fit the model and immediately return predictions.predict(X)– assign region ids or class labels.predict_proba(X)– return class probabilities in multiclass mode.predict_regions(X)– DataFrame with labels and region ids.decision_function(X)– negative distances to region centres.plot_pairs(X, class_index=None, feature_names=None, show_histograms=False)– scatter plots with frontier overlays for stored pairs and optional marginal histograms.plot_classes(X, y, ...)– plot frontiers for each class when in supervised mode.