Multiclass Classification β
Multiclass classification extends binary classification to three or more outcome classes. One of the most common use cases from Advances in Financial Machine Learning is the triple-barrier method, which produces three distinct labels: +1 (profit target hit), 0 (time expiry), and -1 (stop-loss hit).
With task="multiclass", train_model passes the raw integer labels directly to the estimator and reports per-class precision, recall, F1, and a full NΓN confusion matrix.
The triple-barrier method β
The triple-barrier method, introduced by Marcos LΓ³pez de Prado, labels each observation according to the first barrier touched out of three barriers anchored to a price series β no actual positions need to be opened. For each observation bar you anchor:
- Upper barrier β a profit-taking level at
entry + distance - Lower barrier β a stop-loss level at
entry - distance - Vertical barrier β a maximum holding period (number of bars)
The label is determined by which barrier price touches first:
| Barrier touched first | Label |
|---|---|
| Upper (profit target) | +1 |
| Lower (stop-loss) | -1 |
| Vertical (time expiry) | 0 |
This is used when you do not have a primary model that tells you the side of the bet β because the barriers are symmetric, the label is determined purely by which direction price moves first. The 0 class captures periods where price drifts without conviction β it expires within the holding window without touching either horizontal barrier.
TIP
When you do have a primary model that sets the side (long or short), use meta-labeling instead. The labeling machinery is identical, but because the side is known the three outcomes collapse to two: profit target hit = True (the primary signal was correct), stop or time expiry = False (the primary signal was wrong). See the Meta-Labeling page for that pattern.
Label convention β
For task="multiclass", train_model calls int(label_value) for each sample. Your record_label calls must produce values that can be cleanly cast to int:
self.record_label("triple_barrier", 1) # upper barrier hit
self.record_label("triple_barrier", -1) # lower barrier hit
self.record_label("triple_barrier", 0) # vertical barrier (time expiry)WARNING
Do not use task="multiclass" with boolean labels. Boolean True and False cast to 1 and 0, so class -1 would never exist. Use task="binary" for boolean labels.
Strategy example β triple-barrier in before β
The triple-barrier pattern runs entirely in the before hook, without opening any actual position. Features and barrier levels are anchored once per observation window. On each subsequent bar the strategy checks whether a barrier has been touched.
import jesse.indicators as ta
import numpy as np
from jesse.strategies import Strategy
class MyStrategy(Strategy):
vertical_barrier = 50 # maximum holding period in bars
# ββ internal state ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
_features_recorded = False
_record_index = 0
_barrier_upper = None
_barrier_lower = None
@property
def _distance(self):
return ta.atr(self.candles) * 2
# ββ single source of truth for features βββββββββββββββββββββββββββββββββ
def ml_features(self) -> dict:
atr = ta.atr(self.candles) + 1e-9
price = self.price
ema9 = ta.ema(self.candles, 9) + 1e-9
ema21 = ta.ema(self.candles, 21) + 1e-9
ema50 = ta.ema(self.candles, 50) + 1e-9
closes = self.candles[:, 2]
log_ret_1 = float(np.log(closes[-1] / closes[-2])) if closes[-2] != 0 else 0.0
log_ret_5 = float(np.log(closes[-1] / closes[-6])) if closes[-6] != 0 else 0.0
keltner = ta.keltner(self.candles)
keltner_w = (keltner.upperband - keltner.lowerband) + 1e-9
return {
"adx_centered": (float(ta.adx(self.candles)) - 25) / 25,
"atr_pct": atr / price,
"ema21_50_ratio": (ema21 - ema50) / ema50,
"ema9_21_ratio": (ema9 - ema21) / ema21,
"ema9_dist": (price - ema9) / ema9,
"keltner_pos": (price - keltner.lowerband) / keltner_w,
"log_return_1": log_ret_1,
"log_return_5": log_ret_5,
"rsi_centered": (float(ta.rsi(self.candles)) - 50) / 50,
"supertrend_dist": (price - ta.supertrend(self.candles).trend) / atr,
}
def should_long(self) -> bool:
return False # no actual trades placed in gather mode
def should_short(self) -> bool:
return False # no actual trades placed in gather mode
def should_cancel_entry(self) -> bool:
return True
def before(self) -> None:
if self.ml_mode != "gather":
return
if not self._features_recorded:
self.record_features(self.ml_features())
# ββ Anchor barriers to this bar's price ββββββββββββββββββββββ
price = self.price
self._barrier_upper = price + self._distance
self._barrier_lower = price - self._distance
self._record_index = self.index
self._features_recorded = True
return
# ββ On subsequent bars: check whether a barrier was touched ββββββ
upper_touched = self.price >= self._barrier_upper
lower_touched = self.price <= self._barrier_lower
vertical_touched = (self.index - self._record_index) >= self.vertical_barrier
if upper_touched or lower_touched or vertical_touched:
label = 1 if upper_touched else (-1 if lower_touched else 0)
self.record_label("triple_barrier", label)
# ββ Reset for the next observation ββββββββββββββββββββββββββββ
self._features_recorded = False
self._barrier_upper = None
self._barrier_lower = NoneTraining β
# train_multiclass.py
from sklearn.ensemble import RandomForestClassifier
from jesse.research import load_ml_data_csv, train_model
STRATEGY = "MyStrategy"
data = load_ml_data_csv(STRATEGY)
result = train_model(
data=data,
estimator=RandomForestClassifier(
n_estimators=300,
max_depth=8,
class_weight="balanced", # handles -1/0/+1 imbalance automatically
random_state=42,
),
task="multiclass",
test_ratio=0.2,
save_to=f"strategies/{STRATEGY}",
name=STRATEGY,
)
acc = result["metrics"]["accuracy"]
auc = result["metrics"]["roc_auc_macro"] # macro one-vs-rest AUC
mcc = result["metrics"]["mcc"]
print(f"Accuracy : {acc:.1%}")
print(f"ROC AUC (macro) : {auc:.3f}")
print(f"MCC : {mcc:+.3f}")Choosing an estimator β
Most sklearn classifiers support multiclass natively β no special wrapping needed.
| Classifier | Best dataset size | Handles noisy data | Handles class imbalance | Label constraints | Training speed | Notes |
|---|---|---|---|---|---|---|
| Random Forest | MediumβLarge (5 k β 500 k) | β Good | β
class_weight="balanced" | Any integers incl. negatives | Fast | Best default for triple-barrier; low overfitting risk |
| XGBoost | Large (> 50 k) | β Good | β
scale_pos_weight / sample_weight | Non-negative integers only β must remap {-1, 0, 1} β {0, 1, 2} | Fast (GPU support) | Highest accuracy ceiling; requires label remapping for triple-barrier labels |
| Multiclass SVM | Small (< 10 k) | β οΈ Can struggle on weak features | β
class_weight="balanced" | Any integers incl. negatives | Slow on large data | Uses one-vs-one strategy internally; requires probability=True for predict_proba |
| Gradient Boosting | MediumβLarge (5 k β 500 k) | β Good | β οΈ Use sample_weight β no class_weight support | Any integers incl. negatives | Medium | Does not support class_weight directly; handle imbalance via sample_weight |
Random Forest β the recommended starting point for triple-barrier. class_weight="balanced" handles the common imbalance between +1, 0, and -1 classes automatically.
from sklearn.ensemble import RandomForestClassifier
RandomForestClassifier(n_estimators=300, max_depth=8, class_weight="balanced")XGBoost β higher accuracy on large datasets (> 50 k samples). For multiclass use objective="multi:softprob" and set num_class:
from xgboost import XGBClassifier
XGBClassifier(
objective="multi:softprob",
num_class=3,
n_estimators=300,
max_depth=4,
eval_metric="mlogloss",
)WARNING
XGBoost expects class labels to be non-negative integers starting from 0. If your labels are {-1, 0, 1}, you must remap them before passing data to train_model, and reverse the mapping when interpreting predictions in deploy mode:
# Remap before training: -1β0, 0β1, 1β2
for p in data:
p["label"]["value"] = {-1: 0, 0: 1, 1: 2}[p["label"]["value"]]
# In deploy mode, reverse: model.classes_ will be [0, 1, 2]
# class index 0 = original -1 (short), 1 = original 0 (neutral), 2 = original +1 (long)This remapping is not needed for sklearn estimators (RandomForest, SVC, GradientBoosting) β they handle integer labels including negative ones natively.
Multiclass SVM β sklearn's SVC supports multiclass via one-vs-one by default. Set probability=True for predict_proba support (needed by train_model).
from sklearn.svm import SVC
SVC(probability=True, kernel="rbf", C=1.0, gamma="scale", class_weight="balanced")Understanding the training report β
Dataset section β
Shows the count and percentage of each class (triple_barrier = 1, triple_barrier = 0, triple_barrier = -1). A typical triple-barrier dataset is roughly balanced between +1 and -1, with fewer 0 samples.
Feature importance β
Same four-method consensus table as binary classification (RFE, ANOVA F-value, |Corr|, CV-Impact), but the proxy estimator and F-test are evaluated against all three classes.
Model performance β
Summary metrics
| Accuracy | ROC AUC (macro OVR) | MCC |
|---|---|---|
| 47.1% | 0.609 | +0.044 |
Confusion matrix
| Pred -1 | Pred 0 | Pred +1 | |
|---|---|---|---|
| Actual -1 | 253 | 8 | 234 |
| Actual 0 | 40 | 10 | 45 |
| Actual +1 | 245 | 14 | 258 |
Per-class precision / recall / F1
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| -1 | 0.470 | 0.511 | 0.490 | 495 |
| 0 | 0.312 | 0.105 | 0.157 | 95 |
| +1 | 0.480 | 0.499 | 0.490 | 517 |
Key metrics:
Accuracy β fraction of correct predictions across all three classes. On its own this can be misleading: a model that always predicts the majority class can look decent while being useless. Check MCC and F1 alongside it.
ROC AUC (macro OVR) β one-vs-rest AUC averaged across classes. Accessed as
result["metrics"]["roc_auc_macro"]. For each class the model is asked "can you rank this class above the others?" β 0.5 means random, 1.0 means perfect. Macro averaging weights all three classes equally regardless of how many samples they have.MCC β Matthews Correlation Coefficient extended to multiclass. Ranges from β1 to +1; 0 means the model is no better than random guessing, +1 is perfect. It is the most reliable single-number summary on imbalanced data because it accounts for all cells of the confusion matrix at once. Accessed as
result["metrics"]["mcc"].Confusion matrix β rows are the true labels, columns are what the model predicted. The diagonal cells (top-left to bottom-right) are correct predictions; everything off the diagonal is a mistake. Large off-diagonal numbers tell you which pairs of classes the model confuses most β for triple-barrier data,
+1and-1being confused with each other is far more costly than either being confused with0.Precision (per class) β of all the times the model predicted this class, how often was it actually correct? High precision means few false alarms. In a trading context: if precision for
+1is 0.48, the model's "go long" signals are right only 48 % of the time.Recall (per class) β of all the samples that truly belong to this class, how many did the model catch? High recall means few missed signals. Low recall on
+1means many real winning opportunities were ignored.F1 (per class) β the harmonic mean of precision and recall:
2 Γ (precision Γ recall) / (precision + recall). It balances the two β a model that achieves high precision by being very selective (low recall) will be penalised, and so will a model that catches everything (high recall) but fires too many false alarms (low precision). The0(neutral) class typically has the lowest F1 because range-bound bars are genuinely ambiguous; this is expected and acceptable.Support β the number of test samples in each class. Useful for judging how reliable the precision/recall/F1 numbers are: a class with only 95 samples (like
0above) will show noisier metrics than one with 500+.
Feature impact β
Same as binary: the model is retrained without each feature in turn and test accuracy is compared to the baseline. The Β±1.5% dead zone applies here too β changes smaller than 1.5% are reported as neutral since they are within the noise margin of retraining on a small financial dataset.
The 0 class β keep or drop? β
A common question is whether to include the 0 (time-expiry) samples in training at all. There are two schools of thought:
Keep them β The 0 class captures periods of genuine indecision. A model that correctly predicts 0 avoids entering trades that would just expire neutral, saving on fees.
Drop them β If you only care about direction (+1 vs -1), filtering out 0 samples turns the problem back into binary and often produces higher accuracy on the classes that matter.
To train only on +1 vs -1, filter the data before passing it to train_model. Because the > 0 mapping rule applies, +1 becomes class 1 and -1 becomes class 0 automatically when using task="binary":
from sklearn.ensemble import RandomForestClassifier
from jesse.research import load_ml_data_csv, train_model
data = load_ml_data_csv("MyStrategy")
filtered = [p for p in data if p["label"]["value"] != 0]
print(f"Kept {len(filtered):,} directional samples "
f"(dropped {len(data) - len(filtered):,} neutral)")
result = train_model(
data=filtered,
estimator=RandomForestClassifier(
n_estimators=300, class_weight="balanced", random_state=42
),
task="binary", # +1 β class 1, -1 β class 0 (via the > 0 rule)
test_ratio=0.2,
save_to=f"strategies/MyStrategy",
name="MyStrategy",
)TIP
This filtered binary model is often the most useful variant of a triple-barrier dataset: it tells you "when price moved significantly, did it go up or down?" β without the ambiguity of the neutral 0 class.
Using the model in deploy mode β
In deploy mode, call self.ml_predict_proba() directly. It automatically loads the model, calls self.ml_features(), scales the features, and returns a {class_label: probability} dict.
def should_long(self) -> bool:
if self.ml_mode == "gather":
return False
probs = self.ml_predict_proba()
prob_up = probs.get(1, 0.0)
prob_down = probs.get(-1, 0.0)
# Enter long only when the model strongly favours the +1 class
return prob_up >= 0.55 and prob_up > prob_down * 1.2
def should_short(self) -> bool:
if self.ml_mode == "gather":
return False
probs = self.ml_predict_proba()
prob_down = probs.get(-1, 0.0)
prob_up = probs.get(1, 0.0)
return prob_down >= 0.55 and prob_down > prob_up * 1.2Because ml_features() is defined once and used by both before() (gather mode) and ml_predict_proba() (deploy mode), train/deploy feature skew is impossible. See the Deploying in a Strategy page for the full combined template.
