Skip to content

Multiclass Classification ​

Multiclass classification extends binary classification to three or more outcome classes. One of the most common use cases from Advances in Financial Machine Learning is the triple-barrier method, which produces three distinct labels: +1 (profit target hit), 0 (time expiry), and -1 (stop-loss hit).

With task="multiclass", train_model passes the raw integer labels directly to the estimator and reports per-class precision, recall, F1, and a full NΓ—N confusion matrix.

The triple-barrier method ​

The triple-barrier method, introduced by Marcos LΓ³pez de Prado, labels each observation according to the first barrier touched out of three barriers anchored to a price series β€” no actual positions need to be opened. For each observation bar you anchor:

  • Upper barrier β€” a profit-taking level at entry + distance
  • Lower barrier β€” a stop-loss level at entry - distance
  • Vertical barrier β€” a maximum holding period (number of bars)

The label is determined by which barrier price touches first:

Barrier touched firstLabel
Upper (profit target)+1
Lower (stop-loss)-1
Vertical (time expiry)0

This is used when you do not have a primary model that tells you the side of the bet β€” because the barriers are symmetric, the label is determined purely by which direction price moves first. The 0 class captures periods where price drifts without conviction β€” it expires within the holding window without touching either horizontal barrier.

TIP

When you do have a primary model that sets the side (long or short), use meta-labeling instead. The labeling machinery is identical, but because the side is known the three outcomes collapse to two: profit target hit = True (the primary signal was correct), stop or time expiry = False (the primary signal was wrong). See the Meta-Labeling page for that pattern.

Label convention ​

For task="multiclass", train_model calls int(label_value) for each sample. Your record_label calls must produce values that can be cleanly cast to int:

python
self.record_label("triple_barrier", 1)    # upper barrier hit
self.record_label("triple_barrier", -1)   # lower barrier hit
self.record_label("triple_barrier", 0)    # vertical barrier (time expiry)

WARNING

Do not use task="multiclass" with boolean labels. Boolean True and False cast to 1 and 0, so class -1 would never exist. Use task="binary" for boolean labels.

Strategy example β€” triple-barrier in before ​

The triple-barrier pattern runs entirely in the before hook, without opening any actual position. Features and barrier levels are anchored once per observation window. On each subsequent bar the strategy checks whether a barrier has been touched.

python
import jesse.indicators as ta
import numpy as np
from jesse.strategies import Strategy


class MyStrategy(Strategy):
    vertical_barrier = 50         # maximum holding period in bars

    # ── internal state ──────────────────────────────────────────────────────
    _features_recorded = False
    _record_index      = 0
    _barrier_upper     = None
    _barrier_lower     = None

    @property
    def _distance(self):
        return ta.atr(self.candles) * 2

    # ── single source of truth for features ─────────────────────────────────

    def ml_features(self) -> dict:
        atr   = ta.atr(self.candles) + 1e-9
        price = self.price

        ema9  = ta.ema(self.candles, 9)  + 1e-9
        ema21 = ta.ema(self.candles, 21) + 1e-9
        ema50 = ta.ema(self.candles, 50) + 1e-9

        closes    = self.candles[:, 2]
        log_ret_1 = float(np.log(closes[-1] / closes[-2])) if closes[-2] != 0 else 0.0
        log_ret_5 = float(np.log(closes[-1] / closes[-6])) if closes[-6] != 0 else 0.0

        keltner   = ta.keltner(self.candles)
        keltner_w = (keltner.upperband - keltner.lowerband) + 1e-9

        return {
            "adx_centered":    (float(ta.adx(self.candles)) - 25) / 25,
            "atr_pct":         atr / price,
            "ema21_50_ratio":  (ema21 - ema50) / ema50,
            "ema9_21_ratio":   (ema9  - ema21) / ema21,
            "ema9_dist":       (price - ema9)  / ema9,
            "keltner_pos":     (price - keltner.lowerband) / keltner_w,
            "log_return_1":    log_ret_1,
            "log_return_5":    log_ret_5,
            "rsi_centered":    (float(ta.rsi(self.candles)) - 50) / 50,
            "supertrend_dist": (price - ta.supertrend(self.candles).trend) / atr,
        }

    def should_long(self) -> bool:
        return False   # no actual trades placed in gather mode

    def should_short(self) -> bool:
        return False   # no actual trades placed in gather mode

    def should_cancel_entry(self) -> bool:
        return True

    def before(self) -> None:
        if self.ml_mode != "gather":
            return

        if not self._features_recorded:
            self.record_features(self.ml_features())

            # ── Anchor barriers to this bar's price ──────────────────────
            price = self.price
            self._barrier_upper    = price + self._distance
            self._barrier_lower    = price - self._distance
            self._record_index     = self.index
            self._features_recorded = True
            return

        # ── On subsequent bars: check whether a barrier was touched ──────
        upper_touched    = self.price >= self._barrier_upper
        lower_touched    = self.price <= self._barrier_lower
        vertical_touched = (self.index - self._record_index) >= self.vertical_barrier

        if upper_touched or lower_touched or vertical_touched:
            label = 1 if upper_touched else (-1 if lower_touched else 0)
            self.record_label("triple_barrier", label)

            # ── Reset for the next observation ────────────────────────────
            self._features_recorded = False
            self._barrier_upper     = None
            self._barrier_lower     = None

Training ​

python
# train_multiclass.py
from sklearn.ensemble import RandomForestClassifier
from jesse.research import load_ml_data_csv, train_model

STRATEGY = "MyStrategy"
data     = load_ml_data_csv(STRATEGY)

result = train_model(
    data=data,
    estimator=RandomForestClassifier(
        n_estimators=300,
        max_depth=8,
        class_weight="balanced",   # handles -1/0/+1 imbalance automatically
        random_state=42,
    ),
    task="multiclass",
    test_ratio=0.2,
    save_to=f"strategies/{STRATEGY}",
    name=STRATEGY,
)

acc = result["metrics"]["accuracy"]
auc = result["metrics"]["roc_auc_macro"]   # macro one-vs-rest AUC
mcc = result["metrics"]["mcc"]
print(f"Accuracy        : {acc:.1%}")
print(f"ROC AUC (macro) : {auc:.3f}")
print(f"MCC             : {mcc:+.3f}")

Choosing an estimator ​

Most sklearn classifiers support multiclass natively β€” no special wrapping needed.

ClassifierBest dataset sizeHandles noisy dataHandles class imbalanceLabel constraintsTraining speedNotes
Random ForestMedium–Large (5 k – 500 k)βœ… Goodβœ… class_weight="balanced"Any integers incl. negativesFastBest default for triple-barrier; low overfitting risk
XGBoostLarge (> 50 k)βœ… Goodβœ… scale_pos_weight / sample_weightNon-negative integers only β€” must remap {-1, 0, 1} β†’ {0, 1, 2}Fast (GPU support)Highest accuracy ceiling; requires label remapping for triple-barrier labels
Multiclass SVMSmall (< 10 k)⚠️ Can struggle on weak featuresβœ… class_weight="balanced"Any integers incl. negativesSlow on large dataUses one-vs-one strategy internally; requires probability=True for predict_proba
Gradient BoostingMedium–Large (5 k – 500 k)βœ… Good⚠️ Use sample_weight β€” no class_weight supportAny integers incl. negativesMediumDoes not support class_weight directly; handle imbalance via sample_weight

Random Forest β€” the recommended starting point for triple-barrier. class_weight="balanced" handles the common imbalance between +1, 0, and -1 classes automatically.

python
from sklearn.ensemble import RandomForestClassifier

RandomForestClassifier(n_estimators=300, max_depth=8, class_weight="balanced")

XGBoost β€” higher accuracy on large datasets (> 50 k samples). For multiclass use objective="multi:softprob" and set num_class:

python
from xgboost import XGBClassifier

XGBClassifier(
    objective="multi:softprob",
    num_class=3,
    n_estimators=300,
    max_depth=4,
    eval_metric="mlogloss",
)

WARNING

XGBoost expects class labels to be non-negative integers starting from 0. If your labels are {-1, 0, 1}, you must remap them before passing data to train_model, and reverse the mapping when interpreting predictions in deploy mode:

python
# Remap before training: -1β†’0, 0β†’1, 1β†’2
for p in data:
    p["label"]["value"] = {-1: 0, 0: 1, 1: 2}[p["label"]["value"]]

# In deploy mode, reverse: model.classes_ will be [0, 1, 2]
# class index 0 = original -1 (short), 1 = original 0 (neutral), 2 = original +1 (long)

This remapping is not needed for sklearn estimators (RandomForest, SVC, GradientBoosting) β€” they handle integer labels including negative ones natively.

Multiclass SVM β€” sklearn's SVC supports multiclass via one-vs-one by default. Set probability=True for predict_proba support (needed by train_model).

python
from sklearn.svm import SVC

SVC(probability=True, kernel="rbf", C=1.0, gamma="scale", class_weight="balanced")

Understanding the training report ​

Dataset section ​

Shows the count and percentage of each class (triple_barrier = 1, triple_barrier = 0, triple_barrier = -1). A typical triple-barrier dataset is roughly balanced between +1 and -1, with fewer 0 samples.

Feature importance ​

Same four-method consensus table as binary classification (RFE, ANOVA F-value, |Corr|, CV-Impact), but the proxy estimator and F-test are evaluated against all three classes.

Model performance ​

Summary metrics

AccuracyROC AUC (macro OVR)MCC
47.1%0.609+0.044

Confusion matrix

Pred -1Pred 0Pred +1
Actual -12538234
Actual 0401045
Actual +124514258

Per-class precision / recall / F1

ClassPrecisionRecallF1Support
-10.4700.5110.490495
00.3120.1050.15795
+10.4800.4990.490517

Key metrics:

  • Accuracy β€” fraction of correct predictions across all three classes. On its own this can be misleading: a model that always predicts the majority class can look decent while being useless. Check MCC and F1 alongside it.

  • ROC AUC (macro OVR) β€” one-vs-rest AUC averaged across classes. Accessed as result["metrics"]["roc_auc_macro"]. For each class the model is asked "can you rank this class above the others?" β€” 0.5 means random, 1.0 means perfect. Macro averaging weights all three classes equally regardless of how many samples they have.

  • MCC β€” Matthews Correlation Coefficient extended to multiclass. Ranges from βˆ’1 to +1; 0 means the model is no better than random guessing, +1 is perfect. It is the most reliable single-number summary on imbalanced data because it accounts for all cells of the confusion matrix at once. Accessed as result["metrics"]["mcc"].

  • Confusion matrix β€” rows are the true labels, columns are what the model predicted. The diagonal cells (top-left to bottom-right) are correct predictions; everything off the diagonal is a mistake. Large off-diagonal numbers tell you which pairs of classes the model confuses most β€” for triple-barrier data, +1 and -1 being confused with each other is far more costly than either being confused with 0.

  • Precision (per class) β€” of all the times the model predicted this class, how often was it actually correct? High precision means few false alarms. In a trading context: if precision for +1 is 0.48, the model's "go long" signals are right only 48 % of the time.

  • Recall (per class) β€” of all the samples that truly belong to this class, how many did the model catch? High recall means few missed signals. Low recall on +1 means many real winning opportunities were ignored.

  • F1 (per class) β€” the harmonic mean of precision and recall: 2 Γ— (precision Γ— recall) / (precision + recall). It balances the two β€” a model that achieves high precision by being very selective (low recall) will be penalised, and so will a model that catches everything (high recall) but fires too many false alarms (low precision). The 0 (neutral) class typically has the lowest F1 because range-bound bars are genuinely ambiguous; this is expected and acceptable.

  • Support β€” the number of test samples in each class. Useful for judging how reliable the precision/recall/F1 numbers are: a class with only 95 samples (like 0 above) will show noisier metrics than one with 500+.

Feature impact ​

Same as binary: the model is retrained without each feature in turn and test accuracy is compared to the baseline. The Β±1.5% dead zone applies here too β€” changes smaller than 1.5% are reported as neutral since they are within the noise margin of retraining on a small financial dataset.

The 0 class β€” keep or drop? ​

A common question is whether to include the 0 (time-expiry) samples in training at all. There are two schools of thought:

Keep them β€” The 0 class captures periods of genuine indecision. A model that correctly predicts 0 avoids entering trades that would just expire neutral, saving on fees.

Drop them β€” If you only care about direction (+1 vs -1), filtering out 0 samples turns the problem back into binary and often produces higher accuracy on the classes that matter.

To train only on +1 vs -1, filter the data before passing it to train_model. Because the > 0 mapping rule applies, +1 becomes class 1 and -1 becomes class 0 automatically when using task="binary":

python
from sklearn.ensemble import RandomForestClassifier
from jesse.research import load_ml_data_csv, train_model

data     = load_ml_data_csv("MyStrategy")
filtered = [p for p in data if p["label"]["value"] != 0]

print(f"Kept {len(filtered):,} directional samples "
      f"(dropped {len(data) - len(filtered):,} neutral)")

result = train_model(
    data=filtered,
    estimator=RandomForestClassifier(
        n_estimators=300, class_weight="balanced", random_state=42
    ),
    task="binary",   # +1 β†’ class 1,  -1 β†’ class 0  (via the > 0 rule)
    test_ratio=0.2,
    save_to=f"strategies/MyStrategy",
    name="MyStrategy",
)

TIP

This filtered binary model is often the most useful variant of a triple-barrier dataset: it tells you "when price moved significantly, did it go up or down?" β€” without the ambiguity of the neutral 0 class.

Using the model in deploy mode ​

In deploy mode, call self.ml_predict_proba() directly. It automatically loads the model, calls self.ml_features(), scales the features, and returns a {class_label: probability} dict.

python
def should_long(self) -> bool:
    if self.ml_mode == "gather":
        return False
    probs     = self.ml_predict_proba()
    prob_up   = probs.get(1, 0.0)
    prob_down = probs.get(-1, 0.0)
    # Enter long only when the model strongly favours the +1 class
    return prob_up >= 0.55 and prob_up > prob_down * 1.2

def should_short(self) -> bool:
    if self.ml_mode == "gather":
        return False
    probs     = self.ml_predict_proba()
    prob_down = probs.get(-1, 0.0)
    prob_up   = probs.get(1, 0.0)
    return prob_down >= 0.55 and prob_down > prob_up * 1.2

Because ml_features() is defined once and used by both before() (gather mode) and ml_predict_proba() (deploy mode), train/deploy feature skew is impossible. See the Deploying in a Strategy page for the full combined template.

We do NOT guarantee profitable trading results in anyways. USE THE SOFTWARE AT YOUR OWN RISK. THE AUTHORS AND ALL AFFILIATES ASSUME NO RESPONSIBILITY FOR YOUR TRADING RESULTS. Do not risk money which you are afraid to lose. There might be bugs in the code - this software DOES NOT come with ANY warranty. All investments carry risk! Past performance is no guarantee of future results! Be aware of overfitting!