Binary Classification β
Binary classification is the most common ML task in algorithmic trading: given a set of market features, predict whether an outcome is positive (class 1) or negative (class 0). Typical examples:
- Will this trade be profitable?
- Will price reach the take-profit before the stop-loss?
- Is this an above-average volatility bar?
Label convention β
For task="binary", train_model maps any label value to 0 or 1 using this rule:
| Label value | Class |
|---|---|
True (bool) | 1 β positive |
False (bool) | 0 β negative |
Any number > 0 | 1 β positive |
0, negative numbers, False | 0 β negative |
This means all of the following produce the same binary split:
self.record_label("win", True) # bool
self.record_label("win", 1) # int
self.record_label("win", 0.034) # float > 0 β positive
self.record_label("win", -0.012) # float β€ 0 β negativeTIP
For triple-barrier labels (+1, 0, -1), use task="multiclass" instead. If you collapse them to binary here, both 0 (time expiry) and -1 (stop-loss) become the negative class β which is valid but loses information. See the Multiclass page.
Strategy example β label on trade close β
The simplest gather pattern records features through a shared helper called for example _record_features() that is called from both should_long() and should_short(), then records the binary outcome when the trade closes.
import jesse.indicators as ta
from jesse import utils
from jesse.strategies import Strategy
class MyStrategy(Strategy):
def _record_features(self, side: str) -> None:
if self.ml_mode != "gather":
return
atr = ta.atr(self.candles)
price = self.price
self.record_features({
"side": 1 if side == "long" else -1,
"rsi_centered": (ta.rsi(self.candles) - 50) / 50,
"atr_pct": atr / price,
"ema200_dist": (price - ta.ema(self.candles, 200)) / price,
"keltner_pos": (price - ta.keltner(self.candles).lowerband)
/ (ta.keltner(self.candles).upperband - ta.keltner(self.candles).lowerband + 1e-9),
})
def should_long(self) -> bool:
signal = (
ta.rsi(self.candles) < 35
and self.price > ta.ema(self.candles, 200)
)
if signal:
self._record_features("long")
return signal
def should_short(self) -> bool:
signal = (
ta.rsi(self.candles) > 65
and self.price < ta.ema(self.candles, 200)
)
if signal:
self._record_features("short")
return signal
def should_cancel_entry(self) -> bool:
return True
def go_long(self):
entry = self.price
stop = entry - ta.atr(self.candles) * 2.5
qty = utils.risk_to_qty(
self.available_margin, 2, entry, stop, fee_rate=self.fee_rate
)
self.buy = qty, entry
def go_short(self):
entry = self.price
stop = entry + ta.atr(self.candles) * 2.5
qty = utils.risk_to_qty(
self.available_margin, 2, entry, stop, fee_rate=self.fee_rate
)
self.sell = qty, entry
def on_close_position(self, order, closed_trade) -> None:
if self.ml_mode != "gather":
return
self.record_label("profitable", closed_trade.pnl > 0) # boolTraining β
# train_binary.py
from sklearn.calibration import CalibratedClassifierCV
from sklearn.svm import SVC
from jesse.research import load_ml_data_csv, train_model
STRATEGY = "MyStrategy"
data = load_ml_data_csv(STRATEGY)
# Compute class weights from the data
n_pos = sum(1 for p in data if p["label"]["value"] is True)
n_neg = len(data) - n_pos
cw = {0: 1.0, 1: n_neg / n_pos} if n_pos > 0 else {0: 1.0, 1: 1.0}
result = train_model(
data=data,
estimator=CalibratedClassifierCV(
SVC(kernel="rbf", C=1.0, gamma="scale", class_weight=cw),
method="sigmoid",
cv=5,
),
task="binary",
test_ratio=0.2,
save_to=f"strategies/{STRATEGY}",
name=STRATEGY,
)
print(f"Accuracy : {result['metrics']['accuracy']:.1%}")
print(f"ROC AUC : {result['metrics']['roc_auc']:.3f}")
print(f"MCC : {result['metrics']['mcc']:+.3f}")Choosing an estimator β
Because train_model accepts any sklearn-compatible classifier, the choice of algorithm is entirely up to you.
Quick comparison β
| Classifier | Best dataset size | Handles noisy data | Handles class imbalance | Needs calibration | Training speed | Notes |
|---|---|---|---|---|---|---|
| Gradient Boosting | MediumβLarge (5 k β 500 k) | β Good | β οΈ Use sample_weight | β
Via CalibratedClassifierCV | Medium | Best default for financial tabular data; does not support class_weight directly |
| Random Forest | MediumβLarge (5 k β 500 k) | β Good | β
class_weight="balanced" | β Built-in probabilities are reasonable | Fast | More robust to overfitting than Gradient Boosting; easy to tune |
| Calibrated SVM | Small (< 10 k) | β οΈ Can collapse to majority class | β οΈ Use class_weight | β
Wrap with CalibratedClassifierCV | Slow on large data | Raw SVM probabilities are unreliable β always calibrate; avoid on weak/noisy features |
| XGBoost | Large (> 50 k) | β Good | β
scale_pos_weight | β οΈ May need calibration | Fast (GPU support) | Highest accuracy ceiling but most prone to overfitting; requires careful tuning |
Gradient Boosting β the recommended starting point for financial tabular data. Ensemble methods generalise better than SVMs on the noisy, low-signal features typical of price-derived indicators. Unlike SVMs, they rarely collapse to predicting the majority class on imbalanced data.
from sklearn.calibration import CalibratedClassifierCV
from sklearn.ensemble import GradientBoostingClassifier
CalibratedClassifierCV(
GradientBoostingClassifier(
n_estimators=300, max_depth=3, learning_rate=0.05, subsample=0.8,
),
method="isotonic",
cv=3,
)Random Forest β more robust to overfitting, faster to train, and handles class imbalance automatically with class_weight="balanced".
from sklearn.ensemble import RandomForestClassifier
RandomForestClassifier(n_estimators=300, max_depth=8, class_weight="balanced")Calibrated SVM β works well on small, clean datasets (< 10 k samples). Raw SVM probabilities (probability=True) are often poorly calibrated; wrapping with CalibratedClassifierCV applies Platt scaling for more reliable confidence scores. On noisy financial data with weak features, SVMs can collapse to predicting the majority class β if this happens, switch to an ensemble method.
from sklearn.calibration import CalibratedClassifierCV
from sklearn.svm import SVC
CalibratedClassifierCV(
SVC(kernel="rbf", C=1.0, gamma="scale", class_weight=cw),
method="sigmoid",
cv=5,
)XGBoost β highest accuracy on large datasets (> 50 k samples) but more prone to overfitting. Use scale_pos_weight to handle class imbalance.
from xgboost import XGBClassifier
XGBClassifier(n_estimators=300, max_depth=4, scale_pos_weight=n_neg / n_pos)Understanding the training report β
Dataset section β
Shows sample counts, the chronological train/test split, class balance, and suggested class weights. The split is always chronological (oldest β train, newest β test) to avoid look-ahead leakage.
Feature importance table β
Four methods are combined into a consensus rank:
| Column | Method | Interpretation |
|---|---|---|
| RFE | Recursive Feature Elimination (linear SVM proxy) | Which features a linear model finds most useful. Lower rank = better. |
| F-val | ANOVA F-statistic | How well each feature separates the two classes in isolation. Higher = better. |
| |Corr| | Absolute Pearson correlation with the label | Strength of linear relationship. Higher = better. |
| CV-Impact | Accuracy drop when feature is removed (RBF-SVM proxy) | Whether the feature actively improves cross-validation accuracy. Positive = helpful. |
| Score | Average of the four per-method ranks | Overall consensus β lower is more consistently important. |
Model performance β
- Accuracy β fraction of correct predictions on the test set. On imbalanced data, accuracy alone is misleading β check MCC and F1.
- ROC AUC β area under the ROC curve. 0.5 = random, 1.0 = perfect. Values above 0.55 on financial data are meaningful.
- MCC β Matthews Correlation Coefficient. Ranges from β1 to +1. Robust to class imbalance. Values above +0.10 indicate a useful model.
- Confusion matrix β shows TP, TN, FP, FN counts. Helps you understand whether the model errs towards false positives or false negatives.
Probability calibration β
Groups predictions by their predicted confidence bucket (e.g. [0.6β0.7)) and shows whether the actual positive rate in that bucket matches the predicted confidence. A well-calibrated model means "when the model says 65% confident, it is right ~65% of the time."
Use this to choose your ML_THRESHOLD in deploy mode. A bucket where the actual positive rate meaningfully exceeds the dataset base rate is a good operating point.
Feature impact β
Retrains the model with each feature removed in turn and measures the change in test accuracy. Interpreting the verdict:
| Verdict | Meaning |
|---|---|
β important β keep | Removing this feature drops accuracy by more than 1.5% β it is genuinely useful |
β noisy β consider dropping | Removing this feature improves accuracy by more than 1.5% β it is adding noise |
neutral | Accuracy changes by less than Β±1.5% β the feature is redundant or its signal is captured by other features |
TIP
The Β±1.5% dead zone is intentional. Retraining on a small financial dataset has inherent randomness; differences smaller than 1.5% are within the noise margin and should not drive feature selection decisions. Focus on features clearly marked as important or noisy.
Precision vs confidence threshold sweep β
Shows how precision and coverage change as you raise the minimum confidence required to allow a signal through. Use this alongside the calibration section to find the optimal ML_THRESHOLD.
Threshold Allowed Precision Coverage
0.50 1,107 54.2% 100.0%
0.55 843 57.1% 76.2%
0.60 521 61.8% 47.1%
0.65 289 66.4% 26.1%
0.70 141 70.9% 12.7%A useful operating point is where precision meaningfully exceeds the base rate while enough signals remain to produce a tradeable number of entries.
Class weights β
Class imbalance is common in financial data. If 70 % of your samples are negative (no-win), an untrained model can achieve 70 % accuracy by always predicting negative β without learning anything. Class weights penalise the model more heavily for mistakes on the minority class.
The training report prints suggested class weights in the Dataset section. You can read them off and hardcode them directly:
# Suggested class weights: 0: 1.00 / 1: 2.43
RandomForestClassifier(class_weight={0: 1.00, 1: 2.43})Or compute them from the data programmatically:
n_pos = sum(1 for p in data if p["label"]["value"] is True)
n_neg = len(data) - n_pos
cw = {0: 1.0, 1: n_neg / n_pos} if n_pos > 0 else {0: 1.0, 1: 1.0}WARNING
Class weights must be configured on your estimator before passing it to train_model. The function always clones your estimator and never modifies the object you pass in. Note that GradientBoostingClassifier does not support class_weight directly β handle imbalance via sample_weight in a custom pipeline instead. CalibratedClassifierCV is for probability calibration only and does not address class imbalance.
