Skip to content

Machine Learning ​

Experimental API

The machine-learning API described in this section is experimental and may change at any point without prior notice. Do not rely on it for production systems until it is marked stable.

To collaborate, share feedback, or follow updates, join the community on Discord.

The jesse.research module provides a complete machine-learning pipeline for Jesse strategies β€” from collecting labelled training data during a backtest, to training and evaluating a model, to deploying it live inside your strategy to filter or score signals.

The system is built around four public functions:

FunctionPurpose
gather_ml_dataRun a backtest in "gather mode" and collect labelled feature samples
train_modelTrain any scikit-learn–compatible estimator on that data; produces a full report
load_ml_data_csvReload previously saved data points from a CSV without re-running a backtest

Typical workflow ​

1.  Write your strategy (self.ml_mode defaults to "gather" automatically)
    β†’ call record_features({...}) at each signal bar
    β†’ call record_label(name, value) when the outcome is known
    β†’ use before() / after() for continuous observation loops (vertical barrier)
    β†’ use on_open_position / on_close_position for trade-based labelling

2.  Run gather_ml_data() over a long historical window
    β†’ auto-saves <Name>_data.csv inside strategies/<Name>/ml_data/
    β†’ aim for 1,000+ samples; 2,000–5,000 is a healthy range

3.  Run train_model() with your chosen estimator and task type
    β†’ compare multiple estimators with GridSearchCV before committing
    β†’ inspect feature importance, metrics, calibration, threshold sweep
    β†’ saves model.pkl + scaler.pkl + feature_importance.pkl inside strategies/<Name>/

4.  Switch your strategy to deploy mode (set self.ml_mode = "deploy")
    β†’ call ml_predict() for regression or ml_predict_proba() for classification
    β†’ model loading, scaling, and feature ordering are all handled automatically
    β†’ gate or weight entry signals using the returned scalar or probability dict

5.  Backtest the filtered strategy and compare against the baseline
    β†’ iterate on features, task type, estimator, and threshold
    β†’ re-gather and re-train whenever you change the primary signal or features

Import ​

python
from jesse.research import (
    gather_ml_data,
    load_ml_data_csv,
    train_model,
)

Task types ​

train_model accepts a task parameter that controls how the label is interpreted, which estimator type is expected, and what metrics and report sections are produced.

taskLabel typeEstimatorKey metrics
"binary"bool, or any numeric (> 0 = positive)Classifier with predict_probaAccuracy, ROC AUC, MCC, calibration, precision/threshold sweep
"multiclass"Integer class label (-1, 0, +1, …)Multi-class classifier with predict_probaAccuracy, macro ROC AUC, MCC, per-class precision/recall/F1, NΓ—N confusion matrix
"regression"Continuous floatRegressorMAE, RMSE, R², Spearman ρ

Estimator requirements by task

  • "binary" and "multiclass" β€” the estimator must implement predict_proba. For SVC, either set probability=True or wrap it in CalibratedClassifierCV.
  • "regression" β€” pass any sklearn regressor (one that sklearn.base.is_regressor returns True for).

TIP

If you are just starting out, use "binary". It is the simplest to reason about, trains the fastest, and produces the most actionable output (probability calibration + confidence threshold sweep) for live trading. Move to "multiclass" when you need to distinguish direction from a neutral/no-trade outcome, and to "regression" only when you need predicted return magnitudes for position sizing.

Label types ​

record_label(name, value) accepts three value types. All of them survive the CSV round-trip correctly β€” bool, int, and float are restored to their natural Python types when reloaded via load_ml_data_csv. They are interpreted by train_model according to the rules below.

Python typeExample callHow "binary" task maps itHow "multiclass" task maps itHow "regression" task maps it
boolrecord_label("win", True)True β†’ class 1, False β†’ class 0Not applicable β€” use intNot applicable β€” use float
intrecord_label("triple_barrier", 1)> 0 β†’ class 1, ≀ 0 β†’ class 0Passed as-is via int(value) (-1, 0, 1)Cast to float
floatrecord_label("return_pct", 0.034)> 0 β†’ class 1, ≀ 0 β†’ class 0Not applicable β€” use "regression"Passed as-is via float(value)

WARNING

Multiple labels per data point are not supported. Each call to record_label finalises the current data point and clears it. record_features called again after record_label starts a fresh data point β€” it does not append to the previous one. To predict two independent outcomes (e.g. direction and volatility regime), run separate gather + train passes, each with its own label name and CSV file.

Pages in this section ​

  • Gathering Data β€” how to write the gather-mode strategy and run gather_ml_data; vertical-barrier vs trade-based labelling patterns
  • Stationarity β€” why features must be stationary, common non-stationary pitfalls, and how to transform raw financial data into stationary inputs
  • Binary Classification β€” task="binary", boolean labels, calibration, confidence threshold sweep, estimator recommendations
  • Multiclass Classification β€” task="multiclass", triple-barrier labels, per-class metrics, the 0-class decision
  • Regression β€” task="regression", forward log-return targets, MAE/RΒ²/Spearman, interpreting weak results
  • Meta-Labeling β€” secondary model that learns bet size on top of a primary directional signal; F1-score workflow, confidence-based position sizing, vertical-barrier gather pattern
  • Deploying in a Strategy β€” ml_features() as single source of truth, ml_predict() / ml_predict_proba() for zero-boilerplate inference, signal-first model calling, live trading considerations

train_model return value ​

train_model returns a dict with the following keys. Some keys are only present for specific task types.

KeyPresent forDescription
modelallFitted estimator
scalerallFitted StandardScaler
feature_namesallSorted list of feature names used in training
metricsallTask-specific metrics dict (see below)
feature_importanceallRFE ranks, F-values, correlations, CV impacts, consensus ranks
feature_impactallPer-feature accuracy/MAE delta when retrained without each feature
train_test_infoallTrain/test sizes and date ranges
calibration"binary" onlyProbability calibration buckets list
class_weights"binary" onlySuggested {0: float, 1: float} class weight dict

Metrics dict keys by task

TaskMetric keys
"binary"accuracy, roc_auc, mcc, confusion_matrix, precision, recall, f1, support, tn, fp, fn, tp
"multiclass"accuracy, roc_auc_macro, mcc, confusion_matrix, classes, precision, recall, f1, support
"regression"mae, rmse, r2, spearman

We do NOT guarantee profitable trading results in anyways. USE THE SOFTWARE AT YOUR OWN RISK. THE AUTHORS AND ALL AFFILIATES ASSUME NO RESPONSIBILITY FOR YOUR TRADING RESULTS. Do not risk money which you are afraid to lose. There might be bugs in the code - this software DOES NOT come with ANY warranty. All investments carry risk! Past performance is no guarantee of future results! Be aware of overfitting!