Note
Click here to download the full example code
Classifier bettor
This example illustrates how to use ClassfierBettor
and evaluate its performance on soccer historical data.
# Author: Georgios Douzas <gdouzas@icloud.com>
# Licence: MIT
from sklearn.impute import SimpleImputer
from sklearn.model_selection import TimeSeriesSplit, cross_val_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import make_pipeline
from sportsbet.datasets import SoccerDataLoader
from sportsbet.evaluation import ClassifierBettor, backtest
Extracting the training data
We extract the training data for the Spanish soccer league. We also remove columns that contain missing values and select the market maximum odds.
dataloader = SoccerDataLoader(param_grid={'league': ['Spain'], 'year': [2020, 2021, 2022]})
X_train, Y_train, O_train = dataloader.extract_train_data(drop_na_thres=0.5, odds_type='market_maximum')
The input data:
X_train
Out:
league ... away__adj_goals_against__latest_avg
date ...
2019-08-16 Spain ... NaN
2019-08-17 Spain ... NaN
2019-08-17 Spain ... NaN
2019-08-17 Spain ... NaN
2019-08-17 Spain ... NaN
... ... ... ...
2022-05-29 Spain ... 0.866667
2022-05-29 Spain ... 2.043333
2022-05-29 Spain ... 1.906667
2022-05-29 Spain ... 0.990000
2022-05-29 Spain ... 1.960000
[2526 rows x 39 columns]
The multi-output targets:
Y_train
Out:
output__home_win__full_time_goals ... output__under_2.5__full_time_goals
0 True ... True
1 False ... True
2 False ... True
3 False ... False
4 True ... True
... ... ... ...
2521 False ... False
2522 False ... True
2523 True ... True
2524 False ... False
2525 True ... False
[2526 rows x 5 columns]
The odds data:
O_train
Out:
odds__market_maximum__home_win__full_time_goals ... odds__market_maximum__under_2.5__full_time_goals
0 5.50 ... 2.11
1 2.55 ... 1.67
2 3.00 ... 1.52
3 1.56 ... 1.87
4 2.00 ... 1.70
... ... ... ...
2521 1.61 ... 2.13
2522 4.70 ... 1.73
2523 10.00 ... 2.09
2524 6.00 ... 2.03
2525 1.38 ... 2.55
[2526 rows x 5 columns]
In order to simplify the selected classifier, we keep only numerical features of the input data:
num_cols = X_train.columns[['float' in col_type.name for col_type in X_train.dtypes]]
X_train = X_train[num_cols]
Classifier bettor
We can use ClassifierBettor
class to create
a classifier-based bettor. We use a pipeline of an imputer to handle missing values
and a KNN classifier.
clf = make_pipeline(SimpleImputer(), KNeighborsClassifier())
bettor = ClassifierBettor(clf)
Any bettor is a classifier, therefore we can fit it on the training data.
_ = bettor.fit(X_train, Y_train)
We can predict probabilities for the positive class.
bettor.predict_proba(X_train)
Out:
array([[0.2, 0. , 0.8, 0.8, 0.2],
[0.2, 0.4, 0.4, 0.2, 0.8],
[0.2, 0.4, 0.4, 0. , 1. ],
...,
[0.2, 0. , 0.8, 0.6, 0.4],
[0.4, 0.2, 0.4, 0.2, 0.8],
[0.8, 0. , 0.2, 0.8, 0.2]])
We can also predict the class label.
bettor.predict(X_train)
Out:
array([[False, False, True, True, False],
[False, False, False, False, True],
[False, False, False, False, True],
...,
[False, False, True, True, False],
[False, False, False, False, True],
[ True, False, False, True, False]])
Finally, we can evaluate its cross-validation accuracy.
cross_val_score(bettor, X_train, Y_train, cv=TimeSeriesSplit(), scoring='accuracy').mean()
Out:
0.16342042755344416
Backtesting the bettor
We can backtest the bettor using the historical data.
backtesting_results = backtest(bettor, X_train, Y_train, O_train)
Out:
/home/runner/work/sports-betting/sports-betting/.nox/docs/lib/python3.12/site-packages/joblib/externals/loky/backend/fork_exec.py:38: DeprecationWarning:
This process (pid=2763) is multi-threaded, use of fork() may lead to deadlocks in the child.
Various backtesting statistics are calculated.
backtesting_results
Out:
Number of betting days ... Yield percentage per bet (under_2.5__full_time_goals)
Training start Training end Testing start Testing end ...
2019-08-16 2020-01-04 2020-01-04 2020-08-07 71 ... 3.8
2020-08-07 2020-09-12 2021-01-23 87 ... -0.7
2021-01-23 2021-01-23 2021-05-30 94 ... -4.6
2021-05-30 2021-08-13 2022-01-02 93 ... -4.3
2022-01-02 2022-01-02 2022-05-29 92 ... 0.6
[5 rows x 15 columns]
Estimating the value bets
We extract the fixtures data to estimate the value bets.
X_fix, _, Odds_fix = dataloader.extract_fixtures_data()
X_fix = X_fix[num_cols]
assert Odds_fix is not None
We can estimate the value bets by using the fitted classifier.
_ = bettor.bet(X_fix, Odds_fix)
Total running time of the script: ( 0 minutes 2.259 seconds)
Download Python source code: plot_classifier_bettor.py