Note
Click here to download the full example code
Classifier bettor
This example illustrates how to use ClassfierBettor
and evaluate its performance on soccer historical data.
# Author: Georgios Douzas <gdouzas@icloud.com>
# Licence: MIT
from sklearn.impute import SimpleImputer
from sklearn.model_selection import TimeSeriesSplit, cross_val_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import make_pipeline
from sportsbet.datasets import SoccerDataLoader
from sportsbet.evaluation import ClassifierBettor, backtest
Extracting the training data
We extract the training data for the Spanish soccer league. We also remove columns that contain missing values and select the market maximum odds.
dataloader = SoccerDataLoader(param_grid={'league': ['Spain'], 'year': [2020, 2021, 2022]})
X_train, Y_train, O_train = dataloader.extract_train_data(drop_na_thres=0.5, odds_type='market_maximum')
The input data:
X_train
Out:
league ... away__adj_goals_against__latest_avg
date ...
2019-08-16 Spain ... NaN
2019-08-17 Spain ... NaN
2019-08-17 Spain ... NaN
2019-08-17 Spain ... NaN
2019-08-17 Spain ... NaN
... ... ... ...
2022-05-29 Spain ... 2.043333
2022-05-29 Spain ... 0.990000
2022-05-29 Spain ... 2.980000
2022-05-29 Spain ... 1.210000
2022-05-29 Spain ... 0.866667
[2526 rows x 39 columns]
The multi-output targets:
Y_train
Out:
output__home_win__full_time_goals ... output__under_2.5__full_time_goals
0 True ... True
1 False ... True
2 False ... True
3 False ... False
4 True ... True
... ... ... ...
2521 False ... True
2522 False ... False
2523 True ... False
2524 False ... True
2525 False ... False
[2526 rows x 5 columns]
The odds data:
O_train
Out:
odds__market_maximum__home_win__full_time_goals ... odds__market_maximum__under_2.5__full_time_goals
0 5.50 ... 2.11
1 2.55 ... 1.67
2 3.00 ... 1.52
3 1.56 ... 1.87
4 2.00 ... 1.70
... ... ... ...
2521 4.70 ... 1.73
2522 6.00 ... 2.03
2523 1.60 ... 2.10
2524 6.97 ... 2.00
2525 1.61 ... 2.13
[2526 rows x 5 columns]
In order to simplify the selected classifier, we keep only numerical features of the input data:
num_cols = X_train.columns[['float' in col_type.name for col_type in X_train.dtypes]]
X_train = X_train[num_cols]
Classifier bettor
We can use ClassifierBettor
class to create
a classifier-based bettor. We use a pipeline of an imputer to handle missing values
and a KNN classifier.
clf = make_pipeline(SimpleImputer(), KNeighborsClassifier())
bettor = ClassifierBettor(clf)
Any bettor is a classifier, therefore we can fit it on the training data.
_ = bettor.fit(X_train, Y_train)
We can predict probabilities for the positive class.
bettor.predict_proba(X_train)
Out:
array([[0.2, 0. , 0.8, 0.8, 0.2],
[0.2, 0.4, 0.4, 0.2, 0.8],
[0.2, 0.4, 0.4, 0. , 1. ],
...,
[0.6, 0.2, 0.2, 0.6, 0.4],
[0.2, 0.2, 0.6, 0.4, 0.6],
[0.8, 0. , 0.2, 0.8, 0.2]])
We can also predict the class label.
bettor.predict(X_train)
Out:
array([[False, False, True, True, False],
[False, False, False, False, True],
[False, False, False, False, True],
...,
[ True, False, False, True, False],
[False, False, True, False, True],
[ True, False, False, True, False]])
Finally, we can evaluate its cross-validation accuracy.
cross_val_score(bettor, X_train, Y_train, cv=TimeSeriesSplit(), scoring='accuracy').mean()
Out:
0.16484560570071258
Backtesting the bettor
We can backtest the bettor using the historical data.
backtesting_results = backtest(bettor, X_train, Y_train, O_train)
Various backtesting statistics are calculated.
backtesting_results
Out:
Number of betting days ... Yield percentage per bet (under_2.5__full_time_goals)
Training start Training end Testing start Testing end ...
2019-08-16 2020-01-04 2020-01-04 2020-08-07 71 ... 1.4
2020-08-07 2020-09-12 2021-01-23 87 ... 0.6
2021-01-23 2021-01-23 2021-05-30 94 ... 3.4
2021-05-30 2021-08-13 2022-01-02 93 ... -2.5
2022-01-02 2022-01-02 2022-05-29 92 ... -2.9
[5 rows x 15 columns]
Estimating the value bets
We extract the fixtures data to estimate the value bets.
X_fix, _, Odds_fix = dataloader.extract_fixtures_data()
X_fix = X_fix[num_cols]
assert Odds_fix is not None
We can estimate the value bets by using the fitted classifier.
_ = bettor.bet(X_fix, Odds_fix)
Total running time of the script: ( 0 minutes 2.154 seconds)
Download Python source code: plot_classifier_bettor.py