Skip to content

Note

Click here to download the full example code

Classifier bettor

This example illustrates how to use ClassfierBettor and evaluate its performance on soccer historical data.

# Author: Georgios Douzas <gdouzas@icloud.com>
# Licence: MIT

from sklearn.impute import SimpleImputer
from sklearn.model_selection import TimeSeriesSplit, cross_val_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import make_pipeline

from sportsbet.datasets import SoccerDataLoader
from sportsbet.evaluation import ClassifierBettor, backtest

Extracting the training data

We extract the training data for the Spanish soccer league. We also remove columns that contain missing values and select the market maximum odds.

dataloader = SoccerDataLoader(param_grid={'league': ['Spain'], 'year': [2020, 2021, 2022]})
X_train, Y_train, O_train = dataloader.extract_train_data(drop_na_thres=0.5, odds_type='market_maximum')

The input data:

X_train

Out:

           league  ...  away__adj_goals_against__latest_avg
date               ...                                     
2019-08-16  Spain  ...                                  NaN
2019-08-17  Spain  ...                                  NaN
2019-08-17  Spain  ...                                  NaN
2019-08-17  Spain  ...                                  NaN
2019-08-17  Spain  ...                                  NaN
...           ...  ...                                  ...
2022-05-29  Spain  ...                             2.043333
2022-05-29  Spain  ...                             0.990000
2022-05-29  Spain  ...                             2.980000
2022-05-29  Spain  ...                             1.210000
2022-05-29  Spain  ...                             0.866667

[2526 rows x 39 columns]

The multi-output targets:

Y_train

Out:

      output__home_win__full_time_goals  ...  output__under_2.5__full_time_goals
0                                  True  ...                                True
1                                 False  ...                                True
2                                 False  ...                                True
3                                 False  ...                               False
4                                  True  ...                                True
...                                 ...  ...                                 ...
2521                              False  ...                                True
2522                              False  ...                               False
2523                               True  ...                               False
2524                              False  ...                                True
2525                              False  ...                               False

[2526 rows x 5 columns]

The odds data:

O_train

Out:

      odds__market_maximum__home_win__full_time_goals  ...  odds__market_maximum__under_2.5__full_time_goals
0                                                5.50  ...                                              2.11
1                                                2.55  ...                                              1.67
2                                                3.00  ...                                              1.52
3                                                1.56  ...                                              1.87
4                                                2.00  ...                                              1.70
...                                               ...  ...                                               ...
2521                                             4.70  ...                                              1.73
2522                                             6.00  ...                                              2.03
2523                                             1.60  ...                                              2.10
2524                                             6.97  ...                                              2.00
2525                                             1.61  ...                                              2.13

[2526 rows x 5 columns]

In order to simplify the selected classifier, we keep only numerical features of the input data:

num_cols = X_train.columns[['float' in col_type.name for col_type in X_train.dtypes]]
X_train = X_train[num_cols]

Classifier bettor

We can use ClassifierBettor class to create a classifier-based bettor. We use a pipeline of an imputer to handle missing values and a KNN classifier.

clf = make_pipeline(SimpleImputer(), KNeighborsClassifier())
bettor = ClassifierBettor(clf)

Any bettor is a classifier, therefore we can fit it on the training data.

_ = bettor.fit(X_train, Y_train)

We can predict probabilities for the positive class.

bettor.predict_proba(X_train)

Out:

array([[0.2, 0. , 0.8, 0.8, 0.2],
       [0.2, 0.4, 0.4, 0.2, 0.8],
       [0.2, 0.4, 0.4, 0. , 1. ],
       ...,
       [0.6, 0.2, 0.2, 0.6, 0.4],
       [0.2, 0.2, 0.6, 0.4, 0.6],
       [0.8, 0. , 0.2, 0.8, 0.2]])

We can also predict the class label.

bettor.predict(X_train)

Out:

array([[False, False,  True,  True, False],
       [False, False, False, False,  True],
       [False, False, False, False,  True],
       ...,
       [ True, False, False,  True, False],
       [False, False,  True, False,  True],
       [ True, False, False,  True, False]])

Finally, we can evaluate its cross-validation accuracy.

cross_val_score(bettor, X_train, Y_train, cv=TimeSeriesSplit(), scoring='accuracy').mean()

Out:

0.16484560570071258

Backtesting the bettor

We can backtest the bettor using the historical data.

backtesting_results = backtest(bettor, X_train, Y_train, O_train)

Various backtesting statistics are calculated.

backtesting_results

Out:

                                                       Number of betting days  ...  Yield percentage per bet (under_2.5__full_time_goals)
Training start Training end Testing start Testing end                          ...                                                       
2019-08-16     2020-01-04   2020-01-04    2020-08-07                       71  ...                                                1.4    
               2020-08-07   2020-09-12    2021-01-23                       87  ...                                                0.6    
               2021-01-23   2021-01-23    2021-05-30                       94  ...                                                3.4    
               2021-05-30   2021-08-13    2022-01-02                       93  ...                                               -2.5    
               2022-01-02   2022-01-02    2022-05-29                       92  ...                                               -2.9    

[5 rows x 15 columns]

Estimating the value bets

We extract the fixtures data to estimate the value bets.

X_fix, _, Odds_fix = dataloader.extract_fixtures_data()
X_fix = X_fix[num_cols]
assert Odds_fix is not None

We can estimate the value bets by using the fitted classifier.

_ = bettor.bet(X_fix, Odds_fix)

Total running time of the script: ( 0 minutes 2.154 seconds)

Download Python source code: plot_classifier_bettor.py

Download Jupyter notebook: plot_classifier_bettor.ipynb

Gallery generated by mkdocs-gallery