Skip to content

Note

Click here to download the full example code

Classifier bettor

This example illustrates how to use ClassfierBettor and evaluate its performance on soccer historical data.

# Author: Georgios Douzas <gdouzas@icloud.com>
# Licence: MIT

from sklearn.impute import SimpleImputer
from sklearn.model_selection import TimeSeriesSplit, cross_val_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import make_pipeline
from sportsbet.datasets import SoccerDataLoader
from sportsbet.evaluation import ClassifierBettor, backtest

Extracting the training data

We extract the training data for the Spanish soccer league. We also remove columns that contain missing values and select the market maximum odds.

dataloader = SoccerDataLoader(param_grid={'league': ['Spain'], 'year': [2020, 2021, 2022]})
X_train, Y_train, O_train = dataloader.extract_train_data(drop_na_thres=0.5, odds_type='market_maximum')

The input data:

X_train

Out:

           league  ...  away__adj_goals_against__latest_avg
date               ...                                     
2019-08-16  Spain  ...                                  NaN
2019-08-17  Spain  ...                                  NaN
2019-08-17  Spain  ...                                  NaN
2019-08-17  Spain  ...                                  NaN
2019-08-17  Spain  ...                                  NaN
...           ...  ...                                  ...
2022-05-29  Spain  ...                             0.866667
2022-05-29  Spain  ...                             2.043333
2022-05-29  Spain  ...                             1.906667
2022-05-29  Spain  ...                             0.990000
2022-05-29  Spain  ...                             1.960000

[2526 rows x 39 columns]

The multi-output targets:

Y_train

Out:

      output__home_win__full_time_goals  ...  output__under_2.5__full_time_goals
0                                  True  ...                                True
1                                 False  ...                                True
2                                 False  ...                                True
3                                 False  ...                               False
4                                  True  ...                                True
...                                 ...  ...                                 ...
2521                              False  ...                               False
2522                              False  ...                                True
2523                               True  ...                                True
2524                              False  ...                               False
2525                               True  ...                               False

[2526 rows x 5 columns]

The odds data:

O_train

Out:

      odds__market_maximum__home_win__full_time_goals  ...  odds__market_maximum__under_2.5__full_time_goals
0                                                5.50  ...                                              2.11
1                                                2.55  ...                                              1.67
2                                                3.00  ...                                              1.52
3                                                1.56  ...                                              1.87
4                                                2.00  ...                                              1.70
...                                               ...  ...                                               ...
2521                                             1.61  ...                                              2.13
2522                                             4.70  ...                                              1.73
2523                                            10.00  ...                                              2.09
2524                                             6.00  ...                                              2.03
2525                                             1.38  ...                                              2.55

[2526 rows x 5 columns]

In order to simplify the selected classifier, we keep only numerical features of the input data:

num_cols = X_train.columns[['float' in col_type.name for col_type in X_train.dtypes]]
X_train = X_train[num_cols]

Classifier bettor

We can use ClassifierBettor class to create a classifier-based bettor. We use a pipeline of an imputer to handle missing values and a KNN classifier.

clf = make_pipeline(SimpleImputer(), KNeighborsClassifier())
bettor = ClassifierBettor(clf)

Any bettor is a classifier, therefore we can fit it on the training data.

_ = bettor.fit(X_train, Y_train)

We can predict probabilities for the positive class.

bettor.predict_proba(X_train)

Out:

array([[0.2, 0. , 0.8, 0.8, 0.2],
       [0.2, 0.4, 0.4, 0.2, 0.8],
       [0.2, 0.4, 0.4, 0. , 1. ],
       ...,
       [0.2, 0. , 0.8, 0.6, 0.4],
       [0.4, 0.2, 0.4, 0.2, 0.8],
       [0.8, 0. , 0.2, 0.8, 0.2]])

We can also predict the class label.

bettor.predict(X_train)

Out:

array([[False, False,  True,  True, False],
       [False, False, False, False,  True],
       [False, False, False, False,  True],
       ...,
       [False, False,  True,  True, False],
       [False, False, False, False,  True],
       [ True, False, False,  True, False]])

Finally, we can evaluate its cross-validation accuracy.

cross_val_score(bettor, X_train, Y_train, cv=TimeSeriesSplit(), scoring='accuracy').mean()

Out:

0.16342042755344416

Backtesting the bettor

We can backtest the bettor using the historical data.

backtesting_results = backtest(bettor, X_train, Y_train, O_train)

Out:

/home/runner/work/sports-betting/sports-betting/.nox/docs/lib/python3.12/site-packages/joblib/externals/loky/backend/fork_exec.py:38: DeprecationWarning:

This process (pid=2054) is multi-threaded, use of fork() may lead to deadlocks in the child.

Various backtesting statistics are calculated.

backtesting_results

Out:

                                                       Number of betting days  ...  Yield percentage per bet (under_2.5__full_time_goals)
Training start Training end Testing start Testing end                          ...                                                       
2019-08-16     2020-01-04   2020-01-04    2020-08-07                       71  ...                                                3.8    
               2020-08-07   2020-09-12    2021-01-23                       87  ...                                               -0.7    
               2021-01-23   2021-01-23    2021-05-30                       94  ...                                               -4.6    
               2021-05-30   2021-08-13    2022-01-02                       93  ...                                               -4.3    
               2022-01-02   2022-01-02    2022-05-29                       92  ...                                                0.6    

[5 rows x 15 columns]

Estimating the value bets

We extract the fixtures data to estimate the value bets.

X_fix, _, Odds_fix = dataloader.extract_fixtures_data()
X_fix = X_fix[num_cols]
assert Odds_fix is not None

We can estimate the value bets by using the fitted classifier.

_ = bettor.bet(X_fix, Odds_fix)

Total running time of the script: ( 0 minutes 2.004 seconds)

Download Python source code: plot_classifier_bettor.py

Download Jupyter notebook: plot_classifier_bettor.ipynb

Gallery generated by mkdocs-gallery