Evaluation
It provides the tools to evaluate the performance of predictive models.
BettorGridSearchCV(estimator, param_grid, *, scoring=None, n_jobs=None, refit=True, cv=TSCV, verbose=0, pre_dispatch='2*n_jobs', error_score=np.nan, return_train_score=False)
Bases: GridSearchCV
, _BaseBettor
Exhaustive search over specified parameter values for a bettor.
BettorGridSearchCV implements a fit
, apredict
, a predict_proba',
a
betand a
score` method.
The parameters of the bettor used to apply these methods are optimized by cross-validated grid-search over a parameter grid.
Read more in the user guide.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
estimator
|
_BaseBettor
|
This is assumed to implement the bettor interface. |
required |
param_grid
|
dict | list
|
Dictionary with parameters names ( |
required |
scoring
|
str | Callable | list | tuple | dict[str, Callable] | None
|
Strategy to evaluate the performance of the cross-validated model on the test set. If
If
|
None
|
n_jobs
|
int | None
|
Number of jobs to run in parallel. |
None
|
refit
|
bool | str | Callable
|
Refit an estimator using the best found parameters on the whole dataset. For multiple metric evaluation, this needs to be a Where there are considerations other than maximum score in
choosing a best estimator, The refitted estimator is made available at the Also for multiple metric evaluation, the attributes See |
True
|
cv
|
TimeSeriesSplit
|
Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. |
TSCV
|
verbose
|
int
|
Controls the verbosity: the higher, the more messages. |
0
|
pre_dispatch
|
int | str
|
Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:
|
'2*n_jobs'
|
error_score
|
str | float | int
|
Value to assign to the score if an error occurs in estimator fitting.
If set to |
nan
|
return_train_score
|
bool
|
If |
False
|
Attributes:
Name | Type | Description |
---|---|---|
cv_results_ |
A dict with keys as column headers and values as columns, that can be
imported into a pandas The key The For multi-metric evaluation, the scores for all the scorers are
available in the |
|
best_estimator_ |
Estimator that was chosen by the search, i.e. estimator
which gave highest score (or smallest loss if specified)
on the left out data. Not available if |
|
best_score_ |
Mean cross-validated score of the best_estimator For multi-metric evaluation, this is present only if This attribute is not available if |
|
best_params_ |
Parameter setting that gave the best results on the hold out data. For multi-metric evaluation, this is present only if |
|
best_index_ |
The index (of the For multi-metric evaluation, this is present only if |
|
scorer_ |
Scorer function used on the held out data to choose the best parameters for the model. For multi-metric evaluation, this attribute holds the validated
|
|
n_splits_ |
The number of cross-validation splits (folds/iterations). |
|
refit_time_ |
Seconds used for refitting the best model on the whole dataset. This is present only if |
|
multimetric_ |
Whether or not the scorers compute several metrics. |
|
classes_ |
list
|
The classes labels. This is present only if |
n_features_in_ |
list
|
Number of features seen during |
feature_names_in_ |
list
|
Names of features seen during |
Examples:
>>> from sportsbet.evaluation import BettorGridSearchCV, OddsComparisonBettor, backtest
>>> from sportsbet.datasets import SoccerDataLoader
>>> from sklearn.model_selection import TimeSeriesSplit
>>> # Select only backtesting data for the Italian and Spanish leagues and years 2019 - 2022
>>> param_grid = {'league': ['Italy', 'Spain'], 'year': [2019, 2020, 2021, 2022]}
>>> dataloader = SoccerDataLoader(param_grid)
>>> # Select the market maximum odds
>>> X, Y, O = dataloader.extract_train_data(
... odds_type='market_maximum',
... )
>>> # Backtest the bettor
>>> bettor = BettorGridSearchCV(
... estimator=OddsComparisonBettor(),
... param_grid={'alpha': [0.02, 0.05, 0.1, 0.2, 0.3]},
... cv=TimeSeriesSplit(2),
... )
>>> backtest(bettor, X, Y, O, cv=TimeSeriesSplit(2)).reset_index()
Training start ... Yield percentage per bet (under_2.5__full_time_goals)
...
Source code in src/sportsbet/evaluation/_model_selection.py
367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 |
|
bet(X, O)
Predict the value bets for the provided input data and odds.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
DataFrame
|
The input data. |
required |
O
|
DataFrame
|
The odds data. |
required |
Returns:
Name | Type | Description |
---|---|---|
B |
BoolData
|
The value bets. |
Source code in src/sportsbet/evaluation/_model_selection.py
509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 |
|
fit(X, Y, O=None)
Fit the bettor to the input data and multi-output targets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
DataFrame
|
The input data. |
required |
Y
|
DataFrame
|
The multi-output targets. |
required |
O
|
DataFrame | None
|
The odds data. |
None
|
Returns:
Name | Type | Description |
---|---|---|
self |
Self
|
The fitted bettor object. |
Source code in src/sportsbet/evaluation/_model_selection.py
456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 |
|
predict(X)
Predict class labels for multi-output targets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
DataFrame
|
The input data. |
required |
Returns:
Name | Type | Description |
---|---|---|
Y |
BoolData
|
The positive class labels. |
Source code in src/sportsbet/evaluation/_model_selection.py
495 496 497 498 499 500 501 502 503 504 505 506 507 |
|
predict_proba(X)
Predict class probabilities for multi-output targets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
DataFrame
|
The input data. |
required |
Returns:
Name | Type | Description |
---|---|---|
Y |
Data
|
The positive class probabilities. |
Source code in src/sportsbet/evaluation/_model_selection.py
481 482 483 484 485 486 487 488 489 490 491 492 493 |
|
ClassifierBettor(classifier, betting_markets=None, init_cash=None, stake=None)
Bases: MetaEstimatorMixin
, _BaseBettor
Bettor based on a Scikit-Learn classifier.
Read more in the user guide.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
classifier
|
BaseEstimator
|
A scikit-learn classifier object implementing |
required |
betting_markets
|
list[str] | None
|
Select the betting markets from the ones included in the data. |
None
|
init_cash
|
float | None
|
The initial cash to use when betting. |
None
|
stake
|
float | None
|
The stake of each bet. |
None
|
Attributes:
Name | Type | Description |
---|---|---|
tscv_ |
TimeSeriesSplit
|
The checked value of time series cross-validator object. If |
init_cash_ |
TimeSeriesSplit
|
The checked value of initial cash. If |
backtesting_results_ |
DataFrame
|
The backtesting results. |
Examples:
>>> from sklearn.tree import DecisionTreeClassifier
>>> from sklearn.preprocessing import OneHotEncoder
>>> from sklearn.impute import SimpleImputer
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.compose import make_column_transformer
>>> from sportsbet.evaluation import ClassifierBettor, backtest
>>> from sportsbet.datasets import SoccerDataLoader
>>> # Select only backtesting data for the Italian league and years 2020, 2021
>>> param_grid = {'league': ['Italy'], 'year': [2020, 2021]}
>>> dataloader = SoccerDataLoader(param_grid)
>>> # Select the odds of Pinnacle bookmaker
>>> X, Y, O = dataloader.extract_train_data(
... odds_type='market_average',
... drop_na_thres=1.0
... )
>>> # Create a pipeline to handle categorical features and missing values
>>> clf_pipeline = make_pipeline(
... make_column_transformer(
... (OneHotEncoder(handle_unknown='ignore'), ['league', 'home_team', 'away_team']),
... remainder='passthrough'
... ),
... SimpleImputer(),
... DecisionTreeClassifier(random_state=0)
... )
>>> # Backtest the bettor
>>> bettor = ClassifierBettor(clf_pipeline)
>>> backtest(bettor, X, Y, O).reset_index()
Training start ... Yield percentage per bet (under_2.5__full_time_goals)
...
Source code in src/sportsbet/evaluation/_classifier.py
84 85 86 87 88 89 90 91 92 |
|
bet(X, O)
Predict the value bets for the provided input data and odds.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
DataFrame
|
The input data. |
required |
O
|
DataFrame
|
The odds data. |
required |
Returns:
Name | Type | Description |
---|---|---|
B |
BoolData
|
The value bets. |
Source code in src/sportsbet/evaluation/_classifier.py
167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
|
fit(X, Y, O=None)
Fit the bettor to the input data and multi-output targets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
DataFrame
|
The input data. |
required |
Y
|
DataFrame
|
The multi-output targets. |
required |
O
|
DataFrame | None
|
The odds data. |
None
|
Returns:
Name | Type | Description |
---|---|---|
self |
Self
|
The fitted bettor object. |
Source code in src/sportsbet/evaluation/_classifier.py
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
predict(X)
Predict class labels for multi-output targets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
DataFrame
|
The input data. |
required |
Returns:
Name | Type | Description |
---|---|---|
Y |
BoolData
|
The positive class labels. |
Source code in src/sportsbet/evaluation/_classifier.py
154 155 156 157 158 159 160 161 162 163 164 165 |
|
predict_proba(X)
Predict class probabilities for multi-output targets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
DataFrame
|
The input data. |
required |
Returns:
Name | Type | Description |
---|---|---|
Y |
Data
|
The positive class probabilities. |
Source code in src/sportsbet/evaluation/_classifier.py
141 142 143 144 145 146 147 148 149 150 151 152 |
|
OddsComparisonBettor(odds_types=None, alpha=0.05, betting_markets=None, init_cash=None, stake=None)
Bases: _BaseBettor
Bettor based on comparison of odds.
It implements the betting strategy as described in the paper Beating the bookies with their own numbers. Predicted probabilities of events are based on the average of selected odds types for the corresponding events, adjusted by a constant value called alpha. You can read more in the user guide.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
odds_types
|
list[str] | None
|
The odds types to use for the calculation of concensus probabilities. The
default value corresponds to |
None
|
alpha
|
float
|
An adjustment term that corresponds to the difference between the consensus and real probabilities. |
0.05
|
betting_markets
|
list[str] | None
|
Select the betting markets from the ones included in the data. |
None
|
init_cash
|
float | None
|
The initial cash to use when betting. |
None
|
stake
|
float | None
|
The stake of each bet. |
None
|
Attributes:
Name | Type | Description |
---|---|---|
odds_types_ |
Index
|
The checked value of the odds types. |
alpha_ |
float
|
The checked value of the alpha parameter. |
output_keys_ |
list[str]
|
The keys of the output columns. They are used to identify the consensus columns. |
backtesting_results_ |
DataFrame
|
The backtesting resuts. |
Examples:
>>> from sportsbet.evaluation import OddsComparisonBettor, backtest
>>> from sportsbet.datasets import SoccerDataLoader
>>> # Select only backtesting data for the Italian and Spanish leagues and years 2019 - 2022
>>> param_grid = {'league': ['Italy', 'Spain'], 'year': [2019, 2020, 2021, 2022]}
>>> dataloader = SoccerDataLoader(param_grid)
>>> # Select the market maximum odds
>>> X, Y, O = dataloader.extract_train_data(
... odds_type='market_maximum',
... )
>>> # Backtest the bettor
>>> bettor = OddsComparisonBettor(alpha=0.03)
>>> backtest(bettor, X, Y, O).reset_index()
Training start ... Yield percentage per bet (under_2.5__full_time_goals)
...
Source code in src/sportsbet/evaluation/_rules.py
75 76 77 78 79 80 81 82 83 84 85 |
|
bet(X, O)
Predict the value bets for the provided input data and odds.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
DataFrame
|
The input data. |
required |
O
|
DataFrame
|
The odds data. |
required |
Returns:
Name | Type | Description |
---|---|---|
B |
BoolData
|
The value bets. |
Source code in src/sportsbet/evaluation/_rules.py
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
fit(X, Y, O=None)
Fit the bettor to the input data and multi-output targets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
DataFrame
|
The input data. |
required |
Y
|
DataFrame
|
The multi-output targets. |
required |
O
|
DataFrame | None
|
The odds data. |
None
|
Returns:
Name | Type | Description |
---|---|---|
self |
Self
|
The fitted bettor object. |
Source code in src/sportsbet/evaluation/_rules.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
|
predict(X)
Predict class labels for multi-output targets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
DataFrame
|
The input data. |
required |
Returns:
Name | Type | Description |
---|---|---|
Y |
BoolData
|
The positive class labels. |
Source code in src/sportsbet/evaluation/_rules.py
171 172 173 174 175 176 177 178 179 180 181 182 |
|
predict_proba(X)
Predict class probabilities for multi-output targets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
DataFrame
|
The input data. |
required |
Returns:
Name | Type | Description |
---|---|---|
Y |
Data
|
The positive class probabilities. |
Source code in src/sportsbet/evaluation/_rules.py
158 159 160 161 162 163 164 165 166 167 168 169 |
|
backtest(bettor, X, Y, O, cv=None, n_jobs=-1, verbose=0)
Backtest the bettor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
bettor
|
_BaseBettor
|
The bettor object. |
required |
X
|
DataFrame
|
The input data. Each row of |
required |
Y
|
DataFrame
|
The multi-output targets. Each row of |
required |
O
|
DataFrame
|
The odds data. The column names follow the convention for the odds
data |
required |
cv
|
TimeSeriesSplit | None
|
Provides train/test indices to split time series data samples
that are observed at fixed time intervals, in train/test sets. The
default value of the parameter is |
None
|
n_jobs
|
int
|
Number of CPU cores to use when parallelizing the backtesting runs.
The default value of |
-1
|
verbose
|
int
|
The verbosity level. |
0
|
Returns:
Name | Type | Description |
---|---|---|
results |
DataFrame
|
The backtesting results. |
Source code in src/sportsbet/evaluation/_model_selection.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
|
load_bettor(path)
Load the bettor object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
The path of the bettor pickled file. |
required |
Returns:
Name | Type | Description |
---|---|---|
bettor |
_BaseBettor
|
The bettor object. |
Source code in src/sportsbet/evaluation/_base.py
364 365 366 367 368 369 370 371 372 373 374 375 376 377 |
|
save_bettor(bettor, path)
Save the bettor object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
bettor
|
_BaseBettor
|
The bettor object. |
required |
path
|
str
|
The path to save the object. |
required |
Returns:
Name | Type | Description |
---|---|---|
self |
None
|
The bettor object. |
Source code in src/sportsbet/evaluation/_base.py
346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 |
|