Datasets
It provides the tools to extract sports betting data.
DummySoccerDataLoader(param_grid=None)
Bases: _BaseDataLoader
Dataloader for soccer dummy data.
The data are provided only for convenience, since they require no downloading, and to familiarize the user with the methods of the dataloader objects.
Read more in the user guide.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
param_grid
|
ParamGrid | None
|
It selects the type of information that the data include. The keys of
dictionaries might be parameters like |
None
|
Attributes:
Name | Type | Description |
---|---|---|
param_grid_ |
ParameterGrid
|
The checked value of parameters grid. It includes all possible parameters if
|
dropped_na_cols_ |
Index
|
The columns with missing values that are dropped. |
drop_na_thres_(float) |
Index
|
The checked value of |
odds_type_ |
str | None
|
The checked value of |
input_cols_ |
Index
|
The columns of |
output_cols_ |
Index
|
The columns of |
odds_cols_ |
Index
|
The columns of |
target_cols_ |
Index
|
The columns used for the extraction of output and odds columns. |
train_data_ |
TrainData
|
The tuple (X, Y, O) that represents the training data as extracted from
the method |
fixtures_data_ |
FixturesData
|
The tuple (X, Y, O) that represents the fixtures data as extracted from
the method |
Examples:
>>> from sportsbet.datasets import DummySoccerDataLoader
>>> import pandas as pd
>>> # Get all available parameters to select the training data
>>> DummySoccerDataLoader.get_all_params()
[{'division': 1, 'year': 1998}, ...
>>> # Select only the traning data for the Spanish league
>>> dataloader = DummySoccerDataLoader(param_grid={'league': ['Spain']})
>>> # Get available odds types
>>> dataloader.get_odds_types()
['interwetten', 'williamhill']
>>> # Select the odds of Interwetten bookmaker for training data
>>> X_train, Y_train, O_train = dataloader.extract_train_data(
... odds_type='interwetten')
>>> # Extract the corresponding fixtures data
>>> X_fix, Y_fix, O_fix = dataloader.extract_fixtures_data()
>>> # Training and fixtures input and odds data have the same column names
>>> pd.testing.assert_index_equal(X_train.columns, X_fix.columns)
>>> pd.testing.assert_index_equal(O_train.columns, O_fix.columns)
>>> # Fixtures data have always no output
>>> Y_fix is None
True
Source code in src/sportsbet/datasets/_dummy.py
369 370 |
|
extract_fixtures_data()
Extract the fixtures data.
Read more in the user guide.
It returns fixtures data that can be used to make predictions for upcoming matches based on a betting strategy.
Before calling the extract_fixtures_data
method for
the first time, the extract_training_data
should be called, in
order to match the columns of the input, output and odds data.
The data contain information about the matches known before the
start of the match, i.e. the training data X
and the odds
data O
. The multi-output targets Y
is always equal to None
and are only included for consistency with the method extract_train_data
.
The param_grid
parameter of the initialization method has no effect
on the fixtures data.
Returns:
Type | Description |
---|---|
(X, None, O)
|
Each of the components represent the fixtures input data |
Source code in src/sportsbet/datasets/_dummy.py
433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 |
|
extract_train_data(drop_na_thres=0.0, odds_type=None)
Extract the training data.
Read more in the user guide.
It returns historical data that can be used to create a betting strategy based on heuristics or machine learning models.
The data contain information about the matches that belong
in two categories. The first category includes any information
known before the start of the match, i.e. the training data X
and the odds data O
. The second category includes the outcomes of
matches i.e. the multi-output targets Y
.
The method selects only the the data allowed by the param_grid
parameter of the initialization method. Additionally, columns with missing
values are dropped through the drop_na_thres
parameter, while the
types of odds returned is defined by the odds_type
parameter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
drop_na_thres
|
float
|
The threshold that specifies the input columns to drop. It is a float in
the |
0.0
|
odds_type
|
str | None
|
The selected odds type. It should be one of the available odds columns
prefixes returned by the method |
None
|
Returns:
Type | Description |
---|---|
(X, Y, O)
|
Each of the components represent the training input data |
Source code in src/sportsbet/datasets/_dummy.py
390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 |
|
SoccerDataLoader(param_grid=None)
Bases: _BaseDataLoader
Dataloader for soccer data.
It downloads historical and fixtures data for various leagues, years and divisions.
Read more in the user guide.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
param_grid
|
ParamGrid | None
|
It selects the type of information that the data include. The keys of
dictionaries might be parameters like |
None
|
Attributes:
Name | Type | Description |
---|---|---|
param_grid_ |
ParameterGrid
|
The checked value of parameters grid. It includes all possible parameters if
|
dropped_na_cols_ |
Index
|
The columns with missing values that are dropped. |
drop_na_thres_(float) |
Index
|
The checked value of |
odds_type_ |
str | None
|
The checked value of |
input_cols_ |
Index
|
The columns of |
output_cols_ |
Index
|
The columns of |
odds_cols_ |
Index
|
The columns of |
target_cols_ |
Index
|
The columns used for the extraction of output and odds columns. |
train_data_ |
TrainData
|
The tuple (X, Y, O) that represents the training data as extracted from
the method |
fixtures_data_ |
FixturesData
|
The tuple (X, Y, O) that represents the fixtures data as extracted from
the method |
Examples:
>>> from sportsbet.datasets import SoccerDataLoader
>>> import pandas as pd
>>> # Get all available parameters to select the training data
>>> SoccerDataLoader.get_all_params()
[{'division': 1, 'league': 'Argentina', ...
>>> # Select only the traning data for the French and Spanish leagues of 2020 year
>>> dataloader = SoccerDataLoader(
... param_grid={'league': ['England', 'Spain'], 'year':[2020]})
>>> # Get available odds types
>>> dataloader.get_odds_types()
['market_average', 'market_maximum']
>>> # Select the market average odds and drop colums with missing values
>>> X_train, Y_train, O_train = dataloader.extract_train_data(
... odds_type='market_average')
>>> # Odds data include the selected market average odds
>>> O_train.columns
Index(['odds__market_average__home_win__full_time_goals',...
>>> # Extract the corresponding fixtures data
>>> X_fix, Y_fix, O_fix = dataloader.extract_fixtures_data()
>>> # Training and fixtures input and odds data have the same column names
>>> pd.testing.assert_index_equal(X_train.columns, X_fix.columns)
>>> pd.testing.assert_index_equal(O_train.columns, O_fix.columns)
>>> # Fixtures data have always no output
>>> Y_fix is None
True
Source code in src/sportsbet/datasets/_soccer/_data.py
164 165 |
|
extract_fixtures_data()
Extract the fixtures data.
Read more in the user guide.
It returns fixtures data that can be used to make predictions for upcoming matches based on a betting strategy.
Before calling the extract_fixtures_data
method for
the first time, the extract_training_data
should be called, in
order to match the columns of the input, output and odds data.
The data contain information about the matches known before the
start of the match, i.e. the training data X
and the odds
data O
. The multi-output targets Y
is always equal to None
and are only included for consistency with the method extract_train_data
.
The param_grid
parameter of the initialization method has no effect
on the fixtures data.
Returns:
Type | Description |
---|---|
(X, None, O)
|
Each of the components represent the fixtures input data |
Source code in src/sportsbet/datasets/_soccer/_data.py
246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 |
|
extract_train_data(drop_na_thres=0.0, odds_type=None)
Extract the training data.
Read more in the user guide.
It returns historical data that can be used to create a betting strategy based on heuristics or machine learning models.
The data contain information about the matches that belong
in two categories. The first category includes any information
known before the start of the match, i.e. the training data X
and the odds data O
. The second category includes the outcomes of
matches i.e. the multi-output targets Y
.
The method selects only the the data allowed by the param_grid
parameter of the initialization method. Additionally, columns with missing
values are dropped through the drop_na_thres
parameter, while the
types of odds returned is defined by the odds_type
parameter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
drop_na_thres
|
float
|
The threshold that specifies the input columns to drop. It is a float in
the |
0.0
|
odds_type
|
str | None
|
The selected odds type. It should be one of the available odds columns
prefixes returned by the method |
None
|
Returns:
Type | Description |
---|---|
(X, Y, O)
|
Each of the components represent the training input data |
Source code in src/sportsbet/datasets/_soccer/_data.py
203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 |
|
load_dataloader(path)
Load the dataloader object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
The path of the dataloader pickled file. |
required |
Returns:
Name | Type | Description |
---|---|---|
dataloader |
_BaseDataLoader
|
The dataloader object. |
Source code in src/sportsbet/datasets/_base.py
440 441 442 443 444 445 446 447 448 449 450 451 452 453 |
|