Datasets
It provides the tools to extract sports betting data.
BaseDataLoader(param_grid=None)
The base class for dataloaders.
Warning: This class should not be used directly. Use the derive classes instead.
Source code in src/sportsbet/datasets/_base.py
64 65 |
|
extract_fixtures_data()
Extract the fixtures data.
Read more in the user guide.
It returns fixtures data that can be used to make predictions for upcoming matches based on a betting strategy.
Before calling the extract_fixtures_data
method for
the first time, the extract_training_data
should be called, in
order to match the columns of the input, output and odds data.
The data contain information about the matches known before the
start of the match, i.e. the training data X
and the odds
data O
. The multi-output targets Y
is always equal to None
and are only included for consistency with the method extract_train_data
.
The param_grid
parameter of the initialization method has no effect
on the fixtures data.
Returns:
Type | Description |
---|---|
(X, None, O)
|
Each of the components represent the fixtures input data |
Source code in src/sportsbet/datasets/_base.py
330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 |
|
extract_train_data(drop_na_thres=0.0, odds_type=None)
Extract the training data.
Read more in the user guide.
It returns historical data that can be used to create a betting strategy based on heuristics or machine learning models.
The data contain information about the matches that belong
in two categories. The first category includes any information
known before the start of the match, i.e. the training data X
and the odds data O
. The second category includes the outcomes of
matches i.e. the multi-output targets Y
.
The method selects only the the data allowed by the param_grid
parameter of the initialization method. Additionally, columns with missing
values are dropped through the drop_na_thres
parameter, while the
types of odds returned is defined by the odds_type
parameter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
drop_na_thres
|
float
|
The threshold that specifies the input columns to drop. It is a float in
the |
0.0
|
odds_type
|
str | None
|
The selected odds type. It should be one of the available odds columns
prefixes returned by the method |
None
|
Returns:
Type | Description |
---|---|
(X, Y, O)
|
Each of the components represent the training input data |
Source code in src/sportsbet/datasets/_base.py
212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 |
|
get_all_params()
classmethod
Get the available parameters.
It can be used to get the allowed names and values for the
param_grid
parameter of the dataloader object.
Returns:
Name | Type | Description |
---|---|---|
param_grid |
list[Param]
|
list A list of all allowed params and values. |
Source code in src/sportsbet/datasets/_base.py
394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 |
|
get_odds_types()
Get the available odds types.
It can be used to get the allowed odds types of the dataloader's method
extract_train_data
.
Returns:
Name | Type | Description |
---|---|---|
odds_types |
list[str]
|
A list of available odds types. |
Source code in src/sportsbet/datasets/_base.py
415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 |
|
save(path)
Save the dataloader object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
The path to save the object. |
required |
Returns:
Name | Type | Description |
---|---|---|
self |
Self
|
The dataloader object. |
Source code in src/sportsbet/datasets/_base.py
379 380 381 382 383 384 385 386 387 388 389 390 391 392 |
|
DummySoccerDataLoader(param_grid=None)
Bases: BaseDataLoader
Dataloader for soccer dummy data.
The data are provided only for convenience, since they require no downloading, and to familiarize the user with the methods of the dataloader objects.
Read more in the user guide.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
param_grid
|
ParamGrid | None
|
It selects the type of information that the data include. The keys of
dictionaries might be parameters like |
None
|
Attributes:
Name | Type | Description |
---|---|---|
param_grid_ |
ParameterGrid
|
The checked value of parameters grid. It includes all possible parameters if
|
dropped_na_cols_ |
Index
|
The columns with missing values that are dropped. |
drop_na_thres_(float) |
Index
|
The checked value of |
odds_type_ |
str | None
|
The checked value of |
input_cols_ |
Index
|
The columns of |
output_cols_ |
Index
|
The columns of |
odds_cols_ |
Index
|
The columns of |
target_cols_ |
Index
|
The columns used for the extraction of output and odds columns. |
train_data_ |
TrainData
|
The tuple (X, Y, O) that represents the training data as extracted from
the method |
fixtures_data_ |
FixturesData
|
The tuple (X, Y, O) that represents the fixtures data as extracted from
the method |
Examples:
>>> from sportsbet.datasets import DummySoccerDataLoader
>>> import pandas as pd
>>> # Get all available parameters to select the training data
>>> DummySoccerDataLoader.get_all_params()
[{'division': 1, 'year': 1998}, ...
>>> # Select only the traning data for the Spanish league
>>> dataloader = DummySoccerDataLoader(param_grid={'league': ['Spain']})
>>> # Get available odds types
>>> dataloader.get_odds_types()
['interwetten', 'williamhill']
>>> # Select the odds of Interwetten bookmaker for training data
>>> X_train, Y_train, O_train = dataloader.extract_train_data(
... odds_type='interwetten')
>>> # Extract the corresponding fixtures data
>>> X_fix, Y_fix, O_fix = dataloader.extract_fixtures_data()
>>> # Training and fixtures input and odds data have the same column names
>>> pd.testing.assert_index_equal(X_train.columns, X_fix.columns)
>>> pd.testing.assert_index_equal(O_train.columns, O_fix.columns)
>>> # Fixtures data have always no output
>>> Y_fix is None
True
Source code in src/sportsbet/datasets/_dummy.py
369 370 |
|
extract_fixtures_data()
Extract the fixtures data.
Read more in the user guide.
It returns fixtures data that can be used to make predictions for upcoming matches based on a betting strategy.
Before calling the extract_fixtures_data
method for
the first time, the extract_training_data
should be called, in
order to match the columns of the input, output and odds data.
The data contain information about the matches known before the
start of the match, i.e. the training data X
and the odds
data O
. The multi-output targets Y
is always equal to None
and are only included for consistency with the method extract_train_data
.
The param_grid
parameter of the initialization method has no effect
on the fixtures data.
Returns:
Type | Description |
---|---|
(X, None, O)
|
Each of the components represent the fixtures input data |
Source code in src/sportsbet/datasets/_dummy.py
433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 |
|
extract_train_data(drop_na_thres=0.0, odds_type=None)
Extract the training data.
Read more in the user guide.
It returns historical data that can be used to create a betting strategy based on heuristics or machine learning models.
The data contain information about the matches that belong
in two categories. The first category includes any information
known before the start of the match, i.e. the training data X
and the odds data O
. The second category includes the outcomes of
matches i.e. the multi-output targets Y
.
The method selects only the the data allowed by the param_grid
parameter of the initialization method. Additionally, columns with missing
values are dropped through the drop_na_thres
parameter, while the
types of odds returned is defined by the odds_type
parameter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
drop_na_thres
|
float
|
The threshold that specifies the input columns to drop. It is a float in
the |
0.0
|
odds_type
|
str | None
|
The selected odds type. It should be one of the available odds columns
prefixes returned by the method |
None
|
Returns:
Type | Description |
---|---|
(X, Y, O)
|
Each of the components represent the training input data |
Source code in src/sportsbet/datasets/_dummy.py
390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 |
|
SoccerDataLoader(param_grid=None)
Bases: BaseDataLoader
Dataloader for soccer data.
It downloads historical and fixtures data for various leagues, years and divisions.
Read more in the user guide.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
param_grid
|
ParamGrid | None
|
It selects the type of information that the data include. The keys of
dictionaries might be parameters like |
None
|
Attributes:
Name | Type | Description |
---|---|---|
param_grid_ |
ParameterGrid
|
The checked value of parameters grid. It includes all possible parameters if
|
dropped_na_cols_ |
Index
|
The columns with missing values that are dropped. |
drop_na_thres_(float) |
Index
|
The checked value of |
odds_type_ |
str | None
|
The checked value of |
input_cols_ |
Index
|
The columns of |
output_cols_ |
Index
|
The columns of |
odds_cols_ |
Index
|
The columns of |
target_cols_ |
Index
|
The columns used for the extraction of output and odds columns. |
train_data_ |
TrainData
|
The tuple (X, Y, O) that represents the training data as extracted from
the method |
fixtures_data_ |
FixturesData
|
The tuple (X, Y, O) that represents the fixtures data as extracted from
the method |
Examples:
>>> from sportsbet.datasets import SoccerDataLoader
>>> import pandas as pd
>>> # Get all available parameters to select the training data
>>> SoccerDataLoader.get_all_params()
[{'division': 1, 'league': 'Argentina', ...
>>> # Select only the traning data for the French and Spanish leagues of 2020 year
>>> dataloader = SoccerDataLoader(
... param_grid={'league': ['England', 'Spain'], 'year':[2020]})
>>> # Get available odds types
>>> dataloader.get_odds_types()
['market_average', 'market_maximum']
>>> # Select the market average odds and drop colums with missing values
>>> X_train, Y_train, O_train = dataloader.extract_train_data(
... odds_type='market_average')
>>> # Odds data include the selected market average odds
>>> O_train.columns
Index(['odds__market_average__home_win__full_time_goals',...
>>> # Extract the corresponding fixtures data
>>> X_fix, Y_fix, O_fix = dataloader.extract_fixtures_data()
>>> # Training and fixtures input and odds data have the same column names
>>> pd.testing.assert_index_equal(X_train.columns, X_fix.columns)
>>> pd.testing.assert_index_equal(O_train.columns, O_fix.columns)
>>> # Fixtures data have always no output
>>> Y_fix is None
True
Source code in src/sportsbet/datasets/_soccer/_data.py
164 165 |
|
extract_fixtures_data()
Extract the fixtures data.
Read more in the user guide.
It returns fixtures data that can be used to make predictions for upcoming matches based on a betting strategy.
Before calling the extract_fixtures_data
method for
the first time, the extract_training_data
should be called, in
order to match the columns of the input, output and odds data.
The data contain information about the matches known before the
start of the match, i.e. the training data X
and the odds
data O
. The multi-output targets Y
is always equal to None
and are only included for consistency with the method extract_train_data
.
The param_grid
parameter of the initialization method has no effect
on the fixtures data.
Returns:
Type | Description |
---|---|
(X, None, O)
|
Each of the components represent the fixtures input data |
Source code in src/sportsbet/datasets/_soccer/_data.py
246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 |
|
extract_train_data(drop_na_thres=0.0, odds_type=None)
Extract the training data.
Read more in the user guide.
It returns historical data that can be used to create a betting strategy based on heuristics or machine learning models.
The data contain information about the matches that belong
in two categories. The first category includes any information
known before the start of the match, i.e. the training data X
and the odds data O
. The second category includes the outcomes of
matches i.e. the multi-output targets Y
.
The method selects only the the data allowed by the param_grid
parameter of the initialization method. Additionally, columns with missing
values are dropped through the drop_na_thres
parameter, while the
types of odds returned is defined by the odds_type
parameter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
drop_na_thres
|
float
|
The threshold that specifies the input columns to drop. It is a float in
the |
0.0
|
odds_type
|
str | None
|
The selected odds type. It should be one of the available odds columns
prefixes returned by the method |
None
|
Returns:
Type | Description |
---|---|
(X, Y, O)
|
Each of the components represent the training input data |
Source code in src/sportsbet/datasets/_soccer/_data.py
203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 |
|
load_dataloader(path)
Load the dataloader object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
The path of the dataloader pickled file. |
required |
Returns:
Name | Type | Description |
---|---|---|
dataloader |
BaseDataLoader
|
The dataloader object. |
Source code in src/sportsbet/datasets/_base.py
440 441 442 443 444 445 446 447 448 449 450 451 452 453 |
|