Performance Metrics
Match result statistics
- pyTSPA.metrics.result_stats(df: DataFrame) dict[source]
Computes the number of home wins, draws, and away wins from the full-time result column (‘FTR’).
- Parameters:
df (pd.DataFrame) – DataFrame containing match data with a column ‘FTR’ indicating match outcomes. - ‘H’ for Home Win - ‘D’ for Draw - ‘A’ for Away Win
- Returns:
- a dictionary with the counts of each result type, structured as:
- {
‘Home Wins’: int, ‘Draws’: int, ‘Away Wins’: int
}
- Return type:
dict
- Raises:
ValueError – if the ‘FTR’ column is not found in the DataFrame
Team performance statistics
- pyTSPA.metrics.team_performance(df: DataFrame, team_name: str) dict[source]
Computes a team’s performance statistics across a season.
This function summarizes key performance metrics for a specified team, including the number of matches played, wins, draws, losses, goals scored, goals conceded, goal difference, and points.
- Parameters:
df (pd.DataFrame) – dataFrame containing match data with columns ‘HomeTeam’, ‘AwayTeam’, ‘FTHG’, ‘FTAG’, ‘FTR’.
team_name (str) – the name of the team for which the performance metrics will be calculated
- Returns:
- a dictionary containing the team’s performance metrics with the following structure:
- {
‘Team’: str, ‘Matches’: int, ‘Wins’: int, ‘Draws’: int, ‘Losses’: int, ‘Goals For’: int, ‘Goals Against’: int, ‘Goal Difference’: int, ‘Points’: int
}
- Return type:
dict
- Raises:
ValueError – if any of the required columns (‘HomeTeam’, ‘AwayTeam’, ‘FTHG’, ‘FTAG’, ‘FTR’) are missing
All teams
- pyTSPA.metrics.get_all_teams(df: DataFrame) ndarray[source]
Extracts a list of all unique team names from ‘HomeTeam’ and ‘AwayTeam’ columns.
This function aggregates unique team names from both the ‘HomeTeam’ and ‘AwayTeam’ columns to provide a comprehensive list of teams in the dataset.
- Parameters:
df (pd.DataFrame) – dataFrame containing match data with ‘HomeTeam’ and ‘AwayTeam’ columns
- Returns:
a sorted array of unique team names
- Return type:
np.ndarray
- Raises:
ValueError – if either ‘HomeTeam’ or ‘AwayTeam’ columns are missing
Statistics for all teams
- pyTSPA.metrics.each_team_performance(df: DataFrame) DataFrame[source]
Computes performance statistics for every team in the dataset.
This function calculates the performance metrics for each team using the team_performance() function and returns a DataFrame summarizing each team’s performance.
- Parameters:
df (pd.DataFrame) – dataFrame containing match data with ‘HomeTeam’, ‘AwayTeam’, ‘FTHG’, ‘FTAG’, ‘FTR’ columns
- Returns:
a DataFrame where each row represents a team’s performance summary, sorted by points in descending order Columns include:
’Team’: Team name
’Matches’: Matches played
’Wins’: Wins
’Draws’: Draws
’Losses’: Losses
’Goals For’: Goals scored
’Goals Against’: Goals conceded
’Goal Difference’: Goal difference
’Points’: Points accumulated
- Return type:
pd.DataFrame
Win percentage
- pyTSPA.metrics.win_percentage(df: DataFrame, team_name: str) float[source]
Calculates the win percentage for a specified team.
- Parameters:
df (pd.DataFrame) – DataFrame containing match data with ‘HomeTeam’, ‘AwayTeam’, and ‘FTR’ columns
team_name (str) – the name of the team to calculate win percentage for
- Returns:
the win percentage as a value between 0 and 1
- Return type:
float
- Raises:
ValueError – if the required columns are missing
Win percentage for all teams
- pyTSPA.metrics.each_win_percentage(df: DataFrame) DataFrame[source]
Calculates the win percentage for every team and returns it as a separate DataFrame.
- Parameters:
df (pd.DataFrame) – DataFrame containing match data with ‘HomeTeam’, ‘AwayTeam’, and ‘FTR’ columns
- Returns:
DataFrame with ‘Team’ and ‘WinPercentage’ columns
- Return type:
pd.DataFrame
Pythagorean expectation
- pyTSPA.metrics.pythagorean_expectation(df: DataFrame, team_name: str, exponent: float = 2.0) float[source]
Calculates the Pythagorean Expectation for a specified team using match data.
- Parameters:
df (pd.DataFrame) – DataFrame containing match data with ‘HomeTeam’, ‘AwayTeam’, ‘FTHG’, ‘FTAG’ columns
team_name (str) – the name of the team to calculate the Pythagorean Expectation for
exponent (float) – exponent value for the calculation, default is 2.0
- Returns:
the Pythagorean Expectation as a value between 0 and 1
- Return type:
float
- Raises:
ValueError – if the required columns are missing
Pythagorean expectation for all teams
- pyTSPA.metrics.each_pythagorean_expectation(df: DataFrame, exponent: float = 2.0) DataFrame[source]
Calculates the Pythagorean Expectation for every team and returns it as a separate DataFrame.
- Parameters:
df (pd.DataFrame) – DataFrame containing match data with ‘HomeTeam’, ‘AwayTeam’, ‘FTHG’, ‘FTAG’ columns
exponent (float) – exponent value for the calculation, default is 2.0
- Returns:
DataFrame with ‘Team’ and ‘PythagoreanExpectation’ columns
- Return type:
pd.DataFrame
Logistic regression prediction
- pyTSPA.metrics.logistic_regression_prediction(df: DataFrame) dict[source]
Predicts match outcomes (Win/Draw/Loss) using multinomial logistic regression with oversampling and additional features.
- Parameters:
df (pd.DataFrame) – DataFrame containing match data with necessary metrics calculated
- Returns:
a dictionary containing model accuracy, confusion matrix, predictions, and the trained model
- Return type:
dict
Match outcome prediction
- pyTSPA.metrics.predict_match_outcome(home_team: str, away_team: str, model: LogisticRegression, df: DataFrame) dict[source]
Predicts the outcome of a specific match between two teams using the trained logistic regression model.
- Parameters:
home_team (str) – name of the home team
away_team (str) – name of the away team
model (LogisticRegression) – trained logistic regression model
df (pd.DataFrame) – DataFrame containing the match data
- Returns:
a dictionary containing predicted outcome and probabilities
- Return type:
dict
Season half prediction
- pyTSPA.metrics.season_half_prediction(df: DataFrame) DataFrame[source]
Predicts the outcomes of the second half of the season based on the Pythagorean Expectation values calculated from the first half of the season.
- Parameters:
df (pd.DataFrame) – DataFrame containing match data with ‘Date’, ‘HomeTeam’, ‘AwayTeam’ and ‘FTR’ columns
- Returns:
DataFrame with predicted outcomes for the second half of the season
- Return type:
pd.DataFrame