Performance Metrics

Match result statistics

pyTSPA.metrics.result_stats(df: DataFrame) dict[source]

Computes the number of home wins, draws, and away wins from the full-time result column (‘FTR’).

Parameters:

df (pd.DataFrame) – DataFrame containing match data with a column ‘FTR’ indicating match outcomes. - ‘H’ for Home Win - ‘D’ for Draw - ‘A’ for Away Win

Returns:

a dictionary with the counts of each result type, structured as:
{

‘Home Wins’: int, ‘Draws’: int, ‘Away Wins’: int

}

Return type:

dict

Raises:

ValueError – if the ‘FTR’ column is not found in the DataFrame

Team performance statistics

pyTSPA.metrics.team_performance(df: DataFrame, team_name: str) dict[source]

Computes a team’s performance statistics across a season.

This function summarizes key performance metrics for a specified team, including the number of matches played, wins, draws, losses, goals scored, goals conceded, goal difference, and points.

Parameters:
  • df (pd.DataFrame) – dataFrame containing match data with columns ‘HomeTeam’, ‘AwayTeam’, ‘FTHG’, ‘FTAG’, ‘FTR’.

  • team_name (str) – the name of the team for which the performance metrics will be calculated

Returns:

a dictionary containing the team’s performance metrics with the following structure:
{

‘Team’: str, ‘Matches’: int, ‘Wins’: int, ‘Draws’: int, ‘Losses’: int, ‘Goals For’: int, ‘Goals Against’: int, ‘Goal Difference’: int, ‘Points’: int

}

Return type:

dict

Raises:

ValueError – if any of the required columns (‘HomeTeam’, ‘AwayTeam’, ‘FTHG’, ‘FTAG’, ‘FTR’) are missing

All teams

pyTSPA.metrics.get_all_teams(df: DataFrame) ndarray[source]

Extracts a list of all unique team names from ‘HomeTeam’ and ‘AwayTeam’ columns.

This function aggregates unique team names from both the ‘HomeTeam’ and ‘AwayTeam’ columns to provide a comprehensive list of teams in the dataset.

Parameters:

df (pd.DataFrame) – dataFrame containing match data with ‘HomeTeam’ and ‘AwayTeam’ columns

Returns:

a sorted array of unique team names

Return type:

np.ndarray

Raises:

ValueError – if either ‘HomeTeam’ or ‘AwayTeam’ columns are missing

Statistics for all teams

pyTSPA.metrics.each_team_performance(df: DataFrame) DataFrame[source]

Computes performance statistics for every team in the dataset.

This function calculates the performance metrics for each team using the team_performance() function and returns a DataFrame summarizing each team’s performance.

Parameters:

df (pd.DataFrame) – dataFrame containing match data with ‘HomeTeam’, ‘AwayTeam’, ‘FTHG’, ‘FTAG’, ‘FTR’ columns

Returns:

a DataFrame where each row represents a team’s performance summary, sorted by points in descending order Columns include:

  • ’Team’: Team name

  • ’Matches’: Matches played

  • ’Wins’: Wins

  • ’Draws’: Draws

  • ’Losses’: Losses

  • ’Goals For’: Goals scored

  • ’Goals Against’: Goals conceded

  • ’Goal Difference’: Goal difference

  • ’Points’: Points accumulated

Return type:

pd.DataFrame

Win percentage

pyTSPA.metrics.win_percentage(df: DataFrame, team_name: str) float[source]

Calculates the win percentage for a specified team.

Parameters:
  • df (pd.DataFrame) – DataFrame containing match data with ‘HomeTeam’, ‘AwayTeam’, and ‘FTR’ columns

  • team_name (str) – the name of the team to calculate win percentage for

Returns:

the win percentage as a value between 0 and 1

Return type:

float

Raises:

ValueError – if the required columns are missing

Win percentage for all teams

pyTSPA.metrics.each_win_percentage(df: DataFrame) DataFrame[source]

Calculates the win percentage for every team and returns it as a separate DataFrame.

Parameters:

df (pd.DataFrame) – DataFrame containing match data with ‘HomeTeam’, ‘AwayTeam’, and ‘FTR’ columns

Returns:

DataFrame with ‘Team’ and ‘WinPercentage’ columns

Return type:

pd.DataFrame

Pythagorean expectation

pyTSPA.metrics.pythagorean_expectation(df: DataFrame, team_name: str, exponent: float = 2.0) float[source]

Calculates the Pythagorean Expectation for a specified team using match data.

Parameters:
  • df (pd.DataFrame) – DataFrame containing match data with ‘HomeTeam’, ‘AwayTeam’, ‘FTHG’, ‘FTAG’ columns

  • team_name (str) – the name of the team to calculate the Pythagorean Expectation for

  • exponent (float) – exponent value for the calculation, default is 2.0

Returns:

the Pythagorean Expectation as a value between 0 and 1

Return type:

float

Raises:

ValueError – if the required columns are missing

Pythagorean expectation for all teams

pyTSPA.metrics.each_pythagorean_expectation(df: DataFrame, exponent: float = 2.0) DataFrame[source]

Calculates the Pythagorean Expectation for every team and returns it as a separate DataFrame.

Parameters:
  • df (pd.DataFrame) – DataFrame containing match data with ‘HomeTeam’, ‘AwayTeam’, ‘FTHG’, ‘FTAG’ columns

  • exponent (float) – exponent value for the calculation, default is 2.0

Returns:

DataFrame with ‘Team’ and ‘PythagoreanExpectation’ columns

Return type:

pd.DataFrame

Logistic regression prediction

pyTSPA.metrics.logistic_regression_prediction(df: DataFrame) dict[source]

Predicts match outcomes (Win/Draw/Loss) using multinomial logistic regression with oversampling and additional features.

Parameters:

df (pd.DataFrame) – DataFrame containing match data with necessary metrics calculated

Returns:

a dictionary containing model accuracy, confusion matrix, predictions, and the trained model

Return type:

dict

Match outcome prediction

pyTSPA.metrics.predict_match_outcome(home_team: str, away_team: str, model: LogisticRegression, df: DataFrame) dict[source]

Predicts the outcome of a specific match between two teams using the trained logistic regression model.

Parameters:
  • home_team (str) – name of the home team

  • away_team (str) – name of the away team

  • model (LogisticRegression) – trained logistic regression model

  • df (pd.DataFrame) – DataFrame containing the match data

Returns:

a dictionary containing predicted outcome and probabilities

Return type:

dict

Season half prediction

pyTSPA.metrics.season_half_prediction(df: DataFrame) DataFrame[source]

Predicts the outcomes of the second half of the season based on the Pythagorean Expectation values calculated from the first half of the season.

Parameters:

df (pd.DataFrame) – DataFrame containing match data with ‘Date’, ‘HomeTeam’, ‘AwayTeam’ and ‘FTR’ columns

Returns:

DataFrame with predicted outcomes for the second half of the season

Return type:

pd.DataFrame