Calculate the average minute of the first substitution per decade.
# Pseudocode for Python (Pandas) avg_sub_time = df[df['substitute_out'].notnull()].groupby('year')['substitute_out'].mean() In the 1980s, the average sub happened in the 75th minute. By 2022, it’s the 58th minute. This table empirically proves the tactical revolution: managers now treat the bench as a weapon, not a lifeboat. 4. The Anomaly Detection: Own Goals and Disciplinary Records Because appearances.csv includes own_goals and red_cards at the player-match level, you can ask bizarre, wonderful questions. jfjelstul worldcup data-csv appearances
import pandas as pd appearances = pd.read_csv('https://raw.githubusercontent.com/jfjelstul/worldcup/master/data-csv/appearances.csv') goals = pd.read_csv('https://raw.githubusercontent.com/jfjelstul/worldcup/master/data-csv/goals.csv') Filter for substitutes (game_started = FALSE) subs = appearances[appearances['game_started'] == False] Merge with goals to count goals by sub appearances sub_goals = goals.merge(subs, on=['match_id', 'player_id']) sub_goals_count = sub_goals.groupby('player_name_x').size().reset_index(name='goals') sub_goals_count.sort_values('goals', ascending=False).head(10) Calculate the average minute of the first substitution
For the analyst, this file is a playground of temporal logic. For the fan, it is a reminder that every minute on that pitch is a dataset of one. Load the CSV. Run the join. Ask who really worked the hardest. The answer is waiting in the rows of appearances.csv . import pandas as pd appearances = pd