NoSix
09 Jun 2007, 06:29 AM
This thread was inspired by Illinizizou’s statistical analysis threads. While I applaud his objective and efforts, my methods will differ from his in certain respects.
The starting point of my analysis is the observation that teams win soccer games by scoring more goals and conceding less goals than their opponents. I adopt the viewpoint that goals scored and conceded are team, not individual, statistics. My objective is to determine the relationship between team goals scored and conceded and selected individual statistics of the players that make up a team.
The process I will use in order to accomplish my objective is as follows:
1) Propose simple models for the relationship between team goals scored and conceded and selected individual player statistics
2) Define the selected individual statistics as objectively as possible, to improve the reliability of the data collected and avoid the introduction of subjective biases into the analysis.
3) Use a simplified match tracking system in order to collect data for the individual statistics for USA MNT matches.
4) Use standard regression techniques to determine from the data the proper weights for the different variables (statistics) in the model.
On offense, I choose a simple model which assumes that team goals for (GF) is a linear function of shots on goal (S) and passes completed (P), i.e.,
GF=a*S+b*P
Where a and b are constants to be determined from the data.
On theoretical grounds I justify this choice of model by noting that it is the rare goal that is scored without the benefit of a preceding shot, and that while such skills as trapping and dribbling no doubt make some contribution to goal scoring, the vast majority of goals stem from successive possessions of the ball by multiple teammates. Each individual possession in such a chain may (or may not) include traps or dribbles, but to be successful (lead to a goal) all must end with a pass or a shot. [As a practical matter, for those of you statistically inclined, I find based on limited data that even this simple model explains upwards of 70% of the variation in goals scored between different teams.]
On defense, I choose a simple model which assumes that team goals against (GA) is a linear function of KIC’s and BV’s, where KIC’s are the sum of tackles (K), intercepted passes (I), and goalkeeper catches (C ), and BV’s are the sum of blocked shots (B) and saves (V), i.e.,
GA=c*KIC+d*BV+e
Where c, d, and e are constants to be determined from the data.
On theoretical grounds I justify this choice of model by analogy (however imperfect) with the offensive model: BV’s being the defensive analog of shots on goal and KIC’s being the defensive analog of passes completed. [The practical foundations of this model are on somewhat shaky ground, as I have no data whatsoever to demonstrate its validity.]
Now, the “nattering nabobs of negativism” around here aren’t going to like this, but note that only successful touches (offensive and defensive) are included in these simple models. More shots on target and passes completed contribute to more goals scored, but there are no penalties for shots off target or incomplete passes. In the case of the offensive model, the available data indicate that once shots on target are included in the model, the inclusion of shots off target does not explain any more of the variation in goals scored, in line with intuition. A similar argument applies to incomplete passes (while they may or may not correlate to how many goals your team concedes, it is the completed ones that influence how many goals your team scores).
Definitions:
Pass (Successful) – a touch by an offensive player which results in transfer of possession of the ball to a teammate. A pass is defined to be completed (successful) if it is first touched by a teammate of the passing player.
Shot (Successful) – a shot on goal, i.e, a shot that is a goal, is saved by the opposing goal keeper, or blocked by a defender while standing in the opposing goal area.
tacKle (Successful) – a touch by a defensive player which results in transfer of possession of the ball from an offensive player, whose last touch was a trap or dribble, to a defensive player.
Intercepted pass (Successful) – a touch by a defensive player which results in transfer of possession of the ball from an offensive player, whose last touch was a pass, to a defensive player.
Catch (Successful) – a touch by a goalkeeper with their hands which results in transfer of possession of the ball from an offensive player, whose last touch was a pass, to the goalkeeper.
Block (Successful) – a touch by a defensive player while standing in their own goal area which prevents a shot by an offensive player from becoming a goal.
saVe (Successful) – a touch by a goalkeeper which prevents a shot by an offensive player from becoming a goal.
Note that these definitions are designed to be as objective as possible. To record data for these stats does not require you to make any value judgments as to whether a particular pass is good or bad, or whether a particular passer or receiver is more to blame for an incomplete pass. By definition, if a pass is touched by a teammate first, it is successful; if it is touched by the opposing team first (or goes out of bounds) it is not. You are simply observing touches and recording what you see to the best of your ability.
As an example, data from Thursday’s USA v Guatemala match are shown in the table below:
P S K I B C V KIC BV Player
43 2 1 6 0 0 0 7 0 Donovan
54 1 6 5 0 0 0 11 0 Feilhaber
16 1 0 0 0 0 0 0 0 Dempsey
58 1 3 12 0 0 0 15 0 Bradley
58 1 1 13 0 0 0 14 0 Bocanegra
42 1 2 3 0 0 0 5 0 Beasley
45 0 0 10 0 0 0 10 0 Onyewu
55 0 3 7 0 0 0 10 0 Hejduk
46 0 1 12 0 0 0 13 0 Bornstein
5 0 3 1 0 0 0 4 0 DeMerit
10 0 0 1 0 0 0 1 0 Ralston
12 0 0 1 0 0 0 1 0 Johnson
19 0 0 2 0 0 0 2 0 Twellman
20 0 0 3 0 6 2 9 2 Howard
483 7 20 76 0 6 2 102 2 Team
Data from a single match is obviously insufficient to fit the models. For the sake of illustration, however, in the table below I show an example calculation of goal (for) equivalents using constants derived from league data:
P S GF* Player
43 2 0.41 Donovan
54 1 0.31 Feilhaber
16 1 0.29 Dempsey
58 1 0.29 Bradley
58 1 0.29 Bocanegra
42 1 0.25 Beasley
45 0 0.12 Onyewu
55 0 0.12 Hejduk
46 0 0.10 Bornstein
5 0 0.10 DeMerit
10 0 0.10 Ralston
12 0 0.08 Johnson
19 0 0.05 Twellman
20 0 0.04 Howard
483 7 2.18 Team
In this example, Donovan would have contributed the most goal (for) equivalents, 0.41, of any USA player in the match out of a total expected team goal total of 2.18 goals, based on the recorded number of shots on goal and passes completed.
The starting point of my analysis is the observation that teams win soccer games by scoring more goals and conceding less goals than their opponents. I adopt the viewpoint that goals scored and conceded are team, not individual, statistics. My objective is to determine the relationship between team goals scored and conceded and selected individual statistics of the players that make up a team.
The process I will use in order to accomplish my objective is as follows:
1) Propose simple models for the relationship between team goals scored and conceded and selected individual player statistics
2) Define the selected individual statistics as objectively as possible, to improve the reliability of the data collected and avoid the introduction of subjective biases into the analysis.
3) Use a simplified match tracking system in order to collect data for the individual statistics for USA MNT matches.
4) Use standard regression techniques to determine from the data the proper weights for the different variables (statistics) in the model.
On offense, I choose a simple model which assumes that team goals for (GF) is a linear function of shots on goal (S) and passes completed (P), i.e.,
GF=a*S+b*P
Where a and b are constants to be determined from the data.
On theoretical grounds I justify this choice of model by noting that it is the rare goal that is scored without the benefit of a preceding shot, and that while such skills as trapping and dribbling no doubt make some contribution to goal scoring, the vast majority of goals stem from successive possessions of the ball by multiple teammates. Each individual possession in such a chain may (or may not) include traps or dribbles, but to be successful (lead to a goal) all must end with a pass or a shot. [As a practical matter, for those of you statistically inclined, I find based on limited data that even this simple model explains upwards of 70% of the variation in goals scored between different teams.]
On defense, I choose a simple model which assumes that team goals against (GA) is a linear function of KIC’s and BV’s, where KIC’s are the sum of tackles (K), intercepted passes (I), and goalkeeper catches (C ), and BV’s are the sum of blocked shots (B) and saves (V), i.e.,
GA=c*KIC+d*BV+e
Where c, d, and e are constants to be determined from the data.
On theoretical grounds I justify this choice of model by analogy (however imperfect) with the offensive model: BV’s being the defensive analog of shots on goal and KIC’s being the defensive analog of passes completed. [The practical foundations of this model are on somewhat shaky ground, as I have no data whatsoever to demonstrate its validity.]
Now, the “nattering nabobs of negativism” around here aren’t going to like this, but note that only successful touches (offensive and defensive) are included in these simple models. More shots on target and passes completed contribute to more goals scored, but there are no penalties for shots off target or incomplete passes. In the case of the offensive model, the available data indicate that once shots on target are included in the model, the inclusion of shots off target does not explain any more of the variation in goals scored, in line with intuition. A similar argument applies to incomplete passes (while they may or may not correlate to how many goals your team concedes, it is the completed ones that influence how many goals your team scores).
Definitions:
Pass (Successful) – a touch by an offensive player which results in transfer of possession of the ball to a teammate. A pass is defined to be completed (successful) if it is first touched by a teammate of the passing player.
Shot (Successful) – a shot on goal, i.e, a shot that is a goal, is saved by the opposing goal keeper, or blocked by a defender while standing in the opposing goal area.
tacKle (Successful) – a touch by a defensive player which results in transfer of possession of the ball from an offensive player, whose last touch was a trap or dribble, to a defensive player.
Intercepted pass (Successful) – a touch by a defensive player which results in transfer of possession of the ball from an offensive player, whose last touch was a pass, to a defensive player.
Catch (Successful) – a touch by a goalkeeper with their hands which results in transfer of possession of the ball from an offensive player, whose last touch was a pass, to the goalkeeper.
Block (Successful) – a touch by a defensive player while standing in their own goal area which prevents a shot by an offensive player from becoming a goal.
saVe (Successful) – a touch by a goalkeeper which prevents a shot by an offensive player from becoming a goal.
Note that these definitions are designed to be as objective as possible. To record data for these stats does not require you to make any value judgments as to whether a particular pass is good or bad, or whether a particular passer or receiver is more to blame for an incomplete pass. By definition, if a pass is touched by a teammate first, it is successful; if it is touched by the opposing team first (or goes out of bounds) it is not. You are simply observing touches and recording what you see to the best of your ability.
As an example, data from Thursday’s USA v Guatemala match are shown in the table below:
P S K I B C V KIC BV Player
43 2 1 6 0 0 0 7 0 Donovan
54 1 6 5 0 0 0 11 0 Feilhaber
16 1 0 0 0 0 0 0 0 Dempsey
58 1 3 12 0 0 0 15 0 Bradley
58 1 1 13 0 0 0 14 0 Bocanegra
42 1 2 3 0 0 0 5 0 Beasley
45 0 0 10 0 0 0 10 0 Onyewu
55 0 3 7 0 0 0 10 0 Hejduk
46 0 1 12 0 0 0 13 0 Bornstein
5 0 3 1 0 0 0 4 0 DeMerit
10 0 0 1 0 0 0 1 0 Ralston
12 0 0 1 0 0 0 1 0 Johnson
19 0 0 2 0 0 0 2 0 Twellman
20 0 0 3 0 6 2 9 2 Howard
483 7 20 76 0 6 2 102 2 Team
Data from a single match is obviously insufficient to fit the models. For the sake of illustration, however, in the table below I show an example calculation of goal (for) equivalents using constants derived from league data:
P S GF* Player
43 2 0.41 Donovan
54 1 0.31 Feilhaber
16 1 0.29 Dempsey
58 1 0.29 Bradley
58 1 0.29 Bocanegra
42 1 0.25 Beasley
45 0 0.12 Onyewu
55 0 0.12 Hejduk
46 0 0.10 Bornstein
5 0 0.10 DeMerit
10 0 0.10 Ralston
12 0 0.08 Johnson
19 0 0.05 Twellman
20 0 0.04 Howard
483 7 2.18 Team
In this example, Donovan would have contributed the most goal (for) equivalents, 0.41, of any USA player in the match out of a total expected team goal total of 2.18 goals, based on the recorded number of shots on goal and passes completed.