PDA

View Full Version : Super Simple Match Tracking: USA v Guatemala [R]


Pages : [1] 2

NoSix
09 Jun 2007, 06:29 AM
This thread was inspired by Illinizizou’s statistical analysis threads. While I applaud his objective and efforts, my methods will differ from his in certain respects.

The starting point of my analysis is the observation that teams win soccer games by scoring more goals and conceding less goals than their opponents. I adopt the viewpoint that goals scored and conceded are team, not individual, statistics. My objective is to determine the relationship between team goals scored and conceded and selected individual statistics of the players that make up a team.

The process I will use in order to accomplish my objective is as follows:

1) Propose simple models for the relationship between team goals scored and conceded and selected individual player statistics
2) Define the selected individual statistics as objectively as possible, to improve the reliability of the data collected and avoid the introduction of subjective biases into the analysis.
3) Use a simplified match tracking system in order to collect data for the individual statistics for USA MNT matches.
4) Use standard regression techniques to determine from the data the proper weights for the different variables (statistics) in the model.

On offense, I choose a simple model which assumes that team goals for (GF) is a linear function of shots on goal (S) and passes completed (P), i.e.,

GF=a*S+b*P

Where a and b are constants to be determined from the data.

On theoretical grounds I justify this choice of model by noting that it is the rare goal that is scored without the benefit of a preceding shot, and that while such skills as trapping and dribbling no doubt make some contribution to goal scoring, the vast majority of goals stem from successive possessions of the ball by multiple teammates. Each individual possession in such a chain may (or may not) include traps or dribbles, but to be successful (lead to a goal) all must end with a pass or a shot. [As a practical matter, for those of you statistically inclined, I find based on limited data that even this simple model explains upwards of 70% of the variation in goals scored between different teams.]

On defense, I choose a simple model which assumes that team goals against (GA) is a linear function of KIC’s and BV’s, where KIC’s are the sum of tackles (K), intercepted passes (I), and goalkeeper catches (C ), and BV’s are the sum of blocked shots (B) and saves (V), i.e.,

GA=c*KIC+d*BV+e

Where c, d, and e are constants to be determined from the data.

On theoretical grounds I justify this choice of model by analogy (however imperfect) with the offensive model: BV’s being the defensive analog of shots on goal and KIC’s being the defensive analog of passes completed. [The practical foundations of this model are on somewhat shaky ground, as I have no data whatsoever to demonstrate its validity.]

Now, the “nattering nabobs of negativism” around here aren’t going to like this, but note that only successful touches (offensive and defensive) are included in these simple models. More shots on target and passes completed contribute to more goals scored, but there are no penalties for shots off target or incomplete passes. In the case of the offensive model, the available data indicate that once shots on target are included in the model, the inclusion of shots off target does not explain any more of the variation in goals scored, in line with intuition. A similar argument applies to incomplete passes (while they may or may not correlate to how many goals your team concedes, it is the completed ones that influence how many goals your team scores).

Definitions:

Pass (Successful) – a touch by an offensive player which results in transfer of possession of the ball to a teammate. A pass is defined to be completed (successful) if it is first touched by a teammate of the passing player.

Shot (Successful) – a shot on goal, i.e, a shot that is a goal, is saved by the opposing goal keeper, or blocked by a defender while standing in the opposing goal area.

tacKle (Successful) – a touch by a defensive player which results in transfer of possession of the ball from an offensive player, whose last touch was a trap or dribble, to a defensive player.

Intercepted pass (Successful) – a touch by a defensive player which results in transfer of possession of the ball from an offensive player, whose last touch was a pass, to a defensive player.

Catch (Successful) – a touch by a goalkeeper with their hands which results in transfer of possession of the ball from an offensive player, whose last touch was a pass, to the goalkeeper.

Block (Successful) – a touch by a defensive player while standing in their own goal area which prevents a shot by an offensive player from becoming a goal.

saVe (Successful) – a touch by a goalkeeper which prevents a shot by an offensive player from becoming a goal.

Note that these definitions are designed to be as objective as possible. To record data for these stats does not require you to make any value judgments as to whether a particular pass is good or bad, or whether a particular passer or receiver is more to blame for an incomplete pass. By definition, if a pass is touched by a teammate first, it is successful; if it is touched by the opposing team first (or goes out of bounds) it is not. You are simply observing touches and recording what you see to the best of your ability.

As an example, data from Thursday’s USA v Guatemala match are shown in the table below:

P S K I B C V KIC BV Player
43 2 1 6 0 0 0 7 0 Donovan
54 1 6 5 0 0 0 11 0 Feilhaber
16 1 0 0 0 0 0 0 0 Dempsey
58 1 3 12 0 0 0 15 0 Bradley
58 1 1 13 0 0 0 14 0 Bocanegra
42 1 2 3 0 0 0 5 0 Beasley
45 0 0 10 0 0 0 10 0 Onyewu
55 0 3 7 0 0 0 10 0 Hejduk
46 0 1 12 0 0 0 13 0 Bornstein
5 0 3 1 0 0 0 4 0 DeMerit
10 0 0 1 0 0 0 1 0 Ralston
12 0 0 1 0 0 0 1 0 Johnson
19 0 0 2 0 0 0 2 0 Twellman
20 0 0 3 0 6 2 9 2 Howard
483 7 20 76 0 6 2 102 2 Team


Data from a single match is obviously insufficient to fit the models. For the sake of illustration, however, in the table below I show an example calculation of goal (for) equivalents using constants derived from league data:

P S GF* Player
43 2 0.41 Donovan
54 1 0.31 Feilhaber
16 1 0.29 Dempsey
58 1 0.29 Bradley
58 1 0.29 Bocanegra
42 1 0.25 Beasley
45 0 0.12 Onyewu
55 0 0.12 Hejduk
46 0 0.10 Bornstein
5 0 0.10 DeMerit
10 0 0.10 Ralston
12 0 0.08 Johnson
19 0 0.05 Twellman
20 0 0.04 Howard
483 7 2.18 Team


In this example, Donovan would have contributed the most goal (for) equivalents, 0.41, of any USA player in the match out of a total expected team goal total of 2.18 goals, based on the recorded number of shots on goal and passes completed.

Sam Hamwich
09 Jun 2007, 07:33 AM
Excellent post. Thanks for the data.

Unless I am reading your numbers incorrectly, Dempsey would appear to have had far less statistical impact on the match than in reality.

OWN(yewu)ED
09 Jun 2007, 07:45 AM
That is just downright freaky how accurate that is. great post.

Nutmeg
09 Jun 2007, 09:50 AM
I'm pretty impressed with this model. One thing that stands out to me is this is a pretty good example of how not tracking shots off target doesn't really matter. Look at where Twellman rates for this game.

Good stuff, NoSix. What league data are you using for your constants? MLS?

USA4Life
09 Jun 2007, 10:01 AM
That looked like a lot of work. Nice Job.

I like statistics, however game breaking plays don't show up in your analysis. I would trade 50 good passes for one good shot that goes into the back of the net or the pass that leads to the goal.

Twellman looked like he sucked based on the statistics, however his great left footed pass back across the goal to Dempsey was the difference between winning and tieing.

Players like Twellman that can hustle and create a play that wins a game need to be accounted for.(Poland in Europe last year)

Speaking of Twellman, I believe he said that when given a real chance to play with the first team that he would get the job done.

If you look strictly at results I believe this has been the case.

FirstStar
09 Jun 2007, 10:46 AM
That is just downright freaky how accurate that is. great post.

Yes because Timmy clearly had nothing to do with our win. Either exclude keepers or go back to the drawing board.

cpwilson80
09 Jun 2007, 12:19 PM
That looked like a lot of work. Nice Job.

I like statistics, however game breaking plays don't show up in your analysis. I would trade 50 good passes for one good shot that goes into the back of the net or the pass that leads to the goal.


I think the limitations of the model are apparent. However, over time, you'd expect those who rate highly in the model to be performing well.

Two things come to mind regarding this model:

1) Aren't free kick goals something like 25% of goals scored? I think I remember seeing that stat in the technical report for WC 2002.

2) Should time on the field factor into the analysis?

ugaaccountant
09 Jun 2007, 12:31 PM
These statistics seem to have real value. In a game where we dominated possession and shots on goal, I am not surprised at all to find that it was our midfielders doing the majority of the good work.

Twellman scored low as he did little volume of good work. He had one excellent assist, and the statistic that shows that is assists. Same for Dempsey and the statistic goals.

Howard had an incredibly dull night. He was near perfect on the night, but he did not do alot and thus shows up very little.

With the way we dominated possession we should have had the second goal the model predicted, which coincidently was the one Twellman blew, or could have been Boca's header, or several other legit chances. The average viewer felt we scored less than we should have given the run of play, and this model quantified that.

I think this model can go a long way in helping understand who helps us have possession and shots on goal, which are very key building blocks to a good result. The model seems to have a purpose, be objectively defined, and for this one game yielded the correct results.

Great job!

rollo
09 Jun 2007, 12:50 PM
A good test of this model is the offensive ranking of Twellman. He has almost the lowest GF metric for the game. Does this really make sense? Twellman was inches away from registering two goals!! Wow! Also, he seemed to be working "hard" off the ball and coming deep to help out the team. It would be nice to see the GA using the league stats for constants to see if Twellman has higher than expected GA metric for a forward. Otherwise, was his contribution to the game really poor, or defensive (and shows up on the GA metric), or so different that it is not visible from this model? Over time, if Twellman converts some of those chances one would assume that his GF metric would improve. Whats interesting though, is that all his work still did not show up in terms of passes.

casoccerdad47
09 Jun 2007, 12:59 PM
Yes because Timmy clearly had nothing to do with our win. Either exclude keepers or go back to the drawing board.

I believe the example he provided was for the offensive statistics, not the defensive ones. The only two variables he showed in the example were passes and shots.

casoccerdad47
09 Jun 2007, 01:14 PM
As an example, data from Thursday’s USA v Guatemala match are shown in the table below:

P S K I B C V KIC BV Player
43 2 1 6 0 0 0 7 0 Donovan
54 1 6 5 0 0 0 11 0 Feilhaber
16 1 0 0 0 0 0 0 0 Dempsey
58 1 3 12 0 0 0 15 0 Bradley
58 1 1 13 0 0 0 14 0 Bocanegra
42 1 2 3 0 0 0 5 0 Beasley
45 0 0 10 0 0 0 10 0 Onyewu
55 0 3 7 0 0 0 10 0 Hejduk
46 0 1 12 0 0 0 13 0 Bornstein
5 0 3 1 0 0 0 4 0 DeMerit
10 0 0 1 0 0 0 1 0 Ralston
12 0 0 1 0 0 0 1 0 Johnson
19 0 0 2 0 0 0 2 0 Twellman
20 0 0 3 0 6 2 9 2 Howard
483 7 20 76 0 6 2 102 2 Team




I find the raw data fascinating by itself. For example DeMerit had 3 successful tackles in just over 10 minutes of play compared to 0 for Gooch and Bradley appears to have had a better game KIC=15 and P=58, both of which were the top numbers on the team, than many on Big Soccer have given him credit for.

Academically its fascinating, but I'm really not sure you need to try to combine these statistics in a single number. Give me these numbers plus a percentage of attempted passes completed and a percentage of shots on goal to shots and I'll be happy.

Nutmeg
09 Jun 2007, 01:18 PM
That's correct. These are just the offensive results.

I think there may be something wrong with them right now. For example, Bradley and Bocanegra, both with 58 successful passes and 1 successful shot each, should be scoring higher than Feilhaber, who has slightly fewer passes (54) and 1 shot.

I like the model, but I don't understand the conclusions yet, nor do I understand the constants that are used in the calculations.

As far as Twellman goes, I think this model identifies his "contributions" perfectly. Fact is, he blew a wide open header. Had he managed to put even that one shot on frame, he would have had a goal, and he would have shown up a lot higher in this model. As it is, he didn't complete many passes, wasn't all that involved, and aside from the one play (which was awesome), was ineffective against Guatemala. Take out that 5 second piece, Twellman was pretty awful. The statistics rightly bear that out.

Craig P
09 Jun 2007, 01:19 PM
Excellent post. Thanks for the data.

Unless I am reading your numbers incorrectly, Dempsey would appear to have had far less statistical impact on the match than in reality.
Since the model is showing the weight of team play, I think you should always expect the scorers to have less than a full goal in game contribution.

Ghosting
09 Jun 2007, 03:15 PM
That's correct. These are just the offensive results.

I think there may be something wrong with them right now. For example, Bradley and Bocanegra, both with 58 successful passes and 1 successful shot each, should be scoring higher than Feilhaber, who has slightly fewer passes (54) and 1 shot.

I like the model, but I don't understand the conclusions yet, nor do I understand the constants that are used in the calculations.

As far as Twellman goes, I think this model identifies his "contributions" perfectly. Fact is, he blew a wide open header. Had he managed to put even that one shot on frame, he would have had a goal, and he would have shown up a lot higher in this model. As it is, he didn't complete many passes, wasn't all that involved, and aside from the one play (which was awesome), was ineffective against Guatemala. Take out that 5 second piece, Twellman was pretty awful. The statistics rightly bear that out.

My guess is that LD came out so high because shots on goal is very heavily weighted, and he had twice as many as anyone else (yeah... I know it's only two, but that's how the model would work).

This points out the weakness of the model as well. If Twellman had hit his header softly straight at the keeper instead of over the crossbar, he would have had a much better game.

I'd be curious to see the outcome of the defensive model. I think you might want to normalize it using time of posession. The number of passes on offense essentially helps determine time of possession for that model, but in theory you could have a team completely controll the match so that their defense has very few tackles and blocked shots. Conversly, you could have a defense that is completely dominated that has very few tackles and blocked shots. If you normalized tackles and blocked shots by time of posession, it could mitigate this.

NoSix
10 Jun 2007, 04:25 AM
Unless I am reading your numbers incorrectly, Dempsey would appear to have had far less statistical impact on the match than in reality.

Obviously, real goals are discrete (0, 1, 2, ...), while goal equivalents can take any value. In any single match it will be common for the goal equivalents to be higher or lower than real goals, but over several matches the sums of the goal equivalents and real goals should converge towards the same number.

Dempsey's rating reflects the fact that he scored a goal with his only shot on goal. On average about one in four shots on goal is a goal.

NoSix
10 Jun 2007, 04:33 AM
Good stuff, NoSix. What league data are you using for your constants? MLS?

I used three seasons of EPL data (Optastats), this season and two older ones scrounged from the depths of my hard drive. If anyone has more seasons archived somewhere, please PM me.

OWN(yewu)ED
10 Jun 2007, 04:45 AM
I believe the example he provided was for the offensive statistics, not the defensive ones. The only two variables he showed in the example were passes and shots.

I thought that was kinda assumed as an obvious as well. Guess i was wrong.

no, its not a good value rater for your keep. But give the man some credit he did his homework on this one.

NoSix
10 Jun 2007, 04:56 AM
I like statistics, however game breaking plays don't show up in your analysis.


No, they don't, but we have TV highlights to serve that purpose. The models proposed here are more of a step towards an objective match rating, if you will, that attempts to encompass the entirety of a players contribution to a match.

I would trade 50 good passes for one good shot that goes into the back of the net or the pass that leads to the goal.


I congratulate you on your good intuition. The EPL data indicates that it takes about 59 good passes to equal 1 shot on goal.

NoSix
10 Jun 2007, 05:04 AM
I think there may be something wrong with them right now. For example, Bradley and Bocanegra, both with 58 successful passes and 1 successful shot each, should be scoring higher than Feilhaber, who has slightly fewer passes (54) and 1 shot.


Feilhaber completed his 54 passes and 1 shot in only 80 minutes, while Bradley and Bocanegra went 90. Perhaps I should have mentioned that individual player stats are normalized to 90 min in calculating the ratings, since teams always play 90 min matches.

NoSix
10 Jun 2007, 05:13 AM
Since the model is showing the weight of team play, I think you should always expect the scorers to have less than a full goal in game contribution.

A player's goal equivalents would exceed one if they were to generate 4 shots on goal, or even 3 shots on goal depending on the number of passes completed.