View Full Version : Keeper stats
beineke
27 Sep 2003, 11:52 AM
FYI, over in the Youth National Teams Forum, we've been having an interesting discussion of goalkeepers' numbers.
http://www.bigsoccer.com/forum/showthread.php?threadid=74250
By the way, Moderator -- is it possible to make Stats and Analysis posts appear on the front page? Thx.
beineke
28 Sep 2003, 12:43 PM
Here are the results of the study put together by ChrisE and me. For more detailed discussion, see the link above.
Offside-Adjusted Ratings for 2001-03
1. Metros +4.17% (Howard, Walker)
2. San Jose +2.31% (Cannon, Onstad)
3. Fire +1.93% (Thornton)
4. Crew +1.61% (Presthus, Busch)
5. LA +1.24% (Hartman)
6. NE -0.61% (Sommer, Brown)
7. DC -0.78% (Rimando, Ammann)
8. KC -1.28% (Meola)
9. Colorado -1.61% (Garlick)
10. Dallas -6.05% (Jordan, Countess)
That is to say, the Metros saved 4.17% more shots than you would've expected from looking at the number of times they pulled their opponents offsides.
A few more points:
-- In addition to the numbers above, the 2001 Fusion were at +2.78%, and the 2001 Mutiny were at -5.65% ... note that those numbers are more scattered.
-- If we hadn't adjusted for offsides, the Metros would still be #1, but their lead would be less dramatic. The Crew would be #2, and the Revs would be #9.
-- In a couple of cases (Rimando at DC, Brown at NE), the current keeper has put up markedly better numbers than his predecessor. In other cases, there doesn't appear to be much difference.
ChrisE
20 Apr 2004, 01:48 AM
So, I'm resurrecting a very very old thread, and I'm going to do so rather poorly, but if I don't, it's going to languish here, unfinished, for god knows how long.
So, months ago, I gathered up the 8 seasons of goalkeepr info (less a couple of unavailable TB/Miami seasons), but didn't really have much to do with it. Seeing as I collected these stats months ago, I can't be sure of their accuracy. For offsides, I simply took offsides/games (which is not as good as offsides/minutes, but I'm sorry, I'm not going to go back now and change it); for save %, I removed pk's, and simply took (sog-goals)/sog - that may not be exactly what we're trying to measure, but it should be measuring the same thing across all 8 years.
Offsides Save %
Clb 1996 3.22 0.731
Col 1996 6.38 0.725
Dal 1996 2.56 0.820
DC 1996 4.50 0.723
KC 1996 3.13 0.692
LA 1996 3.81 0.787
Met 1996 3.28 0.786
NE 1996 3.19 0.753
SJ 1996 1.56 0.759
TB 1996 2.69 0.790
Clb 1997 2.75 0.803
Col 1997 4.81 0.744
Dal 1997 2.25 0.831
DC 1997 6.34 0.757
KC 1997 2.25 0.727
LA 1997 5.03 0.755
Met 1997 4.31 0.780
NE 1997 4.34 0.717
SJ 1997 3.59 0.719
TB 1997 3.09 0.724
clb 1998 2.16 0.752
Chi 1998 2.09 0.744
Col. 1998 3.72 0.722
Dal. 1998 2.63 0.752
DC 1998 6.25 0.727
KC 1998 2.75 0.697
LA 1998 4.16 0.735
Met. 1998 4.84 0.755
NE 1998 3.25 0.689
SJ 1998 3.16 0.690
clb 1999 1.00 0.777
Chi 1999 2.13 0.778
Col. 1999 2.47 0.802
Dal. 1999 1.63 0.819
DC 1999 5.25 0.760
KC 1999 3.06 0.730
LA 1999 2.63 0.823
Met. 1999 3.47 0.702
NE 1999 5.25 0.722
SJ 1999 2.59 0.743
clb 2000 1.97 0.741
Chi. 2000 2.97 0.727
Col. 2000 2.94 0.749
Dal. 2000 2.69 0.738
DC 2000 3.97 0.683
KC 2000 2.44 0.836
LA 2000 3.13 0.775
Met. 2000 4.19 0.797
Mia. 2000 3.84 0.741
NE 2000 3.69 0.717
SJ 2000 4.28 0.791
TB 2000 4.03 0.804
clb 2001 3.08 0.812
Chi. 2001 3.31 0.810
Col. 2001 4.23 0.772
Dal. 2001 2.31 0.664
DC 2001 3.65 0.691
KC 2001 3.31 0.720
LA 2001 1.77 0.755
Met. 2001 3.42 0.829
Mia. 2001 3.65 0.792
NE 2001 3.35 0.751
SJ 2001 2.96 0.807
TB 2001 4.62 0.720
clb 2002 2.07 0.770
Chi. 2002 3.00 0.792
Col. 2002 2.96 0.724
Dal. 2002 1.82 0.761
DC 2002 2.50 0.802
KC 2002 3.46 0.732
LA 2002 1.93 0.801
Met. 2002 3.82 0.777
NE 2002 4.43 0.751
SJ 2002 3.61 0.796
clb 2003 1.86 0.784
Chi. 2003 2.69 0.788
Col. 2003 3.17 0.746
Dal. 2003 2.10 0.716
DC 2003 2.03 0.808
KC 2003 3.72 0.770
LA 2003 2.83 0.830
Met. 2003 1.83 0.817
NE 2003 4.76 0.712
SJ 2003 2.38 0.786
For all 8 years, we therefore get totals of:
Offsides Save %
1996 3.431 0.757
1997 3.878 0.756
1998 3.500 0.726
1999 2.947 0.766
2000 3.344 0.758
2001 3.304 0.760
2002 2.961 0.771
2003 2.738 0.776
We see that offsides/game have been declining significantly over the years (r=-.77), while save percentage has been rising (r=.60).
Using my extremely meager linear regression abilities, I get, from the preceding list, a formula that looks like offsides = 79% - 1.009%*offsides/game.
This, of course, is where I stop. I think the next logical step would be to adjust for the decrease in offsides over time (which I've done, and improves correlation of predicted offsides from .27 to .37), but I'm reluctant to do that without someone's approval. Obviously, I ought to test for significance, but I haven't got the faintest clue how to do that.
mellon002
20 Apr 2004, 08:28 AM
By the way, Moderator -- is it possible to make Stats and Analysis posts appear on the front page? Thx.
I know you posted this a while ago but to answer your question the only thing I can think of is BigSoccer live which shows the most recent posts. Did you mean like a part of the blog section?
mpruitt
21 Apr 2004, 05:31 PM
Yeah that was taken care of by Huss during the switch which was great for him. Although I haven't been participating much recently, the forum appears to be really healthy and I will be redoubling my efforts shortly. Another nice thing form the switchover that you'll notice is the ability to have spreadsheets scroll left and right. It makes things we do around here infinately more readable.
In terms of Chris' numbers it's interesting that the save percentages leaguewide haven't changed all that much. The only dip you see is during an expansion year which would seem to make sense. ANother interesting comparrison might be to see how these numbers compare with other leagues. However, I think that the leaguewide goal save percentages could eventually be an interesitng baseline as to looking for what are the chances on average of a shot going in at any given shot.
ur_land
21 Apr 2004, 08:37 PM
Using my extremely meager linear regression abilities, I get, from the preceding list, a formula that looks like offsides = 79% - 1.009%*offsides/game.
This, of course, is where I stop. I think the next logical step would be to adjust for the decrease in offsides over time (which I've done, and improves correlation of predicted offsides from .27 to .37), but I'm reluctant to do that without someone's approval. Obviously, I ought to test for significance, but I haven't got the faintest clue how to do that.
The regression equation I get for that is slightly different (and by the way, I'm assuming you made a typo and were really trying to predict saves from offside calls/game):
save%=0.8451 - .02646 (off/game)
This equation's r-squared (i.e., the amount of the variance in save% that it explains) is 0.41, which is pretty good. Incidentally, take the square root of the r-squared, and you get r, the correlation coefficient of save% and off/game, which is about .65ish or so. Not too bad of a correlation, but we only have 8 data points).
And here's what the equation means: Offside calls/game is a marginally significant predictor of save% (p<.09). This means that as offside calls/game go up by one, save percentage goes DOWN by 2.6%. More offside calls leads to lower save percetnage.
As to why this is (more aggressive defenses lead to more 1 vs 1 opportunities?), I'm not entirely sure.
You're right, however, that offside calls have been declining over time (the correlation is significant at p<.02). Interestingly, when you control for time (i.e.adjust for the decrease in offside calls over time), the relationship between offside calls and save% is no longer significant (p=.41). So any affect of offside on save% was really a function of offside calls going down over time and save% going up over time. That and the fact that this is a small sample, which is not ideal from an inferential standpoint.
So, bottom line, it looks like there's not really any relationship (in this small sample) between offsides-trap-like defenses and save percentage, once you control for time.
ChrisE
21 Apr 2004, 10:04 PM
Yeah that was taken care of by Huss during the switch which was great for him. Although I haven't been participating much recently, the forum appears to be really healthy and I will be redoubling my efforts shortly. Another nice thing form the switchover that you'll notice is the ability to have spreadsheets scroll left and right. It makes things we do around here infinately more readable.
I guess. I sort of liked to have 70 entries on a single page. Maybe I just like things being unreadable.
In terms of Chris' numbers it's interesting that the save percentages leaguewide haven't changed all that much. The only dip you see is during an expansion year which would seem to make sense. ANother interesting comparrison might be to see how these numbers compare with other leagues. However, I think that the leaguewide goal save percentages could eventually be an interesitng baseline as to looking for what are the chances on average of a shot going in at any given shot.
The change in the expansion year isn't surprising, but there's no reason I'd have predicted it to go down instead of up. I mean, people generally make the argument about expansion weakening pitching in the majors, but it ought to weaken hitting too - same here, while goalkeeping/defense may have weakened (although Chigao's keeper was Zach Thornton and Miami's was Jeff Cassar - not significantly weaker), I don't see any reason that it would weaken more than offense. What makes it especially strange is that there was no concurrent jump when the league contracted in 2002 (although 1998 was the first year Ian Feuer ever got significant minutes - maybe he can be blamed).
I think it might be interesting to compare these numbers to other leagues, but I'm not sure exactly what it would show you. Higher save %'s don't necessarily mean a league has better goalkeepers or worse strikers - it might be the case that shot selection is different (I believe England takes a lot more low-percentage long-range shots), or defenses have different strategies, or a host of other factors. They would tell you 'or what are the chances on average of a shot going in at any given shot,' but I'm not sure what that tells you.
You're right, however, that offside calls have been declining over time (the correlation is significant at p<.02). Interestingly, when you control for time (i.e.adjust for the decrease in offside calls over time), the relationship between offside calls and save% is no longer significant (p=.41). So any affect of offside on save% was really a function of offside calls going down over time and save% going up over time. That and the fact that this is a small sample, which is not ideal from an inferential standpoint.
Why would you need to adjust for the decrease in offside calls over time, unless there's an indication that this is an officiating change rather than a tactical change?
ChrisE
21 Apr 2004, 10:23 PM
The regression equation I get for that is slightly different (and by the way, I'm assuming you made a typo and were really trying to predict saves from offside calls/game):
save%=0.8451 - .02646 (off/game)
Yeah, my bad on the typo. Thanks a lot for the input, ur_land. It looks to me that the difference between our numbers (certainly not slight!) is that you just used the 8 season averages; although I suspect I didn't make it clear, I used the 80 or so individual team-seasons. It doesn't make sense to me to use just the 8 seasons, since you eliminate the teams that would show the most distinct effect.
statistics stuff...
As to why this is (more aggressive defenses lead to more 1 vs 1 opportunities?), I'm not entirely sure.
That was exactly the theory (Marvin Fischer and beineke's, I believe). More offsides traps means more blown offsides traps means better shooting percentage for the shots that players actually get (probably produces fewer shots in general, also).
You're right, however, that offside calls have been declining over time (the correlation is significant at p<.02). Interestingly, when you control for time (i.e.adjust for the decrease in offside calls over time), the relationship between offside calls and save% is no longer significant (p=.41). So any affect of offside on save% was really a function of offside calls going down over time and save% going up over time. That and the fact that this is a small sample, which is not ideal from an inferential standpoint.
So, bottom line, it looks like there's not really any relationship (in this small sample) between offsides-trap-like defenses and save percentage, once you control for time.
Let me here apologize for screwing up and making my initial post unclear, and try to post some results from the regression that I did. R squared in this case is a measly .075 - offsides clearly don't have as significant an effect as a lot of other factors do. Adjusted r squared is even lower, .063, although I don't know what that means. Excel is kind enough to give me a whole lot of numbers I don't understand, but I think (significance F = .011881) means I can pull one of these (p<.02).
I'm still not gonna mess with adjusting for time unless someone gives me some help.
ChrisE
21 Apr 2004, 10:37 PM
Why would you need to adjust for the decrease in offside calls over time, unless there's an indication that this is an officiating change rather than a tactical change?
One reason that I think it might be useful is that the yearly error in the regressed save percentages correlates very strongly to the average yearly difference from the mean. Maybe this is to be expected, I'm sort of lost, but here's the data:
mean = .759
1996 0.757 -0.002
1997 0.756 -0.003
1998 0.726 -0.032
1999 0.766 0.007
2000 0.758 0.000
2001 0.760 0.002
2002 0.771 0.012
2003 0.776 0.017
Total 0.759 0.000
Regression Average
1996 -0.002 0.002
1997 -0.040 0.003
1998 0.292 0.032
1999 -0.043 -0.007
2000 -0.014 0.000
2001 -0.033 -0.002
2002 -0.094 -0.012
2003 -0.124 -0.017
r=.97455
mpruitt
22 Apr 2004, 12:01 AM
This means that as offside calls/game go up by one, save percentage goes DOWN by 2.6%. More offside calls leads to lower save percetnage.
As to why this is (more aggressive defenses lead to more 1 vs 1 opportunities?), I'm not entirely sure.
Theres a mental connection that I'm having trouble making here. If saves percentage is going down then that should mean that more goals are being scored yes. It would seem to be the obvious inference. However, it might be interesting to see offside rate as it relates to goals per game or shots per game.
It'd seem to me that the offeside trap, while can be an effective defensive tool is usually more of a bail out manuever. How many times do you watch a game and you see a team that is attacking, attacking and attacking and they'll have a couple of big offsides calls go agains them. I think it's possiable that the correlation we're seeing between more offsides equaling less saves is that teams are doing the lion's share of attacking.
Problem being that we're dealing with a ratio here. So back at the begining, what's special about this sitation where keepers aren't getting shots off? Essetinally, I'd have no idea and because this damn sport is so weak with independant events it'd be terriably hard to tell. However, I'd be willing to bet that teams who are getting more offsides calls against them are also getting more shots on goals.
Problem again, so what's the recomendation to coaches? Have your guys get called offsides more :) ?
ur_land
22 Apr 2004, 09:55 AM
Why would you need to adjust for the decrease in offside calls over time, unless there's an indication that this is an officiating change rather than a tactical change?
To find the true relationship between offside calls and save percentage.
When you control for time, the relationship goes away. So there is SOMETHING out there (changes in officials? changes in tactics?) that is affecting both offside calls and save percentage. I'm not sure what it is, but it is affecting both offside calls and save percentage. And when you don't control for it, you see a spurious relationship between offside cals and save%.
ur_land
22 Apr 2004, 10:22 AM
Yeah, my bad on the typo. Thanks a lot for the input, ur_land. It looks to me that the difference between our numbers (certainly not slight!) is that you just used the 8 season averages; although I suspect I didn't make it clear, I used the 80 or so individual team-seasons. It doesn't make sense to me to use just the 8 seasons, since you eliminate the teams that would show the most distinct effect.
You're right in that using just the 8 seson averages has a few problems. It doesn't eliminate the teams that would show the most distinct effect, as they contribute to that year's average, but using all of the teams seperately gives you much greater statistical power. I just used the 8 years becasue it was easy and because I didn't want to go to the trouble of doing the regression with all of the teams in the correct way.
But since you used the 80 or so team seasons......well, I guess I'll have to explain it anyway.
I would not place too much stock on the regression results you obtained, because the regression you did was done in a less than optimal manner. When you have two or more individual data points from the same group (same marriage, classroom, league season, person, etc.) you need to make sure that your data isn't corrupted by dependency. One of the assumptions of regression is that all of the errors of you observations are independent. When you have dependency, the erros are not independent--because of some grouping variable, some errors are correlated together.
(this is from http://www.uu.nl/uupublish/onderzoek/onderzoekcentra/iops/research/stat/30956main.html )
Many data sets in the social sciences have a nested structure with persons nested within groups, which may themselves be nested within higher order groups and so on. Examples are pupils nested within classes within schools, patients nested within therapy groups, and employees nested within companies. Outcomes of persons within a group are likely to be correlated due to mutual influence and group norms. For instance, the smoking behaviour of a pupil within a class is likely to depend of that of pupils within the same class and (to a lesser degree) school, that of teachers, and the school policy towards smoking. Nested data structures also occur in longitudinal studies in which repeated measurements are nested within persons. Examples are a study on the effect of a hormone treatment on the psychosocial function of children with a disease and studies on the change in gender effect on test scores over time.
And another example would be looking at the effect of offside calls on save% for 10+ teams over 10 years. The errors are likely to be correlated with each other within a year more than across years, and this could cause your regression to be biased.
So there's a couple things that can be done to correct for this: you can do a within subjects regression (http://www.visualstatistics.net/Visual%20Statistics%20Multimedia/multiple_regression_within_the_repeated_measures_design.htm), you can do a multilvel model (http://www.ssicentral.com/hlm/hlm.htm), or you can average across all teams for a year. I did the third way because it was easy and quick. The other methods (which actually are much better, as they give you greater statistical power) take a little longer, and I'm supposed to be analyzing my dissertation data, not analyzing MLS stats! If you don't want to tackle this (and it's not something that can easily be done in excel), I'll see if I can con one of my buddies into doing it for us.......
numerista
22 Apr 2004, 12:24 PM
When you control for time, the relationship goes away.
You're missing JG's point. The purpose of the study is to relate tactics (the offside trap) to save %. We know that teams have decreased their use of the offside trap over time, so by adjusting for time, you're adjusting out the signal of interest.
I have other issues with some things you've claimed, but this one is probably the biggest.
ur_land
22 Apr 2004, 12:50 PM
We know that teams have decreased their use of the offside trap over time, so by adjusting for time, you're adjusting out the signal of interest.
I have other issues with some things you've claimed, but this one is probably the biggest.
How do we know that teams have decreased their use of the offside trap over time? We know that offside calls have gone down, but does that mean trap usage has necessarily gone down too? The two are not necessarily related. If they have declined their usage of the trap, then yes, that would be partialed out when controlling for time.
What are your other issues?
numerista
22 Apr 2004, 01:52 PM
How do we know that teams have decreased their use of the offside trap over time? We know that offside calls have gone down, but does that mean trap usage has necessarily gone down too? The two are not necessarily related. If they have declined their usage of the trap, then yes, that would be partialed out when controlling for time.
Take a look at individual teams over time. For instance, Bob Bradley came to New York last season and got rid of the offsides trap. They drew 50 fewer offsides calls in 2003 than 2002, by far the biggest change in the league.
Other issues...
1) Even after the adjustments you've made, errors are clearly correlated, due to the fact that we're observing some of the same players and coaches across multiple seasons.
2) Correlated errors are a much bigger issue in significance testing than parameter estimation, so ChrisE's results are still the best we've got. To illustrate, even if within-year observations were perfectly correlated (this is the situation for which your adjustment is correct), then his fitted model would be (essentially) identical to yours. Because your model is different, this suggests that you're discarding valuable information.
3) Given the way you're looking at within-season data (by averaging), I don't see how a reduction in significance implies that "you've seen a spurious relationship."
numerista
22 Apr 2004, 02:12 PM
Regression Average
1996 -0.002 0.002
1997 -0.040 0.003
1998 0.292 0.032
1999 -0.043 -0.007
2000 -0.014 0.000
2001 -0.033 -0.002
2002 -0.094 -0.012
2003 -0.124 -0.017
r=.97455
Afraid I don't understand this chart, Chris. The righthand column is the difference between the yearly average and the global average. What's the lefthand column?
ChrisE
22 Apr 2004, 03:15 PM
Afraid I don't understand this chart, Chris. The righthand column is the difference between the yearly average and the global average. What's the lefthand column?
You can only expect me to be so clear when I'm talking about things I don't understand. For the first column, I predicted the save percentage from offsides/game for all 84 teams. I then subtracted that number from the actual save percentage for that team, that year. I then summed them up by year.
It's not actually the average, as I believe I said, it's the sum, but it hardly makes any difference.
numerista
22 Apr 2004, 03:56 PM
You can only expect me to be so clear when I'm talking about things I don't understand. For the first column, I predicted the save percentage from offsides/game for all 84 teams. I then subtracted that number from the actual save percentage for that team, that year. I then summed them up by year.
It's not actually the average, as I believe I said, it's the sum, but it hardly makes any difference.
... so if you had divided the left-hand column by the number of teams, you'd have the yearly average minus the predicted yearly average (in stats, we might call this the average residual for that year).
Because the two columns are correlated, something other than offsides is changing from year to year to modify save percentage. What's striking to me is the low save %age in 1998. 1998 was an expansion year, meaning that two new keepers were needed. In addition, Friedel and Zenga had left the league, Dodd was getting old, and shotstoppers like Howard and Cannon (#1 and #2 in adjusted save percentage, 01-03) were not yet playing.
My theory is that in 1998, there was a drop in the quality of goalkeeping, and that since then, keeping has improved. That's why you've found this pattern.
numerista
22 Apr 2004, 05:05 PM
A few more notes ...
-- I fit a regression using offsides AND team id, but team was clearly not a significant predictor of save %age (min pval 0.11).
-- Ignoring correlations between observations, the estimated effect is 1.01%, with a plus-minus (2 std errors) of 0.78%. This implies that even with optimistic assumptions, we're not sure how big the effect really is.
Top five seasons: adjusted save percentage
1. 2001 Metros, Tim Howard +7.2%
2. 2000 Wizards, Tony Meola +6.9%
3. 2003 Galaxy, Kevin Hartman +6.7%
4. 1997 Burn, Mark Dodd +6.2%
5. 1999 Galaxy, Kevin Hartman +5.8%
Bottom five (worst first)
1. 2001 Burn, Matt Jordan -10.4%
2. 1998 Revs, Ian Feuer -7.0%
3. 1998 SJ, David Kramer/Andy Kirk -7.0%
4. 2000 DC, Mark Simpson/Tom Presthus -6.9%
5. 1996 Wizards, Garth Lagerwey -6.8%
2003
1. Galaxy, Hartman +6.7%
2. Metros, Howard/Walker + 4.4%
3. DC, Rimando +3.7%
4. Fire, Thornton +2.4%
5. SJ, Onstad +1.8%
6. KC, Meola +1.6%
7. Clb, Busch +1.1%
8. Clr, Garlick -1.4%
9. NE, Brown -3.2%
10. Dal, Countess -5.4%
Hartman's numbers have varied quite a bit through the years, but here are Rimando's three full seasons ... 2003 +3.7, 2002 +3.7, 2001 +3.6.