FYI, over in the Youth National Teams Forum, we've been having an interesting discussion of goalkeepers' numbers. https://www.bigsoccer.com/forum/showthread.php?threadid=74250 By the way, Moderator -- is it possible to make Stats and Analysis posts appear on the front page? Thx.
Here are the results of the study put together by ChrisE and me. For more detailed discussion, see the link above. Offside-Adjusted Ratings for 2001-03 1. Metros +4.17% (Howard, Walker) 2. San Jose +2.31% (Cannon, Onstad) 3. Fire +1.93% (Thornton) 4. Crew +1.61% (Presthus, Busch) 5. LA +1.24% (Hartman) 6. NE -0.61% (Sommer, Brown) 7. DC -0.78% (Rimando, Ammann) 8. KC -1.28% (Meola) 9. Colorado -1.61% (Garlick) 10. Dallas -6.05% (Jordan, Countess) That is to say, the Metros saved 4.17% more shots than you would've expected from looking at the number of times they pulled their opponents offsides. A few more points: -- In addition to the numbers above, the 2001 Fusion were at +2.78%, and the 2001 Mutiny were at -5.65% ... note that those numbers are more scattered. -- If we hadn't adjusted for offsides, the Metros would still be #1, but their lead would be less dramatic. The Crew would be #2, and the Revs would be #9. -- In a couple of cases (Rimando at DC, Brown at NE), the current keeper has put up markedly better numbers than his predecessor. In other cases, there doesn't appear to be much difference.
So, I'm resurrecting a very very old thread, and I'm going to do so rather poorly, but if I don't, it's going to languish here, unfinished, for god knows how long. So, months ago, I gathered up the 8 seasons of goalkeepr info (less a couple of unavailable TB/Miami seasons), but didn't really have much to do with it. Seeing as I collected these stats months ago, I can't be sure of their accuracy. For offsides, I simply took offsides/games (which is not as good as offsides/minutes, but I'm sorry, I'm not going to go back now and change it); for save %, I removed pk's, and simply took (sog-goals)/sog - that may not be exactly what we're trying to measure, but it should be measuring the same thing across all 8 years. Code: Offsides Save % Clb 1996 3.22 0.731 Col 1996 6.38 0.725 Dal 1996 2.56 0.820 DC 1996 4.50 0.723 KC 1996 3.13 0.692 LA 1996 3.81 0.787 Met 1996 3.28 0.786 NE 1996 3.19 0.753 SJ 1996 1.56 0.759 TB 1996 2.69 0.790 Clb 1997 2.75 0.803 Col 1997 4.81 0.744 Dal 1997 2.25 0.831 DC 1997 6.34 0.757 KC 1997 2.25 0.727 LA 1997 5.03 0.755 Met 1997 4.31 0.780 NE 1997 4.34 0.717 SJ 1997 3.59 0.719 TB 1997 3.09 0.724 clb 1998 2.16 0.752 Chi 1998 2.09 0.744 Col. 1998 3.72 0.722 Dal. 1998 2.63 0.752 DC 1998 6.25 0.727 KC 1998 2.75 0.697 LA 1998 4.16 0.735 Met. 1998 4.84 0.755 NE 1998 3.25 0.689 SJ 1998 3.16 0.690 clb 1999 1.00 0.777 Chi 1999 2.13 0.778 Col. 1999 2.47 0.802 Dal. 1999 1.63 0.819 DC 1999 5.25 0.760 KC 1999 3.06 0.730 LA 1999 2.63 0.823 Met. 1999 3.47 0.702 NE 1999 5.25 0.722 SJ 1999 2.59 0.743 clb 2000 1.97 0.741 Chi. 2000 2.97 0.727 Col. 2000 2.94 0.749 Dal. 2000 2.69 0.738 DC 2000 3.97 0.683 KC 2000 2.44 0.836 LA 2000 3.13 0.775 Met. 2000 4.19 0.797 Mia. 2000 3.84 0.741 NE 2000 3.69 0.717 SJ 2000 4.28 0.791 TB 2000 4.03 0.804 clb 2001 3.08 0.812 Chi. 2001 3.31 0.810 Col. 2001 4.23 0.772 Dal. 2001 2.31 0.664 DC 2001 3.65 0.691 KC 2001 3.31 0.720 LA 2001 1.77 0.755 Met. 2001 3.42 0.829 Mia. 2001 3.65 0.792 NE 2001 3.35 0.751 SJ 2001 2.96 0.807 TB 2001 4.62 0.720 clb 2002 2.07 0.770 Chi. 2002 3.00 0.792 Col. 2002 2.96 0.724 Dal. 2002 1.82 0.761 DC 2002 2.50 0.802 KC 2002 3.46 0.732 LA 2002 1.93 0.801 Met. 2002 3.82 0.777 NE 2002 4.43 0.751 SJ 2002 3.61 0.796 clb 2003 1.86 0.784 Chi. 2003 2.69 0.788 Col. 2003 3.17 0.746 Dal. 2003 2.10 0.716 DC 2003 2.03 0.808 KC 2003 3.72 0.770 LA 2003 2.83 0.830 Met. 2003 1.83 0.817 NE 2003 4.76 0.712 SJ 2003 2.38 0.786 For all 8 years, we therefore get totals of: Code: Offsides Save % 1996 3.431 0.757 1997 3.878 0.756 1998 3.500 0.726 1999 2.947 0.766 2000 3.344 0.758 2001 3.304 0.760 2002 2.961 0.771 2003 2.738 0.776 We see that offsides/game have been declining significantly over the years (r=-.77), while save percentage has been rising (r=.60). Using my extremely meager linear regression abilities, I get, from the preceding list, a formula that looks like offsides = 79% - 1.009%*offsides/game. This, of course, is where I stop. I think the next logical step would be to adjust for the decrease in offsides over time (which I've done, and improves correlation of predicted offsides from .27 to .37), but I'm reluctant to do that without someone's approval. Obviously, I ought to test for significance, but I haven't got the faintest clue how to do that.
I know you posted this a while ago but to answer your question the only thing I can think of is BigSoccer live which shows the most recent posts. Did you mean like a part of the blog section?
Yeah that was taken care of by Huss during the switch which was great for him. Although I haven't been participating much recently, the forum appears to be really healthy and I will be redoubling my efforts shortly. Another nice thing form the switchover that you'll notice is the ability to have spreadsheets scroll left and right. It makes things we do around here infinately more readable. In terms of Chris' numbers it's interesting that the save percentages leaguewide haven't changed all that much. The only dip you see is during an expansion year which would seem to make sense. ANother interesting comparrison might be to see how these numbers compare with other leagues. However, I think that the leaguewide goal save percentages could eventually be an interesitng baseline as to looking for what are the chances on average of a shot going in at any given shot.
The regression equation I get for that is slightly different (and by the way, I'm assuming you made a typo and were really trying to predict saves from offside calls/game): save%=0.8451 - .02646 (off/game) This equation's r-squared (i.e., the amount of the variance in save% that it explains) is 0.41, which is pretty good. Incidentally, take the square root of the r-squared, and you get r, the correlation coefficient of save% and off/game, which is about .65ish or so. Not too bad of a correlation, but we only have 8 data points). And here's what the equation means: Offside calls/game is a marginally significant predictor of save% (p<.09). This means that as offside calls/game go up by one, save percentage goes DOWN by 2.6%. More offside calls leads to lower save percetnage. As to why this is (more aggressive defenses lead to more 1 vs 1 opportunities?), I'm not entirely sure. You're right, however, that offside calls have been declining over time (the correlation is significant at p<.02). Interestingly, when you control for time (i.e.adjust for the decrease in offside calls over time), the relationship between offside calls and save% is no longer significant (p=.41). So any affect of offside on save% was really a function of offside calls going down over time and save% going up over time. That and the fact that this is a small sample, which is not ideal from an inferential standpoint. So, bottom line, it looks like there's not really any relationship (in this small sample) between offsides-trap-like defenses and save percentage, once you control for time.
I guess. I sort of liked to have 70 entries on a single page. Maybe I just like things being unreadable. The change in the expansion year isn't surprising, but there's no reason I'd have predicted it to go down instead of up. I mean, people generally make the argument about expansion weakening pitching in the majors, but it ought to weaken hitting too - same here, while goalkeeping/defense may have weakened (although Chigao's keeper was Zach Thornton and Miami's was Jeff Cassar - not significantly weaker), I don't see any reason that it would weaken more than offense. What makes it especially strange is that there was no concurrent jump when the league contracted in 2002 (although 1998 was the first year Ian Feuer ever got significant minutes - maybe he can be blamed). I think it might be interesting to compare these numbers to other leagues, but I'm not sure exactly what it would show you. Higher save %'s don't necessarily mean a league has better goalkeepers or worse strikers - it might be the case that shot selection is different (I believe England takes a lot more low-percentage long-range shots), or defenses have different strategies, or a host of other factors. They would tell you 'or what are the chances on average of a shot going in at any given shot,' but I'm not sure what that tells you.
Why would you need to adjust for the decrease in offside calls over time, unless there's an indication that this is an officiating change rather than a tactical change?
Yeah, my bad on the typo. Thanks a lot for the input, ur_land. It looks to me that the difference between our numbers (certainly not slight!) is that you just used the 8 season averages; although I suspect I didn't make it clear, I used the 80 or so individual team-seasons. It doesn't make sense to me to use just the 8 seasons, since you eliminate the teams that would show the most distinct effect. That was exactly the theory (Marvin Fischer and beineke's, I believe). More offsides traps means more blown offsides traps means better shooting percentage for the shots that players actually get (probably produces fewer shots in general, also). Let me here apologize for screwing up and making my initial post unclear, and try to post some results from the regression that I did. R squared in this case is a measly .075 - offsides clearly don't have as significant an effect as a lot of other factors do. Adjusted r squared is even lower, .063, although I don't know what that means. Excel is kind enough to give me a whole lot of numbers I don't understand, but I think (significance F = .011881) means I can pull one of these (p<.02). I'm still not gonna mess with adjusting for time unless someone gives me some help.
One reason that I think it might be useful is that the yearly error in the regressed save percentages correlates very strongly to the average yearly difference from the mean. Maybe this is to be expected, I'm sort of lost, but here's the data: mean = .759 Code: 1996 0.757 -0.002 1997 0.756 -0.003 1998 0.726 -0.032 1999 0.766 0.007 2000 0.758 0.000 2001 0.760 0.002 2002 0.771 0.012 2003 0.776 0.017 Total 0.759 0.000 Code: Regression Average 1996 -0.002 0.002 1997 -0.040 0.003 1998 0.292 0.032 1999 -0.043 -0.007 2000 -0.014 0.000 2001 -0.033 -0.002 2002 -0.094 -0.012 2003 -0.124 -0.017 r=.97455
Theres a mental connection that I'm having trouble making here. If saves percentage is going down then that should mean that more goals are being scored yes. It would seem to be the obvious inference. However, it might be interesting to see offside rate as it relates to goals per game or shots per game. It'd seem to me that the offeside trap, while can be an effective defensive tool is usually more of a bail out manuever. How many times do you watch a game and you see a team that is attacking, attacking and attacking and they'll have a couple of big offsides calls go agains them. I think it's possiable that the correlation we're seeing between more offsides equaling less saves is that teams are doing the lion's share of attacking. Problem being that we're dealing with a ratio here. So back at the begining, what's special about this sitation where keepers aren't getting shots off? Essetinally, I'd have no idea and because this damn sport is so weak with independant events it'd be terriably hard to tell. However, I'd be willing to bet that teams who are getting more offsides calls against them are also getting more shots on goals. Problem again, so what's the recomendation to coaches? Have your guys get called offsides more ?
To find the true relationship between offside calls and save percentage. When you control for time, the relationship goes away. So there is SOMETHING out there (changes in officials? changes in tactics?) that is affecting both offside calls and save percentage. I'm not sure what it is, but it is affecting both offside calls and save percentage. And when you don't control for it, you see a spurious relationship between offside cals and save%.
You're right in that using just the 8 seson averages has a few problems. It doesn't eliminate the teams that would show the most distinct effect, as they contribute to that year's average, but using all of the teams seperately gives you much greater statistical power. I just used the 8 years becasue it was easy and because I didn't want to go to the trouble of doing the regression with all of the teams in the correct way. But since you used the 80 or so team seasons......well, I guess I'll have to explain it anyway. I would not place too much stock on the regression results you obtained, because the regression you did was done in a less than optimal manner. When you have two or more individual data points from the same group (same marriage, classroom, league season, person, etc.) you need to make sure that your data isn't corrupted by dependency. One of the assumptions of regression is that all of the errors of you observations are independent. When you have dependency, the erros are not independent--because of some grouping variable, some errors are correlated together. And another example would be looking at the effect of offside calls on save% for 10+ teams over 10 years. The errors are likely to be correlated with each other within a year more than across years, and this could cause your regression to be biased. So there's a couple things that can be done to correct for this: you can do a within subjects regression (http://www.visualstatistics.net/Vis...ssion_within_the_repeated_measures_design.htm), you can do a multilvel model (http://www.ssicentral.com/hlm/hlm.htm), or you can average across all teams for a year. I did the third way because it was easy and quick. The other methods (which actually are much better, as they give you greater statistical power) take a little longer, and I'm supposed to be analyzing my dissertation data, not analyzing MLS stats! If you don't want to tackle this (and it's not something that can easily be done in excel), I'll see if I can con one of my buddies into doing it for us.......
You're missing JG's point. The purpose of the study is to relate tactics (the offside trap) to save %. We know that teams have decreased their use of the offside trap over time, so by adjusting for time, you're adjusting out the signal of interest. I have other issues with some things you've claimed, but this one is probably the biggest.
How do we know that teams have decreased their use of the offside trap over time? We know that offside calls have gone down, but does that mean trap usage has necessarily gone down too? The two are not necessarily related. If they have declined their usage of the trap, then yes, that would be partialed out when controlling for time. What are your other issues?
Take a look at individual teams over time. For instance, Bob Bradley came to New York last season and got rid of the offsides trap. They drew 50 fewer offsides calls in 2003 than 2002, by far the biggest change in the league. Other issues... 1) Even after the adjustments you've made, errors are clearly correlated, due to the fact that we're observing some of the same players and coaches across multiple seasons. 2) Correlated errors are a much bigger issue in significance testing than parameter estimation, so ChrisE's results are still the best we've got. To illustrate, even if within-year observations were perfectly correlated (this is the situation for which your adjustment is correct), then his fitted model would be (essentially) identical to yours. Because your model is different, this suggests that you're discarding valuable information. 3) Given the way you're looking at within-season data (by averaging), I don't see how a reduction in significance implies that "you've seen a spurious relationship."
Afraid I don't understand this chart, Chris. The righthand column is the difference between the yearly average and the global average. What's the lefthand column?
You can only expect me to be so clear when I'm talking about things I don't understand. For the first column, I predicted the save percentage from offsides/game for all 84 teams. I then subtracted that number from the actual save percentage for that team, that year. I then summed them up by year. It's not actually the average, as I believe I said, it's the sum, but it hardly makes any difference.
... so if you had divided the left-hand column by the number of teams, you'd have the yearly average minus the predicted yearly average (in stats, we might call this the average residual for that year). Because the two columns are correlated, something other than offsides is changing from year to year to modify save percentage. What's striking to me is the low save %age in 1998. 1998 was an expansion year, meaning that two new keepers were needed. In addition, Friedel and Zenga had left the league, Dodd was getting old, and shotstoppers like Howard and Cannon (#1 and #2 in adjusted save percentage, 01-03) were not yet playing. My theory is that in 1998, there was a drop in the quality of goalkeeping, and that since then, keeping has improved. That's why you've found this pattern.
A few more notes ... -- I fit a regression using offsides AND team id, but team was clearly not a significant predictor of save %age (min pval 0.11). -- Ignoring correlations between observations, the estimated effect is 1.01%, with a plus-minus (2 std errors) of 0.78%. This implies that even with optimistic assumptions, we're not sure how big the effect really is. Top five seasons: adjusted save percentage 1. 2001 Metros, Tim Howard +7.2% 2. 2000 Wizards, Tony Meola +6.9% 3. 2003 Galaxy, Kevin Hartman +6.7% 4. 1997 Burn, Mark Dodd +6.2% 5. 1999 Galaxy, Kevin Hartman +5.8% Bottom five (worst first) 1. 2001 Burn, Matt Jordan -10.4% 2. 1998 Revs, Ian Feuer -7.0% 3. 1998 SJ, David Kramer/Andy Kirk -7.0% 4. 2000 DC, Mark Simpson/Tom Presthus -6.9% 5. 1996 Wizards, Garth Lagerwey -6.8% 2003 1. Galaxy, Hartman +6.7% 2. Metros, Howard/Walker + 4.4% 3. DC, Rimando +3.7% 4. Fire, Thornton +2.4% 5. SJ, Onstad +1.8% 6. KC, Meola +1.6% 7. Clb, Busch +1.1% 8. Clr, Garlick -1.4% 9. NE, Brown -3.2% 10. Dal, Countess -5.4% Hartman's numbers have varied quite a bit through the years, but here are Rimando's three full seasons ... 2003 +3.7, 2002 +3.7, 2001 +3.6.
So Howard saved about 13.5 goals compared to his expected save percentage...that's almost four wins due to his goalkeeping.
To clarify, JG: The Metros had 42 points in 2001. Replacing Howard with an average keeper, we would've expected them to get only about 30 points? If so, wow ...
I hadn't actually run the numbers before--it turns out that the extra 13 goals would only be a 9 point difference. But the Metros overachieved a bit in 2001 compared to their goal differential...a team with a GD of 38-48 (the metros GD with an "average" keeper) would be expected to get 30 points from a 26-game schedule. Presumably we could get "point values" for every goalie this way.
JG, would you mind expanding on this. I don't understand how you're calculating expected goals and how you're relating that to points? Please forgive my ignorance with some of this stuff.
Exactly why we need a website (). I had the same reaction when I read this a month ago, it's based on a thread JG posted here: https://www.bigsoccer.com/forum/showthread.php?t=80294