Winning Percentage, Goal Differential, and the Shootout

Discussion in 'Statistics and Analysis' started by ChrisE, Feb 11, 2004.

  1. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa


    Well, there's this idea, in baseball at least, that runs differential is in fact a more accurate evaluation of how good a team is than how many games they win. It's because doing things like winning one-run games presumably has a lot more to do with luck than with skill. Admittedly, strategy may play a more important role in preserving one goal leads in soccer, but I think it's at least something worth looking at.



    I don't have any problem taking out shootout wins (and losses), or more particularly just calling them ties, I think they're pretty much random. Furthermore, since a shootout win was only worth one point (and you've only got a 50% chance of winning it), I can't imagine anybody playing for a shootout; you're not distorting anything about how teams played by extracting shootout wins.



    In the spermatological stats and analysis thread, somebody mentioned (or I believed linked to an article) indicating that scoring is not particularly increased whether a team gets 2 points for a win or 3. So I think there's two ways of doing this: one, the way you described, points won/points available, but also something like (W+1/2T)/(W+L+T) more akin to winning percentage.
     
    henryo repped this.
  2. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    To begin with, these are the shootout percentage (W/[W+L])of the league from its institution in 1996 to its unfortunate demise in 1999:

    Code:
    [size=1]
    SO Record	1996	1997	1998	1999
    				
    Colorado	0.333	0.400	0.500	0.667
    Columbus	0.444	0.429	0.000	0.600
    D.C.	0.250	0.500	0.700	0.625
    Dallas	0.625	0.600	0.667	0.333
    K. City	0.714	0.778	0.333	0.250
    L.Angeles	0.500	0.333	0.500	0.429
    Metros	0.600	0.500	1.000	0.429
    N. Eng	0.750	0.500	0.333	0.417
    San Jose	0.333	0.333	0.375	0.769
    Tampa Bay	0.250	0.600	0.167	0.417
    Miami			1.000	0.556
    Chicago	0.667	0.375		
    [/b][/size]
    Running a correlation between a team's performance one year and their performance the subsequent year gives a very very small correlation of 0.043 (I don't know a lot about statistics but I know that's not very good). Now, the problem may just be that we're using extremely small samples here (average of 3.5 shootouts per team per year), I have no idea.

    Clubs' all-time shootout percentages look like this:

    Code:
    [size=1]
    Chicago	0.455
    Colorado	0.500
    Columbus	0.419
    D.C.	0.567
    Dallas	0.536
    K. City	0.533
    L.Angeles	0.440
    Metros	0.579
    N. Eng	0.500
    San Jose	0.487
    Miami	0.714
    Tampa Bay	0.370
    
    [/b][/size]
    which looks like a pretty random distribution to me.
     
    henryo repped this.
  3. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    So, assuming that shootouts were pretty much random, I counted shootout wins and shootout losses as ties. Then, for comparison's sake, I drew up two ways of measuring how good a team is, points/points available and winning %(where ties are counted as half win and half loss); if anyone wants me to post the w-l-t numbers, just ask.

    Here were those two:

    Code:
    [size=1]
    Win %	1996	1997	1998	1999	2000	2001	2002	2003
    Chicago	------	------	0.609	0.594	0.625	0.685	0.464	0.633
    Colorado	0.375	0.453	0.500	0.578	0.469	0.346	0.536	0.483
    Columbus	0.484	0.484	0.547	0.563	0.422	0.615	0.482	0.467
    D.C.	0.531	0.656	0.688	0.688	0.344	0.346	0.411	0.483
    Dallas	0.500	0.484	0.438	0.641	0.500	0.481	0.554	0.283
    K. City	0.484	0.578	0.406	0.313	0.641	0.463	0.482	0.517
    L.Angeles	0.594	0.531	0.750	0.641	0.563	0.635	0.625	0.450
    Metros	0.453	0.406	0.422	0.234	0.578	0.558	0.429	0.517
    N. Eng	0.406	0.469	0.375	0.406	0.500	0.370	0.464	0.550
    San Jose	0.516	0.422	0.438	0.484	0.344	0.615	0.554	0.617
    Miami	------	------	0.391	0.391	0.453	0.712	------	------
    Tampa Bay	0.656	0.516	0.438	0.469	0.563	0.185	------	------
    [/b][/size]
    and:

    Code:
    [size=1]
    Pts	1996	1997	1998	1999	2000	2001	2002	2003
    Chicago	------	------	0.594	0.552	0.594	0.654	0.440	0.589
    Colorado	0.344	0.427	0.479	0.531	0.448	0.295	0.512	0.444
    Columbus	0.438	0.448	0.521	0.510	0.396	0.577	0.452	0.422
    D.C.	0.510	0.615	0.635	0.646	0.313	0.333	0.381	0.433
    Dallas	0.458	0.458	0.406	0.594	0.479	0.449	0.512	0.256
    K. City	0.448	0.531	0.375	0.271	0.594	0.444	0.429	0.467
    L.Angeles	0.552	0.500	0.729	0.604	0.521	0.603	0.607	0.400
    Metros	0.427	0.385	0.406	0.198	0.563	0.538	0.417	0.467
    N. Eng	0.365	0.427	0.344	0.344	0.469	0.333	0.452	0.500
    San Jose	0.469	0.375	0.396	0.417	0.302	0.577	0.536	0.567
    Miami	------	------	0.365	0.344	0.427	0.679	------	------
    Tampa Bay	0.635	0.490	0.406	0.406	0.542	0.173	------	------
    [/b][/size]
    They are appreciably different, but they have a .993 correlation, so I don't really think it matters which you use.
     
    henryo repped this.
  4. mpruitt

    mpruitt Member

    Feb 11, 2002
    E. Somerville
    Club:
    New England Revolution
    Thanks for the response. For my mind the most accurate way to tally winning perentage would be total points by possiable points. While indicating that a tie is a half of a win makes sense, it's not quite in line with how it's recorded. However, I still am wondering how MLS records winning percentage for official purposes.

    In terms of SOW, I think that I probably just should re-calibrate the SOW into ties and adjust from there. It's not our fault that MLS had a stupid idea, and really the point would be to analyze historical trends or compare teams over a historic basis. Taking away the SOW certainly would be in line with trying to diminish the occurance of luck as a factor in greatness.

    As to running GF/GA as a more accurate means of measuring a teams greatness, thats just fine when comparing anything as a factor related to a teams greatness the more ways the better. It would just tell us something different. Whether goal differential is a better determination of a team's greatness than winning percentage, I think that's getting a bit ahead of ourselves.
     
  5. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    Furthermore, I thought I'd compare them to goal differential. So, I used a formula that beineke had suggested, and that seemed simple enough for me to implement, of GF/(GF+GA). This is what you get:

    Code:
    [size=1]
    beineke	1996	1997	1998	1999	2000	2001	2002	2003
    Chicago	------	------	0.579	0.586	0.568	0.625	0.531	0.552
    Colorado	0.427	0.459	0.473	0.494	0.422	0.434	0.473	0.471
    Columbus	0.496	0.506	0.545	0.552	0.453	0.576	0.506	0.500
    D.C.	0.525	0.569	0.607	0.602	0.411	0.457	0.437	0.514
    Dallas	0.510	0.529	0.422	0.607	0.500	0.505	0.506	0.354
    K. City	0.492	0.528	0.474	0.384	0.618	0.384	0.451	0.522
    L.Angeles	0.546	0.556	0.659	0.628	0.560	0.591	0.571	0.500
    Metros	0.489	0.448	0.462	0.333	0.533	0.521	0.466	0.500
    N. Eng	0.434	0.430	0.445	0.418	0.490	0.402	0.500	0.539
    San Jose	0.500	0.482	0.444	0.495	0.412	0.618	0.563	0.563
    Miami	------	------	0.404	0.416	0.491	0.613	------	------
    Tampa Bay	0.564	0.478	0.447	0.505	0.554	0.320	------	------
    [/b][/size]
    (correlation to the other two record-based numbers was approximately .92)

    So then, in a rather dubious attempt to see which is a more accurate measure of how good a team is, I decided to see how good a club's individual season's percentage was at predicting performance for the club in the next year. Maybe there's a better way, but I've recently been introduced to the correlation function in Excel, so that's what I used. You get:

    Code:
    [size=1]
    Comparison	%-%	Pts-%	b.-%		Pts-Pts	b.-Pts		b.-b.
    Correlation	0.181	0.179	0.230		0.191	0.252		0.242
    [/b][/size]
    So, in predicting the winning percentage, pt's/p.pt's, or GF/GA ratio of a club from season to season, beineke's GF/GA number was a good deal better (though none was particularly good). I don't know if this is significant or what, but I think it makes a pretty good case that GF/GA is at least as good a predictor as any kind of win%.
     
    henryo repped this.
  6. beineke

    beineke New Member

    Sep 13, 2000
    Correlation is a very good way to look at these things ... this is a nice finding.

    A success story of goal percentage is Chicago this past season. Even though they had a poor win percentage (.464) in 2002, their goal percentage (.531) was third best in the league. That was a solid predictor of their success in 2003.

    Bodes well for DC this coming year, not so well for Colorado.
     
  7. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    As a postscript:

    I admittedly don't know a whole lot about correlations, so I was thinking, maybe regression to the mean, the tendency of exceptionally good teams to be worse the next year and exceptionally poor teams to be better, was diluting the strength of the year-year+1 correlations.

    So, I divided winning percentages into two categories: greater than or equal to .500, and less than .500. I then did the same calculation (read: typing) as I had previously, comparing the winning percentages to those of the subsequent year. I got what, in my mind, was a truly surprising result.

    When winning percentages were above .500, the correlation to winning percentages in the subsequent year was pretty close to what I originally calculated, .26 (vs. an original .18). However, for teams below .500, things were very different. Instead of simply getting a slightly stronger result, I got the opposite: a correlation of -.05.

    Results were even stronger using goal differential rather than using win %:
    >.500: .217
    <.500: -.268

    I had a pretty good guess what the deal was with winning percentages: while really good teams tend to be really good in subsequent years, and average teams tend to be pretty much average, the teams that end up at the bottom are propped up with allocations and high draft picks etc. in the name of parity. I've got no answers, however, as to why the worst goal differential teams would tend to be better than the slightly below average ones...

    (Of course, the numbers probably mean less than they appear to. Since there have only been 88 team-seasons played in MLS history, we're looking at a really small sample. One significantly different result, e.g., the 2000 Metros sucking about as bad as did the 1999 Metros, would make things look pretty significantly different. I should probably be checking p values or something...:))
     
    henryo repped this.
  8. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    Some more semi-related stuff:

    Teams who score fewer goals should, one assumes, expect to have more ties. Testing this out using my now ubiquitous correlation function, we find exactly that: we get a correlation of -.379 between total goals scored/game and ties/game.

    Likewise, teams who have more ties should expect to have winning percentages closer to .500. So I divided the the total goals/game numbers into a top half and a bottom half. Unsurprisingly, their average winning percentages are both very close to .500, .497 for fewer goals, .503 for more goals. However, their records' standard deviations are markedly different: .0924 for the fewer goals group, .1239 (34% greater) for more goals.

    I think that any formula that attempts to predict team performance ought to take this into account.


    (further sidenote: It occured to me that there also should be something of a positive correlation between goals scored for per game and goals scored against. Teams who play an aggressive game should not only score more, but should also be scored against more. Likewise, conservative teams should not only concede fewer goals, but also score fewer. Surprisingly, I got a weak negative correlation (-.116) between goals for and goals against. Perhaps team differences in ability are strong enough to drown this out, I really don't know.)
     
    henryo repped this.
  9. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    So most of my previous post was wrong.

    I tested the standard deviation win% of high goals/low goals teams because I figured it would be a slam dunk with ties: teams with more ties would have significantly lower standard deviations than teams with more ties. Not the case, apparently:

    Code:
    [size=1]
    	Win %	St. Dev
    More Ties	0.511	0.111
    Fewer Ties	0.489	0.106
    [/b][/size]
    The goals scored effect was created (I think) because teams that are really good tend to score more goals than average teams (so are in the top half) and teams that are really bad tend to be scored on more than average (and so are also in the top half). If you eliminate the teams above .600 and below .400, and divide them into high goal and low goal groups you get almost exactly the same win % variance.

    The negative correlation for goals scored disappears when you get rid of the really good and really bad teams, too. For the same .400-.600 range (which consists of 57/88 teams), you get a correlation between goals scored and goals against of .379.
     
    henryo repped this.
  10. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    Interested in the >.500/<.500 effect, I divided historical MLS winning percentages into quartiles.

    This is what comes out:

    Code:
    [size=1]
    	yr. 1	yr. 2	Correlation
    1st	0.375	0.461	-0.437
    2nd	0.463	0.517	-0.073
    3rd	0.530	0.473	-0.008
    4th	0.638	0.549	0.168
    
    [/b][/size]
    Of interest here is that we see some pretty significant recovery by the bottom quartile, and, most interestingly, the worse the team is, the worst teams seem to perform better than the best of the worst. Teams in the middle quartiles seem to drift around aimlessly, with slightly below .500's tending to improve and the above .500's tending towards decline. The fourth quartile was the only group where one year's positive performance was predictive of positive performance in the next year.
     
    henryo repped this.
  11. beineke

    beineke New Member

    Sep 13, 2000
    I think the effect here is that teams that miss the playoff get special treatment.
     
  12. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    Yeah, I agree. I still think it's surprising.

    edit: actually, there were 19 teams in the bottom quartile. Two teams missed the playoffs when there were 10 teams, four when there were 12 teams. I'd imagine that most of these teams, not just the terrible ones, missed the playoffs.
     
  13. beineke

    beineke New Member

    Sep 13, 2000
    Below, the records for each year are for consistency in a pairwise comparison from one year to the next, with ties omitted (with attention restricted to the teams that missed the playoffs). The "consistency" record is 13-6-5, quite good. [I used Win% for these numbers and didn't check any other metrics of strength.]

    96 -- NE and Color. missed playoffs. NE was better in both 96 and 97 (Record: 1-0)
    97 -- Met and SJ missed playoffs. NY was better in both 97 and 98 (Record: 1-0)
    98 -- SJ/TB tie, KC, NE -> SJ, TB, NE, KC (Record: 4-1-1)
    99 -- SJ, NE, KC, Met -> KC, Met, NE, SJ (Record: 1-4-1)
    00 -- Mia, Colum, SJ/DC tie -> Mia, Colum/SJ tie, DC (Record: 4-0-2)
    01 -- NE, DC/Color tie -> Color, NE, DC (Record 1-1-1)
    02 -- Met, DC -> Met, DC (Record: 1-0)
     
  14. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    Thanks, beineke, clearly this is a more sophisticated way of looking at things than simply using a correlation (which was distorted because the two worst teams in MLS history that played the next year had pretty huge turnarounds).

    Can you explain to me what qualifies as a tie in these standings?
     
  15. beineke

    beineke New Member

    Sep 13, 2000
    "Tie" means equal in the win% numbers that you posted earlier in this thread.

    (I went through them quickly -- did you notice any mistakes?)
     
  16. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa


    So, because win % and points/possible measure pretty much the same thing, I just decided to use win % and the pythagorean thing. I used an exponent of 1.5, which minimizes average error. I think it's pretty common to alter the formula like this - I believe the size of the exponent is proportional to the number of points scored in a game.

    So I compared across season effects.

    Code:
    [size=1]
    x (yrs)	%-%(+x)	b-b(+x)	b-%(+x)
    1	0.181	0.242	0.230
    2	0.104	0.244	0.169
    3	0.150	0.232	0.150
    4	-0.117	0.059	-0.029
    5	0.028	0.239	0.163
    [/b][/size]
    What we see, unsurprisingly, is that the pythagorean prediction outperforms the other two in every single season. Equally impressive, goal differential predicts future winning percentage better than or equal to winning percentage every single year.

    A little bit surprising was the strength of the correlations. I'd imagine that small sample sizes are obscuring the decrease in the strength of the goal differential correlation (especially in the 4 and 5 year examples).

    However, it's clear that performance in year x-1 isn't a whole lot more indicative of performance in year x than performance in x-2 or x-3.
     
    henryo repped this.
  17. NoSix

    NoSix Member+

    Feb 18, 2002
    Phoenix


    I can't help but be struck by the reversion to the mean behavior evident in these numbers. Any economists out there ready to propose that the "market" for MLS results is an efficient one? (i.e., we might as well be flipping coins as deciding the results on the pitch?!)
     
  18. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    I don't know exactly what you mean by this (specifically, "the "market" for MLS results is an efficient one? (i.e., we might as well be flipping coins as deciding the results on the pitch?!)"), but that's what I had in mind when I posted these numbers.

    Most interesting was that the slightly above .500 teams tend to be below .500, while slightly below tend to be above - a stronger effect than reversion to the mean would predict (I think). It implies to me that this isn't simply reversion to the mean, but an actual concerted pressure exerted by the league to make good teams worse and bad teams good (not allocations, presumably, since they don't go to the middle teams; perhaps the salary cap, and a kind of laissez-faire attitude towards slightly-above average teams, which don't compensate enough for a steadily improving league.)
     
  19. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa


    So, seeing as year x-2 and year x-3 are about as good a predictors of year x performance as year x-1, I was curious whether combining them would provide better prediction (using goal differential). I couldn't think of any really clever ways of combining the numbers (calling beineke), so I just multiplied them together. Years (x-1)(x-2) have a correlation of .283 to year x. Years (x-1)(x-2)(x-3) have a correlation of .288. Both of those are, obviously, better than anything I'd gotten using just one year. (I tried weighting different years differently, and there seemed to be a very slight increase when more distant years were weighted more heavily; i have no reason to think this isn't simply chance though)
     
  20. NoSix

    NoSix Member+

    Feb 18, 2002
    Phoenix
    The market I'm refering to is the stock market:

    A Random Walk Down Wall Street

    You could also probably find an earlier version of this book at your local library.
     
  21. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    Those quotes on market were yours. I'm vaguely familiar with the idea that the stock market is highly efficient (and certainly familiar with the term 'market'; however, I don't see what that has to do with MLS.

    What I don't understand is why 'we might as well be flipping coins as deciding the results on the pitch.' I mean, it's not like results for individual games, or for individual seasons even, are wholly unrelated to previous year's results.

    ?
     
  22. NoSix

    NoSix Member+

    Feb 18, 2002
    Phoenix
    If the stock market were completely efficient, then it can be shown that changes in stock prices would follow a random walk (hence the title of the book), and monkies flipping coins would be as successful in picking stocks as seasoned analysts. In reality, even though markets are not completely efficient, study after study has shown the performance of seasoned analysts to be no better than chance in the long run - and to demonstrate a strong tendency to revert to the mean.

    Again, I'm making a joke and you are taking me seriously. I really need to remember to add one of these guys: ;)
     
  23. henryo

    henryo Member+

    Jun 26, 2007
    Time for an update in 2014? (10 years on!!) ;)
     
  24. nnelli

    nnelli Member

    May 10, 2014
    Club:
    FC Barcelona
    EvanJ repped this.
  25. nnelli

    nnelli Member

    May 10, 2014
    Club:
    FC Barcelona
    JamesBH11 repped this.

Share This Page