View Full Version : Pythag in Soccer, another look
voros
10 Feb 2006, 08:31 PM
I'll make the post quick as I can.
I've been trying to work on a pythagorean translation for soccer and I've come to two conclusions:
1) The standard baseball method doesn't work because the baseball method assumes a team who gets 0 runs has a win% of .000. In soccer you can go goalless and still pick up half a win.
2) The weaknesses in the baseball method (the best fit exponent shifting depending on how many runs per game are scored), also exists in soccer and is exacerbated by the greater variation in scoring between the two sports.
Recognizing these two factors, I've come up with a best fit formula (based on my data sample) of:
A=Goals Scored per game
B=Goals Allowed per game
Win%=(Wins+(0.5*draws))/Games Played
Win%=((0.3235+A)^((A+B)^0.3948))/(((0.3235+A)^((A+B)^0.3948))+((0.3235+B)^((A+B)^0.3948)))
The formula does a couple of good things:
1) A 0 to 0 draw comes out as worth a .5 win%.
2) A 10 to 5 advantage in goals per game generates a much higher win% than a 2 to 1 advantage in goals per game.
3) It fits in with our current best knowledge of the way it works in baseball (that the best fit exponent shifts based on the scoring environment).
4) It allows us to assign win% equivalents on a game by game basis. IE, a 3 to 1 win is worth .83, a 1 to 0 win is worth .80, a 7 to 2 win is worth .94, etc.
Any thoughts?
NoSix
11 Feb 2006, 11:59 PM
Would it be useful to derive a similar formula for points percentage? Just thinking: whether or not a draw should count for 50% of a win is debatable, but a draw is certainly worth 1 point compared to 3 for a win.
Karl K
12 Feb 2006, 09:18 PM
Interesting.
Have you run the appropriate statistical tests on sample sets, and how did the results turn out?
Also, what is the minimum number of games necessary for the formula to start having predictive/explantory value?
voros
13 Feb 2006, 07:00 PM
Would it be useful to derive a similar formula for points percentage? Just thinking: whether or not a draw should count for 50% of a win is debatable, but a draw is certainly worth 1 point compared to 3 for a win.
I think you can probably work backward pretty easily once you have the win% to get standard proportions of wins, draws and losses from that win%, I'm working on a common sense formula to do that, and then develop one empirically and see how they compare.
The r squared for the prediction for the data sample I used (500+ teams) was around .95 and was a little higher than simple goal differential or a pythag type operation with a fixed exponent. The moveable exponent comes not from this, but from studies in baseball which show that the best exponent does in fact shift based on the scoring environment. Standard error of the prodecitions is around .036 of actual win%, so a prediction of .800 would generate an actual win% of .872 to .728 about 95% of the time. I still need to do work on predictive value and I will do it, I just need a little time. This was done because I needed a tool for something that didn't yet exist.
Testing on other data sets need to be done and when you're doing things based simply on "best fit," whatever those numbers are that give you "best fit" invariably shift at least a little as you change data samples. So I'm more interested in folks thoughts on the structure of the formula, rather than on the specific constants being used in it.
The structure was designed to produce non-zero results for zero goals scored (as exists in real life for soccer), to be able to come up with expected win% for individual matches (useful for my team ranking systems I use), and also for the expected win% to always fall between 1 and 0 even at the extremes (I want Australia's win% for their 31-0 whitewash of American Samoa to still be between 1 and 0). The structure seems to do that, my question is whether anyone sees any potential pitfalls within the structure that I may need to address.
NoSix
14 Feb 2006, 01:44 AM
The structure was designed to produce non-zero results for zero goals scored (as exists in real life for soccer), to be able to come up with expected win% for individual matches (useful for my team ranking systems I use), and also for the expected win% to always fall between 1 and 0 even at the extremes (I want Australia's win% for their 31-0 whitewash of American Samoa to still be between 1 and 0). The structure seems to do that, my question is whether anyone sees any potential pitfalls within the structure that I may need to address.
The obvious choice for a benchmark is the tried and true Poisson distribution, which gives a 100% chance of a draw for a 0-0 result (0.500 W%), 63.2%/36.8%/0.0% W-D-L for a 1-0 result (0.816 W%), and 76.7%/13.1%/9.4% for a 3-1 result (0.833 W%).
khucke
15 Feb 2006, 05:52 AM
Voros,
I also tried to find a way to translate goals scored and goals given up into into a measure of success for a team. My approach was to find out the relation of goals with points (3 for a win 1 for a draw). I looked at several years of German Bundesliga play and calculated that 1 goal scored more or given up less as the league average translates to .75 point. This value proved to be a quite nice fit, except for the very goon and the very bad teams. The good team`s point total was always lower than their goeal differtial suggested, while the bad team`s point total was higher. The reason for this is that good and bad teams are more likely to be involved in lopsided games. For further analysis I decided to stunt (is this the right expression?) these scores, because they exaggerate the the real difference of the teams involved, as the losing team gives up on a game or or plays foolishly offensively to catch up a, say 2, goal deficit.
One would expect that in a low-score league like the Italian one the value of .75 would be higher, but, wait, due to lower scoring rate there the numbers of draws are higher there. that leads to fewer points awarded league-wide. Thhere still a lot of things to do and not a enough time.....
P.S.: Are you the Voros McCracken who invented DIPS ?
scaryice
15 Feb 2006, 06:05 AM
P.S.: Are you the Voros McCracken who invented DIPS ?
Yeah, that's him:
http://www.bigsoccer.com/forum/showpost.php?p=1432043&postcount=70