View Full Version : Problem with Poisson Win%
voros
17 Jul 2006, 06:03 PM
Have run into an issue with Poisson win% that appears to be driving the differences I get between the expected win% of favorites in my system and how often they actually win. They tend to win more often than the system projects. Why?
I'm 90% sure the biggest problem lies within Poisson. All favorites, by definition, are more likely to win than they are to lose. How much depends on the favorite of course. Still here's the problem: for every win, poisson credits that team with a win amount less than what they actually get, and for every loss they poisson credits the loser with more wins than they actually get. Of course the credit for draws is the same either way.
So _any_ team that is significantly more likely to win than they are to lose are more likely to rack up fewer poisson wins than they would in real life. This goes for whether the team is a top of the table club team, or whether you're simply looking at betting favorites in matches. If you never lose, there's no possible way to have a higher poisson win% than actual win%.
For example, in the 2003/2004 premiership season:
Poisson Rank Pwin% win%
Top 5 teams .634 .679
Teams 6 - 10 .508 .513
Teams 11 - 15 .464 .439
Bottom 5 teams .395 .368
Because I ranked the teams by poisson win% instead of actual, I should avoid the regressive fallacy. And yet it's clear that the top poisson teams win more often than their poisson wins says they should.
Because I can run my system several different ways: by goals, by poisson win% and by strict win % (IE, ignoring goal totals and just using results), I can test how this affects predictions. The strict win% was the only one where the gap between expected and actual win% for favorites is relatively small. For the first two, we see that same .40 to .50 point gap between expected and actual. However with the first two systems, we do get a more accurate picture of the actual _ranking_ of teams, so simply using the 'strict win' system isn't optimal either.
So I guess my problem here is what to make of all of this. You can't run an optimization on the individual scores so that they closely track with eventual win%, because we all now the optimization formula already: a win by any margin = 1, a draw = 0.5, a loss by any margin = 0. So I'm at a loss as to how to treat this.
Ideas?
numerista
17 Jul 2006, 07:47 PM
Because I ranked the teams by poisson win% instead of actual, I should avoid the regressive fallacy. And yet it's clear that the top poisson teams win more often than their poisson wins says they should.
If I'm understanding correctly (pardon the slowness) ...
1. If reality matched the basic Poisson model, the predicted win% would closely match the actual. Is this right?
2. Because the predicted win% doesn't match the actual, the basic Poisson model is clearly a bad fit.
3. Is there a positive correlation between the scores of opponents? How about if you subtract out their respective predicted scores first?
voros
17 Jul 2006, 08:05 PM
If I'm understanding correctly (pardon the slowness) ...
1. If reality matched the basic Poisson model, the predicted win% would closely match the actual. Is this right?
2. Because the predicted win% doesn't match the actual, the basic Poisson model is clearly a bad fit.
3. Is there a positive correlation between the scores of opponents? How about if you subtract out their respective predicted scores first?
I'm not sure exactly what you mean with number 3, so I'll start with the first two.
I guess the main question is: is #1 actually true? Is it perfectly reasonable for favorites in such a system to undervalued: IE, a non-binary model being used to predict what are strictly binary results. (well actually not strictly binary since we have draws, but we already know the correct weight for those).
On the one hand, there's no such thing as a result that's better than a draw but less than a win, and so by shoehorning non-binary considerations into determining binary outcomes, is it inevitable this problem occurs?
On the other hand, (and I think this answers 3), it is abundantly clear that if I stick to the binary 'strict win' system to derive ratings, the goal differentials do correlate quite well with the level of mismatch. A 9-0 win _is_ far more likely to occur in a mismatch than it is between evenly matched teams, so a 9-0 results does indeed tell us different things about the strength of two teams than a 2-0 result would.
My basic problem is where the correction needs to be made in order to get things 'correct.'
Sagy
17 Jul 2006, 10:09 PM
I'm not sure if this is along the right lines. One approach that I thought about (but never tested :() is to give lesser value for each additional goal in in margin of victory. At first cut I was going to use the square root of the actual margin as the effective margin of victory.
voros
17 Jul 2006, 10:24 PM
I'm not sure if this is along the right lines. One approach that I thought about (but never tested :() is to give lesser value for each additional goal in in margin of victory. At first cut I was going to use the square root of the actual margin as the effective margin of victory.
Poisson does that. An 8-0 victory is roughly equivalent to a 31-0 victory, but a 2-0 victory is worth considerably more than a 1-0 victory.
I think the problem I'm having is that any victory regardless of the margin gives you less credit than the credit you get in real life, and I think that's skewing the system to underrate favorites more than it should.
numerista
18 Jul 2006, 09:02 AM
I guess the main question is: is #1 actually true?
A simulation ought to answer this. If you generate a Poisson season, do the percentages match up? (I suspect they should.)
Clarified Question 3: You have two columns of numbers. One is the home team's score minus its predicted score. The other is the corresponding away team's score minus its predicted score. What is the correlation between the two columns?
I'm asking this because it wouldn't be too hard to make predictions using correlated Poisson variables. If it works, it'd be a convenient fix.
tachyon1
18 Jul 2006, 10:56 AM
Not sure where your problem is.
The top five sides in that season averaged 1.6368 goals for and 0.8842 goals against.A bog standard poisson gives you 0.55129% wins(104.75 actual wins) and 0.2462% draws(46.78 actual draws) or 0.6744 win% if you chose to express it this way.
Your own Pwin% equation in an earlier thread gives a Pwin% of 0.6749 using these figures.
They both compare well with the actual win% of 0.679, so i'm not sure where your 0.634 figure's coming from?
Do other seasons show the same effect?
The EPL has a fairly predicable change in goal environment as the season progresses,but applying that would tend to slightly depress the predicted poisson win% of the better teams.
mtr8967
18 Jul 2006, 12:12 PM
Perhaps I'm just dense here, but I don't understand why not giving full credit for a win is a problem. Suppose Chelsea has an 80% chance to win every game it plays. While it's true they'll never get .8 of a game in their win column they should slip up occasionally and allow draws or even lose and it should all average out (eventually).
Going back to the main problem, I think it's likely, as numerista suggested, the scores of the two teams aren't independent. A team leading 2-0 has less incentive to try for another goal than in a 2-1 or 2-2 game. That would produce exactly the behavior you describe: the system sees favorites as weaker than they really are unless you throw out goal differential.
I don't know what you can do about it. Try to catch the size of the effect and...?
voros
18 Jul 2006, 04:50 PM
Not sure where your problem is.
The top five sides in that season averaged 1.6368 goals for and 0.8842 goals against.A bog standard poisson gives you 0.55129% wins(104.75 actual wins) and 0.2462% draws(46.78 actual draws) or 0.6744 win% if you chose to express it this way.
Your own Pwin% equation in an earlier thread gives a Pwin% of 0.6749 using these figures.
They both compare well with the actual win% of 0.679, so i'm not sure where your 0.634 figure's coming from?
Because you've aggregated the goal scoring for the season, taken an average and calculated a pwin% from there. However if you calculate a pwin% for every single match and then aggregate the total pwin% from the individual pwin%s from each match, you get the effect I describe.
As for whether it fits other data sets, the answer is that yes it does. 2004/2005 prem:
Poisson Rank Pwin% win%
Top 5 teams .639 .695
Teams 6 - 10 .508 .511
Teams 11 - 15 .454 .439
Bottom 5 teams .399 .355
voros
18 Jul 2006, 05:18 PM
Perhaps I'm just dense here, but I don't understand why not giving full credit for a win is a problem.
Suppose Chelsea has an 80% chance to win every game it plays. While it's true they'll never get .8 of a game in their win
column they should slip up occasionally and allow draws or even lose and it should all average out (eventually).
But draws don't help. If a series of wins inevitably puts some team ahead of its expected poisson, all a draw does is slightly bring down the average gap per game, the original gap remains. Why? Becauise a draw in poisson is worth exactly the same as it is in real life: 0.5. Losses should correct it some, but if you take the top 3 premiership teams from 2004/2005, you come up with a record of 76 wins, 27 draws and 11 losses.
Take 2003/2004 Chelsea for example, 24 wins 7 draws and 7 losses. In their 24 wins, the average gap between 1 and their poisson win% was 0.161. In their 7 losses the gap between their poisson win% and 0 was greater: 0.269. So while the gap is greater for 7 losses, it sure isn't 3.5 times greater, which it needs to be since Chelsea wins around 3.5 times more often.
mtr8967
18 Jul 2006, 06:27 PM
But draws don't help. If a series of wins inevitably puts some team ahead of its expected poisson, all a draw does is slightly bring down the average gap per game, the original gap remains. Why? Becauise a draw in poisson is worth exactly the same as it is in real life: 0.5. Losses should correct it some, but if you take the top 3 premiership teams from 2004/2005, you come up with a record of 76 wins, 27 draws and 11 losses.
Take 2003/2004 Chelsea for example, 24 wins 7 draws and 7 losses. In their 24 wins, the average gap between 1 and their poisson win% was 0.161. In their 7 losses the gap between their poisson win% and 0 was greater: 0.269. So while the gap is greater for 7 losses, it sure isn't 3.5 times greater, which it needs to be since Chelsea wins around 3.5 times more often.
Are you saying in Chelsea's 7 losses they averaged a 26.9% chance to win? And in their 24 wins they averaged a 83.9% chance to win? I must be misunderstanding something - that's too big a change.
But what about the main suggestion that the team's scores aren't independent?
Edit: Oh, I see what you mean on the win%. Not they were predicted to have a 26.9% chance going in, but after the score was analyzed it produced a 26.9% chance of a win.
voros
18 Jul 2006, 06:47 PM
Going back to the main problem, I think it's likely, as numerista suggested, the scores of the two teams aren't independent. A team leading 2-0 has less incentive to try for another goal than in a 2-1 or 2-2 game. That would produce exactly the behavior you describe: the system sees favorites as weaker than they really are unless you throw out goal differential.
That doesn't _seem_ to be it. Let's take the average goals for and goals allowed for the top 5 prem teams from 2004/2005. That would be 1.64 goals for and 0.88 goals allowed. Poisson tells us to expect a .676 win% given those numbers (my estimate is close at .675). Now let's run a poisson simulation of 5000 matches based on those numbers, with goals scored and allowed completely independent of one another (IE, however many goals are scored, this has no effect on the random variable for goals allowed). Here are the results of the 5000 sims:
Wins = 2760
Draws = 1243
Losses = 997
Goals For = 8205
Goals Against = 4411
GF/g = 1.64
GA/g = 0.88
Win% = .676
As you can see everything turns out exactly as expected, right goals and goalss allowed total and right win percentage. Now for each of the scores of every game, we assign a poisson win% for the game based on the score. So a 1-0 score gets you .81 wins and a 2-1 score gets you .71 etc.. Here are the toals.
Poisson wins = 3150.72
pwin% = .630
We get the same dynamic as we get in the real life numbers. The aggregated individual poisson wins are .040 to .050 lower than the actual win%. So the dynamic persists even in an environment where we know goals for and against are indpendent.
The problem exists because mathematically there's no reason to believe that (and this is might be hard to understand without mathematical notation):
sum (function (x) ) = function (sum (x) )
For some functions this is true, but clearly for poisson win% it is not.
So again the problem becomes how do we assign win% based on the results of an individual game so that when those win% are aggregated, they reflect the actual win% that team should expect to generate? And how to do we do it while still using goal differential?
NoSix
19 Jul 2006, 01:19 AM
So again the problem becomes how do we assign win% based on the results of an individual game so that when those win% are aggregated, they reflect the actual win% that team should expect to generate?
Why is it necessary/desirable to assign win% based on the results of individual matches?
voros
19 Jul 2006, 03:09 AM
Why is it necessary/desirable to assign win% based on the results of individual matches?
Because for some of the ranking systems I do, it can be exorbitantly helpful, particularly when dealing with smaller samples. It provably tells you more about team strength than simple won/loss results do.
And if you don't do it that way problems can occur. You can get away with it in a league setup, but in a setup where teams play vastly different strnegths of schedule, using aggregated totals like goals scored and goals allowed to calculate win% can lead you astray.
tachyon1
19 Jul 2006, 07:34 AM
Because you've aggregated the goal scoring for the season, taken an average and calculated a pwin% from there. However if you calculate a pwin% for every single match and then aggregate the total pwin% from the individual pwin%s from each match, you get the effect I describe.
cheers,Voros,I'm with you now.
I tend to agree with Nosix,the individual games are the problem particularly the ones a team either wins or loses to nil.
Chelsea win many more to nil than they lose to nil.
For the ones they win to nil they're only getting around an 80% credit because the poisson approach allows for the possibility of a 0-0.They lose many fewer to nil so they don't get enough poisson credit back on games we know they actually lost.
Quickly looking at Chelsea's results,they won 15 games to nil,but lost only 1 to nil.They won 7 where both teams scored and lost 6 where both teams scored.
It's the opposite way around for the bottom teams and the effect probably evens itself out in mid table outfits.
I think the crux is that the distribution of certain individual scorelines at the extremes of the table(especially where one team is goalless) are poorly representative of a team's longerm expected goals for and goals against average.
T.
numerista
19 Jul 2006, 09:20 AM
Voros,I'm with you now.
Yeah, me too ...
Follow-up question:
Does your algorithm require "probabilities" to be between 0 and 1? Obviously, it isn't logical to talk about a 5-0 win being 120% of a victory, but it might be a functional improvement.
mtr8967
19 Jul 2006, 02:02 PM
Ok, now I understand the problem and yes, it doesn't look like it's psychological. I must admit I don't have a good solution. Since you know the size of the problem you could just adjust the numbers to fix it, which would be about as ugly as it gets.
voros
19 Jul 2006, 04:40 PM
Yeah, me too ...
Follow-up question:
Does your algorithm require "probabilities" to be between 0 and 1? Obviously, it isn't logical to talk about a 5-0 win being 120% of a victory, but it might be a functional improvement.
Yeah it has to be. It's a feature so that once all the iterating is done, everything lines up perfectly: every team's expected goal or win total matches it's actual goal or win total. Going outside the 0 to 1 nature of each match would foul that up something fierce.
The question I have is:
I can, very easily, use this algorithm and then simply make the adjustments after the ratings are calculated. So if the ratings give a favorite a 68% chance of winning, it's easy to make an adjustment to where it's now 72%.
The problem is, if I have to do that, does that invalidate the ratings, or do the ratings accurately represent something other than strict expected win% and it's just a matter of making the above adjustment after?
Functionally, the system works well. Testing 2005 qualifying results and then the more important (because of inter-confed matches) 2006 World Cup results, it mopped the floor with FIFA's old ratings and also out performed ELO (though the gap between this and ELO is smaller than the gap between ELO and old FIFA). I should note, however, that a system that ignores scorelines and uses strictly wins, draws and losses, also comfortably outperformed old FIFA, and slightly outperformed ELO as well.
But if there's some tightening up to do mathematically, I'd like to do it as that always seems to improve accuracy a bit. I could switch to ignoring scorelines, an approach that might be preferrable if the rankings were officially sanctioned thereby discouraging run-up scores and emphasizing results over everything. But ultimately they give a less accurate view of team _ranking_ so why should I do that?
mtr8967
19 Jul 2006, 05:32 PM
I hate throwing out information, so I'd tweak the numbers rather than toss goal differential. Of course the question is how much tweaking them helps and might you overfit?
voros
19 Jul 2006, 06:20 PM
I hate throwing out information, so I'd tweak the numbers rather than toss goal differential. Of course the question is how much tweaking them helps and might you overfit?
Well I'm not really concerned with "overfitting" too much because I know the standard shape of the adjustments that need to be made:
f(1) = 1
f(0) = 0
f(0.5) = 0.5
f(1-x) = 1-f(x)
I can pretty easily come up with a formula that does all of this, and then fit the details to match thousands of real life samples. If the above are true, and the details are correctly added, the result should be an improvement over doing nothing.
The way the ratings converge, theoretically I could do this before the final ratings are generated, and then converge to the new win%. However so far in practice that breaks the system because the total win% of all teams' results does not equal = 0.5 (and the numbers wind up never converging).