Problem with Poisson Win%

voros · Aug 11, 2006

Re: ******** the Poisson

happyforever said:

The formula that I use to predict the win% is:

chance = ℮^(A*rating+B)/(1+℮^(A*rating+B))

Good Luck!
Click to expand...

What is the 'rating' part of the formula?

I'm not sure I'm following it. I'd like to try it so I can see if I can't find a way around this problem.

happyforever · Aug 11, 2006

Re: ******** the Poisson

voros said:

What is the 'rating' part of the formula?
Click to expand...

Sorry, the rating part is the rating difference between two teams. You may also call it the handicap. The predicted goal difference.

voros · Aug 11, 2006

I must be copying the formula wrong because I'm getting unusual results.

For example for both 1 - 1, and 1 - 0.5 I'm getting a 0.731.

Any help?

tachyon1 · Aug 11, 2006

Interesting idea,HFE.

Aren't you losing information on the likelyhood of a draw though by going straight to win shares,you don't really know how the win shares are broken down.

And how do you deal with different goal environments?

Higher goal environments tend to drive down the win shares of superior teams.

Win shares for a 1.2 goal superior team can be anything from 0.76 to 0.74 even over a relatively small range of total goals using poisson.But will always be 0.768 using the expo method.

Going back to the original "Problem with poisson..." isn't the problem that the poisson requires the use of an average,where the variance more or less equals the mean and that's not being met if you take individual game scores.

T.

voros · Aug 11, 2006

tachyon1 said:

Going back to the original "Problem with poisson..." isn't the problem that the poisson requires the use of an average,where the variance more or less equals the mean and that's not being met if you take individual game scores.
Click to expand...

Well yeah unless you have a team that features the exact same score in every game.

Here's the issue as it stands now:

My basic formula for estimating a win percentage based on goals scored and goals allowed is:

C = Constant 1
K = Constant 2
gf = Goals for per game
ga = Goals allowed per game

win% = ((gf+C)^((gf+ga)^K))/(((gf+C)^((gf+ga)^K))+((ga+C)^((gf+ga)^K)))

And all that changes in the various scenarios are the two constants.

For poisson my estimates for the constants were:

C = 0.2975
K = 0.4143

I then looked at the win% and goal numbers for 16 leagues, and did 3 seasons for each. That's a total of 814 individual team seasons. I then found constants that minimized the square error between predicted and actual win%. Those constants were somewhat different than poisson:

C = 0.1632
K = 0.3256

Now, up until now I couldn't do a minimized error test on individual game scores compared to win%, because quite obviously the correct minimization forumla is 1.0 for any win 0.5, for any draw, and 0 for any loss.

But if I test for minimized squared error on individual games compared to this predicted win%, I can actually get a couple of constants:

C = 0.1431
K = 1.1152

So if I dole out wins in individual games based on these constants, the end results tend to get closer to the aggregate prediction based on the set of constants before it.

Of course now what is happening is that for one goal wins you get anywhere from 0.89 to 0.91 wins. Any two goal win you get from around 0.990 to 0.997 wins and for three or more goal wins you get essentially 1.0 (a 3-0 win gets you 0.999973 wins). This seems a bit counterintuitive, but empirically that's what seems to work.

The problem as I now see it is that both the aggregate estimate and the individual game estimate are still undershooting their marks for the best teams (by enough to bother me).

Another question: could the fact that I'm testing to a squared error be an issue? Could switching to an absolute error fix the problem? A percentage error?

happyforever · Aug 11, 2006

tachyon1 said:

And how do you deal with different goal environments?
Click to expand...

I am not saying that this is not important! It is and I do have opinions on it, though not looking at it and just considering the predicted goal difference of a match (which should not be that far off of the Asian handicaps offered by bookies) should already give an unbiased win%.
First I want voros to work out the appropriate A and B just to see if it is much different from what I have. I am the last person saying that I got it all right, though I do know that the Poisson distribution is incorrect and I also know why. Will leave that till later.

voros,
What do you mean with 1 - 1, and 1 - 0.5?
The larger the handicap the higher the chance, unless you have a negative A, which is not possible.

voros · Aug 11, 2006

happyforever said:

I am not saying that this is not important! It is and I do have opinions on it, though not looking at it and just considering the predicted goal difference of a match (which should not be that far off of the Asian handicaps offered by bookies) should already give an unbiased win%.
First I want voros to work out the appropriate A and B just to see if it is much different from what I have. I am the last person saying that I got it all right, though I do know that the Poisson distribution is incorrect and I also know why. Will leave that till later.
Click to expand...

Oh I see now. The A and B are some sort of constants that would best fit the actual winpercentages. Lemme see...

tachyon1 · Aug 13, 2006

voros said:

Well yeah unless you have a team that features the exact same score in every game.

Here's the issue as it stands now:

My basic formula for estimating a win percentage based on goals scored and goals allowed is:

C = Constant 1
K = Constant 2
gf = Goals for per game
ga = Goals allowed per game

win% = ((gf+C)^((gf+ga)^K))/(((gf+C)^((gf+ga)^K))+((ga+C)^((gf+ga)^K)))

And all that changes in the various scenarios are the two constants.

For poisson my estimates for the constants were:

C = 0.2975
K = 0.4143

I then looked at the win% and goal numbers for 16 leagues, and did 3 seasons for each. That's a total of 814 individual team seasons. I then found constants that minimized the square error between predicted and actual win%. Those constants were somewhat different than poisson:

C = 0.1632
K = 0.3256

Now, up until now I couldn't do a minimized error test on individual game scores compared to win%, because quite obviously the correct minimization forumla is 1.0 for any win 0.5, for any draw, and 0 for any loss.

But if I test for minimized squared error on individual games compared to this predicted win%, I can actually get a couple of constants:

C = 0.1431
K = 1.1152

So if I dole out wins in individual games based on these constants, the end results tend to get closer to the aggregate prediction based on the set of constants before it.

The problem as I now see it is that both the aggregate estimate and the individual game estimate are still undershooting their marks for the best teams (by enough to bother me).
Click to expand...

Maybe you need to find constants using only results of the better teams?There's alot more average sides than truely outstanding ones so any catch all equation is going to perform better for the majority of cases.

Most of the very top teams are going to have a goal superiority in the region of one goal for most of it's matches and that's also going to raise the total goal expectantion for their games to a level that average teams rarely encounter.

Or maybe you could look at wins/draws/losses as opposed to win shares.You'd be able to see then if you're underestimating better teams outright wins perhaps at the expense of overestmating their number of draws.

One novel way to test the accuracy of your model is to derive win/draw/loss percentages for say a seasons worth of games.Use those figures to see how many wins,draws or losses that team should have got and use a chi test to compare how many they actually did get.

T

Serie Zed · Jan 11, 2007

Voros, I'm in fairly far over my head with this, so this might be way wide of the net.

But I'm wondering if the current score affects your goals-for and goals-against ratings. To do anything with that, you'd need a record of the order the goals were scored for each game in your database, but it's possible that, say, Chelsea's goals-against rating goes way way up/down when the score is tied (or they're winning by one, etc).

voros · Jan 22, 2007

Serie Zed said: ↑

Voros, I'm in fairly far over my head with this, so this might be way wide of the net.

But I'm wondering if the current score affects your goals-for and goals-against ratings. To do anything with that, you'd need a record of the order the goals were scored for each game in your database, but it's possible that, say, Chelsea's goals-against rating goes way way up/down when the score is tied (or they're winning by one, etc).
Click to expand...

Yeah, what I've done (and it's far from perfect), is devise a predicted win% from average goals scored and allowed. Then using real life data, I used a solving algorithm (the one provided by excel works well enough), to optimize individual game win% that, when aggregated, come as close to the predicted win% as possible. The end result basically comes down to something like:

Win by one goal = 0.85
Win by two goals = 0.98
Win by three goals = 0.999
Win by four or more = 1

That's not exactly it, but that's the general upshot.

You still don't quite get a big enough range of win%, but given the uncertainty of predicting future performances, that could very well be as much of a feature as it is a bug.

The new ratings based on this have been kept here:

http://numeridicalcio.wordpress.com/

I like the way they have worked out thus far.