PDA

View Full Version : Question about use of Poisson distribution


Pages : 1 2 [3] 4 5 6

voros
24 Jan 2004, 03:03 AM
Originally posted by microbrew
Don't you mean 2.7182818284% of them?
Which is scarier?

a) That I know what that number is? (It is the red-headed step-child of 'Pi.' It is 'e'.)

b) That I know what the 'e' stands for? (It's from it's creator's name 'Euler', it is Euler's Constant).

c) That I know what it's basic most important use is? (e raised to the x is a function where when you differentiate it, it remains the same function).

d) That I know how to calculate it? (It's 1/0! + 1/1! + 1/2! + 1/3! + ... + 1/infinity!)

2.718281828) All of the above?

God I hate Mathematics for putting all of this geeky stuff into my brain. :)

NoSix
04 Mar 2004, 11:47 PM
I will be posting weekly power rankings of MLS teams in the MLS forum:

2004 MLS Power Rankings (http://www.bigsoccer.com/forum/showthread.php?s=&threadid=98858)

ChrisE asked about the impact of strength of schedule adjustments on the rankings, so I calculated them with and without strength of schedule adjustments (SOSA), and the differences in points percentage are significant:


Rank Team w/SOSA wo/SOSA DIF
1 NE 0.593 0.481 0.112
2 SJ 0.593 0.333 0.260
3 CHI 0.537 0.481 0.056
4 KC 0.463 0.426 0.037
5 CLB 0.444 0.370 0.074
6 DC 0.426 0.352 0.074
7 LA 0.407 0.370 0.037
8 COL 0.389 0.426 -0.037
9 MET 0.370 0.333 0.037
10 DAL 0.074 0.204 -0.130


The dramatic differences may tell you more about the prediction method than the strengths of the teams - without the (multiplicative) strength of schedule adjustments, the estimators for each match become simple (additive) weighted averages of HGF and AGA, and HGA and AGF. As noted by Voros in an earlier post, this averaging tends to obscure the differences between teams, leading (in the most extreme case) to a prediction of a 1-1 draw for every single San Jose match!

ChrisE
07 Mar 2004, 08:35 PM
Originally posted by NoSix

The dramatic differences may tell you more about the prediction method than the strengths of the teams - without the (multiplicative) strength of schedule adjustments, the estimators for each match become simple (additive) weighted averages of HGF and AGA, and HGA and AGF. As noted by Voros in an earlier post, this averaging tends to obscure the differences between teams, leading (in the most extreme case) to a prediction of a 1-1 draw for every single San Jose match!

Thanks for doing the strength of schedule adjustments, NoSix.

An initial question. I'm still don't understand what exactly you're doing (despite reading through your outline; it'll take time), but it seems that you're using the Poisson formula to predict the outcomes, but then simply picking the most likely. Is this correct? Why would you want to use a probabilistic prediction method and then discard all the possible results except the most likely one?

(I feel really dull right now.)

NoSix
07 Mar 2004, 09:49 PM
Originally posted by ChrisE
Thanks for doing the strength of schedule adjustments, NoSix.

You're welcome.


An initial question. I'm still don't understand what exactly you're doing (despite reading through your outline; it'll take time), but it seems that you're using the Poisson formula to predict the outcomes, but then simply picking the most likely. Is this correct? Why would you want to use a probabilistic prediction method and then discard all the possible results except the most likely one?

Because each match can only have one outcome, and I wanted to calculate both GD and point PCT.

This is a good question, though, because sometimes the highest probability exact score is the lowest probability result (e.g, a predicted 1-1 draw). One alternative would be to use the highest probability result (W, L, or D) and only calculate point PCT.

Finally, the alternative you seem to be hinting at with your question would be to calculate the probability of all possible outcomes for each team (a total of 10*3^9=196,830 outcomes) and choose the highest probability finish for each team as their ranking, or something along those lines. This is doable if you have programmed the algorithm, but is not really practical for me using Excel. Note for a league like the EPL with 20 teams, that would be 20*3^19=2.32*10^10 outcomes!

Or maybe I'm misinterpreting your question, and you had something else in mind?

tachyon1
23 Apr 2004, 07:37 AM
If anyone wants to compare their average goal expectancies with those complied by the professionals they should check the trades posted by the British spread firms.

The nature of their markets compels them to reveal how mant goals they expect teams to score against each other.All major leagues are priced up.

An example from last week,they quoted Chelsea as 1.3 goals superior to Middlesbrough.They quoted total goals for the game as 2.8.
From those figures it's a few easy steps to see that they expected Chelsea to average 2.05 goals & M'bro 0.75 goals.They were miles out as it turned out.

If the figures you'd have posted are around these quotes then your method is probably sound.

Aside from a slight dependancy in low scoring games the power of the poisson is virtually dependent on the accuracy of your average.

A decent method is to treat your goals scored/conceded as a weighted time series and apply an exponential smoothing constant to the figures.Armed with a decent average as you point out you can predict everything from correct scores,match outcomes,time of first goal for the match or either team,most likely team to score first,etc etc.

One novel use that may interest is to use the poisson to predict the probability of result at any point in the game.Say for example you lead 1-0 after 50 mins.
Provided you are still happy that you original assessment of a team's goal expectancy is still valid you can calculate the goal expectancy for the remaineder of the game from Ax(B^C),where A is the original expectancy,B is the proportion of the game remaining & C is a constant,usually around 0.84.

Plug this expectancy into a poisson & remember to add on the actual score at the time,you can come up with "in running" probabilities for either A hanging on,B making a comeback or the game ending level.

T1.

AussieVamp2
17 May 2004, 01:46 AM
--
One novel use that may interest is to use the poisson to predict the probability of result at any point in the game.Say for example you lead 1-0 after 50 mins.
Provided you are still happy that you original assessment of a team's goal expectancy is still valid you can calculate the goal expectancy for the remaineder of the game from Ax(B^C),where A is the original expectancy,B is the proportion of the game remaining & C is a constant,usually around 0.84.

Plug this expectancy into a poisson & remember to add on the actual score at the time,you can come up with "in running" probabilities for either A hanging on,B making a comeback or the game ending level.
--

that is interesting, would it be possible to give an example?

tachyon1
20 May 2004, 04:57 AM
Hi AV,
I've stuck up an example from a recent Celtic/Rangers game in another thread,but here's the basic maths if anyone wants to try it themselves.

Assuming goals are independent & ignoring injury time.

Lets take a typical spread bookies quote.

Say the original assessment is that team A is 0.4 of a goal superior to team B & the total goals for the match is pitched at around 2.6 goals.

To get the goal expectancies for both teams you have to solve a simultaneous equation namely;

A+B=2.6 &
A-B=0.4

Add these two equations together(the b's disappear),divide by 2 & you get A's goal expectancy as 1.5.B's therefore is 1.1.

Stick these average goal expectancies into a Poisson & you get A's chances of winning at around 46%,B's 28% and 26% for the draw.

Let's say 30 minutes in A score so it's now 1-0 to team A.

There's 60 miutes left or 0.66% of the game.
0.66^0.84=0.711,so both teams have 0.711 of their original goal expectancy left,which comes to 1.07 of a goal for team A & 0.78 for B.

Now you stick these figures into a Poisson.

A will win the match as a whole if they "win" the remainder of the match or draw it(remember they now hold a one goal lead).

The chances of them winning this mini match from the new Poisson is 42%,the chances of them drawing it is 32%.

Therefore the chances of them going on to win from the 1-0 position they enjoyed after 30 mins is 42+32=74%

Next the game as a whole will end as a draw if team B "win" the mini game by exactly one goal.So significantly we're talking about a 1-0(12.3%),2-1(5.1%),3-2(0.7%) or 4-3(0.05%) any other scoreline whilst possible is very,very unlikely.
These total 18%.

So the chance the game ends level after team A takes a 30th minute lead is 18%.

B only win the game by winning the mini game by two goals or more.
You can add up all these probabilities,but as all proabilities must total 100%,simply take the previous two probabilites away from 100.

The chances of team B coming back to win is therefore 8%.

T.

numerista
20 May 2004, 11:27 AM
Now you stick these figures into a Poisson.


Perhaps you won't want to give this info away, but you also mentioned that the betting houses don't simply use Poisson modeling to produce their halftime lines. Do you have a sense for what they actually do?

Thx.

mpruitt
20 May 2004, 12:20 PM
Perhaps you won't want to give this info away, but you also mentioned that the betting houses don't simply use Poisson modeling to produce their halftime lines. Do you have a sense for what they actually do?

Thx.
Also tachyon, I've noticed some software around the web that gives you the ability to control and manipulate some basic stats from teams in different leagues around the world. Some of them provide weekly updates so that you can do all the number crunching of an informed better in an up to date and analytical way. I was wondering if you've ever used any of these types of software?

NoSix
21 May 2004, 10:36 PM
A decent method is to treat your goals scored/conceded as a weighted time series and apply an exponential smoothing constant to the figures.

That sounds interesting. Any particular reason for the choice of an exponential?

tachyon1
22 May 2004, 07:11 AM
Hi guys,thanks for the feedback

Firstly the way books set prices.

It's dependent upon a wide range of factors,but the most obvious ones are expected liability,actual liability,type of bets that you expect to predominate,usually accumulators(parlays) or singles & of course true probability of the outcome you're betting on.

The liability factors are driven by punter preference & at the moment anyway soccer betting in the UK is predominantly multiple betting centred on home teams & better teams.

Odds therefore on home favs tend to be cramped although not as much as you would anticipate because stringing such bets together in parlays multiplies up the bookies already in built advantage or vig.

You can test this theory that away dogs are more likely to be value by looking at the returns from handicap betting.Here game odds are "levelled up" by giving the outsider or dog a start ranging from half a goal upto two or more for huge mismatches.

At straight up prices home favs are shorter than they should be so away dogs are longer.The handicap prices posted have to bear some relationship to the straight up odds but again the away dogs are underbet,therefore away dogs on the handicap get a double dose of underbetting & are priced accordingly.

Any half decent rating system should pick out enough value dogs to turn a profit.A couple I tried last season gave return on investment of upto double figures with reasonable numbers of bets.

The half time lines are going to see much more singles only betting so they should be easier for books to manipulate the prices to get a balanced book.

The main considerations are going to be any accumulated liabilities and what price they need to post on a team to get the desired amount of action.

Punters will be put off backing the team leading because they will probably be long odds on & "money buying" is considered dumb even though in many cases it won't be.Thefore the aim will be to price the draw or the losing team at a largish price that appears generous...but actually isn't.

Most people are very poor at subjectively putting a figure on the chances of something happening.Tell someone that an event is "unlikely",& then ask for their estimation of the probability of that event & you're likely to get anything from 50% downwards.

One thing that seems to be largely neglected in setting half time lines is a team's apparent scoring profile.Even if teams buck the usual 45%/55% scoring split by a large margin this doesn't seem to filter through to the posted odds.Probably such scoring trends are transient.

In short punters are unfamilar with methods of pricing half time lines,there is a time constraint & so if you can deduce which outcome the bookie wants to take money on & he aggressively markets it then you can find value seeping out on the other side of the line.

M,I haven't tried any of the software,but I have collaborated in writing a few:-).The guy who codes them runs them thru a database of historical odds to see if they prove any good & occassionally sticks the results in a betting column he writes for a weekly paper overhere.

The basis for most of the software uses goals scored/conceded,who they were scored/concede against & how recent the games were & not alot else.Occassionally I'll use games won/lost or drawn records but alot of the other data like time of possession/shots on target/strike rates etc whilst interesting only muddy the waters & give you a worse fit overall.

Even splitting stats into home and away records,which you would imagine would vastely improve the fit of a model does exactly the opposite.You have to deal with these factors in a more general way.

These "secondary" stats give imo at best weak indicators of a team's goal expectancy.Take time of possession.Do good teams have the ball longer? Some do,but then some cede possession and soak up pressure when they get ahead.Others defend by keeping possession.

Out of the 32 Championsleague teams this year the 25th team out of 32 for TOP was Monaco.They've had the ball for around 48% of their matches.So they are very poor on this stat & the bookies agreed,they were rank outsiders(upto 40/1)at every knockout round.

Based on goals alone however they were the top ranked team at every knockout stage....& they're now in the final.

Software based on hard stats doesn't allow for inflated reputations.Monaco were outsiders because they were French(perceived as third rate club sides) & not likely to be popular with punters regardless of their actual results on the field.

N6,the choice of an exponential smoothing constant comes by getting the best fit for historical results.

More later,it's almost time to see if Tim Howard can keep out the might of Millwall who have a 1.5 of a goal start at 0.96/1.

T.

NoSix
22 May 2004, 01:05 PM
N6,the choice of an exponential smoothing constant comes by getting the best fit for historical results.


Can you provide a reference, or is this your own work?

numerista
22 May 2004, 01:33 PM
It's dependent upon a wide range of factors,but the most obvious ones are expected liability,actual liability,type of bets that you expect to predominate,usually accumulators(parlays) or singles & of course true probability of the outcome you're betting on.

Any half decent rating system should pick out enough value dogs to turn a profit.

Thanks for all the info ... I think I pretty much understand what you're saying, but I'll give it a second pass later when I'm a bit more awake. The reason I asked is that I'm curious as to what extent it's necessary to produce a "half decent rating system." I'm thinking

(1) The betting houses have excellent rating systems.
(2) It's possible to predict how far quoted odds will deviate from the betting house's estimate for the true probability.

In that case, cooking up your own rating system would only add noise to the system.

tachyon1
23 May 2004, 06:36 AM
Hi N6

all the stuff I use is my own & it's either been given a shake down by me or for me.The problem with using third party stuff is that it may contain adjustments that (quite rightly) aren't revealed.So if you start messing around with the rules yourself you're just as likely to be doubly accounting for some factor.

Most of the books over here on rating sides thru goals scored/conceded begin & end with "add the goals scored by A in their last five games to the goals conceded by their opponents today in the last five games & divide by the number of games played"...& this is plainly wrong.

Two of the most important factors in determining a reasonable value for a team's goal expectancy are;

the number of games(the more the better)
& how recent is the data(more recent is better)

So a expo smoothed time series is the obvious choice.

N,agree soccer modelling is pretty noisy,mainly as a result of the low number of goals scored.

However I do prefer to have a repeatable procedure to estimate game odds.

The betting environment in the UK perhaps isn't as spot on as you might think especially when firms take on new types of bets or delve into the nether regions of world football.

One of the top three bookies made an absolute pigs ear of setting their asian handicap odds on a recent Euro Championship...they almost lost the company.
Whether by accident or design they didn't know how to price a game up where one team gets a 0.75 of a goal start or a whole goal start.

Another firm regularly prices up Australian State games(park standard soccer at best)on the same basis they use to price the EPL.Even though the Oz games have getting on for twice as many goals & virtually no home advantage.

Even better is the advent of betting exchanges,where anyone can log onto a site & lay a price on anything they want.

If established bookies don't know how to price asian odds(& it's not that difficult)then most members of the public definitely don't.

Last season using an ELO Chess rating based soccer rating system & backing any away dog on the asian handicap when the price on a public betting exchange exceed your true estimate by 5% yielded 150+ bets & 12% return on level stakes.

In short some people carn't price soccer properly & a rating system can alert you to when this might have happened.

Don't really want to hammer the betting side because I'm really much more interested in the many different ways people find to evaluate a teams true worth.The two however inevitably overlap.

T.

numerista
23 May 2004, 09:57 AM
Don't really want to hammer the betting side because I'm really much more interested in the many different ways people find to evaluate a teams true worth.The two however inevitably overlap.

Thx Tach ... I feel like we should be paying tuition. :)

NoSix
24 May 2004, 01:00 AM
Two of the most important factors in determining a reasonable value for a team's goal expectancy are;

the number of games(the more the better)
& how recent is the data(more recent is better)

So a expo smoothed time series is the obvious choice.

Is it? Why is it obvious that an exponentially decaying time series is better than, say, a linearly decaying one?

tachyon1
24 May 2004, 02:21 PM
Hi Nosix,
maybe "obvious" was a poor choice of words.

I should have said "having tried out a variety of ways to give different weightings to more recent results,used those figures to generated predicted odds for w/l/d outcomes & then compared those predictions to the actual outcomes,the best solution I've come up with so far is to use an exponentially smoothed type of moving average" :-).

On a slightly different tack,has anyone given a least squares approach a go.It seems to deal admirably with an unbalanced schedule & is no respecter of reputations.

T.

numerista
24 May 2004, 02:54 PM
On a slightly different tack,has anyone given a least squares approach a go.

Least squares estimation certainly makes sense ... IIRC, a preferred approach would be to use the square root of goals as the response -- this helps adjust for the fact that games with high predicted scores also have high variance.

An easy wrinkle on this idea is weighted least squares, with exponentially decreasing weights, depending on how long ago a game was played.

NoSix
24 May 2004, 04:21 PM
I should have said "having tried out a variety of ways to give different weightings to more recent results,used those figures to generated predicted odds for w/l/d outcomes & then compared those predictions to the actual outcomes,the best solution I've come up with so far is to use an exponentially smoothed type of moving average" :-).

tachyon1,

If you choose some arbitrary level of significance, say 1%, over what time period do you find that previous results become insignificant? Is this time period similar for different leagues? Do you have any data that you could share here as an example?

NoSix

ZeekLTK
25 May 2004, 02:51 PM
Hey what is this "Poisson" thing everyone keeps mentioning? Is it some kind of program I can download to use for predictions, or is it a mathmatical equation, or what?

Anyways, I did something similar what you guys are trying to do (at least on the first couple posts, since I didn't read the whole thread) with CONCACAF World Cup Qualifying by using GF and GA for each home game and away game. For example with the United States vs Grenada I got that the USA would score 7.86 goals in their home game and Grenada would score 0.23 in their away game, so therefore I predicted the score of the game in Columbus will be 8-0 for the USA. On the return leg I got that the USA will score 4.39 goals, and Grenada will score 0.63 goals, so my prediction for that is 4-1 for the USA.

To get the numbers used to determine scores, I look at how one team did against teams of similar skill as their opponents in previous World Cup Qualifying. For example in the Nicaragua vs St. Vincent / Grenadines pick I looked at St. Vincent's scores against teams like Antigua & Barbuda, Surinam, Bermuda, and Grenada, and Nicaragua's scores against teams like Panama, St. Lucia, St. Kitts & Nevis, and Barbados... all from previous World Cup Qualifying matches and figured that St. Vincent / Grenadines would win 2-0 in their home match, and then tie 0-0 at Nicaragua, to win the aggregate.

Here's what I got for all my predictions:

Round 2
Group 1: United States defeats Grenada 12-1 (HW 8-0, AW 4-1)
Group 2: El Salvador defeats Bermuda 4-2 (HW 4-1, AL 0-1)
Group 3: Jamaica defeats Haiti 3-1 (HW 2-0, AT 1-1)
Group 4: Panama defeats St. Lucia 4-0 (HW 4-0, AT 0-0)
Group 5: Costa Rica defeats Cuba 6-1 (HW 4-1, AW 2-0)
Group 6: Guatemala defeats Surinam 5-1 (HW 4-1, AW 1-0)
Group 7: Honduras defeats Netherland Antillies 8-2 (HW 5-1, AW 3-1)
Group 8: Canada defeats Belize 9-1 (HW 5-0, HW 4-1)
Group 9: Mexico defeats Dominica 12-0 (HW 8-0, AW 4-0)
Group 10: St. Kitts & Nevis defeats Barbados 5-2 (HW 4-0, AL 1-2)
Group 11: Trinidad & Tobago defeats Dominican Republic 8-0 (HW 6-0, AW 2-0)
Group 12: St. Vincent / Grenadines defeats Nicaragua 2-0 (HW 2-0, AT 0-0)


Semifinals
Group A:
1. United States - 14
2. Jamaica - 11
3. El Salvador - 6
4. Panama - 1

United States defeats Jamaica 2-0 (HW 2-0, AT 0-0)
United States defeats El Salvador 4-1 (HW 3-0, AT 1-1)
United States defeats Panama 10-0 (HW 7-0, AW 3-0)
Jamaica defeats El Salvador 2-1 (HW 1-0, AT 1-1)
Jamaica defeats Panama 3-0 (HW 2-0, AW 1-0)
El Salvador defeats Panama 3-2 (HW 2-1, AT 1-1)


Group B:
1. Honduras - 9
2. Costa Rica - 9
3. Canada - 8
4. Guatemala - 5

Honduras defeats Costa Rica 3-2 (HT 1-1, AW 2-1)
Honduras defeats Guatemala 2-1 (HW 1-0, AT 1-1)
Costa Rica defeats Guatemala 4-3 (HW 2-1, AT 2-2)
Costa Rica defeats Canada 2-1 (HT 1-1, AW 1-0)
Canada defeats Honduras 3-2 (HW 2-1, AT 1-1)
Guatemala defeats Canada 2-2 (HW 1-0, AL 1-2)

Group C:
1. Mexico - 15
2. Trinidad & Tobago - 15
3. St. Vincent / Grenadines - 3
4. St. Kitts & Nevis - 3

Mexico defeats Trinidad & Tobago 6-2 (HW 5-0, AL 1-2)
Mexico defeats St. Vincent / Grenadines 10-0 (HW 7-0, AW 3-0)
Mexico defeats St. Kitts & Nevis 8-1 (HW 6-1, AW 2-0)
Trinidad & Tobago defeats St. Vincent / Grenadines 8-1 (HW 5-1, AW 3-0)
Trinidad & Tobago defeats St. Kitts & Nevis 6-2 (HW 4-1, AW 2-1)
St. Vincent / Grenadines defeats St. Kitts & Nevis 3-3 (HW 1-0, AL 2-3)


Finals
1. United States - 22
2. Mexico - 18
3. Costa Rica - 18
4. Honduras - 10
5. Jamaica - 7
6. Trinidad & Tobago - 6

United States defeats Costa Rica 2-2 (HW 1-0, AL 1-2)
United States defeats Honduras 4-2 (HW 2-1, AW 2-1)
United States defeats Jamaica 3-1 (HW 2-1, AW 1-0)
United States defeats Trinidad & Tobago 2-1 (HW 2-1, AT 0-0)
Mexico defeats United States 2-1 (HW 2-0, AL 0-1)
Mexico defeats Costa Rica 2-1 (HW 2-1, AT 0-0)
Mexico defeats Honduras 3-1 (HW 2-0, AT 1-1)
Mexico defeats Jamaica 6-1 (HW 5-0, AT 1-1)
Mexico defeats Trinidad & Tobago 4-2 (HW 4-1, AL 0-1)
Costa Rica defeats Honduras 2-1 (HT 1-1, AW 1-0)
Costa Rica defeats Jamaica 5-1 (HW 4-0, AT 1-1)
Costa Rica defeats Trinidad & Tobago 3-0 (HW 2-0, AW 1-0)
Honduras defeats Trinidad & Tobago 3-2 (HT 1-1, AW 2-1)
Honduras defeats Jamaica 3-1 (HW 2-0, AT 1-1)
Jamaica defeats Trinidad & Tobago 3-1 (HW 2-0, AT 1-1)

*Letters indicate where the match is played and my pick for the team listed first. H is home, A is away, W is win, T is tie, L is loss.
Example: United States defeats Jamaica 2-0 (HW 2-0, AT 0-0) means USA wins at home 2-0 and ties in Jamaica 0-0.