PDA

View Full Version : Question about use of Poisson distribution


Pages : 1 2 3 4 [5] 6

AussieVamp2
01 Jun 2004, 02:59 AM
tachyon1,

If you choose some arbitrary level of significance, say 1%, over what time period do you find that previous results become insignificant? Is this time period similar for different leagues? Do you have any data that you could share here as an example?

NoSix


Here's a question though :- Should you vary a team's ability (or change it as quickly) later in a season as earlier?

AussieVamp2
01 Jun 2004, 03:05 AM
On a slightly different tack,has anyone given a least squares approach a go.It seems to deal admirably with an unbalanced schedule & is no respecter of reputations.

T.

Yeah, seems to be ok. Although more complicated to calculate, of course, and you don't mostly directly get your handicap probabilities to win by 2 goals, etc., that you were mentioning, either. Also still the same in-game score state dependency remains as well, of course.

AussieVamp2
01 Jun 2004, 03:09 AM
You can also incorporated trend &/or control type limits to try to spot if a team has greatly improved or otherwise and you've missed it.

T

Out of interest, have you ever tried something like an extra variable (another rating, I guess) and adjusted that up or down more quickly based on recent results, and combined that with a rating? Would only be a small improvement if any, I am sure, but would be interesting perhaps.

tachyon1
01 Jun 2004, 06:57 AM
Hi AV,
in no particular order.

Betting favs etc on the asian handicaps doesn't cut it at present & turns in a small loss even at best price & even if you give yourself a fairly large margin for error for predicted price verses available price.

The leagues looked at were all English,with the EPL & 1st division predominating.

Profitability of away dogs is down to having to price up the chances of the least favoured team winning,which intuitively isn't very easy.Most exchange players retreat to the misplaced safety of the fav.

Also most asian prices take their lead from the usual home/draw/away odds so any price biases present in those can get carried thru.

It also helps that no one can at present a cast iron case for a fav longshot or a longshot fav bias or maybe any bias at all in English odds setting.So the waters are very muddied.

I'm reluctant to give too much extra credence to recent form because of the very low scoring in soccer.You could just be factoring in luck or perverse reffing.Bristol City for example followed 10 straight wins this season with one win in 8 but that kind of run is very unusual.Their true ability was probably somewhere between those two extremes.

IMO a trend value approach would be the way to go.It might also be useful to deal with the seasonal goal trends that repeat year on year.

Don't forget reversion to the mean which is certainly going to need to be allowed for.

I do like the LS approach for leagues or competitions where you've got to make some pretty quick evaluations because teams are dramatically different from last years model(Aussie state leagues+ only around 20 games per team per season) or where teams haven't met before on a regular basis(Champions league,again limited number of games in an incomplete schedule).

Also the lower down the talent scale you go the larger becomes the team news factor,which probably accounts for the reluctance of books to price up completely lower Scottish leagues.

By contrast most major European domestic leagues are much less volatile on a season to season basis even with the added complication of relegated & promoted teams.

Some great ideas for further discussion.

T

AussieVamp2
01 Jun 2004, 07:57 AM
Betting favs etc on the asian handicaps doesn't cut it at present & turns in a small loss even at best price & even if you give yourself a fairly large margin for error for predicted price verses available price.

-- ok, so you are identifying a general pricing flaw with the help of a rating to some degree then it would seem

Profitability of away dogs is down to having to price up the chances of the least favoured team winning,which intuitively isn't very easy.Most exchange players retreat to the misplaced safety of the fav.

-- yeah, not much fun laying 8.00 prices either !


Also most asian prices take their lead from the usual home/draw/away odds so any price biases present in those can get carried thru.

-- yes, although improved a little, I think


It also helps that no one can at present a cast iron case for a fav longshot or a longshot fav bias or maybe any bias at all in English odds setting.So the waters are very muddied.


-- still some, but not as much as previously (at least in the home favorite case) as far as I can tell

I'm reluctant to give too much extra credence to recent form because of the very low scoring in soccer.You could just be factoring in luck or perverse reffing.Bristol City for example followed 10 straight wins this season with one win in 8 but that kind of run is very unusual.Their true ability was probably somewhere between those two extremes.

IMO a trend value approach would be the way to go.It might also be useful to deal with the seasonal goal trends that repeat year on year.

-- not sure I understand what you mean by trend value in this case?

Don't forget reversion to the mean which is certainly going to need to be allowed for.

--- Ok, what was that guy and the 'grand means' thing, guess I will remember shortly, is that what you mean?

I do like the LS approach for leagues or competitions where you've got to make some pretty quick evaluations because teams are dramatically different from last years model(Aussie state leagues+ only around 20 games per team per season) or where teams haven't met before on a regular basis(Champions league,again limited number of games in an incomplete schedule).

-- Australian bookies not silly enough to do these, although Scandinavian equivalents abound.


Also the lower down the talent scale you go the larger becomes the team news factor,which probably accounts for the reluctance of books to price up completely lower Scottish leagues.


-- Yes, and more to the point though is that if you get it wrong, you will get 3 5 buck bets, and one max bet from a pro, so not much chance of winning, given no one else interested in something watched by 200 people, 3 dogs and a goat.

By contrast most major European domestic leagues are much less volatile on a season to season basis even with the added complication of relegated & promoted teams.

-- On how much team ratings change you mean?

Some great ideas for further discussion.

AussieVamp2
01 Jun 2004, 08:33 AM
Also the lower down the talent scale you go the larger becomes the team news factor,which probably accounts for the reluctance of books to price up completely lower Scottish leagues.

T

So, do you mean your look at values of players suggest that key 'spine' players, or at least first choice spine players are more important, relatively, in lower divisions?

AussieVamp2
01 Jun 2004, 08:41 AM
Last season using an ELO Chess rating based soccer rating system & backing any away dog on the asian handicap when the price on a public betting exchange exceed your true estimate by 5% yielded 150+ bets & 12% return on level stakes.

T.

The question I of course forgot is why use an ELO rating rather than the poisson rating you had been discussing, it worked better, or just for interest?

tachyon1
01 Jun 2004, 11:35 AM
Hi AV,

re regression/reversion to the mean.
Generally speaking the great aren't as great as they seem & the really poor aren't quite as poor as they seem.Teams numbers(ratings etc) tend to home in towards the mean for that league as the extreme performances are likely to be the result of accumulated good or bad luck.

In the absence of any fundamental change to the teams make up sides that are good in the first half of a season usually remain good in the second half but by not quite as much.The same for poor sides.

The baseball guys regress every stat that moves & you probably should also in soccer.

re team news.
Mostly anecdotal this one. I know guys who bet exclusively on minor leagues in Scotland & State leagues in Aus & they maintain that presence or absence of certain players can greatly affect outcomes.National league players turning up in the State leagues in Aus being a case in point.

Most English books will willingly give you 2.4 Campbelltown,2.6 Adelaide R & 3.75 the draw all the way thru to the start of the new English season.Most wisely steer clear of the Tas leagues though.

The ELO wasn't mine,I just crunched the results.

More later.
T.

AussieVamp2
01 Jun 2004, 11:44 AM
Hi AV,

re regression/reversion to the mean.
Generally speaking the great aren't as great as they seem & the really poor aren't quite as poor as they seem.Teams numbers(ratings etc) tend to home in towards the mean for that league as the extreme performances are likely to be the result of accumulated good or bad luck.

-- right - so other than smoothing are you specifically adjusting for this on top, so to speak?

In the absence of any fundamental change to the teams make up sides that are good in the first half of a season usually remain good in the second half but by not quite as much.The same for poor sides.

The baseball guys regress every stat that moves & you probably should also in soccer.

-- was thinking more of game to game - and now I recall what I was looking for : James-Stein estimation -- and part of the question of should you adjust teams as quickly later on in the season?


re team news.
Mostly anecdotal this one. I know guys who bet exclusively on minor leagues in Scotland & State leagues in Aus & they maintain that presence or absence of certain players can greatly affect outcomes.National league players turning up in the State leagues in Aus being a case in point.


-- yeah, well in general be hard pressed to find out who is playing for who!

Most English books will willingly give you 2.4 Campbelltown,2.6 Adelaide R & 3.75 the draw all the way thru to the start of the new English season.Most wisely steer clear of the Tas leagues though.

-- Have to have a look at the state leagues, interesting.

The ELO wasn't mine,I just crunched the results.

-- ok, fair enough.

More later.

-- Cool, look forward to it :)
T.

AussieVamp2
02 Jun 2004, 12:16 AM
The ELO wasn't mine,I just crunched the results.

More later.
T.


Wasn't soccerratings was it, out of interest?

Zebby
15 May 2005, 11:46 AM
It helps reel in the chicks that don't immediately run screaming from us.

Which is about 2.17364579843% of them.

Don't you mean 2.7182818284% of them?

Running away screaming is all part of the mating ritual.

Are you really interested in the 2.7182818284% of chicks that don't run away? They're 'e'asy.

johnh00
05 Jun 2006, 05:02 PM
Thought I'd resurrect this old thread, as it makes for some good reading, plus I had a question related to it. ;)

I've been working on a computer program for analyzing sports statistics. It can be used to analyze a lot of stats, but as part of the testing of my theories and my programming, I've been analyzing soccer players' stats and using it to produce some predictions for results. I am able to create a poisson distribution on the predicted results for goals for/against and manually figure out odds of wins/ties/losses based on this, but I wondered if there is any elegant(read: simple mathematical formula) to do this instead? If not, I'll just write a sub-routine to handle the problem, but I was feeling a bit lazy and figured someone may know of an easier way around this.

- Lee

numerista
06 Jun 2006, 07:15 PM
I wondered if there is any elegant(read: simple mathematical formula) to do this instead? If not, I'll just write a sub-routine to handle the problem, but I was feeling a bit lazy and figured someone may know of an easier way around this.

Off the top of my head, I don't think there is a simple formula, although I'm feeling a bit lazy myself. :)

johnh00
06 Jun 2006, 07:46 PM
Off the top of my head, I don't think there is a simple formula, although I'm feeling a bit lazy myself. :)
Looking at it, I would think it's possible to come up with one, but I think I'll just create a function to process this out, instead. I could write the program in a couple of hours, but if I start trying to come up with a mathematical solution, it will probably take me days, and I'm not even sure it would work. :p

voros
16 Jun 2006, 11:26 PM
John, I've found that the following formula is pretty close (as close as I can get so far):

Win% = (wins + (draws*0.5))/ games
Gf = Goals for
Ga = Goals against
C = 1st constant = .307158 (I have no idea how many decimals are needed to get it accurate)
K = 2nd constant = .432928

Poisson win% = (approximately) = ((Gf + C) ^ ((Gf+Ga)^K)) / (((Gf + C) ^ ((Gf+Ga)^K)) + ((Ga + C) ^ ((Gf+Ga)^K)))

Break it out by individual parens and it will be easier to see what's going on.

The formula takes into account not just a ratio of goals scored over total goals raised to an exponent, but also allows the exponent to float based on scoring environment.

Sample scores for each compared to one another:

Score Poiss Math
Ties 50.0% 50.0%
1-0 81.6% 81.0%
2-0 93.2% 93.8%
2-1 71.2% 71.4%
3-0 97.5% 97.9%
3-1 84.1% 84.4%
3-2 66.9% 67.3%

And so forth

Now one possible mistake I made is I best fit those constants counting each possible scoreline as a single data point. I compiled a total of around 50 different scorelines that occurred in international matches. You could argue that I should count 1-0 as many more datapoints than say 7-0 since 1-0 is a far more common scoreline. The end result of doing it that way would likely make the predictions for normal scores a bit more accurate and for the more lopsided scores less accurate.

The good thing is I'd only have to change the constants, the formula itself should remain the same. Also the amount of accuracy gained would be minor I think.

voros
17 Jun 2006, 12:37 AM
John a quick update. Weighting the best fit test to count 1-0 scores far more heavily than less common scores like 10-1, the two constants are:

C = 1st Constant = .297464
K = 2nd Constant = .41427

The above sample would now look like

Score Poiss Math
Ties 50.0% 50.0%
1-0 81.6% 81.4%
2-0 93.2% 93.8%
2-1 71.2% 71.1%
3-0 97.5% 97.8%
3-1 84.1% 84.2%
3-2 66.9% 66.9%
With the downside being as the scores become more and more lopsided, the formula more and more underrates the winning team (though the difference between 99.94% and 99.77% might not be that big of deal for use in most systems).

CaptainJack
02 Aug 2006, 04:36 PM
If anyone wants to compare their average goal expectancies with those complied by the professionals they should check the trades posted by the British spread firms.

The nature of their markets compels them to reveal how mant goals they expect teams to score against each other.All major leagues are priced up.

An example from last week,they quoted Chelsea as 1.3 goals superior to Middlesbrough.They quoted total goals for the game as 2.8.
From those figures it's a few easy steps to see that they expected Chelsea to average 2.05 goals & M'bro 0.75 goals.They were miles out as it turned out.

If the figures you'd have posted are around these quotes then your method is probably sound.

Aside from a slight dependancy in low scoring games the power of the poisson is virtually dependent on the accuracy of your average.

A decent method is to treat your goals scored/conceded as a weighted time series and apply an exponential smoothing constant to the figures.Armed with a decent average as you point out you can predict everything from correct scores,match outcomes,time of first goal for the match or either team,most likely team to score first,etc etc.

One novel use that may interest is to use the poisson to predict the probability of result at any point in the game.Say for example you lead 1-0 after 50 mins.
Provided you are still happy that you original assessment of a team's goal expectancy is still valid you can calculate the goal expectancy for the remaineder of the game from Ax(B^C),where A is the original expectancy,B is the proportion of the game remaining & C is a constant,usually around 0.84.

Plug this expectancy into a poisson & remember to add on the actual score at the time,you can come up with "in running" probabilities for either A hanging on,B making a comeback or the game ending level.

T1.

How do you calculate which team that will score first if you have a decent average of how many goals the home team and away team will score and in what 15 min interval it will be scored?

I have tried some methods on my own and compared the probabillities to what the Sportsbook have but I am miles away from their valuation.

Could someone please give me some help here?

tachyon1
03 Aug 2006, 12:33 PM
Hi CJ,

the negative binomial distribution's a good place to start.You can run in in excel or openoffice..
Essenially you're looking at the probability of x number of failures before you get a success.

Say a team has a goal expectancy of 1.2 goal/game.Divide that by 92 to get the probability of scoring in any minute of the game.You get 0.0130434

So the probability of scoring their first goal in the first 60 seconds is 0.0130434.

The prob of scoring their first goal in the second 60 seconds is;
(1-0.013034)*(0.0130434),in other words the prob of NOT scoring in the first minute multiplied by the prob of scoring in the second minute.

and so on and so on.

Excel's quicker.The formula's something like =NEGBINOMDIST(1;1;0.0130434) for the first minute;=NEGBINOMDIST(2;1;0.0130434) for the second,=NEGBINOMDIST(3;1;0.0130434) for the third etc.

You want the cumulative probability density,not each individual minute probability so the last step is to total them up.

For this example the cumulative total crosses 50% between 52 and 53 minutes.

time......individual...............cumulative
52.........0.0066772............ 0.494759
53.........0.0065901............ 0.501349

So at that point you're more likely to have scored than not and that's the time you take as your most likely first goal.

That's the bare bones,but in reality it needs tweaking.

Low goal expectancies quickly give 1st goal times in excess of 90,plus your chances of scoring a goal isn't constant.Scoring increases as the game progresses.So in this model in high goal scoring environments the first goal comes too quickly,in low ones,too late.

If you address these problems you should get something like this.

Goal expectancy.........1st goal time.
0.5.............................72
0.7.............................67
0.8.............................64
1.0.............................60
1.2.............................56
1.5.............................50
1.7.............................47
2.0.............................43
2.2.............................41
2.5.............................36
2.8.............................33
3.0.............................32
3.5.............................29

Here's a comparison I did between actual 1st goal times and predicted from their end of season goal averages for the EPL.2004/2005 season iirc.

Actual average first goal time in minutes is followed after the slash by the predicted first goal time using the above model.

Team..........Home....................................Away
..................scored......conceded............score..........con.
Chelsea.......46/46........77/85...............47/44.........68/71.
Arsenal........30/33........63/60...............55/46.........71/62.
Manu...........46/49........68/68...............54/53.........65/66.
Lpool...........50/49........61/65...............61/59.........46/51.
Everton........58/55........58/65...............61/59.........50/49.
Bolton..........52/54........56/60..............46/53.........54/55
Mbro............53/51.......59/60...............57/56.........54/52.
ManC..........47/55........63/65...............51/55.........49/53.
Totten.........51/45........55/57...............70/70.........59/60.
AVilla..........53/53........57/61...............54/60.........35/43.
Charl...........40/50........53/50...............68/67.........62/50.
Birming.......53/55........69/66...............60/63.........46/48
Fulham........56/50.......56/53................63/57........42/45
Newc..........52/54........54/54...............53/56.........51/48.
Blackb........52/58........54/57...............70/70.........64/58.
Pmouth.......44/48.......39/53................58/67.........45/46.
WBA..........65/63........60/56...............57/60..........38/43.

One thing to note,Chelsea's actual(77) and predicted(85) isn't great.That's because Chelsea conceded only 6 home goals(only 5 were 1st goal because Bolton scored twice),therefore the time of the first goal conceded in their many goalless games is taken as 90.Which is obviously artificial.

T.

tachyon1
05 Aug 2006, 05:21 AM
Slightly more time so here's a few more little pointers.

The major problem when teams don't have a high goal expectation is the goaless games.A negbino approach gives 1st goal times in excess of 90 minutes.

Here's a workaround.

Say a team only scores 0.5 goals per game.Stick that in a poisson and you've got around a 60% chance of that team scoring zero goals.Over a 38 game EPL season that's about 23 games.if that's the case those 23 games will in the end of season stats have a 1st goal time for that team of 90 mins(or 93 depending on how you're going to treat injury time).

Assume next that the 19 goals they did score were scored in the remaining 15 games.That's 1.266 goals/game.Stick that figure into a negbino,further assuming that goals are equally likely in any minute(again untrue) and you get the time for the first goal at about 50 minutes.

Therefore in total you have 23 games with first goal time at 90 and 15 games with first goal time at 50.

Average those and you get average 1st goal time for at team with a goal expectancy of 0.5 as 74 minutes.That's about a minute later than teams of that profile actually score their 1st goal.

The next fudge is at the other extreme.

Say teams score 2 goal/game.Over an EPL season goaless games account for only 5 games now,so that isn't the area where the major errors will occur.

Now you need to concentrate on more accurately defining a teams goal expectancy for every one minute section of a game.

To do this you need to know that the goal expectancy for the remainder of a games is the initial goal expectancy multiplied by the (proportion of the game remaining) raised to 0.83.

So to work out the goal expectancy for the first minute work out the goal expectancy for the last 89 minutes(ignoring injury time) and subtract this from the initial expectancy.

For an initial GE of 1.0 it'll be 0.00923 of a goal for the 1st minute.

Then work the GE for just the second minute.
(Work out GE for the last 88 minutes,add the GE you already calculated for the 1st minute and subtract from the initial whole game GE)

And the 3rd and so on.

You carn't now just stick all these seperate,but more realistic individual GE's into a negbino because that demands a constant probability of success or failure.But you are on the road to improving your model for an elevated goal environment.

T

arrplayr
14 Sep 2006, 03:05 PM
Hi,
I am working on an application for a bookmaker and I need a guidance on two things.
1. How can I calculate (set) the odds for Halftime/Fulltime (1-1, 1-X, 1-2...) lines only by knowing Home Draw Away odds.
2. Same question is for Halftime result.
I have managed to find out the formula for Double Chance lines but I am not able to find out the formula for above two lines.
Here is a quick example of one of bookies odds (perhaps you can help me with these exact odds)
1 = 3.35
X = 3.10
2 = 2.00

Halftime: 1 = 4.00; X = 1.93; 2 = 2.60
HalfTime/Fulltime: 1-1 = 6.15; 1-X = 13.0; 1-2 = 30; X-1 = 7.25; X-X = 4.85; X-2 = 4.75; 2-1 = 35.00; 2-X = 13.00; and 2-2 = 3.35.

Even if there is no formula for this, I would very much appreciate if you just orient me to the right direction. THANKS.