View Full Version : Question about use of Poisson distribution
tachyon1
26 May 2004, 05:51 AM
Hi Nosix,
I can run thru the basic routes I've taken to arrive at my ideas,but recording r figures,significance tests etc wern't exactly top of my agenda when I started x number of years ago.
Most teams don't vary in their ability very quickly & when they do there's usually a transparent reason(huge input of roubles(Chelsea) or getting themselves put into administration(Leeds)).
Taking EPL teams over a time span of 100+ games,about 2.5 seasons,the average change in ability of teams is not much more than 0.2 of a goal.
If you just want to look a goals scored/conceded,you don't want a recent result saw toothing the average too much & you do want enough results to be present so that a teams opponents are reasonably balanced.Given that, I've used results that go back a few seasons & have used a very small amoothing constant.
The method has been evaluated by running least squares regression on predicted verses actual margins of victory &/or generating home/away/draw odds & comparing those with actual percentage occurrances again by regression.
You can also incorporated trend &/or control type limits to try to spot if a team has greatly improved or otherwise and you've missed it.
Getting a decent average is only half the problem.Your figure probably only gives you the likely number of goals that your team will score against an average team(not including themselves) from that league.
If they're playing either the very worst or very best from the league you need to take that into account,probably by looking at scoring rates....& of course you need to allow for home field.
It's fair to say that not all divisions are alike.Money talks louder in the English lower divisions.Fulham & to a lesser degree Portsmouth(improved over half a goal) made huge advances in ability on relatively modest investment by Premiership standards.
Hi Numerista,
I've just played around with LS for soccer.Like the sqrt idea,I'm currently capping at around a supremacy of 3 goals & toying with the idea of extra credit for teams who win narrowly.
Weighting's also a likely improver.
I'm getting some nice results on relatively few games in terms of ranking,not quite so good on rating.For example for the EPL a LS approach based on half a dozen games for each side is giving a difference from best in the league to worst of around 3 goals,when a more typical and believable value would be something in the region of 1.6 goals.
To get margins that can be slotted into a Poisson you seem to need to be well into double figure games.
Hi Zeek,
a poisson's a distribution that represnts the number of events occuring randomly in a fixed time at an average rate.
So if you known the average number of goals that a team scores in a game you can assign probilities to that team scoring 0,1,2,3 etc goals in a game.
Do the same for their opponents & you in theory can find the probability of a game ending 1-1(the probability of team A scoring 1 goal multiplied by the probability of team B scoring 1 goal) or any other score.
Hence you can add up all the individual score probabilities where the game ends level(0-0,1-1,2-2,3-3,4-4,the rest are probably to unlikely to bother with) & derive odds fotr the game to finish level.
You can also do the same for scores where team A or team B wins.
T
numerista
26 May 2004, 12:39 PM
To get margins that can be slotted into a Poisson you seem to need to be well into double figure games.
Presumably, with a smaller number of games, you'd do well to bias all of your estimates towards the league average. Of course, regularized Poisson regression starts to make the coding into real work.
Incidentally, do you have data in a handy format to share? In such a case, the odds of me coding things up would increase dramatically.
NoSix
27 May 2004, 12:04 PM
Given that, I've used results that go back a few seasons & have used a very small amoothing constant.
How small is "very small"? How about giving us a typical number?
microbrew
27 May 2004, 01:46 PM
Hey what is this "Poisson" thing everyone keeps mentioning? Is it some kind of program I can download to use for predictions, or is it a mathmatical equation, or what?
Read post 13 and later of this thread (http://www.bigsoccer.com/forum/showpost.php?p=1979779&postcount=13). It's a few posts describing Poisson processes in layman's terms.
tachyon1,
I'm impressed. There's an academic paper hidden in this thread, undergoing peer review.
Getting a decent average is only half the problem.Your figure probably only gives you the likely number of goals that your team will score against an average team(not including themselves) from that league.
If they're playing either the very worst or very best from the league you need to take that into account,probably by looking at scoring rates....& of course you need to allow for home field.
voros suggested Bayes' Theorem as a possible answer to this issue in this post
http://www.bigsoccer.com/forum/showpost.php?p=1986634&postcount=28
tachyon1
27 May 2004, 02:16 PM
How small is "very small"? How about giving us a typical number?
Hi N6,
sorry mate,I thought I had posted a figure & you're correct I haven't.
So between 1/20 & 1/25.
T
tachyon1
28 May 2004, 04:23 AM
Hi all,
just quickly checked out the Voros link.
Here's another way or it may be the same way in a different guise.
I'll use the same figures so we can compare results.
SJ score 1.5 g/g
Dallas concede 2.133 g/g
An imaginary match up between two average teams at a neutral venue should on average & longtern end in a 1.44333 draw.
Divide 1.5/1.44333=1.03926 or in other words SJ score at 1.03926 times the league's average rate.
Divide 2.133/1.44333=1.4778 or Dallas concede at 1.4778 times the league average rate.
Multiply the two answers;
1.03926x1.4778=1.5358
So in a game on neutral ground SJ should score at 1.5358 times the league average against Dallas.
The league average is 1.4433 g/g so;
1.44333x1.5358=2.217 goals scored by SJ in a game against Dallas.
V's method came to 2.214 iirc.
Intuititively the answer looks OK & dimensionally it's OK as well.
Worth bearing in mind that this g/g average only applies if the to teams meet on neutral turf or if the league has no home field advantage,which is unlikely.
To address this if we go back to the start we need the additional information about goals scored by the home sides & goals scored by the away sides.
Lets say home advantage is a fairly typical 0.4 of a goal & the home teams score 1.6433 g/g & the away sides 1.2433 g/g.
Now in addition to the two rates we used previously we need to include one for the home side.
Say SJ are at home.
Divide 1.6433/1.4433=1.1386,so home teams score at 1.1386 times the average rate.
Include this in the calculation & now SJ will expect to score on average,longterm 2.524g/g at home to Dallas.
Another approach is to estimate a supremacy for the match up.(Least squares using goal difference??).
There is usually a fair relationship between supremacy & total goals.As the former increases then so does the latter,albeit slowly.
Evaluate the supremacy,calculate the expected total goals that goes with that supremacy & then you've got each teams individual goal expectancy.
I prefer the first approach.
T
numerista
28 May 2004, 09:39 AM
Here's another way or it may be the same way in a different guise.
Pretty sure you're describing the same procedure. Microbrew ... I'm pretty sure this has been done in the literature, in Chance Magazine about eight years ago.
NoSix
28 May 2004, 08:54 PM
Hi N6,
sorry mate,I thought I had posted a figure & you're correct I haven't.
So between 1/20 & 1/25.
T
What units of time are you using? I would have thought days, but apparently not...
AussieVamp2
29 May 2004, 02:09 AM
Hi AV,
I've stuck up an example from a recent Celtic/Rangers game in another thread,but here's the basic maths if anyone wants to try it themselves.
T.
Thanks tach (sorry for delay, been moving house!) :)
AussieVamp2
29 May 2004, 02:19 AM
Hi Nosix,
Hi Numerista,
I've just played around with LS for soccer.Like the sqrt idea,I'm currently capping at around a supremacy of 3 goals & toying with the idea of extra credit for teams who win narrowly.
T
Right, you have to reduce these, when looking at actual matches, whereas the actual ratings can have a bigger spread (before HGA)
AussieVamp2
29 May 2004, 02:22 AM
for least squares, that should probably get closer by the end of the season
for example, for Germany this year :-
have something roughly like this at the end of the season
WERDER BREMEN 0.154
LEVERKUSEN -1.012
WOLFSBURG -1.036
DORTMUND -1.147
BOCHUM -1.24
BAYERN MUNICH -1.379
FREIBURG -1.381
SCHALKE 04 -1.397
HANNOVER -1.435
HANSA ROSTOCK -1.436
STUTTGART -1.502
FC KOLN -1.63
HERTHA BERLIN -1.707
1860 MUNCHEN -1.716
HAMBURGER SV -1.747
MOENCHENGLADBACH -1.751
FRANKFURT -1.807
KAISERSLAUTERN -1.934
AussieVamp2
29 May 2004, 02:42 AM
There's 60 miutes left or 0.66% of the game.
0.66^0.84=0.711,so both teams have 0.711 of their original goal expectancy left,which comes to 1.07 of a goal for team A & 0.78 for B.
Good stuff, the 0.84 exponent for this example you came up with from your work?
AussieVamp2
29 May 2004, 02:45 AM
I've just played around with LS for soccer.Like the sqrt idea,I'm currently capping at around a supremacy of 3 goals & toying with the idea of extra credit for teams who win narrowly.
You could try capping based on the standard deviations involved too...
tachyon1
29 May 2004, 06:12 AM
Hi AV2
neat feedback.
I've be meaning to get some use out of SD's for ages & have never gotten around to it.
Interesting figures on the Bundesliga.Here's some LS figs from the EPL,the first set are the immediately preceeding 5 games only(to illustrate how unrepresentatively large the differences can get on a small sample) whilst the second set are for the season from start to mid february.
...........5 Game.......Season to feb.
Arsenal......2.94.........2.32
Chelsea......2.75.........2.12
AVilla.......2.21.........1.17
B'ham........2.02.........1.01
Newcastle....1.92.........1.39
Spurs........1.82.........0.92
Bolton.......1.45.........0.89
B'burn.......1.38.........0.94
Lpool........1.30.........1.47
M'bro........1.17.........0.94
Fulham.......1.08.........1.15
ManC.........1.00.........0.99
Charlton.....0.99.........1.23
P'mouth......0.96.........0.67
ManU.........0.75.........2.02
Soton........0.72.........1.10
Leeds........0.35...........0
Wolves.......0.21.........0.01
Everton......0.10.........0.87
Leicester.....0...........0.58
I've tentatively thought about using standard scores to compare ratings where the spread & size of ratings varies.
Using this method Arsenal's 2.94 five games rating is equivalent to a rating of 2.30 on the "season to Feb" scale.Which,in this case anyway compares well with their actual feb rating of 2.32 & indicates that Arsenal's five match form was fairly indicative of their probable season long form.
In short comparing a mid season LS ratings with average season long ratings for various completed years in that division via standard scores(which incorporated SD's) might work:-).
T.
NoSix
29 May 2004, 01:52 PM
Taking EPL teams over a time span of 100+ games,about 2.5 seasons,the average change in ability of teams is not much more than 0.2 of a goal.
If you just want to look a goals scored/conceded,you don't want a recent result saw toothing the average too much & you do want enough results to be present so that a teams opponents are reasonably balanced.Given that, I've used results that go back a few seasons & have used a very small smoothing constant.
Not trying to give you a hard time, but something doesn't add up here. You claim to be using data that go back 2+ seasons, but the weighting constants you assign make any result older than 92 days (for 1/20) or 115 days (for 1/25) insignificant (<0.01). What gives?
tachyon1
29 May 2004, 03:06 PM
Hi Nosix,
most EPL teams play a game on a Saturday & then play their next game the following Saturday so why would I give (say) 75% weighting to the game if it occurs 7 days hence rather than 96%.
It's a well known fact that most EPL footballers spend their time between games cutting the lawns of their golf course sized manorial lawn in deepest Hertfordshire,giving their private secretary Spanish lessons,deliberately missing drugs tests,arranging their bail surety & purchasing Mini Coopers in various colours(yellow with a black roof please).None of which has very much effect on their ability to play football.
Therefore arranging the periods so that they contained exactly one game would seem sensible.
However this can cause problems as games are frequently abandoned due to the legendary behaviour of alcohol inflamed English football "hooligans" justifiably incensed by the referee wrongly awarding a throw in to their hated local rivals.
Hence a sensible compromise would seem to be to take one game as a period regardless of the actual timespan.
If you can suggest a superior alternative then I'm all ears:-).
All the best T.
NoSix
29 May 2004, 04:25 PM
Hi Nosix,
most EPL teams play a game on a Saturday & then play their next game the following Saturday so why would I give (say) 75% weighting to the game if it occurs 7 days hence rather than 96%.
It's a well known fact that most EPL footballers spend their time between games cutting the lawns of their golf course sized manorial lawn in deepest Hertfordshire,giving their private secretary Spanish lessons,deliberately missing drugs tests,arranging their bail surety & purchasing Mini Coopers in various colours(yellow with a black roof please).None of which has very much effect on their ability to play football.
Therefore arranging the periods so that they contained exactly one game would seem sensible.
However this can cause problems as games are frequently abandoned due to the legendary behaviour of alcohol inflamed English football "hooligans" justifiably incensed by the referee wrongly awarding a throw in to their hated local rivals.
Hence a sensible compromise would seem to be to take one game as a period regardless of the actual timespan.
If you can suggest a superior alternative then I'm all ears:-).
All the best T.
I see. That would certainly simplify the calculation of the weights. Thanks for answering my questions.
AussieVamp2
31 May 2004, 09:43 PM
If you wanted to you could have a half week as a period, takes care of midweek games, and only messed up by the occasional case where end of season, or some other catchup reason a team had 3 games...
or perhaps the english xmas period :)
AussieVamp2
01 Jun 2004, 01:40 AM
Answer: not necessarily.
Somebody once defined statistics along the following lines: "the art of drawing correct conclusions from wrong assumptions." Depending on what you're modeling, you can often get away with using crude approximations to reality.
That's the simplest answer. A more complex answer is that this particular phenomenon -- that the observed variance doesn't match what it "should" be -- is very common in applied statistics. It's known as overdispersion, and there are techniques of adjusting for it. If somebody's really interested, they should be able to find a good reference ... IIRC, there are a couple pages of discussion in MacCullagh and Nelder's book on Generalized Linear Models.
Good reference on a technique usable for that (or underdispersion, as the case may be) for in-season soccer models would be the most useful, of course ;-)
AussieVamp2
01 Jun 2004, 01:54 AM
Another firm regularly prices up Australian State games(park standard soccer at best)on the same basis they use to price the EPL.Even though the Oz games have getting on for twice as many goals & virtually no home advantage.
T.
-- Yeah, or same thing for Scottish lower divisions until recently, etc.
Last season using an ELO Chess rating based soccer rating system & backing any away dog on the asian handicap when the price on a public betting exchange exceed your true estimate by 5% yielded 150+ bets & 12% return on level stakes.
-- Nice. Take it home faves, dogs, and away faves not as much joy? That was just one league was it?