Question about use of Poisson distribution

Discussion in 'Statistics and Analysis' started by NoSix, Dec 30, 2003.

  1. tachyon1

    tachyon1 Member

    Apr 23, 2004
    Hi guys,thanks for the feedback

    Firstly the way books set prices.

    It's dependent upon a wide range of factors,but the most obvious ones are expected liability,actual liability,type of bets that you expect to predominate,usually accumulators(parlays) or singles & of course true probability of the outcome you're betting on.

    The liability factors are driven by punter preference & at the moment anyway soccer betting in the UK is predominantly multiple betting centred on home teams & better teams.

    Odds therefore on home favs tend to be cramped although not as much as you would anticipate because stringing such bets together in parlays multiplies up the bookies already in built advantage or vig.

    You can test this theory that away dogs are more likely to be value by looking at the returns from handicap betting.Here game odds are "levelled up" by giving the outsider or dog a start ranging from half a goal upto two or more for huge mismatches.

    At straight up prices home favs are shorter than they should be so away dogs are longer.The handicap prices posted have to bear some relationship to the straight up odds but again the away dogs are underbet,therefore away dogs on the handicap get a double dose of underbetting & are priced accordingly.

    Any half decent rating system should pick out enough value dogs to turn a profit.A couple I tried last season gave return on investment of upto double figures with reasonable numbers of bets.

    The half time lines are going to see much more singles only betting so they should be easier for books to manipulate the prices to get a balanced book.

    The main considerations are going to be any accumulated liabilities and what price they need to post on a team to get the desired amount of action.

    Punters will be put off backing the team leading because they will probably be long odds on & "money buying" is considered dumb even though in many cases it won't be.Thefore the aim will be to price the draw or the losing team at a largish price that appears generous...but actually isn't.

    Most people are very poor at subjectively putting a figure on the chances of something happening.Tell someone that an event is "unlikely",& then ask for their estimation of the probability of that event & you're likely to get anything from 50% downwards.

    One thing that seems to be largely neglected in setting half time lines is a team's apparent scoring profile.Even if teams buck the usual 45%/55% scoring split by a large margin this doesn't seem to filter through to the posted odds.Probably such scoring trends are transient.

    In short punters are unfamilar with methods of pricing half time lines,there is a time constraint & so if you can deduce which outcome the bookie wants to take money on & he aggressively markets it then you can find value seeping out on the other side of the line.

    M,I haven't tried any of the software,but I have collaborated in writing a few:).The guy who codes them runs them thru a database of historical odds to see if they prove any good & occassionally sticks the results in a betting column he writes for a weekly paper overhere.

    The basis for most of the software uses goals scored/conceded,who they were scored/concede against & how recent the games were & not alot else.Occassionally I'll use games won/lost or drawn records but alot of the other data like time of possession/shots on target/strike rates etc whilst interesting only muddy the waters & give you a worse fit overall.

    Even splitting stats into home and away records,which you would imagine would vastely improve the fit of a model does exactly the opposite.You have to deal with these factors in a more general way.

    These "secondary" stats give imo at best weak indicators of a team's goal expectancy.Take time of possession.Do good teams have the ball longer? Some do,but then some cede possession and soak up pressure when they get ahead.Others defend by keeping possession.

    Out of the 32 Championsleague teams this year the 25th team out of 32 for TOP was Monaco.They've had the ball for around 48% of their matches.So they are very poor on this stat & the bookies agreed,they were rank outsiders(upto 40/1)at every knockout round.

    Based on goals alone however they were the top ranked team at every knockout stage....& they're now in the final.

    Software based on hard stats doesn't allow for inflated reputations.Monaco were outsiders because they were French(perceived as third rate club sides) & not likely to be popular with punters regardless of their actual results on the field.

    N6,the choice of an exponential smoothing constant comes by getting the best fit for historical results.

    More later,it's almost time to see if Tim Howard can keep out the might of Millwall who have a 1.5 of a goal start at 0.96/1.

    T.
     
  2. NoSix

    NoSix Member+

    Feb 18, 2002
    Phoenix
    Can you provide a reference, or is this your own work?
     
  3. numerista

    numerista New Member

    Mar 21, 2004
    Thanks for all the info ... I think I pretty much understand what you're saying, but I'll give it a second pass later when I'm a bit more awake. The reason I asked is that I'm curious as to what extent it's necessary to produce a "half decent rating system." I'm thinking

    (1) The betting houses have excellent rating systems.
    (2) It's possible to predict how far quoted odds will deviate from the betting house's estimate for the true probability.

    In that case, cooking up your own rating system would only add noise to the system.
     
  4. tachyon1

    tachyon1 Member

    Apr 23, 2004
    Hi N6

    all the stuff I use is my own & it's either been given a shake down by me or for me.The problem with using third party stuff is that it may contain adjustments that (quite rightly) aren't revealed.So if you start messing around with the rules yourself you're just as likely to be doubly accounting for some factor.

    Most of the books over here on rating sides thru goals scored/conceded begin & end with "add the goals scored by A in their last five games to the goals conceded by their opponents today in the last five games & divide by the number of games played"...& this is plainly wrong.

    Two of the most important factors in determining a reasonable value for a team's goal expectancy are;

    the number of games(the more the better)
    & how recent is the data(more recent is better)

    So a expo smoothed time series is the obvious choice.

    N,agree soccer modelling is pretty noisy,mainly as a result of the low number of goals scored.

    However I do prefer to have a repeatable procedure to estimate game odds.

    The betting environment in the UK perhaps isn't as spot on as you might think especially when firms take on new types of bets or delve into the nether regions of world football.

    One of the top three bookies made an absolute pigs ear of setting their asian handicap odds on a recent Euro Championship...they almost lost the company.
    Whether by accident or design they didn't know how to price a game up where one team gets a 0.75 of a goal start or a whole goal start.

    Another firm regularly prices up Australian State games(park standard soccer at best)on the same basis they use to price the EPL.Even though the Oz games have getting on for twice as many goals & virtually no home advantage.

    Even better is the advent of betting exchanges,where anyone can log onto a site & lay a price on anything they want.

    If established bookies don't know how to price asian odds(& it's not that difficult)then most members of the public definitely don't.

    Last season using an ELO Chess rating based soccer rating system & backing any away dog on the asian handicap when the price on a public betting exchange exceed your true estimate by 5% yielded 150+ bets & 12% return on level stakes.

    In short some people carn't price soccer properly & a rating system can alert you to when this might have happened.

    Don't really want to hammer the betting side because I'm really much more interested in the many different ways people find to evaluate a teams true worth.The two however inevitably overlap.

    T.
     
  5. numerista

    numerista New Member

    Mar 21, 2004
    Thx Tach ... I feel like we should be paying tuition. :)
     
  6. NoSix

    NoSix Member+

    Feb 18, 2002
    Phoenix
    Is it? Why is it obvious that an exponentially decaying time series is better than, say, a linearly decaying one?
     
  7. tachyon1

    tachyon1 Member

    Apr 23, 2004
    Hi Nosix,
    maybe "obvious" was a poor choice of words.

    I should have said "having tried out a variety of ways to give different weightings to more recent results,used those figures to generated predicted odds for w/l/d outcomes & then compared those predictions to the actual outcomes,the best solution I've come up with so far is to use an exponentially smoothed type of moving average" :).

    On a slightly different tack,has anyone given a least squares approach a go.It seems to deal admirably with an unbalanced schedule & is no respecter of reputations.

    T.
     
  8. numerista

    numerista New Member

    Mar 21, 2004
    Least squares estimation certainly makes sense ... IIRC, a preferred approach would be to use the square root of goals as the response -- this helps adjust for the fact that games with high predicted scores also have high variance.

    An easy wrinkle on this idea is weighted least squares, with exponentially decreasing weights, depending on how long ago a game was played.
     
  9. NoSix

    NoSix Member+

    Feb 18, 2002
    Phoenix
    tachyon1,

    If you choose some arbitrary level of significance, say 1%, over what time period do you find that previous results become insignificant? Is this time period similar for different leagues? Do you have any data that you could share here as an example?

    NoSix
     
  10. ZeekLTK

    ZeekLTK Member

    Mar 5, 2004
    Michigan
    Nat'l Team:
    Norway
    Hey what is this "Poisson" thing everyone keeps mentioning? Is it some kind of program I can download to use for predictions, or is it a mathmatical equation, or what?

    Anyways, I did something similar what you guys are trying to do (at least on the first couple posts, since I didn't read the whole thread) with CONCACAF World Cup Qualifying by using GF and GA for each home game and away game. For example with the United States vs Grenada I got that the USA would score 7.86 goals in their home game and Grenada would score 0.23 in their away game, so therefore I predicted the score of the game in Columbus will be 8-0 for the USA. On the return leg I got that the USA will score 4.39 goals, and Grenada will score 0.63 goals, so my prediction for that is 4-1 for the USA.

    To get the numbers used to determine scores, I look at how one team did against teams of similar skill as their opponents in previous World Cup Qualifying. For example in the Nicaragua vs St. Vincent / Grenadines pick I looked at St. Vincent's scores against teams like Antigua & Barbuda, Surinam, Bermuda, and Grenada, and Nicaragua's scores against teams like Panama, St. Lucia, St. Kitts & Nevis, and Barbados... all from previous World Cup Qualifying matches and figured that St. Vincent / Grenadines would win 2-0 in their home match, and then tie 0-0 at Nicaragua, to win the aggregate.

    Here's what I got for all my predictions:

     
  11. tachyon1

    tachyon1 Member

    Apr 23, 2004
    Hi Nosix,

    I can run thru the basic routes I've taken to arrive at my ideas,but recording r figures,significance tests etc wern't exactly top of my agenda when I started x number of years ago.

    Most teams don't vary in their ability very quickly & when they do there's usually a transparent reason(huge input of roubles(Chelsea) or getting themselves put into administration(Leeds)).

    Taking EPL teams over a time span of 100+ games,about 2.5 seasons,the average change in ability of teams is not much more than 0.2 of a goal.

    If you just want to look a goals scored/conceded,you don't want a recent result saw toothing the average too much & you do want enough results to be present so that a teams opponents are reasonably balanced.Given that, I've used results that go back a few seasons & have used a very small amoothing constant.

    The method has been evaluated by running least squares regression on predicted verses actual margins of victory &/or generating home/away/draw odds & comparing those with actual percentage occurrances again by regression.

    You can also incorporated trend &/or control type limits to try to spot if a team has greatly improved or otherwise and you've missed it.

    Getting a decent average is only half the problem.Your figure probably only gives you the likely number of goals that your team will score against an average team(not including themselves) from that league.

    If they're playing either the very worst or very best from the league you need to take that into account,probably by looking at scoring rates....& of course you need to allow for home field.

    It's fair to say that not all divisions are alike.Money talks louder in the English lower divisions.Fulham & to a lesser degree Portsmouth(improved over half a goal) made huge advances in ability on relatively modest investment by Premiership standards.

    Hi Numerista,

    I've just played around with LS for soccer.Like the sqrt idea,I'm currently capping at around a supremacy of 3 goals & toying with the idea of extra credit for teams who win narrowly.

    Weighting's also a likely improver.

    I'm getting some nice results on relatively few games in terms of ranking,not quite so good on rating.For example for the EPL a LS approach based on half a dozen games for each side is giving a difference from best in the league to worst of around 3 goals,when a more typical and believable value would be something in the region of 1.6 goals.


    To get margins that can be slotted into a Poisson you seem to need to be well into double figure games.

    Hi Zeek,
    a poisson's a distribution that represnts the number of events occuring randomly in a fixed time at an average rate.

    So if you known the average number of goals that a team scores in a game you can assign probilities to that team scoring 0,1,2,3 etc goals in a game.
    Do the same for their opponents & you in theory can find the probability of a game ending 1-1(the probability of team A scoring 1 goal multiplied by the probability of team B scoring 1 goal) or any other score.

    Hence you can add up all the individual score probabilities where the game ends level(0-0,1-1,2-2,3-3,4-4,the rest are probably to unlikely to bother with) & derive odds fotr the game to finish level.

    You can also do the same for scores where team A or team B wins.


    T
     
  12. numerista

    numerista New Member

    Mar 21, 2004
    Presumably, with a smaller number of games, you'd do well to bias all of your estimates towards the league average. Of course, regularized Poisson regression starts to make the coding into real work.

    Incidentally, do you have data in a handy format to share? In such a case, the odds of me coding things up would increase dramatically.
     
  13. NoSix

    NoSix Member+

    Feb 18, 2002
    Phoenix
    How small is "very small"? How about giving us a typical number?
     
  14. microbrew

    microbrew New Member

    Jun 29, 2002
    NJ
    Read post 13 and later of this thread. It's a few posts describing Poisson processes in layman's terms.


    tachyon1,

    I'm impressed. There's an academic paper hidden in this thread, undergoing peer review.

    voros suggested Bayes' Theorem as a possible answer to this issue in this post
    https://www.bigsoccer.com/forum/showpost.php?p=1986634&postcount=28
     
  15. tachyon1

    tachyon1 Member

    Apr 23, 2004
    Hi N6,
    sorry mate,I thought I had posted a figure & you're correct I haven't.


    So between 1/20 & 1/25.

    T
     
  16. tachyon1

    tachyon1 Member

    Apr 23, 2004
    Hi all,
    just quickly checked out the Voros link.

    Here's another way or it may be the same way in a different guise.

    I'll use the same figures so we can compare results.

    SJ score 1.5 g/g
    Dallas concede 2.133 g/g
    An imaginary match up between two average teams at a neutral venue should on average & longtern end in a 1.44333 draw.

    Divide 1.5/1.44333=1.03926 or in other words SJ score at 1.03926 times the league's average rate.

    Divide 2.133/1.44333=1.4778 or Dallas concede at 1.4778 times the league average rate.

    Multiply the two answers;
    1.03926x1.4778=1.5358

    So in a game on neutral ground SJ should score at 1.5358 times the league average against Dallas.

    The league average is 1.4433 g/g so;

    1.44333x1.5358=2.217 goals scored by SJ in a game against Dallas.

    V's method came to 2.214 iirc.

    Intuititively the answer looks OK & dimensionally it's OK as well.

    Worth bearing in mind that this g/g average only applies if the to teams meet on neutral turf or if the league has no home field advantage,which is unlikely.

    To address this if we go back to the start we need the additional information about goals scored by the home sides & goals scored by the away sides.

    Lets say home advantage is a fairly typical 0.4 of a goal & the home teams score 1.6433 g/g & the away sides 1.2433 g/g.

    Now in addition to the two rates we used previously we need to include one for the home side.

    Say SJ are at home.

    Divide 1.6433/1.4433=1.1386,so home teams score at 1.1386 times the average rate.

    Include this in the calculation & now SJ will expect to score on average,longterm 2.524g/g at home to Dallas.

    Another approach is to estimate a supremacy for the match up.(Least squares using goal difference??).
    There is usually a fair relationship between supremacy & total goals.As the former increases then so does the latter,albeit slowly.

    Evaluate the supremacy,calculate the expected total goals that goes with that supremacy & then you've got each teams individual goal expectancy.

    I prefer the first approach.

    T
     
  17. numerista

    numerista New Member

    Mar 21, 2004
    Pretty sure you're describing the same procedure. Microbrew ... I'm pretty sure this has been done in the literature, in Chance Magazine about eight years ago.
     
  18. NoSix

    NoSix Member+

    Feb 18, 2002
    Phoenix
    What units of time are you using? I would have thought days, but apparently not...
     
  19. AussieVamp2

    AussieVamp2 New Member

    Jul 8, 2000
    Melbourne, Australia
    Thanks tach (sorry for delay, been moving house!) :)
     
  20. AussieVamp2

    AussieVamp2 New Member

    Jul 8, 2000
    Melbourne, Australia
    Right, you have to reduce these, when looking at actual matches, whereas the actual ratings can have a bigger spread (before HGA)
     
  21. AussieVamp2

    AussieVamp2 New Member

    Jul 8, 2000
    Melbourne, Australia
    for least squares, that should probably get closer by the end of the season

    for example, for Germany this year :-

    have something roughly like this at the end of the season

    WERDER BREMEN 0.154
    LEVERKUSEN -1.012
    WOLFSBURG -1.036
    DORTMUND -1.147
    BOCHUM -1.24
    BAYERN MUNICH -1.379
    FREIBURG -1.381
    SCHALKE 04 -1.397
    HANNOVER -1.435
    HANSA ROSTOCK -1.436
    STUTTGART -1.502
    FC KOLN -1.63
    HERTHA BERLIN -1.707
    1860 MUNCHEN -1.716
    HAMBURGER SV -1.747
    MOENCHENGLADBACH -1.751
    FRANKFURT -1.807
    KAISERSLAUTERN -1.934
     
  22. AussieVamp2

    AussieVamp2 New Member

    Jul 8, 2000
    Melbourne, Australia
    Good stuff, the 0.84 exponent for this example you came up with from your work?
     
  23. AussieVamp2

    AussieVamp2 New Member

    Jul 8, 2000
    Melbourne, Australia

    You could try capping based on the standard deviations involved too...
     
  24. tachyon1

    tachyon1 Member

    Apr 23, 2004
    Hi AV2
    neat feedback.
    I've be meaning to get some use out of SD's for ages & have never gotten around to it.
    Interesting figures on the Bundesliga.Here's some LS figs from the EPL,the first set are the immediately preceeding 5 games only(to illustrate how unrepresentatively large the differences can get on a small sample) whilst the second set are for the season from start to mid february.

    ...........5 Game.......Season to feb.

    Arsenal......2.94.........2.32
    Chelsea......2.75.........2.12
    AVilla.......2.21.........1.17
    B'ham........2.02.........1.01
    Newcastle....1.92.........1.39
    Spurs........1.82.........0.92
    Bolton.......1.45.........0.89
    B'burn.......1.38.........0.94
    Lpool........1.30.........1.47
    M'bro........1.17.........0.94
    Fulham.......1.08.........1.15
    ManC.........1.00.........0.99
    Charlton.....0.99.........1.23
    P'mouth......0.96.........0.67
    ManU.........0.75.........2.02
    Soton........0.72.........1.10
    Leeds........0.35...........0
    Wolves.......0.21.........0.01
    Everton......0.10.........0.87
    Leicester.....0...........0.58


    I've tentatively thought about using standard scores to compare ratings where the spread & size of ratings varies.

    Using this method Arsenal's 2.94 five games rating is equivalent to a rating of 2.30 on the "season to Feb" scale.Which,in this case anyway compares well with their actual feb rating of 2.32 & indicates that Arsenal's five match form was fairly indicative of their probable season long form.

    In short comparing a mid season LS ratings with average season long ratings for various completed years in that division via standard scores(which incorporated SD's) might work:).

    T.
     
  25. NoSix

    NoSix Member+

    Feb 18, 2002
    Phoenix
    Not trying to give you a hard time, but something doesn't add up here. You claim to be using data that go back 2+ seasons, but the weighting constants you assign make any result older than 92 days (for 1/20) or 115 days (for 1/25) insignificant (<0.01). What gives?
     

Share This Page