Question about use of Poisson distribution

Discussion in 'Statistics and Analysis' started by NoSix, Dec 30, 2003.

  1. tachyon1

    tachyon1 Member

    Apr 23, 2004
    Hi Nosix,

    most EPL teams play a game on a Saturday & then play their next game the following Saturday so why would I give (say) 75% weighting to the game if it occurs 7 days hence rather than 96%.

    It's a well known fact that most EPL footballers spend their time between games cutting the lawns of their golf course sized manorial lawn in deepest Hertfordshire,giving their private secretary Spanish lessons,deliberately missing drugs tests,arranging their bail surety & purchasing Mini Coopers in various colours(yellow with a black roof please).None of which has very much effect on their ability to play football.

    Therefore arranging the periods so that they contained exactly one game would seem sensible.

    However this can cause problems as games are frequently abandoned due to the legendary behaviour of alcohol inflamed English football "hooligans" justifiably incensed by the referee wrongly awarding a throw in to their hated local rivals.

    Hence a sensible compromise would seem to be to take one game as a period regardless of the actual timespan.

    If you can suggest a superior alternative then I'm all ears:).

    All the best T.
     
  2. NoSix

    NoSix Member+

    Feb 18, 2002
    Phoenix
    I see. That would certainly simplify the calculation of the weights. Thanks for answering my questions.
     
  3. AussieVamp2

    AussieVamp2 New Member

    Jul 8, 2000
    Melbourne, Australia
    If you wanted to you could have a half week as a period, takes care of midweek games, and only messed up by the occasional case where end of season, or some other catchup reason a team had 3 games...


    or perhaps the english xmas period :)
     
  4. AussieVamp2

    AussieVamp2 New Member

    Jul 8, 2000
    Melbourne, Australia
    Good reference on a technique usable for that (or underdispersion, as the case may be) for in-season soccer models would be the most useful, of course ;-)
     
  5. AussieVamp2

    AussieVamp2 New Member

    Jul 8, 2000
    Melbourne, Australia
    -- Yeah, or same thing for Scottish lower divisions until recently, etc.

    Last season using an ELO Chess rating based soccer rating system & backing any away dog on the asian handicap when the price on a public betting exchange exceed your true estimate by 5% yielded 150+ bets & 12% return on level stakes.

    -- Nice. Take it home faves, dogs, and away faves not as much joy? That was just one league was it?
     
  6. AussieVamp2

    AussieVamp2 New Member

    Jul 8, 2000
    Melbourne, Australia

    Here's a question though :- Should you vary a team's ability (or change it as quickly) later in a season as earlier?
     
  7. AussieVamp2

    AussieVamp2 New Member

    Jul 8, 2000
    Melbourne, Australia
    Yeah, seems to be ok. Although more complicated to calculate, of course, and you don't mostly directly get your handicap probabilities to win by 2 goals, etc., that you were mentioning, either. Also still the same in-game score state dependency remains as well, of course.
     
  8. AussieVamp2

    AussieVamp2 New Member

    Jul 8, 2000
    Melbourne, Australia
    Out of interest, have you ever tried something like an extra variable (another rating, I guess) and adjusted that up or down more quickly based on recent results, and combined that with a rating? Would only be a small improvement if any, I am sure, but would be interesting perhaps.
     
  9. tachyon1

    tachyon1 Member

    Apr 23, 2004
    Hi AV,
    in no particular order.

    Betting favs etc on the asian handicaps doesn't cut it at present & turns in a small loss even at best price & even if you give yourself a fairly large margin for error for predicted price verses available price.

    The leagues looked at were all English,with the EPL & 1st division predominating.

    Profitability of away dogs is down to having to price up the chances of the least favoured team winning,which intuitively isn't very easy.Most exchange players retreat to the misplaced safety of the fav.

    Also most asian prices take their lead from the usual home/draw/away odds so any price biases present in those can get carried thru.

    It also helps that no one can at present a cast iron case for a fav longshot or a longshot fav bias or maybe any bias at all in English odds setting.So the waters are very muddied.

    I'm reluctant to give too much extra credence to recent form because of the very low scoring in soccer.You could just be factoring in luck or perverse reffing.Bristol City for example followed 10 straight wins this season with one win in 8 but that kind of run is very unusual.Their true ability was probably somewhere between those two extremes.

    IMO a trend value approach would be the way to go.It might also be useful to deal with the seasonal goal trends that repeat year on year.

    Don't forget reversion to the mean which is certainly going to need to be allowed for.

    I do like the LS approach for leagues or competitions where you've got to make some pretty quick evaluations because teams are dramatically different from last years model(Aussie state leagues+ only around 20 games per team per season) or where teams haven't met before on a regular basis(Champions league,again limited number of games in an incomplete schedule).

    Also the lower down the talent scale you go the larger becomes the team news factor,which probably accounts for the reluctance of books to price up completely lower Scottish leagues.

    By contrast most major European domestic leagues are much less volatile on a season to season basis even with the added complication of relegated & promoted teams.

    Some great ideas for further discussion.

    T
     
  10. AussieVamp2

    AussieVamp2 New Member

    Jul 8, 2000
    Melbourne, Australia
    Betting favs etc on the asian handicaps doesn't cut it at present & turns in a small loss even at best price & even if you give yourself a fairly large margin for error for predicted price verses available price.

    -- ok, so you are identifying a general pricing flaw with the help of a rating to some degree then it would seem

    Profitability of away dogs is down to having to price up the chances of the least favoured team winning,which intuitively isn't very easy.Most exchange players retreat to the misplaced safety of the fav.

    -- yeah, not much fun laying 8.00 prices either !


    Also most asian prices take their lead from the usual home/draw/away odds so any price biases present in those can get carried thru.

    -- yes, although improved a little, I think


    It also helps that no one can at present a cast iron case for a fav longshot or a longshot fav bias or maybe any bias at all in English odds setting.So the waters are very muddied.


    -- still some, but not as much as previously (at least in the home favorite case) as far as I can tell

    I'm reluctant to give too much extra credence to recent form because of the very low scoring in soccer.You could just be factoring in luck or perverse reffing.Bristol City for example followed 10 straight wins this season with one win in 8 but that kind of run is very unusual.Their true ability was probably somewhere between those two extremes.

    IMO a trend value approach would be the way to go.It might also be useful to deal with the seasonal goal trends that repeat year on year.

    -- not sure I understand what you mean by trend value in this case?

    Don't forget reversion to the mean which is certainly going to need to be allowed for.

    --- Ok, what was that guy and the 'grand means' thing, guess I will remember shortly, is that what you mean?

    I do like the LS approach for leagues or competitions where you've got to make some pretty quick evaluations because teams are dramatically different from last years model(Aussie state leagues+ only around 20 games per team per season) or where teams haven't met before on a regular basis(Champions league,again limited number of games in an incomplete schedule).

    -- Australian bookies not silly enough to do these, although Scandinavian equivalents abound.


    Also the lower down the talent scale you go the larger becomes the team news factor,which probably accounts for the reluctance of books to price up completely lower Scottish leagues.


    -- Yes, and more to the point though is that if you get it wrong, you will get 3 5 buck bets, and one max bet from a pro, so not much chance of winning, given no one else interested in something watched by 200 people, 3 dogs and a goat.

    By contrast most major European domestic leagues are much less volatile on a season to season basis even with the added complication of relegated & promoted teams.

    -- On how much team ratings change you mean?

    Some great ideas for further discussion.
     
  11. AussieVamp2

    AussieVamp2 New Member

    Jul 8, 2000
    Melbourne, Australia
    So, do you mean your look at values of players suggest that key 'spine' players, or at least first choice spine players are more important, relatively, in lower divisions?
     
  12. AussieVamp2

    AussieVamp2 New Member

    Jul 8, 2000
    Melbourne, Australia
    The question I of course forgot is why use an ELO rating rather than the poisson rating you had been discussing, it worked better, or just for interest?
     
  13. tachyon1

    tachyon1 Member

    Apr 23, 2004
    Hi AV,

    re regression/reversion to the mean.
    Generally speaking the great aren't as great as they seem & the really poor aren't quite as poor as they seem.Teams numbers(ratings etc) tend to home in towards the mean for that league as the extreme performances are likely to be the result of accumulated good or bad luck.

    In the absence of any fundamental change to the teams make up sides that are good in the first half of a season usually remain good in the second half but by not quite as much.The same for poor sides.

    The baseball guys regress every stat that moves & you probably should also in soccer.

    re team news.
    Mostly anecdotal this one. I know guys who bet exclusively on minor leagues in Scotland & State leagues in Aus & they maintain that presence or absence of certain players can greatly affect outcomes.National league players turning up in the State leagues in Aus being a case in point.

    Most English books will willingly give you 2.4 Campbelltown,2.6 Adelaide R & 3.75 the draw all the way thru to the start of the new English season.Most wisely steer clear of the Tas leagues though.

    The ELO wasn't mine,I just crunched the results.

    More later.
    T.
     
  14. AussieVamp2

    AussieVamp2 New Member

    Jul 8, 2000
    Melbourne, Australia
    Hi AV,

    re regression/reversion to the mean.
    Generally speaking the great aren't as great as they seem & the really poor aren't quite as poor as they seem.Teams numbers(ratings etc) tend to home in towards the mean for that league as the extreme performances are likely to be the result of accumulated good or bad luck.

    -- right - so other than smoothing are you specifically adjusting for this on top, so to speak?

    In the absence of any fundamental change to the teams make up sides that are good in the first half of a season usually remain good in the second half but by not quite as much.The same for poor sides.

    The baseball guys regress every stat that moves & you probably should also in soccer.

    -- was thinking more of game to game - and now I recall what I was looking for : James-Stein estimation -- and part of the question of should you adjust teams as quickly later on in the season?


    re team news.
    Mostly anecdotal this one. I know guys who bet exclusively on minor leagues in Scotland & State leagues in Aus & they maintain that presence or absence of certain players can greatly affect outcomes.National league players turning up in the State leagues in Aus being a case in point.


    -- yeah, well in general be hard pressed to find out who is playing for who!

    Most English books will willingly give you 2.4 Campbelltown,2.6 Adelaide R & 3.75 the draw all the way thru to the start of the new English season.Most wisely steer clear of the Tas leagues though.

    -- Have to have a look at the state leagues, interesting.

    The ELO wasn't mine,I just crunched the results.

    -- ok, fair enough.

    More later.

    -- Cool, look forward to it :)
    T.
     
  15. AussieVamp2

    AussieVamp2 New Member

    Jul 8, 2000
    Melbourne, Australia

    Wasn't soccerratings was it, out of interest?
     
  16. Zebby

    Zebby New Member

    May 15, 2005
    Running away screaming is all part of the mating ritual.

    Are you really interested in the 2.7182818284% of chicks that don't run away? They're 'e'asy.
     
  17. johnh00

    johnh00 Member

    Apr 25, 2001
    CT, USA
    Club:
    New England Revolution
    Nat'l Team:
    United States
    Thought I'd resurrect this old thread, as it makes for some good reading, plus I had a question related to it. ;)

    I've been working on a computer program for analyzing sports statistics. It can be used to analyze a lot of stats, but as part of the testing of my theories and my programming, I've been analyzing soccer players' stats and using it to produce some predictions for results. I am able to create a poisson distribution on the predicted results for goals for/against and manually figure out odds of wins/ties/losses based on this, but I wondered if there is any elegant(read: simple mathematical formula) to do this instead? If not, I'll just write a sub-routine to handle the problem, but I was feeling a bit lazy and figured someone may know of an easier way around this.

    - Lee
     
  18. numerista

    numerista New Member

    Mar 21, 2004
    Off the top of my head, I don't think there is a simple formula, although I'm feeling a bit lazy myself. :)
     
  19. johnh00

    johnh00 Member

    Apr 25, 2001
    CT, USA
    Club:
    New England Revolution
    Nat'l Team:
    United States
    Looking at it, I would think it's possible to come up with one, but I think I'll just create a function to process this out, instead. I could write the program in a couple of hours, but if I start trying to come up with a mathematical solution, it will probably take me days, and I'm not even sure it would work. :p
     
  20. voros

    voros Member

    Jun 7, 2002
    Parts Unknown
    Nat'l Team:
    United States
    John, I've found that the following formula is pretty close (as close as I can get so far):

    Win% = (wins + (draws*0.5))/ games
    Gf = Goals for
    Ga = Goals against
    C = 1st constant = .307158 (I have no idea how many decimals are needed to get it accurate)
    K = 2nd constant = .432928

    Poisson win% = (approximately) = ((Gf + C) ^ ((Gf+Ga)^K)) / (((Gf + C) ^ ((Gf+Ga)^K)) + ((Ga + C) ^ ((Gf+Ga)^K)))

    Break it out by individual parens and it will be easier to see what's going on.

    The formula takes into account not just a ratio of goals scored over total goals raised to an exponent, but also allows the exponent to float based on scoring environment.

    Sample scores for each compared to one another:
    Code:
    Score  Poiss   Math
    Ties   50.0%  50.0%
    1-0    81.6%  81.0%
    2-0    93.2%  93.8%
    2-1    71.2%  71.4%
    3-0    97.5%  97.9%
    3-1    84.1%  84.4%
    3-2    66.9%  67.3%
    
    And so forth

    Now one possible mistake I made is I best fit those constants counting each possible scoreline as a single data point. I compiled a total of around 50 different scorelines that occurred in international matches. You could argue that I should count 1-0 as many more datapoints than say 7-0 since 1-0 is a far more common scoreline. The end result of doing it that way would likely make the predictions for normal scores a bit more accurate and for the more lopsided scores less accurate.

    The good thing is I'd only have to change the constants, the formula itself should remain the same. Also the amount of accuracy gained would be minor I think.
     
  21. voros

    voros Member

    Jun 7, 2002
    Parts Unknown
    Nat'l Team:
    United States
    John a quick update. Weighting the best fit test to count 1-0 scores far more heavily than less common scores like 10-1, the two constants are:

    C = 1st Constant = .297464
    K = 2nd Constant = .41427

    The above sample would now look like

    Code:
    Score  Poiss   Math
    Ties   50.0%  50.0%
    1-0    81.6%  81.4%
    2-0    93.2%  93.8%
    2-1    71.2%  71.1%
    3-0    97.5%  97.8%
    3-1    84.1%  84.2%
    3-2    66.9%  66.9%
    With the downside being as the scores become more and more lopsided, the formula more and more underrates the winning team (though the difference between 99.94% and 99.77% might not be that big of deal for use in most systems).
     
  22. CaptainJack

    CaptainJack New Member

    Aug 2, 2006
    How do you calculate which team that will score first if you have a decent average of how many goals the home team and away team will score and in what 15 min interval it will be scored?

    I have tried some methods on my own and compared the probabillities to what the Sportsbook have but I am miles away from their valuation.

    Could someone please give me some help here?
     
  23. tachyon1

    tachyon1 Member

    Apr 23, 2004
    Hi CJ,

    the negative binomial distribution's a good place to start.You can run in in excel or openoffice..
    Essenially you're looking at the probability of x number of failures before you get a success.

    Say a team has a goal expectancy of 1.2 goal/game.Divide that by 92 to get the probability of scoring in any minute of the game.You get 0.0130434

    So the probability of scoring their first goal in the first 60 seconds is 0.0130434.

    The prob of scoring their first goal in the second 60 seconds is;
    (1-0.013034)*(0.0130434),in other words the prob of NOT scoring in the first minute multiplied by the prob of scoring in the second minute.

    and so on and so on.

    Excel's quicker.The formula's something like =NEGBINOMDIST(1;1;0.0130434) for the first minute;=NEGBINOMDIST(2;1;0.0130434) for the second,=NEGBINOMDIST(3;1;0.0130434) for the third etc.

    You want the cumulative probability density,not each individual minute probability so the last step is to total them up.

    For this example the cumulative total crosses 50% between 52 and 53 minutes.

    time......individual...............cumulative
    52.........0.0066772............ 0.494759
    53.........0.0065901............ 0.501349

    So at that point you're more likely to have scored than not and that's the time you take as your most likely first goal.

    That's the bare bones,but in reality it needs tweaking.

    Low goal expectancies quickly give 1st goal times in excess of 90,plus your chances of scoring a goal isn't constant.Scoring increases as the game progresses.So in this model in high goal scoring environments the first goal comes too quickly,in low ones,too late.

    If you address these problems you should get something like this.

    Goal expectancy.........1st goal time.
    0.5.............................72
    0.7.............................67
    0.8.............................64
    1.0.............................60
    1.2.............................56
    1.5.............................50
    1.7.............................47
    2.0.............................43
    2.2.............................41
    2.5.............................36
    2.8.............................33
    3.0.............................32
    3.5.............................29

    Here's a comparison I did between actual 1st goal times and predicted from their end of season goal averages for the EPL.2004/2005 season iirc.

    Actual average first goal time in minutes is followed after the slash by the predicted first goal time using the above model.

    Team..........Home....................................Away
    ..................scored......conceded............score..........con.
    Chelsea.......46/46........77/85...............47/44.........68/71.
    Arsenal........30/33........63/60...............55/46.........71/62.
    Manu...........46/49........68/68...............54/53.........65/66.
    Lpool...........50/49........61/65...............61/59.........46/51.
    Everton........58/55........58/65...............61/59.........50/49.
    Bolton..........52/54........56/60..............46/53.........54/55
    Mbro............53/51.......59/60...............57/56.........54/52.
    ManC..........47/55........63/65...............51/55.........49/53.
    Totten.........51/45........55/57...............70/70.........59/60.
    AVilla..........53/53........57/61...............54/60.........35/43.
    Charl...........40/50........53/50...............68/67.........62/50.
    Birming.......53/55........69/66...............60/63.........46/48
    Fulham........56/50.......56/53................63/57........42/45
    Newc..........52/54........54/54...............53/56.........51/48.
    Blackb........52/58........54/57...............70/70.........64/58.
    Pmouth.......44/48.......39/53................58/67.........45/46.
    WBA..........65/63........60/56...............57/60..........38/43.

    One thing to note,Chelsea's actual(77) and predicted(85) isn't great.That's because Chelsea conceded only 6 home goals(only 5 were 1st goal because Bolton scored twice),therefore the time of the first goal conceded in their many goalless games is taken as 90.Which is obviously artificial.

    T.
     
  24. tachyon1

    tachyon1 Member

    Apr 23, 2004
    Slightly more time so here's a few more little pointers.

    The major problem when teams don't have a high goal expectation is the goaless games.A negbino approach gives 1st goal times in excess of 90 minutes.

    Here's a workaround.

    Say a team only scores 0.5 goals per game.Stick that in a poisson and you've got around a 60% chance of that team scoring zero goals.Over a 38 game EPL season that's about 23 games.if that's the case those 23 games will in the end of season stats have a 1st goal time for that team of 90 mins(or 93 depending on how you're going to treat injury time).

    Assume next that the 19 goals they did score were scored in the remaining 15 games.That's 1.266 goals/game.Stick that figure into a negbino,further assuming that goals are equally likely in any minute(again untrue) and you get the time for the first goal at about 50 minutes.

    Therefore in total you have 23 games with first goal time at 90 and 15 games with first goal time at 50.

    Average those and you get average 1st goal time for at team with a goal expectancy of 0.5 as 74 minutes.That's about a minute later than teams of that profile actually score their 1st goal.

    The next fudge is at the other extreme.

    Say teams score 2 goal/game.Over an EPL season goaless games account for only 5 games now,so that isn't the area where the major errors will occur.

    Now you need to concentrate on more accurately defining a teams goal expectancy for every one minute section of a game.

    To do this you need to know that the goal expectancy for the remainder of a games is the initial goal expectancy multiplied by the (proportion of the game remaining) raised to 0.83.

    So to work out the goal expectancy for the first minute work out the goal expectancy for the last 89 minutes(ignoring injury time) and subtract this from the initial expectancy.

    For an initial GE of 1.0 it'll be 0.00923 of a goal for the 1st minute.

    Then work the GE for just the second minute.
    (Work out GE for the last 88 minutes,add the GE you already calculated for the 1st minute and subtract from the initial whole game GE)

    And the 3rd and so on.

    You carn't now just stick all these seperate,but more realistic individual GE's into a negbino because that demands a constant probability of success or failure.But you are on the road to improving your model for an elevated goal environment.

    T
     
  25. arrplayr

    arrplayr New Member

    Sep 14, 2006
    Re: Halftime/Fulltime and Halftime results

    Hi,
    I am working on an application for a bookmaker and I need a guidance on two things.
    1. How can I calculate (set) the odds for Halftime/Fulltime (1-1, 1-X, 1-2...) lines only by knowing Home Draw Away odds.
    2. Same question is for Halftime result.
    I have managed to find out the formula for Double Chance lines but I am not able to find out the formula for above two lines.
    Here is a quick example of one of bookies odds (perhaps you can help me with these exact odds)
    1 = 3.35
    X = 3.10
    2 = 2.00

    Halftime: 1 = 4.00; X = 1.93; 2 = 2.60
    HalfTime/Fulltime: 1-1 = 6.15; 1-X = 13.0; 1-2 = 30; X-1 = 7.25; X-X = 4.85; X-2 = 4.75; 2-1 = 35.00; 2-X = 13.00; and 2-2 = 3.35.

    Even if there is no formula for this, I would very much appreciate if you just orient me to the right direction. THANKS.
     

Share This Page