Using Elo Ratings to Predit the Hex

Discussion in 'USA Men: News & Analysis' started by Maximum Optimal, Oct 20, 2012.

  1. Maximum Optimal

    Maximum Optimal Member

    Joined:
    Jul 10, 2001
    I'm going to look at the Hex's for the last 3 WCs to see how well the final standing correlated with the teams Elo Ratings as of the end of the semifinal rounds.

    Let's start with qualifying for the 2010 WC. I give two numbers in parentheses. The first is the points the team gained in the Hex and the second is their Elo Rating as of the end of the semifinal rounds.

    2010
    US (20) (1791)
    Mex (19) (1814)
    Hond (16) (1706)
    CR (16) (1682)
    ES (8) (1471)
    T&T (6) (1568)

    Elo did pretty well here in distinguishing between the front-runners (US and Mexico), the middle teams (Honduras and Costa Rica) and the weaklings (ES and T&T). There was a mild upset in the form of the US finishing ahead of Mexico and a bit more of an upset from El Salvador finishing ahead of T&T.

    Moving on to to the 2006 WC.

    2006
    Mex (22) (1844)
    US (22) (1804)
    CR (16) (1670)
    T&T (13) (1473)
    Guat (11) (1581)
    Pan (2) (1495)

    Here there was one upset. T&T's Cinderella run all the way to the WC. Otherwise, no significant surprises.

    2002
    CR (23) (1579)
    Mex (17) (1848)
    US (17) (1768)
    Hon (14) (1689)
    Jamaica (8) (1536)
    T&T (5) (1639)

    Here we get an even bigger surprise from Costa Rica. They outperformed their rating by a huge amount (with the result that their Elo Rating rose almost 200 points over the course of the Hex). Notable results for CR included road wins at Honduras, T&T, Jamaica and Mexico! And T&T was a significant underperformer.

    Let's now take a look at the ratings going into the 2014 Hex

    2014
    Mex (1894)
    US (1744)
    Pan (1668)
    Hon (1664)
    CR (1648)
    Jam (1624)

    Based on these ratings, this could be a very competitive Hex. The #6 team has a significantly higher rating than in the prior three Hexes. The US's gap over the number #6 team is 120 points. In the prior three Hexes the gap was 320, 331, 232. In this Hex our gap over the #4 and #5 teams is less than 100. In prior Hexes we enjoyed a bigger ratings edge over those teams. Finally, Mexico starts out this Hex with a seemingly much bigger gap over us and the rest of the competition than in the prior three Hexes.


  2. sidefootsitter

    sidefootsitter Member+

    Joined:
    Oct 14, 2004
    Mexico underwent their semi-regular coaching changes in the middle of the qualifying.

    Aguirre for Meza 2001, Aguirre for Svennis 2009. They did stick with Lavolpe in 2005.
  3. Maximum Optimal

    Maximum Optimal Member

    Joined:
    Jul 10, 2001
    A big reason for Costa Rica's great run in 2001 were two very good and in-form forwards--Wanchope and Fonseca. I think if a team is to outperform, the most likely reason is going to be a hot goal scorer or two.

    The schedule can make a difference too. You want some of the weaker teams late when they might be out of it and demoralized. But with this Hex's fairly even matched group and four teams potentially moving forward everyone might still be fighting to the end.
    EruditeHobo and 22SteveD repped this.
  4. Neuwerld

    Neuwerld Member

    Joined:
    Oct 15, 2007
    Location:
    California
    Club:
    San Jose Earthquakes
    Wow, those ELO ratings are exactly how I'd predict the hex if I had to, gaps and all. Panama and Honduras neck and neck, with Costa Rica and Jamaica slightly behind.

    44 points separating 3rd and 6th really shows how close it is though. It's tough to predict.


  5. EvanJ

    EvanJ Member

    Joined:
    Mar 30, 2004
    Location:
    Nassau County, NY
    Club:
    Manchester United FC
    Country:
    United States
    I entered the 18 data points from those three Hexagonals in my calculator and got a correlation of 0.706 and a regression equation of Hexagonal points = 0.035 * ELO points - 43.286. I did the same thing using the FIFA Rankings and got an almost identical correlation of 0.705 and a regression equation of Hexagonal points = 0.0334 * FIFA points - 6.581.
    When I made my predictions for the Semifinals I used a combination of FIFA and ESPN's SPI. For the Hexagonal I'm going to use those two and ELO.
  6. Reccossu

    Reccossu Member

    Joined:
    Jan 31, 2005
    Location:
    Birmingham
    I assume the Elo can be used to monte Carlo the hex results? Need a home/away adjustment probably. Also in Elo a draw is half a win, not 1/3, so that might complicate.
  7. TrueCrew

    TrueCrew Member+

    Joined:
    Dec 22, 2003
    Location:
    Columbus, OH
    Club:
    Columbus Crew
    Country:
    United States
    What, did my post get deleted?

    I posted earlier that the FIFA rakings (once the November ones come out) came pretty close as well.

    Here are Edgar's predictions for November:
    1. Mexico, 14th, 984 pts
    2. USA, 27th, 776 pts
    3. Panama, T46th, 609 pts
    4. Jamaica, 50th, 586 pts
    5. Honduras, 56th, 572 pts
    6. Haiti, T57th, 553 pts
    7. Canada, 60th, 526 pts
    8. Costa Rica, 64th, 509 pts
    9. Guatemala, 76th, 467 pts
    10. Trinidad & Tobago, 79th, 446 pts
    11. El Salvador, 92nd, 402 pts
    T12. St. Kitts & Nevis, T104th, 327 pts
    T12. Dominican Republic, T104th, 327 pts
    14. Antigua & Barbuda, 120th, 281 pts
    15. Suriname, 124th, 265 pts
    16. Guyana, 126th, 264 pts
    17. Puerto Rico, 128th, 248 pts
    =======================

    FIFA got 5/6 correct, with Haiti being the only outlier, and they've been snapping up points as a result of Caribbean Cup qualifiers taking place (as have T&T and some others). The Caribbean Nations will be in more of the correct order after the top teams have also played their Caribbean Cup matches, and more in line after Copa Centralamerica as well.

    Further, one looks at the groups as constituted, FIFA correctly picked who would advance out of every group.

    Group A: USA (27) and Jamaica (50) over Guatemala (76) and Antigua & Barbuda (120)
    Group B: Mexico (14) and Costa Rica (64th) over El Salvador (92nd) and Guyana (126)
    Group C: Honduras (56) and Panama (46) over Canada (60) and Cuba (139)

    A strange thing happened this time around, the best six teams made the Hex. ELO had it nailed, FIFA even came close, but is subject to more monthly swings than ELO, and the Caribbean Qualifiers happening at the same time messed things up a bit. But still, FIFA correctly got who would come out of each group.

    Point being, I don't think the results necessarily validate the rankings as much as the fact that we had six teams that were just a bit better than everyone else, and they all made it through. No upsets this time. Guatemala, El Salvador, and Canada made it close, and maybe if the groups were different (Canada in Group B), we'd have had a different result, but I doubt it.
  8. az2004

    az2004 Member

    Joined:
    Jun 5, 2012
    the nature of the schedule will be friendly when mexico actuall clinches bid for brasil

    they may test a few players, opening the door for a draw or even a loss not predicted

    i thin the last matchday panama. cr and honduras will be playing to go to brasil

    and usa will have clinched 4th but playing for an automatic
  9. dlokteff

    dlokteff Member+

    Joined:
    Jan 22, 2002
    Location:
    San Francisco, CA
    O.K. I tried to do this.

    The home/road is no problem, as the Elo win probability has an adjustment built in for this.

    Here are the USA probabilities right now:
    @ Mex = 0.192
    v. Mex = 0.429
    @ Pan = .466
    v. Pan = .734
    @ Hon = .471
    v. Hon = .738
    @ CR = .494
    v. CR = .756
    @ JAM = .529
    v. JAM = .780

    So we are actually underdogs in half the games, slight though that might be. This truly will be a Hex where the "tie on the road, win at home" mantra is the reality.

    The draw is a big issue, not so much the 0.5 vs. 1/3 issue, but the fact the the Elo prob. is an expected value. That is, a 0.75 prob., can derive from; 0.75*(1,win)+.25*(0,loss)=.75, or 0.5*(1,win)+0.5*(.5,draw)=0.75, or any combination in between.

    So, you have to make some assumption about the distribution of wins vs. draws that gets you to the corresponding Elo prob(expected value). I played around with it, and the results shown here are based on a 3/4 win, 1/4 draw split when the Elo prob. is >.6666, and 1/2 and 1/2 when the prob. is between .3333 and .6666. Someone smarter than me might be able to do better (or someone with a better tool the just Excel), but to show this yields a fairly reasonable distribution, here is one 1000 simulation run of the US results:

    W-D-L
    @ Mex = 70-227-703
    V. Mex = 241-379-380
    @ Pan = 324-295-381
    v. Pan = 611-229-160
    @ Hon = 315-317-368
    v. Hon = 633-216-151
    @ CR = 344-318-338
    v. CR = 636-208-156
    @ JAM = 356-345-299
    v. JAM = 671-221-108

    Here is the resulting probability of finishing in the Top 3 (when teams were tied for 4th I gave them 50% chance to be in Top3, or if three teams tied for 2nd, they got 66.66% chance, etc.):

    Mexico = 98.3%;

    wow, they really are far out ahead.

    USA = 73.3%;

    would like this to be higher, but goes to show this is not a lock with 1 team ahead and 4 below not that far behind. Not a lock to qualify.

    Panama = 37.7%
    Honduras = 35.8%
    Costa Rica = 30.7%
    Jamaica = 24.1%

    A dogfight for the final spot.

    Here's how the USA placed in the Hex (if they were tied it shows them in that place):
    1st = 134
    2nd = 419
    3rd = 205
    4th = 134
    5th = 68
    6th = 40.

    There's a few times there that the 4th place will be a tie and we might be out, but it's pretty close to a 90% chance then, that we at least make the play-off.
    Ghosting, Quaker, dcole and 8 others repped this.
  10. EvanJ

    EvanJ Member

    Joined:
    Mar 30, 2004
    Location:
    Nassau County, NY
    Club:
    Manchester United FC
    Country:
    United States
    Based on that our average probability for the ten games is .5589. We're only favorites in half but we have greater than a 0.7 for four out of five games we are favored in and we have less than a 0.4 chance for only one game. All four of our away games not at Mexico are relatively even with at least 295 wins, draws, and losses each from 1,000 simulations.
    The difference between home and away probabilities for the five teams ranges from 0.237 (vs. Mexico) to 0.268 (vs. Panama). Is there a way of comparing this to the actual difference for previous Hexagonals and/or all previous games by any country in the last four years?

    Edit: If we lose at Mexico and get 1 win, 2 draws, and 1 loss from our other four away games that will be 5 points. If we get 3 wins and 2 draws at home (with 1 of the draws being against Mexico and another from a game we are favored in), that will be 11 points at home and 16 total. 16 most likely gives a team third or fourth.
  11. az2004

    az2004 Member

    Joined:
    Jun 5, 2012
    how does the 4th place fit into these projections??

    usa home and away with new zealand, what's the odds for usa winning this assuming usa is 4th
  12. az2004

    az2004 Member

    Joined:
    Jun 5, 2012
    i picked game by game and have us with 16 or 17 points...

    17 should give usa a slot in brasil, and i think 16 is more than enough for 4th
  13. dlokteff

    dlokteff Member+

    Joined:
    Jan 22, 2002
    Location:
    San Francisco, CA
    I'm not totally sure what you are asking here, but there is always the issue that you have to assume some distribution of wins,draws, and losses that gets you to the Elo prob. when trying to predict a result. My version of 0-.3333=loss, .3333-.6666 = draw, and .6666-1 = win, might not really be very good. Also a Hex's worth of results (or even one team's over a whole 4 years) is a pretty small sample size.

    I'd just say that MO's original post (edit: late rep given) gives a pretty good indication of Elo's accuracy. It's got the order somewhere between 1/2 and 5/6 correct, which is not bad, and accuracy would improve with larger sample (of course a larger sample isn't reality, the Hex (and our WC hopes!) is simply 10 games.)

    Actually 16 points is kind of the magic number. Your scenario above (or others like it) is very reasonable. No surprise then that in my Sim, it is the most likely outcome for the US:

    0 points = 0 times
    1 = 0
    2 = 0
    3 = 0
    4 = 1 ......JK is definitely fired for that one !!!
    5 = 2
    6 = 4
    7 = 11
    8 = 19
    9 = 23
    10 = 49
    11 = 62
    12 = 67
    13 = 68
    14 = 102
    15 = 98
    16 = 112
    17 = 94
    18 = 77
    19 = 51
    20 = 73
    21 = 30
    22 = 24
    23 = 11
    24 = 11
    25 = 7
    26 = 1
    27 = 2
    28 = 1
    29 = 0 (Damn, never ran the table)
    30 = 0

    16 points is much better than you thought though. It actually guaranteed (in this Sim, not sure if it's a mathematical certainty or not) advancement. On 16 points, the USA finished in the following place over then 112 outcomes (possibly tied).

    1st = 6 times
    2nd = 80 times (I checked and this was never a three way tie, thus we advance)
    3rd = 26 times (2 times we were tied, so possible 4th place, but virtual lock)
    Worse = NEVER

    15 points is also very safe, as is 14. 13 starts to get dicey. % of finishing in at least tie for 3rd, given points:

    9 or less points = 0%
    10 = 2%
    11 = 14.5%
    12 = 38.8%
    13 = 58.8%
    14 = 91.2%
    15 = 96.9%
    16 or more points = 100%
    Sachsen repped this.
  14. dlokteff

    dlokteff Member+

    Joined:
    Jan 22, 2002
    Location:
    San Francisco, CA
    I ran 1000 Sims v. NZ and I got the following:

    6 points = 25.7%
    4 points = 37.0%
    3 points = 17.6%
    2 points = 9.0%
    1 point = 8.6%
    0 points = 2.1%

    If you assume 50% chance via the GD in the 3 and 2 point scenarios, the USA advances 76% of the time it faces the Kiwis.

    So it's something close to this
    Finish Top 3 (73.3%) + Finish 4th and Beat NZ (15%*76%) =

    84.7% likelihood of Brazil.

    Of course it's even higher if we face New Caledonia!
    dcole repped this.
  15. az2004

    az2004 Member

    Joined:
    Jun 5, 2012
    this modelling gievs me pretty good confidence for brasil

    jamaica beating us for top 4 is very remote

    it seems CR, HOND and PANAMA need to all beat usa

    and PANAMA depth willeventually haunt them, i see panama getting too many ties to be top 3
  16. EvanJ

    EvanJ Member

    Joined:
    Mar 30, 2004
    Location:
    Nassau County, NY
    Club:
    Manchester United FC
    Country:
    United States
    I was asking if the difference between home and away value for the same opponent being near 0.25 has been close to actual results as opposed to the value supposed to being significantly higher or lower than 0.25. For that matter, I wonder what the overall win, draw, and loss percentages are for the home team in qualifiers (if good teams play a greater percentage of their friendlies at home then including friendlies would hurt the data if the home teams are much better).
    16 points is not a mathematical certainty for the top three. It takes 22 points to do that. There could hypothetically be a four way tie for first with 21 points if each of the top four got all 12 points against the bottom two and 9 out of 18 points (half) against the rest of the top four.
  17. chad

    chad Member+

    Joined:
    Jun 24, 1999
    Location:
    chicago
    Country:
    United States
  18. dlokteff

    dlokteff Member+

    Joined:
    Jan 22, 2002
    Location:
    San Francisco, CA
    I got ya.

    The Eloratings site makes it egregiously difficult to download data from, so I can't do it for a large sample. But I see what you are getting at, wondering about the home field advantage accuracy, so I did calculate the Expected Value from Elo, vs. the actual results for all USA WCQ since 1996.

    This is a sample of 74 matches.

    Elo's overall accuracy is stunning. It's expected prob. for the USA is 0.726. The actual results for the USA in these 74 qualifiers... 0.723!

    However, Elo's Home/Road adjustments don't hold that well, at least for the USA in WCQ's. Not surprisingly, it's been tougher than Elo thinks to get Road points, and the US has just been Nails at home.

    At Home: Elo Pred. = 0.815; Actual = 0.905
    On Road: Elo Pred. = 0.637; Actual = 0.541

    So these larger home field advantages essentially cancel each other out in the USA's case.

    It would be interesting to see how Elo's home/road adjustment holds up for other teams in our region in qualifiers, vs. how it does in friendlies, or how CONCACAF differs from other regions with respect to the homefield. If I could download the Elo data in a useable format, I'd do it...
  19. dlokteff

    dlokteff Member+

    Joined:
    Jan 22, 2002
    Location:
    San Francisco, CA
    I was thinking the same thing when I was doing this.

    Says last activity, yesterday at 6:30 PM.

    But he's probably got better things to do.
  20. EvanJ

    EvanJ Member

    Joined:
    Mar 30, 2004
    Location:
    Nassau County, NY
    Club:
    Manchester United FC
    Country:
    United States
    That's a home-road gap of 0.178 for the ELO predictions which is much less than the gap of about 0.25 for the predictions for the five Hexagonal opponents. Why is that?
  21. dlokteff

    dlokteff Member+

    Joined:
    Jan 22, 2002
    Location:
    San Francisco, CA
    The differences are driven by the fact the Elo's win expectancy is a logistic function.

    The home field team is given 100 points. As to why they chose 100, I don't know, but I'm sure it was based on "some" historical data. If two teams are evenly (exact same Elo rating) matched, the home team gets 100 extra, and their win expectancy calculates to 0.64; the opposite then for the road, 0.36. Since the function is flat around zero, -100 or 100 yields this significant difference

    But say we are facing Barbados. We had approximately a 500 Elo point advantage we we faced them in a Home and Home in June of '08 with the "World Cup on the Line." So the function has us +600 or +400. Since we a far from zero, away from the flat part of the curve (approaching an asymptote) the differences aren't so great; 0.969 at home, 0.909 on the road.

    The closer you are to an opponent in rating, the bigger the home road splits. That's why the biggest difference for the US in this Hex as you pointed out above is Panama (our closest Elo competitor) and the least is Mexico (biggest disparity). I'm trying to convince myself that there is some logic to this. It might make sense against a side like Barbados (or A&B). The difference is so wide; Elo recognizes a Minnow, and a real minnow doesn't have much soccer tradition, and isn't going to enjoy much home field advantage (unless they stripe an illegal field on a Cricket oval :D). When the game is expected to be tight though, the home crowd is into it, throwing bags of piss and batteries???? But mostly, it's just math.:sleep:

    In my historical data, the Elo difference between the US and it's opponents is much greater than what we are facing in this HEX (echoing the original post). We were heavy favorites usually, thus the home road differences are minimized, since the function expects us to get points either way.

    Side note: While we were generally bigger favorites in those days, there were some interesting times. I certainly was surprised by this one: 3/16/1997: USA played Canada at home. Canada, CANADA :eek:, was the higher rated team 1626 to 1618. USA punked 'em 3-0 and the two programs went their separate ways.
    blacksun repped this.
  22. ImaPuppy

    ImaPuppy Member+

    Joined:
    Aug 10, 2009
    Location:
    Right behind you...
    Club:
    Houston Dynamo
    Country:
    American Samoa
    Just wanted to chime in to say that this is very thought-provoking work. Nice job MO!
  23. Maximum Optimal

    Maximum Optimal Member

    Joined:
    Jul 10, 2001
    One more interesting point to emphasize. On average during the last three Hexagonals, there has been one team that has significantly underperformed or overperformed. T&T in the 2006 cycle. CR in the 2002 cycle. T&T in the 2002 cycle. Two significant overperformances and one significant underperformance. In my view (and this is not backed up by any research) overperformances in these types of competitions will mainly tend to be due to having a hot goal scorer or two. I suppose a hot goalie could have a similar effect. Underperformances are more likely to be due to collective issues within a team--lack of team unity, clashes between groups of players, a coach that isn't right for the team. Something to think about.
  24. sidefootsitter

    sidefootsitter Member+

    Joined:
    Oct 14, 2004
    Anyone tried a prediction/calculation based on the upward or downward point movement within, say, the last year of the beginning of the competition?

    PS. As I recall, Voros is doing some non-soccer stat work for ESPN, which chose SBI as its model anyway. In this case, a gentlemanly thing to do is to sit this out.
  25. Suyuntuy

    Suyuntuy Member+

    Joined:
    Jul 16, 2007
    Location:
    Vancouver, Canada
    That's my general observation: the USA-MEX gap seems to be growing, and the USA-top UNCAF teams gap seems to be diminishing.

    A process that started right after the WC, in the Gold Cup already. Even if the results are still more or less the same, what we see on the field causes concern.

    It may not be significant, but this drop has coincided with Donovan's phasing-out.

Share This Page