Elo Ratings

Discussion in 'Women's International' started by soccersubjectively, Aug 22, 2016.

  1. soccersubjectively

    soccersubjectively BigSoccer Supporter

    Jan 17, 2012
    Dallas
    Nat'l Team:
    United States
    kolabear and cpthomas repped this.
  2. kolabear

    kolabear Member+

    Nov 10, 2006
    los angeles
    Nat'l Team:
    United States
    The first thing which comes to my mind is, what is the scale used in the ratings -- that is, what is the expected win probability for a given rating difference between teams?

    For the FIFA ratings, I'm used to using approximately the Elo scale where
    100 point rating difference = .640 expected win percentage
    200 pt = .760
    300 pt = .849
    etc.
     
  3. soccersubjectively

    soccersubjectively BigSoccer Supporter

    Jan 17, 2012
    Dallas
    Nat'l Team:
    United States
    Okay I think this should be right now...
    Screen Shot 2016-08-22 at 2.21.44 PM.png
     
    kolabear repped this.
  4. soccersubjectively

    soccersubjectively BigSoccer Supporter

    Jan 17, 2012
    Dallas
    Nat'l Team:
    United States
    That's counting ties as half wins.
     
  5. cpthomas

    cpthomas BigSoccer Supporter

    Portland Thorns
    United States
    Jan 10, 2008
    Portland, Oregon
    Nat'l Team:
    United States
    Good effort, soccer subjectively, especially inputting all the data.

    I'm wondering why you say FIFA does not use an Elo system for the women. Don't they use an Elo based system with tweaks for game location, score differential, and level of the competition (for example, a friendly distinguished from a World Cup game).
     
    kolabear repped this.
  6. soccersubjectively

    soccersubjectively BigSoccer Supporter

    Jan 17, 2012
    Dallas
    Nat'l Team:
    United States
    It's based in it but it's different with a couple of tweaks here and there. I think ultimately it doesn't change MUCH but there are a few things I've noticed differently. (Although if I'm mistaken, please correct me.) In FIFA's rankings...

    - a team can lose points if they don't win by "enough"
    - a 6-0 win is no different than a 32-0 win
    - someone posted on the FIFA thread that a team starts at 1000 pts if their first game is after 2003. I dropped Equatorial Guinea's base rating to 600 and still had a higher rating in mine than in FIFA's. (EG's first game was after 2003.)
    - the K factor that weighs games is different than the men's Elo ratings. Not a ton but there are minor switches. Elo goes up by 10s and FIFA's goes up by 15s.
    - all non-qualifying or contenetial tournament games are weighed as friendlies
    - friendly games are weighed more if it's a top 10 match up
    - USA held the number one spot for 6+ years in FIFA. Max was under 3 years with mine
     
    kolabear repped this.
  7. SiberianThunderT

    Sep 21, 2008
    DC
    Club:
    Saint Louis Athletica
    Nat'l Team:
    Spain
    #7 SiberianThunderT, Aug 22, 2016
    Last edited: Aug 22, 2016
    Always interesting to see different rankings! However, I think it's inaccurate to suggest FIFA's system isn't an Elo system. There is no such thing as an single "correct" Elo system because there are so many different weightings you can tweak, and there's nothing about FIFA's system that I think "disqualifies" it as an Elo-based system. The math is still the exact same structure.

    Also...
    I don't think you have this correct, unless you've stated things in a confusing roundabout way. Your comment about EG is also a bit concerning, since they shouldn't ranked incredibly so darn close to Nigeria. You've answered what you thought about FIFA's rankings, but you haven't stated anywhere what weighting you've been using. Are you just mirroring the men's http://www.eloratings.net/world.html ?

    =EDIT=
    Okay, it's buried in your documentation, but I see you are mirroring the eloratings.net site. Though I also see you're not factoring in HFA, which I think is a glaring shortcoming. That's probably why you have EG so high among the CAF nations, since their only CAF wins were in tournaments they hosted. If you have every game that FIFA has in their records, I don't see why it's "hard to track down" HFA.

    The other issue to deal with is that you generally have the CAF nations higher-ranked than FIFA does, which probably shouldn't be happening considering how, until 2015, CAF nations bled points out to basically any other opponents. Then again, considering CAF nations only have met other nations in major tournaments, and your major tournaments are only 3x the weight of friendlies instead of the 4x used if FIFA's system, that might explain why CAF on the whole isn't as low as they probably should be since you've stopped them from bleeding points as much.

    ....long rant aside, I still feel it's inaccurate for you to be repeatedly referring to FIFA's ranking as if it's not an Elo system. You probably should change that all through your FAQs - or at the very least stop referring to Elo as if it's a single system (since it's a family of systems).
     
    kolabear and soccersubjectively repped this.
  8. cpthomas

    cpthomas BigSoccer Supporter

    Portland Thorns
    United States
    Jan 10, 2008
    Portland, Oregon
    Nat'l Team:
    United States
    One suggestion I've seen for a starting rating for a new team is to use the median rating. That starts the team out too high, if it's really a new team. FIFA says that brand new teams should expect to get a rating of 1000, so maybe that's what they use. Ordinarily, it's considered that a competitor must play 30 games in order for it to receive a reliable rating. (FIFA uses 5 games, for the team to show up in its ratings.)

    I'm curious about what you are using for a K factor (assuming you don't use different factors depending on the nature of the competition). I've done a lot of work with NCAA Division 1 women's soccer, including experimenting with Elo system for rating the teams -- with a data base of about 27,000 games. I've tested a great number of K factors. For each test, I do an analysis of how accurately ratings match with game results and how well the system does at rating different playing pools' teams (as in confederations' teams, probably, for international women's soccer) within a single system. I've found that a K factor of 70 performs the best. Some commenters have felt this is too high, but my work says that's the best performing K factor. I also consider game locations in analyzing how the different K factors perform. For a 70 K factor, home field is worth 60 points.

    One of the things I've seen with rating systems (I've tested a bunch of them, some my own, some others, some the NCAA's, some Elo-based) is that the differences between them are not great. It's relatively easy to develop a proficient system, but it's very difficult to develop one that is significantly better than the others. For me, the most important areas of difference have to do with how well they are able to rate teams from different playing pools within a single system. This is a very difficult challenge when there is not a lot of "correspondence" among playing pools.

    I know how much time and effort goes into setting something like this up, so I really appreciate what you've done.
     
    kolabear and soccersubjectively repped this.
  9. soccersubjectively

    soccersubjectively BigSoccer Supporter

    Jan 17, 2012
    Dallas
    Nat'l Team:
    United States
    All good stuff and I appreciate the feedback. I know I go into a length response but I honestly do appreciate someone taking a close look at what I've done.



    - FIFA's rankings aren't comparable to the men's Elo's with the way they're implementing it. So I'd say you're right in saying it's the same structure but they've obviously tweaked it in more than a couple ways and makes it so you can't compare the two. I think the fact that USA didn't hold an eight year reign when I compiled it shows that. However I'm not sure why you put correct/disqualifies in quotes as I never said that. I actually wrote on my site...

    "While Elo ratings don't provide a vastly different ranking order of teams than FIFA's..." and "The FIFA women's rankings are actually based off of Elo, but are weighted differently."

    - EG is a really weird case and I'm not quite sure what to do with it. I started them at 1200 (around Wales/Portugal, both of which have never qualified for a WC) and they went up to 1400+. I then restarted them at 600 and it was around a 30 point change after 40+ games. So either way they went up on their own. After the 2011 World Cup, they only dropped 17 pts after playing Norway (lost by one), Australia (lost by one), and Brazil (lost by three). But yes they do play 95% African teams but I wouldn't say them getting 1/2 goal wins against them has rocketed them up unfairly. Not to mention that even when I started them at a base rating of 600, it was still significantly higher than what FIFA has them.

    They do have the advantage of home field with the ACON but they did very well in the Olympic qualifying, beating Nigeria, and lost by one goal to South Africa, who finished with a -3 GD against Brazil/Sweden/China. So even if I did go back and given them HFA, how much should we expect them to go down? Ten spots farther down is 170 points, which is a ton.

    - Looking at another example, here is Nigeria's track record in big tournaments

    competition / entering elo score / exiting elo / change
    1991 WC / 1522 / 1444 / -78
    1995 WC / 1539 / 1494 / -45
    1999 WC / 1559 / 1624 / -65
    2000 OL / 1622 / 1604 / -18
    2003 WC / 1645 / 1591 / -54
    2004 OL / 1600 / 1621 / 21
    2007 WC / 1652 / 1645 / 7
    2008 OL / 1670 / 1639 / -31
    2011 WC / 1587 / 1618 / 31
    2015 WC / 1582 / 1570 / -12

    So those first five big tournaments Nigeria does bleed those points pretty hard but they've held their own rating in the last five as well. I would think if I lowered Nigeria's base rating, it would come out to what they are now.

    - Weighing is the same as men's, yes. 20 minimum to 60 max. Was trying to make them as congruent as possible. Here is Elo's weighing scale.

    - You're right about HFA and that's something I'd love to do going forward. FIFA doesn't list them all specifically from each game I imported from (here) so it was a really tedious process. I'd say that's the main between mine and the men's ratings.

    - The men's has an established way of doing it and FIFA changed that when they crossed over for the women's, which is fine. They get to decide how they run their ship. But there's a reason why FIFA doesn't call their ratings "Elo" nor reference Elo one time in their explanation of how they get the rankings when they are clearly using the same idea. If they wanted to be called Elo, they would.
     
  10. soccersubjectively

    soccersubjectively BigSoccer Supporter

    Jan 17, 2012
    Dallas
    Nat'l Team:
    United States
    Thanks! Yeah it was a little bit of an effort to get to this point but I'm glad I got here.

    I think the base rating is probably more important for your college ratings (which are great btw!) as they have so few games to really set their true rating. It seems like international teams really blow past that really quickly. I will say that a 1000 base, to me, seems REALLY low. I have Slovenia at 992 right now at they have a 25-9-64 (wtl) record, a 25% win percentage.

    I try to get each country to an even 100s multiple that isn't too high or too low, based off the following results. It's not ideal but even in Equatorial Guinea's case (rated much higher in mine than FIFA's) they go up almost 300 pts from the base rating. So it's not like I gave them an unfair start when they are climbing the ladder on their own. In everything I did with these, I tried to revert to the men's, which ranges from 1800 to 600 in their base ratings.

    Same with the K factor, just mirroring the men's.

    60 - World Cup / Olympics (same as FIFA, actually, the rest are different))
    50 - Contenential championships (Gold Cup, Euros, etc)
    40 - WCQ / Olympics qualifying
    30 - tournaments
    20 - friendlies

    I actually put Algarve and some other tourneys at 40, if they were well established.

    That's a great idea about testing the K factor. I hadn't thought about that and would really like to look into that. I think ultimately I still want to mirror the men's but I would like to know how accurate it is.

    Do you have any advice on figuring out "how well they are able to rate teams from different playing tools"? Besides the K factor test? I suppose I could test my ratings against FIFA's but I doubt there would be much difference tbh.
     
  11. cpthomas

    cpthomas BigSoccer Supporter

    Portland Thorns
    United States
    Jan 10, 2008
    Portland, Oregon
    Nat'l Team:
    United States
    #11 cpthomas, Aug 23, 2016
    Last edited: Aug 23, 2016
    Evaluating how teams handle rating pools is difficult, mostly from an "enough data" perspective. I have what I call my "correlator" that does it and also evaluates other aspects of systems (how well the ratings correlate with the actual results from which they're generated, for all teams and for the top 60 teams). The problem, for playing pools, is that to do a proper evaluation it takes a ton of data -- I'll be happy after this year, when I'll have 30,000 games in my data base. I'll have to read more of your material, it's possible I could set up the correlator to see how your system does (all I need are your ratings and your games data). I probably can't do this until after the 1st of the year, as I'm into the college soccer season now.

    It's possible that the playing pool problem has something to do with EG. It sounds like they've played some games outside their pool, but if their pool as a whole is relatively isolated, it might be the pool doesn't have enough correspondence with other pools. Here's how I've described the playing pool challenge in relation to college soccer, with slightly modified language to make it more relevant here:

    Assume that there are two 12-team confederations, each of which plays a full round robin among its members. Assume also that the game outcomes are such that in each confederation, the top team has gone 11-0, the second place team has gone 10-1, the third place team 9-2, the fourth place team 8-3, and so on all the way to the basement dweller who has gone 0-11. Assume also that teams play only their confederation games. In this scenario, the ratings for teams from the two confederations will match exactly. In other words, the confederation winners will have identical ratings, the second place teams will have identical ratings, and so on. This will be true no matter what the actual relative strengths of the two confederations. One confederation could consist of the top 12 teams in the world and the other confederation could consist of U8 recreational soccer teams, yet their ratings will be the same.

    This reveals a characteristic of any mathematical rating system, which is that it cannot differentiate the strengths of two different pools of teams unless there are crossover games between the two pools. Further, the reliability of the ratings' comparisons of the relative strengths of the two pools depends on the number of crossover games: the more crossover games there are, the more reliable the comparisons, and the fewer the crossover games, the less reliable the comparisons.
    If the African teams are not playing enough crossover games to mute out the effect I've described, my correlator will show it if I have enough games data. Even with 6,000 games, I'd give it a try. The good news is that there aren't a lot of federations.

    (By the way, my avatar is a chart that my correlator produces. It shows the relationship between playing pools' average ratings and the playing pools' performance in relation to their ratings. The left of the chart represents the performance of the playing pools with the highest average ratings; and the right represents the performance of the playing pools with the lowest average ratings. Even in the little avatar, you can see that the lines descend from left to right. What this means is that the higher rated pools tend to outperform their ratings and the lower rated pools tend to underperform. I'm pretty confident this would happen for national women's teams. Probably one of the reasons FIFA weights games based on the significance of the competition is to try to mute this effect.)

    EDIT: I just looked at your spreadsheet. I'm wondering, since it doesn't include a list of game results that I can see -- do you have a table of all the women's game results? Hopefully including, at least, who won or lost or whether it was a tie, and who hosted the game (home, away, or neutral). I'd have to have that info to run my correlator.
     
    gricio61 repped this.
  12. pauley

    pauley Member

    Feb 11, 2015
    Yes, the start rating really is a mysterium :) The average of all teams in the FIFA rankings is about 1300 points.

    Though it's easy to see not all teams started there, because after a handfull of losses, you can't end up in the 800s or 335 like Mauritius.
     
    soccersubjectively repped this.
  13. SiberianThunderT

    Sep 21, 2008
    DC
    Club:
    Saint Louis Athletica
    Nat'l Team:
    Spain
    #13 SiberianThunderT, Aug 23, 2016
    Last edited: Aug 23, 2016
    Well, I never said they "should" be comparable to the men's, although IMO they're similar enough. The point I was trying to make is that, throughout your FAQs, you repeatedly say something along the lines of "FIFA does X, while Elo does Y" - implying both that FIFA doesn't use an Elo system and that there's a single Elo system instead of many. Please remember the the original Elo rating system was developed for chess and that any Elo-based system for football/soccer will, by default, include many modifications to account for things like draws, HFA, and strength of win. So, with all those choices to make, you can't say one system is in the Elo family and one isn't if they use the same underlying math.

    There's more than one men's Elo rating system out there if you look. I recommend reading this 2013 analysis paper that compares predictive power between multiple systems - which, by the way, specifically states that the FIFA women's ranking are modeled with a version of the Elo structure, and the the FIFA Women's Rankings and the Eloratings.net systems are basically tied at the top in terms of predictive power, so they're honestly very comparable.

    Besides, who holds on to the top spot shouldn't be a measure of the system. When you consider the fact that the USWNT won all OGs and placed in the top three in WC in that eight-year time period, is it all that surprising they stayed in #1?

    I think you miss my point - I agree where you start someone won't make a difference after they've played enough games. My point is that the method you're applying is giving EQG a noticeable benefit in the points shifts when you don't include HFA, and that the amount of points you're letting CAF lose seems low to me in general. (Also, you point out how well EQG did in Olympic qualifying, but that was a small number of matches, so that shouldn't have inflated their rating much in the first place - if it did, you've got something off-weight.)

    See, you're doing it again here, saying "Elo's rating scale" as if there's only one Elo to compare to, which isn't correct.

    I don't know how you're doing your importing, but for every game I clicked on under that link, the report lists where the game took place. I know you can also go to any country's home page to view their past games, and those DO list game location on the summary pages. (e.g. http://www.fifa.com/live-scores/teams/country=eqg/women/matches/index.html - btw, browse there and you'll see EQG's first matches were in 2001. I know you said earlier their first matches were after 2003, so that makes me think your games database isn't complete.)

    Just because they don't name it as such doesn't mean it isn't mathematically still an Elo system. See the 2013 paper I linked earlier. (And, to be fair, FIFA's documentation is sh*t. The actual equation they list is completely incorrect and would never produce the ratings they publish if you used it literally.)
     
  14. soccersubjectively

    soccersubjectively BigSoccer Supporter

    Jan 17, 2012
    Dallas
    Nat'l Team:
    United States
    Yeah idk what more to say. I'd like to add HFA going forward but it takes time. Outside of that I'm not sure why you are so defensive of FIFA rankings or can't stand EG being as high in the rankings as they are. I just made these as a fun comparison to the men's by using the same system as theirs. But I will check out that analysis paper. Seems very informative. Thanks!

    Makes sense. If you want to DM your email I can send you the document and you can respond when you have time. Tbh I really enjoy the college season as well so no rush on getting back with me.
     
  15. SiberianThunderT

    Sep 21, 2008
    DC
    Club:
    Saint Louis Athletica
    Nat'l Team:
    Spain
    Well, in terms of EQG and CAF in general, I haven't seen anything from them to suggest they should be highly-ranked, so in that sense it's more that I'm skeptical you've got yours tuned correctly since it doesn't jive with perception (as opposed to defending the FIFA rankings in particular). I'm also a stickler for terming things correctly, (and a math major,) so it rubs me the wrong way to see someone trying to argue that the FIFA women's ratings aren't an Elo system - even if the FIFA women's rankings were as poorly performing as the men's are, I'd still be a bit ruffled about that miscategorization. Don't get me wrong, I appreciate your attempt to mirror the men's system, and as I said up front it's certainly fun to see - the more rating systems we have out there, the better, since the aggregate is usually more accurate than any single system.
     
  16. soccersubjectively

    soccersubjectively BigSoccer Supporter

    Jan 17, 2012
    Dallas
    Nat'l Team:
    United States
    Yeah I appreciate your studiousness. If you'd like, I can send you the excel file and you can double check the math and make some more specific suggestions from there?
     
  17. SiberianThunderT

    Sep 21, 2008
    DC
    Club:
    Saint Louis Athletica
    Nat'l Team:
    Spain
    I'm neck-deep in turning my research into publications right now, (been meaning to make an interactive FIFA rankings page myself for ages, hence my trying to back-track what each nation's starting rating was the other month,) but I can certainly check numbers quickly!
     

Share This Page