2012 NCAA - Massey/Elo ratings

Discussion in 'Women's College' started by kolabear, Dec 3, 2012.

  1. kolabear

    kolabear Member

    Joined:
    Nov 10, 2006
    Location:
    los angeles
    Country:
    United States
    I didn't even find where I was discussing this earlier - and if it's in the Tournament thread I don't want to interrupt a lively discussion about the college substitution rules, etc.

    The "predictions" for this tournament were further off than they've ever been since I started this a few years back. In the quarterfinals, the predictions went 3/4 but in the semifinals and final they went 0/3.

    Overall, for the entire tournament the average expected win percentage for the higher-rated team was (.808), the expected result for 63 games being about 51 points. The actual result was 44.5 points. The average deviation was 10.1%. Not horrible - small sample, etc - but not impressive either. I think it's alright in a way, though, which maybe I'll talk about later. Not only the small sample but it's sports -- there are upsets and there is unpredictability and, finally, there is limitation to how "prediction" a rating system can have when that rating system is either used by the Committee to set the bracket or is being proposed for that use.

    There are competing objectives for a rating system that will be used by the Committee and the ability to predict is only one of them. Technically, it's not a stated one at all.

    Looking over the range of games and rating differentials, the 12 games with rating differences between 100 and 200 stood out. With an average rating difference of 142 points, the expected win percentage was .725; the actual win percentage was below 50%: .458 with the favored team only scoring 5.5 games out of 12.

    These are the 12 games - are the ratings unreasonable?
    favored teamratinghomefield factorunderdogratingdifferentialresult

    Baylor 1845 0 Georgetown 1745 100 1
    Missouri 1765 60 Illinois 1715 110 0.5
    BYU 1985 60 North Carolina 1925 120 0
    Penn St 1950 60 Duke 1885 125 1
    Maryland 1825 0 Denver 1695 130 0
    Stanford 2055 0 North Carolina 1925 130 0
    North Carolina 1925 60 Baylor 1845 140 0.5
    Virginia Tech 1830 60 Georgetown 1745 145 0
    BYU 1985 60 Marquette 1885 160 0.5
    Virginia 1995 60 Duke 1885 170 0
    San Diego St 1945 60 California 1825 180 1
    Stanford 2055 60 UCLA 1920 195 1


  2. Cliveworshipper

    Cliveworshipper Member+

    Joined:
    Dec 3, 2006
    How did you determine those rating? They don't exist on the Massey ranking page that I see.
  3. kolabear

    kolabear Member

    Joined:
    Nov 10, 2006
    Location:
    los angeles
    Country:
    United States
    As I've said before it's a bastardized, unscientific conversion of the Massey Ratings into something like an Albyn Jones scale. Unscientific. The distribution of course looks pretty similar to what the Albyn Jones ratings used to look like. I set it up to look that way since it stayed fairly constant over the years I followed the Albyn Jones rating page. Obviously you have to take this with a heavy grain of salt but it's generally worked pretty well. This tournament less so. Perhaps partly because I simplified one part of the formula but I don't even think that was a major factor. More than anything, there was bound to be more deviation in a given year. This year -more upsets. North Carolina had a significant impact as well and there is the possibility that at the end of the year they were a stronger team than the overall season showed, because of several key players being away for the U20 World Cup. (Interesting ESPN article on Virginia's Morgan Brian quoted her as saying it took a long time for her to get back to 100% after returning from the World Cup)

    **
    add - it's working better in the Volleyball tournament. More like it usually does. Through 2 rounds, the average deviation is about 3%.
  4. kolabear

    kolabear Member

    Joined:
    Nov 10, 2006
    Location:
    los angeles
    Country:
    United States
    :) Oh, it's probably a good idea on my part to point out that my converted ratings are based on the Massey Ratings from a couple weeks ago, not the ones currently on the Massey Rating website. In other words, I'm using the end-of-the-regular season ratings, which makes sense for our purposes because that incorporates the games that the Committee uses to select and seed the tournament.

    In converting to an Albyn Jones scale, I don't change the order of the teams.* It's a question of just how valid or wildly invalid the rating differential between the teams are when I convert the Massey numbers into an Albyn Jones-ish scale. In a crude way I think the numbers can be useful in providing a guide between the relative strength of teams as an Elo-rating system might calculate it.


    * (But they're different than they are now on the Massey website because he's recalculated the ratings while the tournament is in progress. Those are numbers that don't interest me and they probably don't interest you either. The numbers that are interesting are the ones at the end of the regular season -- corresponding to the final regular season RPI)


  5. Cliveworshipper

    Cliveworshipper Member+

    Joined:
    Dec 3, 2006
    OK, but what is the math you are doing to make the conversion? It would be helpful to understand what you are doing to make Massey look like an elo rating( which I'm not entirely sure is the case since he doesn't reveal his method)


    Without understanding what you are doing, it just looks like magic - Nice to look at and entertaining, but not always what it seems. It's bad enough that Massey doesn't reveal what he does, but when you add a second layer of obfuscation I makes it pretty difficult to discuss its merits ( or their lack).

    It is just arm waving.
  6. kolabear

    kolabear Member

    Joined:
    Nov 10, 2006
    Location:
    los angeles
    Country:
    United States
    Think of it as a good burrito. Some mysteries shouldn't be explained in too much detail.
    :)
  7. Cliveworshipper

    Cliveworshipper Member+

    Joined:
    Dec 3, 2006
    Thanks, your Wizardness, I totally get it. I'm off to get that broomstick directly...

    [​IMG]
  8. kolabear

    kolabear Member

    Joined:
    Nov 10, 2006
    Location:
    los angeles
    Country:
    United States
    Alright, alright! Here's basically what I did: I made the median about 1350 because that was about the median it seemed in Albyn Jones (Massey's median appears to be about 1.000 in the rating simply called "Rating", not the "Power" rating). The "bubble" usually was around 1675 or so in Albyn Jones (around 40th to 45th place). So I simply scaled the Massey numbers so that the bubble would be around 1675 or so.

    I played with it a year or two ago and wound up settling on a coefficient of 1/2.15 -- I forget exactly how I came up with that. I probably simulated some ratings of some teams, "recalculating" or "audit-testing" their ratings by using their record and the ratings of their opponents. And dividing by 2.15 seemed to fit better than dividing by 2.25 or 2. It worked well enough so I stuck with it - except this year when I went with dividing by 2 for simplicity's sake, which wasn't a good idea.

    So, basically the formula is to subtract the median (1.000) from a team's Massey Rating. Divide that number by 2.15* ; add a few zeroes to it and then add that to 1350, the median in Albyn Jones. (*Except this time around I divided by 2, which doesn't work as well)

    You didn't really want to know how that burrito was made, did you?
  9. Cliveworshipper

    Cliveworshipper Member+

    Joined:
    Dec 3, 2006
    Kinda like sausage.

    Thanks
  10. kolabear

    kolabear Member

    Joined:
    Nov 10, 2006
    Location:
    los angeles
    Country:
    United States
    I looked this over and I'll make these observations. I think the ratings were reasonable but there may have been a few factors throwing off the "results" so to speak. First, notice that North Carolina is involved in three of the games with "missed predictions" (add another reason to root against the Tarheels!). It makes you wonder if the U20s added a layer of unpredictability or unreliability to the final ratings as several key players were out for the first part of the season for the Tarheels and for other teams like Penn State.

    BYU is involved in two of the "missed predictions". Were they overrated by Massey? Perhaps somewhat although not drastically , although one of their key wins was an early season win against a team missing key players again to the U20s - Penn State. Because their record was so good, so close to perfect, there's always greater chance for error, volatility or unreliability in their rating. A single fluke result or half-fluke will have a greater impact on their rating. For instance, if they tied Penn State, that might've dropped their rating around 30-40 points. That's pretty big for dropping a half-point in their record.

    They only lost in the tournament to North Carolina, the eventual national champions, so I wouldn't disparage BYU's season. More than anything, it was a case of North Carolina being -- in the end, from a predictive standpoint, underrated -- or perhaps it's better to simply say by the end they were better than their record because they did struggle (by North Carolina's standards) in the regular season.

    Which brings up an important point about ratings and using them to put together the bracket. On the one hand we expect a good ratings system to "predict" results well. On the other hand, you can see how that's not an absolute objective or test of a rating system. The rating system -- and this applies equally to the RPI in this case (yes, I can be fair about this!) -- can't be blamed for not correctly predicting North Carolina's improvement by the time of the tournament and assessing the effects of the U20 tournament on them, if that was indeed the cause of their earlier struggles.
  11. cpthomas

    cpthomas BigSoccer Supporter

    Joined:
    Jan 10, 2008
    Location:
    Portland, Oregon
    Country:
    United States
    While developing some variations of the RPI, I compared one of the variations (the "Iteration 5 RPI") to Massey for the 2012 season. In doing the comparison, one of the things I remembered is that Massey's system involves a statistical calculation for each team of the extent of home field advantage for that team. This struck me as potentially unreliable due to the limited data for each team. (In my calculations, I don't do a team-by-team calculation but rather do a calculation of an average advantage that I apply to all teams.)

    I've been in the process of seeing how well Massey's ratings correlate with game results as compared to the RPI. I'll write more about that later. In the course of doing that, however, I've taken a look at Massey's home field advantage numbers. (I'm not sure how he works these into his rating calculations, but I know that his actual ratings factor in where teams actually played their games.) Here are some of his Home Field Advantage numbers, the first group for the 5 teams with the least home field advantage and the second for the 5 with the most advantage:

    Least Home Field Advantage

    Samford -0.12
    Tennessee -0.06
    Miami OH -0.05
    Toledo -0.03
    Chattanooga -0.0

    Most Home Field Advantage

    Lehigh 0.59
    Utah 0.58
    UTEP 0.55
    Coastal Carolina 0.55
    Winthrop 0.54

    So question: Does anyone really believe that Samford, Tennessee, Miami OH, and Toledo have a home field disadvantage?
  12. orange crusader

    orange crusader Member

    Joined:
    May 2, 2011
    Club:
    Carolina Railhawks
    No, I don't. As you pointed out, I think the sample size is too small. Only 10 games a season on average, and maybe the home schedule is tougher in certain years.
  13. kolabear

    kolabear Member

    Joined:
    Nov 10, 2006
    Location:
    los angeles
    Country:
    United States
    I'm going to guess, and it's only a guess, that he doesn't factor this into the first column rating at all. If it factors into the 2nd column rating, the "Power" rating, which I've been less concerned with, I don't know how he does it. Possibly it only factors in when he does an individual game prediction between two teams, which could then use the individual homefield advantages.
  14. cpthomas

    cpthomas BigSoccer Supporter

    Joined:
    Jan 10, 2008
    Location:
    Portland, Oregon
    Country:
    United States
    I should have put this in previously, but I was wondering whether Massey's regular ratings (not his "power ratings," which are predictive) factored in home field imbalances, so I asked him. Here is what I asked and his answer:

    Question

    "> A question about your ratings. Am I correct in believing that yours
    > take into account game locations -- so that, for example, a team with
    > a distinct favorable home field imbalance will have a lower rating
    > than if it had the same game results but no favorable imbalance? (The
    > RPI, as I assume you know, does not take game locations into account
    > for women's soccer, although it does for some sports.)"

    Answer

    "Yes, I do account for homefield. Winning at home is less impressive than winning on the road."
    This still is a little ambiguous, but I think it is saying that his "regular" rating takes game locations into account.
    More to follow ....
  15. kolabear

    kolabear Member

    Joined:
    Nov 10, 2006
    Location:
    los angeles
    Country:
    United States
    Oh I didn't mean to imply there was no homefield factor in the 1st column ratings. I just didn't think it would incorporate individualized calculations for homefield advantage for the different teams... but I could be easily wrong.

Share This Page