Division 1 Ratings, Bracketology, Scheduling, Etc.

Discussion in 'Women's College' started by cpthomas, Sep 10, 2019.

  1. cpthomas

    cpthomas BigSoccer Supporter

    Portland Thorns
    United States
    Jan 10, 2008
    Portland, Oregon
    Nat'l Team:
    United States
    I'm going to put my numbers stuff all on one thread this year. Here's what I have so far that people may want to go check out:

    Team Histories and Simulated 2019 Ranks - This is an Excel workbook (1) that has the basic data that went into the simulated ratings and ranks I've assigned to teams going into this year and (2) that is a scheduliing resource for coaches to use for future years' scheduling. The workbook has a User Guide on its first page. It's available for downloading from the RPI for Division I Women's Soccer's NCAA Tournament: Scheduling Towards the Tournament page. It's an attachment at the bottom of that page. If you're really interested in teams' performance and trends over the last 12 years, I think you'll find it a great resource.

    2019 Conferences Scheduling Resource - This likewise is an Excel workbook, available as an attachment at the bottom of the linked page in the preceding paragraph. It shows how conferences have been trending, in various categories, over the period from 2013 to the present. It likewise has a User Guide as its first page. It's intended as a resource for conferences to use if the are interested in scheduling as a unit to improve their ratings and ranks. If you're wondering why your conference's teams ranks and NCAA Tournament at large selection numbers have been improving or declining, it may provide some insight on why.

    2019 Simulated RPI Ranks 9.9.2019 - This is a blog article, with some background, on my simulated team RPI ranks using the actual results of games played through last weekend and simulated results for the remainder of the season.

    2019 Simulated Conference Standings and Tournaments 9.9.2019 - This is a blog article, with some background, on my simulated conference standings and tournament results using the actual results of games played through last weekend and simulated results for the remainder of the season. Since there had been no conference games as of last Sunday, these are the the same as at the beginning of the season.

    2019 Simulated NCAA Tournament Bracket 9.9.2019 - This is a blog article, with some explanation, with my simulated NCAA Tournament bracket using the actual results of games played through last weekend and simulated results for the remainder of the season.

    Please weigh in with any questions or comments.
     
    HeadSpun, PlaySimple, Val1 and 2 others repped this.
  2. espola

    espola Member+

    Feb 12, 2006
    "Simulated" how?
     
  3. PlaySimple

    PlaySimple Member

    Sep 22, 2016
    Chicagoland
    Club:
    Manchester United FC
    I'm not trying to be a smart ass and don't want to answer for cp but here is one of the definitions of simulate:

    produce a computer model of

    In the work that cp did, a sentence using the word "simulated" would read as this:

    "future RPI ranks, conference standings, and the NCAA bracket were simulated by computer"

    Read the text that accompanies the numbers. It's explained there. These are predictions.
     
  4. PlaySimple

    PlaySimple Member

    Sep 22, 2016
    Chicagoland
    Club:
    Manchester United FC
    cp, in the simulated RPI rankings chart you have 2015 listed as the year in the last two columns.
     
  5. espola

    espola Member+

    Feb 12, 2006
    Clicked the tab "Open with google sheets". Got "file is too large"
     
  6. cpthomas

    cpthomas BigSoccer Supporter

    Portland Thorns
    United States
    Jan 10, 2008
    Portland, Oregon
    Nat'l Team:
    United States
    If you download the workbook, the User Guide explains how I do the simulated ratings, in great detail. If you don't want to download it, pm me with an email address and I'll send you a copy of the User Guide and you can read the explanation.
    Thanks for that piece of info. The files are large. You can download the spreadsheet, if you have Excel, but since it's large you might not want to do that.

    I'll do a detailed write-up of how I do the simulations and will post it here.

    PS - I just was able to open up the document from the website link, as a Google spreadsheet, so your not being able to open it may not be a website or Google spreadhsheet limitation. But, apart from the User Guide, which looks fine, I find the Google spreadsheet not very usable anyway.
     
  7. cpthomas

    cpthomas BigSoccer Supporter

    Portland Thorns
    United States
    Jan 10, 2008
    Portland, Oregon
    Nat'l Team:
    United States
    The "2015 BPs NC" reference in the titles of those two columns indicates these are ratings and ranks using the NCAA's 2015 bonus and penalty award amounts, applied to non-conference games only. That's the formula the NCAA currently uses for DI women's soccer. (I run computations for each of the bonus and penalty regimes the Committee has used since 2007, as well as for a number of RPI variants I've created, and my column titles let me know which regime a column applies to. That's why you see column titles like the ones you identified.)
     
  8. espola

    espola Member+

    Feb 12, 2006
    I was trying to load the sheet to see what formulas were used in the simulations. All you need to do is to show a single cell of that with a description to make sense of the cell references buried in the formula.
     
  9. cpthomas

    cpthomas BigSoccer Supporter

    Portland Thorns
    United States
    Jan 10, 2008
    Portland, Oregon
    Nat'l Team:
    United States
    Espola asked how I do my simulations. Here’s an explanation.

    Background.

    Beginning with the 2007 season, I’ve maintained a data base of all regular season games that Division I women’s teams have played, including conference tournament games. The data base includes, but is not limited to:

    Game dates, opponents, locations, and scores

    Team win, loss, and tie and home, away, and neutral records

    Teams’ coaches and when each became his or her team’s head coach

    Using the data, for each season I compute each team’s RPI Elements 1, 2, and 3, Unadjusted RPI, and Adjusted RPI. I do this using the version of the RPI formula that the Women’s Soccer Committee currently uses (2015 bonus and penalty point regime, awarded for non-conference games only), applied retroactively to all seasons. I also do it using earlier versions of the formula the Committee used from 2007 to the present as well as other variations of the RPI I’ve developed. My games data base and RPI ratings match the NCAA’s exactly, due to a vetting process I use.

    I also have Kenneth Massey’s ratings and ranks for teams from 2007 to the present. I track these because I believe his ratings and ranks are the best available. I believe this because I have a computer-based system for seeing how well rating systems’ ratings correlate with the results of games during the season from which the ratings were derived. Massey’s have a good overall correlation rate and, more important, they do a good job of rating teams from different conferences in relation to each other. The RPI’s ratings have a problem in that area, on average discriminating against strong conferences’ teams and in favor of weak conferences’ teams. This RPI problem is a result of the way the RPI is designed.

    Assigning Pre-Season Simulated Ratings.

    I assign pre-season simulated ratings as part of preparing each season’s Team Histories and Simulated 20XX Ranks workbook. This workbook primarily is intended as a resource for coaches to use when working on future non-conference schedules. The workbook has a page for each Division I team. A team’s page includes a series of tables and charts. These include various pieces of information relevant to whether the team might or might not be a good opponent for another team to play.

    A team page’s tables include information on the team from 2007 to the present. Two of the pieces of information are the team’s RPI and Massey ratings over this period.

    A team page’s charts present some of the tables’ information in chart form. Most of the tables present the information chronologically, so the charts allow you to look at the team’s trends over time. In addition, the charts include trend lines.

    Once I initially have set up a team page, I look at the chart that shows the team’s RPI and Massey ratings chronologically over time. Since I believe Massey does the best job of showing teams’ actual strength, I focus on the Massey ratings. I look at two types of trend lines. First, I look at simple straight trend lines for 2007 to the present. Second, I look at Order 2 Polynomial trend lines. Based on my experience, these are the types of trend lines that are most likely to capture where a team might be headed next year. Straight trend lines work best for teams that appear to be headed in a single direction, i.e., getting better or worse on a relatively consistent basis. Order 2 Polynomial trend lines work best for teams that appear to have had a good or bad period but now appear to be returning to where they were during an earlier time. For each team, I make a decision on which type of trend line appears to be most consistent with the team’s rating history. I then chose that trend line as the one to use as the basis for a simulated rating for next year.

    For teams that have brought in a new coach between 2009 and four years from the present, the team page has another chart that shows the team’s ratings from a year before the current coach arrived to the present. This chart is to take into account that the team’s trend may have changed since the new coach came in. I require that the coach have been there for four years in order for me to provide this chart, since in my experience it takes about four years for a new coach’s work to show where the team is likely to be trending under his or her program. (Teams’ performances under new coaches with less time as coach tend to be relatively volatile and thus present the most difficulty for my simulations.) I look at straight trend lines and Order 2 Polynomial trend lines for these charts, too. And, I have the option of chosing one of these trend lines as the one to use as the basis for a simulated rating for next year.

    Once I’ve selected the type of trend line to use for a team, the computer provides the Massey rating the team will have next year if the trend continues. Ordinarily, this is the simulated rating I assign to the team. Sometimes, however, none of the trend lines seems to match well with where a team might be next year. In those cases, my default is to assign the team the same Massey rating as it had in the most recent year.

    I limit myself to the above possible choices for simulating a team’s rating because I want to minimize the human element in simulating where a team will be next year. Because of that, my system intentionally does not take into consideration detailed information such as outgoing players and incoming players. Other systems use those types of information and thus provide a different perspective on where teams might be next year.

    Once I have teams’ simulated Massey ratings for next year, I determine their simulated ranks. Using those ranks, I then assign RPI ranks to the teams. Then, for each rank, I determine what the average RPI rating has been for a team with that rank over the period from 2007 to the present, and I assign that RPI rating to the team. For teams that have had soccer but will be new to Division I next year, Massey rates all women’s college teams within a single system, so I look to see where those teams fall in Massey’s rankings relative to Division I teams and assign them a simulated rating accordingly. For teams that will have new soccer programs next year, I assign them the average rating that first year programs have had in the past.

    All of the above information, including which trended ratings I’ve selected for teams, is included in the workbook, so it’s possible for someone who has the workbook to see what decisions I’ve made – and whether he or she agrees with my decisions or not. This matches my approach with all of my work, which is to make it as transparent as possible so that those considering it can make their own decisions about how reliable it is or isn’t.

    My typical caution with work like this is that you have to take it with a grain of salt. Any attempts at predicting how a season will come out are going to turn out not to match how things actually come out.

    Applying the Simulated Ratings to the Season Schedule

    With the simulated RPI ratings in hand, I then determine how the season will come out if those ratings are correct.

    To do this, I start with a schedule of all the season’s games. For each game, my computer program starts with the simulated ratings of the two opponents. Through other work I’ve done, I’ve determined that on average, home field advantage is worth 0.0150 within the RPI rating system. My program uses that value to determine the location-adjusted rating difference between the two opponents.

    My program then has to decide whether the game will be won by the better rated team or whether it will be a tie. Again through other work I’ve done, I know what the win, tie, and loss likelihoods will be for the better rated team depending on the amount of its location-adjusted rating difference advantage. If the better rated team’s win likelihood is above 50%, my program identifies the game as being won by the better rated team. If the win likelihood is 50% or less, my program identifies the game as a tie. The rating difference that results in a tie, as a matter of interest, is 0.0150 or less. (It’s a fluke that this is the same as the value of home field advantage.) The computer goes through this process for every game.

    To get the full season’s set of games, I need to include conference tournaments. To do this, I first need to know teams’ conference regular season standings. My program tracks the results of conference regular season games and from these produces standings based on 3 points for a win and 1 point for a tie. If teams are tied in the standings, my program ranks the team with the better simulated rating as the higher ranked team for conference tournament seeding purposes. (This is a surrogate for conferences’ tie-break procedures, which are not consistent across all the conferences.) Once my program has all the conference standings set for seeding purposes, it produces all the conference tournament brackets and results. For conference tournament ties that go to Kicks from the Mark, my program assigns as the winner the team with the better location-adjusted simulated rating.

    After it has all of the season win-tie-loss results, including for conference tournament games, my program applies the RPI formula to the results and produces simulated RPI ratings and ranks.

    As we go through the season, once games are played, I substitute the actual results of played games for the simulated results my program had used previously. Thus as we go through the season, the simulated RPI ratings and ranks are based more on actual results and less on simulated results. Each week, I publish the then current simulated RPI ratings and ranks, which week by week get closer to what the final actual RPI ratings and ranks will be.

    As a matter of interest, looking at teams’ actual RPI ratings over the last 12 years – covering some 36,000 plus games – the team with the better location-adjusted rating has won games 72.6% of the time, tied 10.7%, and lost 16.7%. This is in the same realm as other rating systems, with the best correlators coming in around 73.3% wins. For the games played so far this season, representing about 30% of all the games teams will play, the simulated results have had the team with the better location-adjusted rating winning 70.5%, tieing 10.2%, and losing 19.3%.

    Simulating the Bracket

    I’ll only give a general description here of how I simulate the NCAA Tournament bracket. For a detailed explanation, go to the RPI for Division I Women’s Soccer website and read the NCAA Tournament: Predicting the Bracket, At Large Selections and NCAA Tournament: Predicting the Bracket, Seeding pages.

    In summary, from the 2007 season to the present, the Women’s Soccer Committee’s decisions on at large selections and seeding for the NCAA Tournament have followed patterns. Teams that meet certain measurements always have gotten positive decisions from the Committee – always have gotten at large selections or always have gotten #1, #2, #3, or #4 seeds. And, teams that meet certain other measurements never have gotten positive decisions from the Committee – never have gotten at large selections or never have gotten #1, #2, #3, or #4 seeds. I’ve tracked 92 different types of measurements, 91 of which each has a “yes always” and “no never” component.

    I have a program that takes my simulated regular season, including conference tournaments, results and plugs them into the Women’s Soccer Committee’s patterns to see how many “yes” and “no” measurements each team meets in relation to at large selections and #1, #2, #3, and #4 seeds. I then look at how many “yes” and “no” measurements each team meets and, based on those numbers, assign seeds and at large selections. I report these weekly as the season progresses.

    Caution

    As I indicated earlier, you have to take all of these simulations with a grain of salt. What actually happens always is going to be different than the simulated results, although as the season progresses the simulated results will come closer and closer to what actually will happen.

    Notwithstanding my caution, however, there is a good chance that my simulations, looking at all teams, will come closer to what actually will happen than what almost any humans could predict on their own, acting without the benefit of sophisticated computer programs and analysis.
     
    Val1 repped this.
  10. cpthomas

    cpthomas BigSoccer Supporter

    Portland Thorns
    United States
    Jan 10, 2008
    Portland, Oregon
    Nat'l Team:
    United States
    If you want to see some cell references and if I'm correctly understanding what you want, if you let me know the workbook, page, and cell for which you want to see the formula, I can show what the cell formula looks like. (All of my workbooks are available for anyone who wants them, sort of an open source approach.)
     
  11. espola

    espola Member+

    Feb 12, 2006
    Thank you.
     
  12. Sam Miami

    Sam Miami New Member

    Bayern Munich
    Germany
    Sep 11, 2019
    My head is spinning, but this is brilliant stuff.

    How about bag the RPI and promotion/relegation regionally? 20 team subsets in each region with Final Fours for everyone (E-W-N-S). You play 19 games, top 2 up, bottom 2 down each year. 17th and 3rd play at one site for promotion or staying up. Final 4 has each regional top division champion. Imagine the fun. UNC/UVA/WVU for East, UCLA, USC, Stanford out West, Penn State/Wisconsin/Notre Dame in North and A&M/FSU/Texas for South.

    No more UNC blowouts of UNLV or UCLA cream puffing the schedule.

    I have always wanted to see this on the men’s side, why not the women?
     
  13. Soccerhunter

    Soccerhunter Member+

    Sep 12, 2009
    #13 Soccerhunter, Sep 14, 2019 at 3:27 AM
    Last edited: Sep 14, 2019 at 3:33 AM
    Always blows my mind when this happens. Last week Wake Forest was about 20 or so in adjusted rpi rank and then lost to UNC 4-0 on Thursday. As a result of this loss, they are now at the fifth spot on the adjusted rpi on the AWK list.

    In any case that is a heck of an improvement for a thumping loss. What would it have been if they had tied or beaten UNC?!?

    Addendum.... Yes I know the theory in principal, but it always amazes me!
     
  14. Soccerhunter

    Soccerhunter Member+

    Sep 12, 2009
    ..and by this I mean that I am aware that the upward factor was the momentum of two opponents that Wake had defeated earlier: Santa Clara with three recent wins (including over UCLA and Arizona), and Charlotte with three recent wins....and it didn't hurt that Stanford lost and Oklahoma had a bad week with both departing the top 5.
     
  15. cpthomas

    cpthomas BigSoccer Supporter

    Portland Thorns
    United States
    Jan 10, 2008
    Portland, Oregon
    Nat'l Team:
    United States
    I've published two reports for this week at the RPI and Bracketology blog, and will provide a link to one more detailed but not too large Excel workbook for those of you who want to see the details behind this week's simulated NCAA Tournament bracket. The two reports are:

    2019 Simulated RPI Ranks 9.16.2019 - These are my simulated team RPI ratings and ranks using the actual results of games played through Sunday, September 15, and simulated results for the remainder of the season including simulated conference tournaments. I don't have a simulated conference standings and tournaments report this week since they are the same as for last week.

    2019 Simulated NCAA Tournament Bracket 9.16.2019 - This is my simulated NCAA Tournament bracket using the actual results of games played through last weekend and simulated results for the remainder of the season including simulated conference tournaments.

    2019 Website Factor Workbook 9.16.2019 - This is a relatively small three-worksheet Excel workbook that has the details underlying my simulated NCAA Tournament bracket for this week. It's an exhibit at the bottom of the NCAA Tournament: Predicting the Bracket, Track Your Team page at the RPI for Division I Women's Soccer website. If you're interested in the details, then before going to the workbook, read the text of the linked page, as it explains how to use the workbook. (This workbook's cells show values only. I generate it from a much larger and quite complex workbook that shows the formulas for the underlying calculations. If anyone is interested in this "source" workbook, PM me and we'll see if we can figure out how I can get it, and some other linked workbooks, to you.)

    As always, feel free to ask questions or provide comments.

    ATTENTION: If my calendar of NCAA procedures for the season is right, the NCAA's first publication of RPI ranks and ratings for the season should occur after next weekend's games, on Monday or Tuesday.
     
  16. stubifier

    stubifier Member

    Real Salt Lake
    United States
    Jan 19, 2018
    All other aspects of your simulation being equal, if BYU were to run the table, would they be seeded?
     
  17. cpthomas

    cpthomas BigSoccer Supporter

    Portland Thorns
    United States
    Jan 10, 2008
    Portland, Oregon
    Nat'l Team:
    United States
    My regular season simulation has them going 17-1-1, with the tie to Kansas on Thursday and the loss to Santa Clara. If they were to win both games, the simulation has them moving to #17 in the final RPI rankings. Due to the way the simulation works, they'd actually probably be a little better than that. I haven't run that scenario through the bracket simulation program, but I'm pretty confident that a team such as BYU running the table would get them a seed.

    The question would be, What seed? At #17, it would be a #3 or #4 seed, based on past history. The problem is, their schedule is soft. According to the regular season simulation, their opponents' winning percentage, which is about 40% of the effective weight of the RPI, will be 0.4918. This means that the average of their opponents' winning percentages will be below 0.500. That's definitely not good and is why as an undefeated team they still could only have a rank in the #15 area.
     
    stubifier repped this.

Share This Page