PDA

View Full Version : Youth National Team Rankings Cont'd


voros
12 Jul 2006, 08:24 PM
The discussion on the US youth board here:

http://www.bigsoccer.com/forum/showthread.php?t=379969

was on the verge of becoming a technical one, so I decided to move that part of the discussion here.

Numerista's concern with my system (as I understand it), was that the strength of schedule adjustments that occur due to results in 2005 are affected by the results those teams had in 1999 and 2001. So that a strong 2005 Mexican U17 team would represent a weaker strength of schedule factor than the opposition they faced in 2005. And of course the inverse problem with teams who faced them in previous years: IE playing Mexico in 2005 was more difficult than playing them in 2001.

This is easy enough to solve. All you have to do is calculate separate ratings for each cycle and then average them up, or use a weighted average favoring more recent results. That way Mexico's 2003 quality has no effect on the strength of schedule factor for their opponents in 2005 (and vice-versa).

My counter-argument to this is that any set of rankings, even with a very large sample has some sort of assumed inherent error range. The rankings represent our best guess as to their team's strength, but we assume that this guess has an element of some unreliability to it. If we do the ratings based on a single cycle, if the error range for the ratings around each team is essentially random, numerista's suggestion still works. That is, if the information in those rankings is all we really have to go on, then despite the inherent inaccuracies, we've constructed things as best as we can.

The problem of course is that results from the 2003, 2001 and 1999 cycles do likely give us clues as to which direction and size the the 2005 cycle ratings point to. Here's a good recent example. It is very easy for me to use my poisson win% formula to calculate win rankings for national teams (they are about as good as doing two separate rankings, though less informative). As such it is also easy for me to construct the rating system around the 64 World Cup matches that just took place. Here are the final rankings:

1. Italy - 1054
2. France - 1054
3. Switzerland - 857
4. Brazil - 847
5. Spain - 463
6. South Korea - 319
7. Ghana - 251
8. Ukraine - 216
9. Australia - 175
10. Czech Republic - 157
11. Croatia - 128
12. U.S.A. - 109
13. Germany - 108
14. Portugal - 96
15. Togo - 94
16. England - 87
17. Argentina - 86
18. Japan - 74
19. Tunisia - 52
20. Netherlands - 47
21. Saudi Arabia - 43
22. Ecuador - 37
23. Sweden - 36
24. Mexico - 34
25. Paraguay - 24
26. Ivory Coast - 22
27. Angola - 20
28. Iran - 11
29. Poland - 11
30. T & T - 10
31. Serbia and Montenegro - 8
32. Costa Rica - 6

As you can see, many of these ratings don't match the public perception of how well various teams did. The US is ahead of Germany the Ukraine ahead of Argentina, etc. Why? The short answer is sample size. But why did the small sample size have this result?

Because, to put it simply, we have additional information at our disposal in which to gauge team strength that goes beyond their world cup results. We believe Argentina is a stronger side than Ukraine, based largely on information gleaned previous to the tournament. So to the extent that their world cup results do not reflect their overall strength of a side, this in turn affects the ratings of their opponents they faced in the cup. Mexico tying Argentina is a much better result than the Swiss tying the Ukraine, but the above ratings don't see it that way and so both of the former suffer from both of the latter.

IOW, while how well a team played in 2004 has no direct bearing on how they played in 2006, given our limited sample, it does give us hints as to the overall strength of various teams in 2006 that go beyond their 2006 results. Therefore, if there tends to be much consistency in the strength of national teams from year to year (there is) knowing what teams did in 2005, 2004, 2003, 2002 and 2001 does give us more information with which to interpret the results in 2006.

And this holds true for youth teams as well. If Spain or Argentina fails to make it out of the first round of a youth tournament, the teams that beat them probably deserve credit for beating a youth team that has consistently been excellent in those tournaments, rather than simply dismissing the Argentina team as being unusually subpar.

If in a youth cycle Spain rates a 90 and Australia rates a 120, we know those ratings have some error range compared to the actual strength of those teams. My contention is that the error range for those ratings is not random, but rather previous youth tournament results tell us information about the extent and the direction to which those ratings 'miss'. If they are not random, and we assume Spain has a much better chance of having been better than that rating than Australia, and a much lower chance of being worse, then of course the ratings themselves should be changed to accomodate this info. Once the ratings for those teams change, everybody's rating changes.

And so through that prism, it makes sense to use results from 2003 and 2001 to gauge the results from 2005. Without large team strength independence from cycle to cycle or large samples per cycle (and we are without both), how we evaluate what happened in 2005 has much to do with what happened in previous cycles.

numerista
12 Jul 2006, 10:11 PM
Here is a very brief synopsis of the two modeling options you've laid out ...

Estimate 1: assume that past and future generations tell us nothing about the current generation's quality.

This has the advantage of being unbiased, but because each generation plays only a small number of games, it has high random variability.

Estimate 2: assume that there is no variability at all from generation to generation.

This has the advantage of minimizing random variability, but because it doesn't adjust for a particular generation's strength, it sometimes causes bias, e.g. in a regional tournament format.

---
To my mind, neither of these procedures is really satisfying. In the specific case we're discussing, I'd like to see ...
Estimate Ideal: assume that a team that qualifies for a world championships is better than usual by an appropriate amount.

For teams that qualify regularly, not much adjustment is needed, but for teams that usually have a harder time qualifying, it would be informative to adjust. (These adjustments have a broad effect because some regions are dominated by the same few teams, while others have a larger pool of contenders.) At this point, I haven't thought in detail about ways to pursue this ideal, but I'm pretty sure that it's feasible to do it without generating nearly as much random variability as Estimate 1.

voros
12 Jul 2006, 10:49 PM
Estimate 2: assume that there is no variability at all from generation to generation.
I'm not sure once the time wights are added, this necessarily does that. Certainly 2003 results don't count for as much as the 2005 results, and the 2001 and 1999 results even less so. With qualifiers added in, this should give every team at least some matches each cycle, to avoid the selection bias you speak of.

As I argued, there are clear relationships between the results those cycles and those in 2005. If we are to estimate the true strength of Mexico's U17 squad, I think it's only reasonable to gauge that maybe they aren't as strong as their 2005 results suggest, and saying that based on their unremarkable results previously is a defensible action. Much stronger than before certainly, but then the system would suggest that anyway.

voros
12 Jul 2006, 11:00 PM
Also random variability from the rankings isn't the issue. It's the non-random variability that causes single cycle rankings to breakdown. If the variations were truly random, we could simply do it individually for four cycles and then average the ratings. But the reality is a win against Spain is likely a better win than one against Australia often regardless of what the individual cycle ratings say. The fairly strong relationship between team quality from one cycle to the next suggests that if you could add 30 more results to each team, the ratings would pull much more in Spain's direction.

Now we could, I suppose, add schill results (fake results designed to accomplish something), where each cycle every team plays 'x' amount of schill games, with scores designed to establish their traditional baseline of results and then add that cycle's results to the schill games. The difficulties lie in deciding how many schill games to add and what precisely to make out of the results of the rankings afterward.

However, I think you can certainly argue that just as strong results in 1999 - 2003 argue for better team quality in 2005 (all other things equal), so too do strong results in 2005 argue for better team quality in 1999 - 2003. If we accept that as being the case, the schill system doesn't offer that reverse feedback.

numerista
12 Jul 2006, 11:10 PM
I'm not sure once the time wights are added, this necessarily does that. Certainly 2003 results don't count for as much as the 2005 results, and the 2001 and 1999 results even less so. With qualifiers added in, this should give every team at least some matches each cycle, to avoid the selection bias you speak of.

A chief source of bias is that teams in the finals are not, in general, average versions of themselves. The reason that many (but not all) of them reach the finals is because at that time, their youth generation is better than usual. Including qualifiers may provide some indication of this relative to other teams within their confederation, but the between-confederation ratings end up skewed.

If we are to estimate the true strength of Mexico's U17 squad, I think it's only reasonable to gauge that maybe they aren't as strong as their 2005 results suggest, and saying that based on their unremarkable results previously is a defensible action. Much stronger than before certainly, but then the system would suggest that anyway.

Regardless of how strong Mexico is considered to be, Estimate 2 assigns an identical rating to all of Mexico's different U-17 teams*. That's clearly not accurate.

*This is what I mean when I say it assumes team strength is constant in time.

numerista
12 Jul 2006, 11:31 PM
Also random variability from the rankings isn't the issue. It's the non-random variability that causes single cycle rankings to breakdown. If the variations were truly random, we could simply do it individually for four cycles and then average the ratings. But the reality is a win against Spain is likely a better win than one against Australia often regardless of what the individual cycle ratings say.

I'd still call this random variability. On average, individual cycle ratings will say that a win over Spain is better than a win over Australia ... sometimes it will be rated much better, but other times it will be rated worse.

Now we could, I suppose, add schill results (fake results designed to accomplish something), where each cycle every team plays 'x' amount of schill games, with scores designed to establish their traditional baseline of results and then add that cycle's results to the schill games. The difficulties lie in deciding how many schill games to add and what precisely to make out of the results of the rankings afterward.

I agree that this seems difficult.

How do you incorporate home field advantage into your model? What I suspect is that you could also include a "better-than-usual advantage" parameter to each team in its final-round games. Given a reasonable estimate of how much team strength varies from one cycle to another, this wouldn't be too hard to produce.

voros
12 Jul 2006, 11:40 PM
A chief source of bias is that teams in the finals are not, in general, average versions of themselves.
But adding the qualifiers should suss that out.

voros
13 Jul 2006, 12:12 AM
How do you incorporate home field advantage into your model?
I used to do it in a bunch of different ways. What I do now is the system works by dragging the expected results toward the actual results by changing the ratings. The ratings converge when each team's projected win% for their total number of games equals their actual win% for those games. Home field is added when calculating the projected win% for each game. By adding more expected wins to teams who play lots of home games, this has the same affect as a team who faces lots of inferior competition.

As to the other point, is the France team that drew South Korea the same one that beat Brazil. I mean team quality varies not just by player, but from game to game. I thinking you're making an exception out of one type of variance in performance quality when you theoretically could do so for all sorts of other reasons. My point simply is that a difference in rosters is not the only place a line can be drawn as to what constitutes different teams.

numerista
13 Jul 2006, 10:18 AM
I thinking you're making an exception out of one type of variance in performance quality when you theoretically could do so for all sorts of other reasons.

Some noise averages out over time; other noise isn't relevant to the comparison at hand. In this case, however, we're not that fortunate. We'd like to compare the US youth with teams from other confederations, but most of its out-of-confederation oppoents are stronger-than-average versions of themselves.

For instance, in the most recent U-17 World Cup, the US had Ivory Coast and North Korea in its group...

Ivory Coast U-17s in Qualifying, 99-03

99 - Swept by non-qualifier Cameroon 2-0 and 1-0
01 - Didn't enter
03 - Swept by non-qualifier Guinea 2-0 and 1-0
05 - Eliminated Senegal and Morocco in playoffs; finished ahead of Nigeria and Zimbabwe in group play; lost to Ghana in semis, but beat South Africa in third place game.

North Korea U-17s in Qualifying, 99-03
99 - finished 4th in group play, behind Thailand, Iraq, and hosts Qatar
01 - didn't enter
03 - hosted semifinal round gp w/ China and Mongola; lost to China, who qualified.
05 - In group play, finished behind China and ahead of Thailand and hosts Japan; in semis, beat Iran on PKs; lost to China in final.

The 2005 editions of these teams do appear to have been stronger than previous ones, and that's the general tendency we expect. If no adjustment is made for it, it creates a bias.

tachyon1
13 Jul 2006, 10:36 AM
Whilst agreeing that you can gleen some information about a current team's ability from previous tournaments I'd argue that the connection is at it's weakest when considering international teams.
An outstanding young player can carry as side at youth level and mass retirements of both players and managers isn't unknown after senior tornaments.So tournament on tournament continuity can suffer.

You therefore need to extract the maximum amount of information from a limited number of matches played in tournament play.

The obvious situations to look at first are qualifying matches for the current tournament.This not only brings non qualifiers into the loop but can also provide extra games between competing qualifiers.(Brazil,Argentina,Paraguay and Ecuador played each other in the Conmebol games,as did S&M and Spain in Euro group 7).

Once at the tournament goals for and against are equally obvious,but you also need to address the strength of schedule issue.The fixture list will be incomplete,but by the end of the tournament you will have a line of collateral form that inculdes every team.(Saudi didn't play Germany,but they did play Ukraine,who played Switzerland,who played France,who played Italy,who did play Germany.)

Next don't neglect the natural advantages some teams have on a world stage.Home advantage is one,but more important,because it affects many more matches is continental advantage(the records of European teams in the Americas and Japan compared to the records of European sides 'at home' in Europe is most illuminating.)
When Ecuador played Germany,the hosts had a huge combined advantage over their visitors even before the respective abilities of each team was considered.Don't leave these venue based advantages in the numbers.

Allow for situations that may not repeat.Sendings off are part of the game,but when you are trying to evaluate a team on perhaps only 3 games,an hour with only 10 men can distort matters greatly.So you may be justified in allowing for it.

Next make use of extra time.With golden and silver goals rightly done away with teams are more likely to play a more normal style in ET.So if a team wins in extra time give them the appropriate credit for the win,if you don't you're chucking away 30 minutes of extra information.

Equally use the results of penalty shootouts,it's not a lottery.The better pre game team wins more often than a lottery implies they should,so again credit them.

Lastly,do the derived ratings(as opposed to rankings) make sense.Historically you should have a pretty good idea of the usual spread of talent in a specfic tournament,so do your final tournament ratings reflect this spread.Would you be happy to lay odds based on them ? :-).

scaryice
13 Jul 2006, 07:27 PM
Mexico tying Argentina is a much better result than the Swiss tying the Ukraine, but the above ratings don't see it that way and so both of the former suffer from both of the latter.

Mexico did not draw with Argentina. Are you only counting results after 90 minutes?