Designing the optimal rating system

palynka · Nov 30, 2013

We all love to hate the FIFA rankings.

Although an inaccurate system certainly stimulates discussion, I believe a good rating system should be ultimately a "good" predictor of outcomes. So I propose that we design a system based on its ability to forecast results.

ELO is obviously a first candidate. It's a good approach, but I believe it can be improved. Here are some examples of things that could be improved:

1) It assumes performance follows a specific distribution: the Extreme Value distribution.
The current ELO model assumes that teams performances follow an Extreme Value distribution. From this distribution, you get that the probability of A's performance being better than B's performance follows a Logistic distribution. This is done for convenience, but can we do better if we relax that assumption?

2) It assumes that teams are equally regular.
The underlying model assumes team performances follow an Extreme Value distribution with the same shape parameter. So the location parameter changes, but not the shape one. Very roughly, this means that average scores are different but teams are equally regular.

3) It doesn't use information on goal difference. A win is a win, and winning by 1-0 or 7-0 will give you the same points. But shouldn't a 7-0 mean that we should update the ratings by more? It doesn't make sense to throw away this information.

4) ....

To begin with, I want to first focus on point 3. Most pair-wise comparison models (which are the theoretical underpinning of ELO) tend to look at two elements (A and B) and try to compare them. But these models are made for binary results. They ask the question, which one is better and the answer can only be "A" or "B". It tries to estimate the probability that A is better than B, so the ratings differences are a measure of how likely A is better than B rather than how much better/worse is A when compared to B. These sound similar but are not the same.

For example: this:
http://en.wikipedia.org/wiki/World_Football_Elo_Ratings
Uses the information on goal difference, but has to use all this ad-hoc tables for scaling K appropriately. This is trying to fit a square peg on a round hole. We should be using this information to estimate how much goal difference matters, rather than assuming that a 2-0 win is scaled by 1.5 relative to a 1-0 win. Same for tournament matches vs friendly matches. If friendly matches are not good predictors, we should let the model estimate how much they are relevant, we shouldn't just assume a scaling factor of 20 vs 60.

So I think we can do better by looking at latent variable models rather than at pairwise comparison models. The latent variable in this case is the team's strength. We do not observe the strength directly, but we observe the results from which we can try to infer the true strength. We can also use information indirectly, if we want to. For example, A, B, C and D are in a group. They all play each other once. Imagine they all start with the same rating but A plays and wins all its games first (in alphabetical order). Winning against B, C and D will give A some points. Since A wins points in each game, ELO will give A more points for the win against B than against D. If then B loses both its games against C and D, that will not change A's rating. But why not? B seems a worse team, yet we are fine with letting A take more points for a win against B? Why should we not use the information from B's games with C and D to update A's rating? Should the order in which they play matter so much if the games are all in quick succession?

I believe we are throwing away information. Using different types of models, we can put more variables in there and let the data speak.

Is anyone interested in trying to work out a method based on latent variable models? Could even be used for gambling purposes if we get it right...

palynka · Nov 30, 2013

Somehow this went from designing an optimal ranking system to predicting results. I forgot to add, that once we have a predicting model, it's very easy to back out a ranking system as we'd estimate the latent variable (the team's strength). I should have been clearer on that.

soccersubjectively · Dec 2, 2013

This seems relevant: http://everybodysoccer.com/2013/12/02/international-rankings-royale-2/

palynka · Dec 2, 2013

soccersubjectively said: ↑

This seems relevant: http://everybodysoccer.com/2013/12/02/international-rankings-royale-2/
Click to expand...

Very relevant, indeed!

It shows that ELO is a poor predictor. I think I've exposed some of its design flaws above, and the data there shows that it isn't hard to improve on it.

There is a lot of promise here, people...

soccersubjectively · Dec 2, 2013

Yeah we're not at the point where computers can predict soccer well enough. I like kenpom.com a lot but basing rankings only on results is obviously not great.

JamesBH11 · Dec 10, 2013

palynka said: ↑

Very relevant, indeed!

It shows that ELO is a poor predictor. I think I've exposed some of its design flaws above, and the data there shows that it isn't hard to improve on it.

There is a lot of promise here, people...
Click to expand...

Obviously, ELO or FIFA system are both based on historical data and then accumulate in points. Theygive a good hint but no warranty on the "instant result" nor "most recent form" of the teams (that impact to the result)

SiberianThunderT · Jan 9, 2014

I wonder, has anyone ever tried modifying the Glicko system to soccer? Seems like it would be an interesting experiment to attempt.

Dward1 · Feb 4, 2014

the best systems right now are almost surely using shot location data. I have a primitive system that I use where I break down each teams shots into 6 zones and credit them with a certain amount of goals based on taking a shot from that zone. goal % and shot on target % for that team come into effect some as do the defensive versions of those stats. the idea is generally that there is a lot of luck involved if a ball goes on target or goes into the net and there are so few of them it is hard to get a good rating system based on goals alone. shots happen a lot more so are a richer data set