I keep track of international team strengths from time to time with a mathematical system I developed for use in a different sport. It does work just as well (if not better) in soccer and it did well enough to win rec.sport.soccer's Euro 2004 prediction contest. So I decided to simulate the World Cup using those rankings. Here are some results of the 10,000 sims I did: To win the tournament: France = 14.09% Brazil = 14.06% Germany = 11.50% Netherlands = 10.78% Spain = 8.89% Argentina = 6.91% Czech Republic = 6.02% England = 4.77% Italy = 4.68% Portugal = 4.29% Sweden = 3.78% Mexico = 1.96% U.S.A. = 1.68% Croatia = 1.3% Poland = 0.72% Paraguay = 0.67% Ukraine = 0.65% Ivory Coast = 0.54% Australia = 0.51% Serbia and Montenegro = 0.47% Switzerland = 0 .44% Tunisia = 0.37% Japan = 0.3% South Korea = 0.28% Ecuador = 0.15% Iran 0.1% Costa Rica = 0.04% Ghana = 0.03% Angola = 0.01% Togo = 0.01% The first thing that jumps out is France with a better chance to win than Brazil. Brazil _is_ ranked as the best team, but the system also says that France's toughest opponent in their group (Switzerland) is weaker than the weakest opponent in Brazil's group (Japan). The end result is that in the sim, France has about a 5% greater chance of advancing from the group stages as Brazil. Brazil then makes up most of that ground in the knockout stages. Don't weep too much for them though, they still have a better chance of winning the whole thing (according to the sim) than they do of not advancing (the same holds for France and Germany). Germany is rated as the 11th best team in the tournament, so you can see the advantage they have in hosting the tournament. The U.S.A. is ranked as the 13th best team in the tournament and also has the 13th best chances of winning at 1.68%. That is in the neighborhood of the 81 to 1 odds being offered at Bet365, though a little better. At 12 to 1 the system seems to think the best bet would be France. Saudi Arabia and Trinidad and Tobago are the only two teams who didn't win any of the 10,000 simulations (the Saudis made the final once and lost). Costa Rica, Ghana, Angola and Togo all won so few that their chances are effectively zero as well. Iran is the longest shot of the rest winning 10 of the 10,000 sims. I waver back and forth between thinking that the system underrates the favorites and thinking that it really doesn't. Anomalous results are rare to the extent that my smaples are never quite big enough to accurately gauge the extent to which the longshot odds are accurate. In order to truly know if your 1% chance is accurate, you need thousands of results. More stuff to come in this thread...
Re: World Cup Simulation Results For a comparison, can you apply your formula to the 2002 field and see how it compares with the actual results?
Re: World Cup Simulation Results Netherlands in 4th is a surprise - most Dutch fans and seemingly the players themselves seem to believe, yes we have some good young talent but this is not our strongest team, we won't make a serious run this year. They sound like the English did entering the last World Cup, that it was a bit too late for the previous generation and too early for the next generation. Do you give the Netherlands a "neighbors Germany" boost? Probably not ... How about a European home field advantage in general? Do you assign a modest boost to all European teams? It might seem so, because I'd sure have Argentina at least even with Spain and the Netherlands, in my mind.
Re: World Cup Simulation Results Interesting post. Here are the odds of winning that I come up with based on the bettors. I take the best odds offered here http://www.oddschecker.com/betting/mode/o/card/worldcup-worldcupgroups/odds/10218x/sid/10062 and then reduce everyone's odds proportionately to reach approximately 100% (due to rounding, it's not exactly 100%). Code: Brazil 29.7 Germany 10.2 England 10.2 Argentina 9.0 Italy 8.1 France 5.8 Netherlands 5.5 Spain 5.1 Portugal 3.3 Czechia 2.7 Sweden 1.6 Mexico 1.6 Ukraine 1.1 Croatia 1.0 USA 0.8 Ivory Coast 0.8 Serbia 0.7 Switzerland 0.6 Poland 0.6 Australia 0.6 S. Korea 0.3 Paraguay 0.3 Japan 0.2 Ghana 0.2 Tunisia 0.2 Ecuador 0.2 Togo 0.1 Iran 0.1 Costa Rica 0.1 Angola 0.1 Saudi Arabia 0.1 Trinidad 0.0 Here's your list for comparison's sake. If you really trust your system, you'll want to put money on France, Netherlands, Spain, and Czechia. Or, short Brazil, England, and Italy. EDIT: Added Ukraine, which was inadvertently left off.
Re: World Cup Simulation Results France (and Spain) have incredibly easy draws and should have no problem making the quarterfinals.
Re: World Cup Simulation Results No European boost, no Netherlands boost. Each is too hard to calculate and even harder to confirm. When the Dutch play the French in France, France's advantage tends to be the same as when the Russians or Australians or whomever plays in France. There aren't enough neutral site games to come up with anything outside of that situation. The system is based strictly on results. As for the Dutch, their qualifying campaign this time around was probably the best of any team in the world. 10 wins 2 draws and no losses with one of the draws coming on the last match day after they had already qualified. They beat the Czechs twice, who the system says is very strong. Grabbed two more wins against Romania who the system ranks ahead of the U.S.. Their goal differential was a stunning 27-3, the Czechs and the Romanians didn't score on them in their four games. That kind of performance against the 6th and 12th ranked teams is quite impressive. France is rated as the best defensive team in the world. The system's top 10 teams all qualified: 1. Brazil 2. France 3. Netherlands 4. Spain 5. Argentina 6. Czech Republic 7. Italy 8. England 9. Portugal 10. Sweden There is a hidden small advantage for European teams in that the regional nature of the draw somewhat limits the number of European teams other European teams have to face in the group stage. So, for example, Switzerland winds up with any easier group of a seeded team, an Asian team and African team, whereas the U.S. winds up with a seeded team, another European team and an African team.
Re: World Cup Simulation Results I trust my system, but I have no money at all. And I know enough about probability to know that you have to make lots of bets in sports gambling to come out ahead, even with a perfect system. You have to bet a lot of 8 to 1 shots in order to ensure coming out ahead betting 8 to 1 shots.
Re: World Cup Simulation Results Please do keep it coming - very interesting to look at. A question. Do you have a mechanism that adjusts for key injuries? Rooney is the obvious example this year. England is a strong team without him but clearly not the same team. I ask (in part) because that is the sort of information someone betting would use to 'discount' England's chances. [I'm not a betting man myself.] Given all the teams that crashed out unexpectedly in 2002 (e.g. Argentina, France, Portugal), I think you have to focus first on who will make it out of their group. Even Brazil's chances have to be something less than 100% to move on. I wouldn't think much less, given their group, but a loss in Game 1 with all the finger-pointing to follow, might devolve into a downward spiral. [This will sound comical after they win three straight; funnier still if they win seven straight.] The Netherlands and Argentina both have strong young teams that should get stronger still as they move along. But there's that nagging issue of escaping group C. One of them might be going home early. I hope not, though. Anyway, someone's probably made this point elsewhere, but I'll make it here. I think Germany and Groups A & B have a distinct, if slight, overall advantage, especially over Groups G & H, because the former's seven games will be spread out over a longer period of time. Now this is addressed in part by the way the early knockouts are set up (e.g. A v. B), but if the Final is an A or B team vs. a G or H team, I think the 'compressed' schedule of the latter team could work against them. That's the sort of factor that would be tough to adjust for, even if you wanted to. Lastly, I saw part of the France v. Mexico friendly on Saturday. Now, it's unfair to judge on a single performance, let alone a friendly, but Zidane gave every evidence that he's lost it. Gone. Wooden, heavy legs, little energy, etc. Great French goal, however, and if the French coach let's the young players play, they could do some real damage.
Re: World Cup Simulation Results My question would concern the mathematical seeding formula. I would assume that it uses some sort of algorithm that uses: 1. results between teams; 2. against common opponents, taking into account the relative strength of each ; and 3. all other games, rating the relevance of each by the relative strength of the teams. It seems to me that the more the teams overlap in terms of opponents, the more accuracy would be obtained. Hence, the relative success in the Euro 2004 prediction contest. Rating Germany vs. England vs Italy involves a good degree of overlap - they've all played each other, and have played many teams that have played the others, often times in games that have real meaning (Euro and WC qualies). But it seems that it would work less well for the World Cup since there is less overlap between the teams. How would you be able to judge the relative strength of, say Mexico vs. Iran vs. Holland? How many common opponents have they had? Of course, your system may use a logic entirely different from this, in which case you can ignore this post.
Re: World Cup Simulation Results Again, this too difficult to do in a mathematical system. It involves guesswork and when I add guesswork to the system I pretty much defeat my whole purpose of doing it in the first place: an exercise in the applied use of statistics and probability, in a field I find interesting and enjoyable. The system originally was devised to come up with strength of opposition factors for college baseball teams in order to evaluate players for a future in the pros. Hitting 15 homers for Southern University and hitting 15 homers for Alabama are two different types of performances entirely, and I needed a system that could quantify those differences. If you want to use my numbers for whatever purpose you seek, you can go ahead and make mental adjustments for injuries a possible slight home field advantage for European teams or whatever. Again the system is based solely on results, I don't "thumb the scales" at all.
Re: World Cup Simulation Results If you're using FIFA rankings to assess the relative strengths of teams, you might as well flip a coin.
Re: World Cup Simulation Results You are 100% correct, but it uses a sort of "six degrees of Kevin Bacon" sort of thing to get the correct amount of overlap. So each point in the chain extends in many different directions each to a separate point which extends in many different directions and so on. The only teams where I'm really concerned about the size of the sample are the really bad ones. We know they're bad, but we really don't know how they stack up with one another. Who is the worst team in the world? Guam or American Samoa? I couldn't tell you. Friendlies _are_ used precisely because of the problem you mention, but they count less than competitive matches and analysis shows that by themselves (with no other matches used) they still give a fairly accurate portrayal of team strengths. My opinion is that the intra-federation of rankings between teams in this system is its biggest advantage over something like ELO, which I believe overrates teams with weaker schedules.
Re: World Cup Simulation Results don't be sure, they look ready to bring it and have the added motivation (on several levels) of wanting to win it in germany
Re: World Cup Simulation Results Last question before signing off for the night. Could you run (or direct me to where I could find) all the 4-team group combinations/permutations/whatever, which would give some sense (in the abstract) of what a team's chances are with a given number of points. [9-6-3-0; 9-4-4-0; 9-3-3-3; 9-2-2-2; 6-6-6-0; 5-5-5-0; etc. There are a lot of them!]* How many of these scenarios come down to goal difference (or another tie-breaker)? Someone has already begun a 'historical' analysis (of 1998 & 2002 - 32 teams - 2 to advance). Presumably no teams have advanced with just two points; only one (Chile) with three; how many with four, five and six? All advancers (except Chile) must have won at least one game. My guess: W-D-L = 4 points sounds too low for confidence W-D-D = 5 points sounds much better *If you even told me how many combinations there are, that would be helpful. Thanks!
Re: World Cup Simulation Results This sounds eerily like Boyd Nation's ISR. I had suggested, in another thread awhile back, that someone devise a system similar to it in order to give us more accurate world rankings. Since compensatory mechanisms for regional bias are built into it, it seemed ideal. Since you obviously are the expert and not me, and assuming you know anything about the ISR, is this a similar predictor? ~Justin
Re: World Cup Simulation Results Interesting results. France and Spain especially seem to be more highly rated by your system than I've seen elsewhere, but baseball analyst Bill James once said that any good new analytical framework should give you 80% expected results and 20% surprising results.
Re: World Cup Simulation Results I think your system is intriguing and interesting; however I have to disagree with your system when France is rated as one of the top 5 teams. Were talking about this same French team that barely qualified, was out of the tournament at one point really late into qualifying, lost to some big time no names...either Isreal or Saudi Arabia one of those two IIRC. France simply is not one of the teams imo that will do some damage in Germany, they qualified by a thread with those so called "young guys"; which led to a bunch of their previous stars coming out of retirement to save face for France. Without the guys that are coming out of retirement France would be nothing, so whomever made the comment about France letting their young guys play is wrong, those young guys almost cost them getting into the WC.
Re: World Cup Simulation Results Yes in the sense that it's iterative, no in the way it does the iterations and how it comes up with two ratings (offense and defense) for both teams. Boyd's data is invaluable however and his rankings and mine tend to generate similar results (as they should). Also a player's college performance quality isn't the only factor in determining their future pro success. The way their performance is shaped (the individual component statistics) is a large factor as well, though this is off topic a bit.
Re: World Cup Simulation Results One of the ways the system is intriguing is the way the two rating system works. A team with a relatively low offense and a strong defense will tend to be shaped in such a way that they wind up with a lot of draws, but have the ability to beat anybody if they can score a goal. France might not be as good a team against Switzerland as an Argentina, but I think they are a better team than Argentina against a Brazil. Scoring on france in meaningful matches can be very very hard. France's qualifying group is a good example. Their goal differential of 14 to 2 was outstanding, the problem was the 14 goals in 10 matches was low and led to a whole bunch of draws (three 0-0 results and two 1-1 results).
Re: World Cup Simulation Results Nice work. Would it be easy to run simulations that assign teams to completely random groups? If so, this might be an interesting way to see the effect of seeding and the draw.
Re: World Cup Simulation Results so you're saying that 80% of the time the system should work everytime.