Statistical Rankings/Gold Cup Predictions

slaminsams · Jul 17, 2013

So how many predictions did you get right in the group phase?

NoSix · Jul 17, 2013

slaminsams said: ↑

So how many predictions did you get right in the group phase?
Click to expand...

11 out of 18. See post #19 for details.

EvanJ · Jul 18, 2013

NoSix said: ↑

The probability of predicting 11 or more out of 18 matches correctly by chance alone is only 1.4%, so TrueCrew's concern regarding the usefulness of the model has proved to be unwarranted.
Click to expand...

The nine people in my prediction contest averaged 10 2/3 correct results with six of them having at least 11 correct results, so a person who knows how good the teams are has much better than 1.4% chance at having 11 correct results. I'm not calling your model bad, but I don't think the random chance percentage is relevant.

NoSix · Jul 18, 2013

EvanJ said: ↑

The nine people in my prediction contest averaged 10 2/3 correct results with six of them having at least 11 correct results, so a person who knows how good the teams are has much better than 1.4% chance at having 11 correct results. I'm not calling your model bad, but I don't think the random chance percentage is relevant.
Click to expand...

OK, so how does a person "know how good the teams are"?

NoSix · Jul 18, 2013

EvanJ said: ↑

The nine people in my prediction contest averaged 10 2/3 correct results with six of them having at least 11 correct results, so a person who knows how good the teams are has much better than 1.4% chance at having 11 correct results. I'm not calling your model bad, but I don't think the random chance percentage is relevant.
Click to expand...

Noone is claiming that the players in your contest have only a 1.4% chance of having 11 or more correct results. Imagine that prior to start of the Gold Cup, I sat down with a list of the 18 group stage fixtures, and for each fixture I rolled a die, and predicted the first team to win if the result on the die was 1 or 2, predicted the second team to win if the result on the die was 3 or 4, and predicted a draw if the result on the die was a 5 or 6. Choosing results randomly in that manner, there is a 100-1.4=98.6% probability that I would pick 10 or fewer match results correctly. The fact that six of your nine contestants did better than that indicates that they are skillful at picking match results, not just lucky. By the same token, the fact that my algorithm did better than that also indicates that it is "skillful" at picking match results. (Note that had my algorithm entered your contest, it would be kicking your butt right now - while you have one more correct result (12 vs 11), my algorithm has picked 5 exact scores correctly compared to your 0.)

NoSix · Jul 19, 2013

NoSix said: ↑

Quarterfinal Day 1 Preview:

#4 PAN vs #16 CUB
Probability of PAN win: 69%
Probability of CUB win: 10%
Probability of draw after regulation: 21%
Prediction: PAN 1 CUB 0 (18% probability)

#1 MEX vs #8 TRI
Probability of MEX win: 68%
Probability of TRI win: 11%
Probability of draw after regulation: 21%
Prediction: MEX 1 TRI 0 (16% probability)
Click to expand...

Further breaking down the probability of draws after regulation:

#4 PAN vs #16 CUB
Probability of PAN winning in overtime: 8%
Probability of CUB winning in overtime: 2%
Probability of draw after overtime: 11%
Overall probability of PAN winning: 82.5%
Overall probability of CUB winning: 17.5%

#1 MEX vs #8 TRI
Probability of MEX winning in overtime: 8%
Probability of TRI winning in overtime: 2%
Probability of draw after overtime: 11%
Overall probability of MEX winning: 81.5%
Overall probability of TRI winning: 18.5%

NoSix · Jul 19, 2013

NoSix said: ↑

Quarterfinal Day 2 Preview:

#2 USA vs #11 SLV
Probability of USA win: 75%
Probability of SLV win: 7%
Probability of draw after regulation: 18%
Prediction: USA 1 SLV 0 (18% probability)
(Note that a 2-o result (18.1% probability) is almost as likely as a 1-0 result (18.4% probability))

#5 HON vs #3 CRC
Probability of HON win: 23%
Probability of CRC win: 43%
Probability of draw after regulation: 34%
Prediction: HON 0 CRC 1 (19% probability)
Click to expand...

Further breaking down the probability of draws after regulation:

#2 USA vs #11 SLV
Probability of USA winning in overtime: 8%
Probability of SLV winning in overtime: 1%
Probability of draw after overtime: 9%
Overall probability of USA winning: 87.5%
Overall probability of SLV winning: 12.5%

#5 HON vs #3 CRC
Probability of HON winning in overtime: 5%
Probability of CRC winning in overtime: 8%
Probability of draw after overtime: 21%
Overall probability of HON winning: 38.5%
Overall probability of CRC winning: 61.5%

EvanJ · Jul 19, 2013

NoSix said: ↑

Noone is claiming that the players in your contest have only a 1.4% chance of having 11 or more correct results. Imagine that prior to start of the Gold Cup, I sat down with a list of the 18 group stage fixtures, and for each fixture I rolled a die, and predicted the first team to win if the result on the die was 1 or 2, predicted the second team to win if the result on the die was 3 or 4, and predicted a draw if the result on the die was a 5 or 6. Choosing results randomly in that manner, there is a 100-1.4=98.6% probability that I would pick 10 or fewer match results correctly. The fact that six of your nine contestants did better than that indicates that they are skillful at picking match results, not just lucky. By the same token, the fact that my algorithm did better than that also indicates that it is "skillful" at picking match results. (Note that had my algorithm entered your contest, it would be kicking your butt right now - while you have one more correct result (12 vs 11), my algorithm has picked 5 exact scores correctly compared to your 0.)
Click to expand...

I know what you mean with rolling a die. I just think that when you brag about your model you should use the latter part of the paragraph (comparing it to people) and not the 1.4% part.

NoSix · Jul 22, 2013

Before looking forward to the semifinals, a quick summary of the predictions to date:

Out of 22 matches, 14 results were predicted correctly (64%), including 6 exact scores predicted correctly (27%).

The probability of predicting 14 or more out of 22 matches correctly by chance alone is only 0.3%.

NoSix · Jul 22, 2013

Updated rankings, based on match results through 2013/7/21:

rank team w d l pf pp pct gd
1 MEX 13.4 3.7 1.9 43.8 57. 0.769 1.55
2 USA 13.6 3.0 2.4 43.8 57. 0.768 1.76
3 PAN 11.4 4.4 3.2 38.6 57. 0.677 1.04
4 CRC 11.2 4.8 3.1 38.2 57. 0.671 0.95
5 HON 9.7 5.1 4.2 34.3 57. 0.602 0.64
6 JAM 8.2 5.8 5.0 30.5 57. 0.534 0.33
7 GLP 8.2 4.6 6.2 29.3 57. 0.514 0.27
8 TRI 7.4 5.1 6.5 27.2 57. 0.477 0.11
9 MQE 6.8 5.4 6.9 25.7 57. 0.450 0.00
10 DOM 6.9 3.8 8.3 24.5 57. 0.430 -0.23
11 GUA 6.4 4.9 7.6 24.2 57. 0.425 -0.16
12 SLV 6.1 4.9 8.0 23.2 57. 0.406 -0.23
13 CAN 5.5 6.1 7.4 22.6 57. 0.396 -0.24
14 HAI 4.8 6.0 8.2 20.5 57. 0.359 -0.36
15 GYF 5.3 3.7 10.0 19.7 57. 0.345 -0.69
16 ATG 4.6 4.4 10.0 18.3 57. 0.321 -0.75
17 CUB 4.2 4.5 10.2 17.2 57. 0.302 -0.75
18 NCA 4.1 3.9 11.1 16.1 57. 0.282 -1.00
19 GUY 3.2 3.8 12.0 13.4 57. 0.236 -1.21
20 BLZ 2.6 4.9 11.5 12.6 57. 0.222 -1.02

USA's blowout win over SLV combined with MEX's squeaker over TRI leave the two virtually tied at the top. The four gold cup semifinalists rank 1st-3rd and 5th.

Biggest gainers are USA, up 23 pct points and 0.21 gd, and PAN, up 22 pct points and 0.14 gd.

Biggest losers are CUB, down 25 pct points and 0.16 gd, and SLV, down 20 pct points and 0.11 gd.

NoSix · Jul 22, 2013

Semifinal Preview:

#2 USA vs #5 HON
Probability of USA win: 69%
Probability of HON win: 8%
Probability of draw after regulation: 23%
Prediction: USA 1 HON 0 (21% probability)
In the event of a draw after regulation:
Probability of USA win after overtime: 9%
Probability of HON win after overtime: 2%
Probability of draw after overtime: 12%
Overall probability of USA win: 84%
Overall probability of HON win: 16%

#1 MEX vs #3 PAN
Probability of MEX win: 45%
Probability of PAN win: 24%
Probability of draw after regulation: 31%
Prediction: MEX 1 PAN 0 (18% probability)
In the event of a draw after regulation:
Probability of MEX win after overtime: 8%
Probability of PAN win after overtime: 5%
Probability of draw after overtime: 18%
Overall probability of MEX win: 62%
Overall probability of PAN win: 38%

NoSix · Jul 25, 2013

Through the semifinals, out of 24 matches, 15 results were predicted correctly (63%), including 6 exact scores predicted correctly (25%).

The probability of predicting 15 or more out of 24 matches correctly by chance alone is only 0.3%.

NoSix · Jul 25, 2013

Updated rankings, based on match results through 2013/7/24:

rank team w d l pf pp pct gd
1 USA 13.8 2.9 2.3 44.3 57. 0.778 1.85
2 MEX 13.0 3.8 2.1 42.9 57. 0.753 1.46
3 PAN 11.8 4.2 3.0 39.5 57. 0.693 1.13
4 CRC 11.1 4.7 3.1 38.2 57. 0.670 0.94
5 HON 9.5 5.1 4.4 33.7 57. 0.591 0.60
6 JAM 8.2 5.8 4.9 30.5 57. 0.536 0.33
7 GLP 8.3 4.6 6.2 29.4 57. 0.515 0.27
8 TRI 7.4 5.1 6.5 27.2 57. 0.478 0.11
9 MQE 6.6 5.4 7.0 25.1 57. 0.441 -0.04
10 DOM 6.9 3.8 8.2 24.6 57. 0.432 -0.23
11 GUA 6.4 4.9 7.6 24.3 57. 0.426 -0.16
12 SLV 6.1 4.9 8.0 23.2 57. 0.407 -0.23
13 CAN 5.5 6.1 7.4 22.6 57. 0.397 -0.24
14 HAI 4.8 6.0 8.2 20.5 57. 0.360 -0.37
15 GYF 5.4 3.7 10.0 19.7 57. 0.346 -0.69
16 ATG 4.7 4.4 9.9 18.4 57. 0.322 -0.75
17 CUB 4.3 4.5 10.2 17.3 57. 0.304 -0.74
18 NCA 4.0 3.9 11.1 15.9 57. 0.280 -1.01
19 GUY 3.2 3.8 12.0 13.4 57. 0.236 -1.21
20 BLZ 2.6 4.9 11.5 12.7 57. 0.222 -1.03

With their win over HON, USA take over the top spot from MEX for the first time in Jurgen Klinsmann's tenure. The gold cup final will be contested between #1 USA and #3 PAN.

Biggest gainers are PAN, up 16 pct points and 0.09 gd.

Biggest losers are MEX, down 16 pct points and 0.09 gd.

NoSix · Jul 25, 2013

Final Preview:

#1 USA vs #3 PAN
Probability of USA win after regulation: 63%
Probability of PAN win after regulation: 14%
Probability of draw after regulation: 23%
Prediction: USA 1 PAN 0 (15% probability)
In the event of a draw after regulation:
Probability of USA win after overtime: 9%
Probability of PAN win after overtime: 3%
Probability of draw after overtime: 11%
Overall probability of USA win: 77.5%
Overall probability of PAN win: 22.5%

NoSix · Jul 25, 2013

More on Sunday's Gold Cup Final:

Breakdown by goal difference after regulation:
Probability of USA win by 4 goals: 4.4%
Probability of USA win by 3 goals: 10.6%
Probability of USA win by 2 goals: 19.6%
Probability of USA win by 1 goal: 26.4%
Probability of draw: 22.7%
Probability of PAN win by 1 goal: 10.4%
Probability of PAN win by 2 goals: 3.1%
Probability of PAN win by 3 goals: 0.7%
Probability of PAN win by 4 goals: 0.1%

Top 10 most likely scores after regulation:
USA PAN prob
1 0 15.0%
2 0 13.3%
1 1 10.5%
2 1 9.3%
0 0 8.5%
3 0 7.8%
0 1 5.9%
3 1 5.5%
1 2 3.7%
4 0 3.5%

NoSix · Jul 29, 2013

Out of 25 total Gold Cup matches, 16 results were predicted correctly (64%), including 7 exact scores predicted correctly (28%).

The probability of predicting 16 or more out of 25 matches correctly by chance alone is only 0.2%.

NoSix · Jul 29, 2013

Updated rankings, based on match results through 2013/7/28:

rank team w d l pf pp pct gd
1 USA 13.8 2.9 2.3 44.2 57. 0.776 1.82
2 MEX 13.1 3.8 2.1 43.0 57. 0.754 1.47
3 PAN 11.5 4.4 3.1 38.9 57. 0.683 1.06
4 CRC 11.2 4.8 3.1 38.2 57. 0.670 0.94
5 HON 9.5 5.1 4.4 33.7 57. 0.591 0.59
6 JAM 8.2 5.8 5.0 30.5 57. 0.536 0.33
7 GLP 8.3 4.6 6.1 29.4 57. 0.517 0.28
8 TRI 7.4 5.1 6.5 27.3 57. 0.478 0.11
9 MQE 6.6 5.4 7.0 25.2 57. 0.442 -0.03
10 DOM 6.9 3.8 8.2 24.6 57. 0.432 -0.22
11 GUA 6.5 4.9 7.6 24.3 57. 0.427 -0.15
12 SLV 6.1 4.9 8.0 23.3 57. 0.409 -0.22
13 CAN 5.5 6.1 7.4 22.7 57. 0.397 -0.23
14 HAI 4.9 6.0 8.2 20.6 57. 0.361 -0.36
15 GYF 5.3 3.7 10.0 19.6 57. 0.344 -0.70
16 ATG 4.7 4.4 9.9 18.5 57. 0.324 -0.73
17 CUB 4.2 4.5 10.2 17.3 57. 0.303 -0.74
18 NCA 4.0 3.9 11.1 16.0 57. 0.280 -1.00
19 GUY 3.2 3.8 11.9 13.5 57. 0.237 -1.20
20 BLZ 2.6 5.0 11.5 12.7 57. 0.223 -1.02

tab5g · Aug 20, 2013

NoSix said: ↑

Noone is claiming that the players in your contest have only a 1.4% chance of having 11 or more correct results. Imagine that prior to start of the Gold Cup, I sat down with a list of the 18 group stage fixtures, and for each fixture I rolled a die, and predicted the first team to win if the result on the die was 1 or 2, predicted the second team to win if the result on the die was 3 or 4, and predicted a draw if the result on the die was a 5 or 6. Choosing results randomly in that manner, there is a 100-1.4=98.6% probability that I would pick 10 or fewer match results correctly. The fact that six of your nine contestants did better than that indicates that they are skillful at picking match results, not just lucky. By the same token, the fact that my algorithm did better than that also indicates that it is "skillful" at picking match results. (Note that had my algorithm entered your contest, it would be kicking your butt right now - while you have one more correct result (12 vs 11), my algorithm has picked 5 exact scores correctly compared to your 0.)
Click to expand...

Do 1/3 of all soccer (or Gold Cup) matches end in draws?

Wouldn't you need a die with many more sides than 6 to try to reasonably outline (project) the "results" of some series of matches based solely on chance?

"Chance" should or would be better defined if the expectation weren't that a draw is equally likely to one team winning or to the other team winning a singular game/event.

Not that I do not see the utility of your algorithm, but I do think there is real weight to the argument that the "baseline" comparison scale you have offered in this thread as "chance" is flawed somewhat (though perhaps not significantly).

NoSix · Aug 20, 2013

tab5g said: ↑

Do 1/3 of all soccer (or Gold Cup) matches end in draws?
Click to expand...

I already addressed this question (see post #11). I'm not assuming the probability of a draw is 1/3.

tab5g · Aug 21, 2013

NoSix said: ↑

I already addressed this question (see post #11). I'm not assuming the probability of a draw is 1/3.
Click to expand...

You certainly look to be assuming that 1/3 probability for a draw when you write the following:

NoSix said: ↑

Imagine that prior to start of the Gold Cup, I sat down with a list of the 18 group stage fixtures, and for each fixture I rolled a die, and predicted the first team to win if the result on the die was 1 or 2, predicted the second team to win if the result on the die was 3 or 4, and predicted a draw if the result on the die was a 5 or 6. Choosing results randomly in that manner, there is a 100-1.4=98.6% probability that I would pick 10 or fewer match results correctly.
Click to expand...

I was just suggesting that a 10 or 12 (for examples) sided die (on which exactly 2 sides still represented "draw") would be more accurate for randomly selecting soccer games than would a six sided die.

NoSix · Aug 21, 2013

tab5g said: ↑

You certainly look to be assuming that 1/3 probability for a draw when you write the following:

I was just suggesting that a 10 or 12 (for examples) sided die (on which exactly 2 sides still represented "draw") would be more accurate for randomly selecting soccer games than would a six sided die.
Click to expand...

The point is, there are 3 outcomes, home win, away win, and draw, the probabilities for which always add to 1. Therefore, if you randomly select amongst the three, you will predict the correct outcome 1/3 of the time, irrespective of the distribution of probabilities. Assume, for example, the probability of a home win is always 64%, the probability of an away win is 24%, and the probability of a draw is 12%. If you select randomly from those possible outcomes, the probabilility of correctly predicting the outcome is 1/3*64%+1/3*24%+1/3*12%=1/3*100%=33.33%.

tab5g · Aug 21, 2013

NoSix said: ↑

The point is, there are 3 outcomes, home win, away win, and draw, the probabilities for which always add to 1. Therefore, if you randomly select amongst the three, you will predict the correct outcome 1/3 of the time, irrespective of the distribution of probabilities. Assume, for example, the probability of a home win is always 64%, the probability of an away win is 24%, and the probability of a draw is 12%. If you select randomly from those possible outcomes, the probabilility of correctly predicting the outcome is 1/3*64%+1/3*24%+1/3*12%=1/3*100%=33.33%.
Click to expand...

Is that equation accurate?

Why use the 1/3 multiplier on the left hand side? (The "selected randomly for the possible outcomes" would not be at the consistent 1/3 level for each of W, L and D.)

Should not the equation be (.64*.64)+(.24*.24)+(.12*.12)=48.16% to account for the random selection of more of the higher probability outcomes (and not just a 1/3 chance of each outcome W, L or D)?

In the Gold Cup scenario -- which features a lot of neutral site matches especially -- if the baseline assumptions were using the 64% "stronger/high-ranked" team wins, 24% "weaker/lower-seeded" team wins and 12% draw probabilities, would not the odds of picking 11 or more games correctly out of 18 improve above the 1/3, 1/3 and 1/3 assumptions you referenced earlier in this thread?

The assumptions (or the data put into the model) are important.

Your baseline (for comparison) was very limited and faulty, and could easily be improved (wrt to predicting soccer match outcomes).

Using a six-sided die and assigning 2 sides (1/3 probability) to draw is not a great approach, when likely using a 16-sided die (with 1-10 as Team1 Win, 11-14 as Team1 Lose, and 15-16 as Draw) would be the better assumption and approach (to more accurately account for the likely/historical number of draws in a competition like the Gold Cup). (And yes "Team1" would have be be assigned/selected based on some basis -- either HFA for USA and ranking, FIFA, ELO or otherwise for neutral venue matches.)

Your algorithm and model are very solid. In large part because you rely on a huge set of data -- 5 years or past results -- put into the model. If you only entered 1 or 2 years would your model be as valid? Or would using 8 or 10 years of past data make it any more or less valid?

Your baseline comparison is very weak, in that it relies on a 6-sided die and the assumption that 1/3 of matches will be a draw.

NoSix · Aug 21, 2013

tab5g said: ↑

Is that equation accurate?

Why use the 1/3 multiplier on the left hand side? (The "selected randomly for the possible outcomes" would not be at the consistent 1/3 level for each of W, L and D.)

Should not the equation be (.64*.64)+(.24*.24)+(.12*.12)=48.16% to account for the random selection of more of the higher probability outcomes (and not just a 1/3 chance of each outcome W, L or D)?

In the Gold Cup scenario -- which features a lot of neutral site matches especially -- if the baseline assumptions were using the 64% "stronger/high-ranked" team wins, 24% "weaker/lower-seeded" team wins and 12% draw probabilities, would not the odds of picking 11 or more games correctly out of 18 improve above the 1/3, 1/3 and 1/3 assumptions you referenced earlier in this thread?

The assumptions (or the data put into the model) are important.

Your baseline (for comparison) was very limited and faulty, and could easily be improved (wrt to predicting soccer match outcomes).

Using a six-sided die and assigning 2 sides (1/3 probability) to draw is not a great approach, when likely using a 16-sided die (with 1-10 as Team1 Win, 11-14 as Team1 Lose, and 15-16 as Draw) would be the better assumption and approach (to more accurately account for the likely/historical number of draws in a competition like the Gold Cup). (And yes "Team1" would have be be assigned/selected based on some basis -- either HFA for USA and ranking, FIFA, ELO or otherwise for neutral venue matches.)

Your algorithm and model are very solid. In large part because you rely on a huge set of data -- 5 years or past results -- put into the model. If you only entered 1 or 2 years would your model be as valid? Or would using 8 or 10 years of past data make it any more or less valid?

Your baseline comparison is very weak, in that it relies on a 6-sided die and the assumption that 1/3 of matches will be a draw.
Click to expand...

Again, for each match by definition (pW+pD+pL)=1, so if you choose one of the 3 outcomes randomly, in the long run, you will choose 1/3 W's , 1/3 D's, and 1/3 L's, and your percentage of correct choices will be 1/3*(pW+pD+pL)=1/3*1=1/3. That is a fact.

Imagine you had a loaded coin, with pH=0.75 and pT=0.25. If you flip the coin 10000 times, and choose one of the two outcomes randomly, then half the time you will choose heads, half the time you will choose tails, and your probability of choosing correctly is 1/2*0.75+1/2*0.25=1/2*1=1/2

tab5g · Aug 21, 2013

NoSix said: ↑

Again, for each match by definition (pW+pD+pL)=1, so if you choose one of the 3 outcomes randomly, in the long run, you will choose 1/3 W's , 1/3 D's, and 1/3 L's, and your percentage of correct choices will be 1/3*(pW+pD+pL)=1/3*1=1/3. That is a fact.
Click to expand...

"If you choose one of 3 outcomes randomly" is the point of contention.

How randomly are you choosing those outcomes?

Randomly (via 16-sided die for example) choosing 10/16 for a W, 4/16 for a L and 2/16 for a Draw is a better method of random selection (for soccer match outcomes), than is the pure 1/3 for each possible outcome (W, L, D).

NoSix · Aug 21, 2013

tab5g said: ↑

"If you choose one of 3 outcomes randomly" is the point of contention.

How randomly are you choosing those outcomes.

Randomly (via 16-sided die for example) choosing 10/16 for a W, 4/16 for a L and 2/16 for a Draw is a better method of random selection (for soccer match outcomes), than is the pure 1/3 for each possible outcome (W, L, D).
Click to expand...

It is not better, it is exactly the same, because 1/3*10/16+1/3*4/16+1/3*2/16=1/3*16/16=1/3!!

Take a look back at my edit to post 48.