Statistical Rankings/Gold Cup Predictions

Discussion in 'CONCACAF' started by NoSix, Jul 5, 2013.

  1. tab5g

    tab5g Member+

    May 17, 2002
    #51 tab5g, Aug 21, 2013
    Last edited: Aug 21, 2013
    You're missing my question.

    You wouldn't be rolling a six-sided die and assigning the 1/3 multiplier to each of the 10/16, the 4/16 and the 2/16 probabilities.

    You'd be rolling a 16-sided die and assigning a 5/8 multiplier to the 10/16 sides, a 1/4 multiplier to the 4/16 sides and a 1/8 multiplier to the 2/16 sides of the die.

    Rolling a 16-sided die means you are going to get sides 1 through 10 at 62.5% of the time (not 1/3 of the time).

    The equation, imo, should be (.625*.625)+(.25*.25)+(.125*.125)=46.875%
     
  2. tab5g

    tab5g Member+

    May 17, 2002
    #52 tab5g, Aug 21, 2013
    Last edited: Aug 21, 2013
    No, with a loaded coin (with pH=0.75 and pT=0.25), I'd choose Heads 75% of the time and choose Tails 25% of the time. (I'd use an 8-sided die and assign sides 1-6 to Heads and assign sides 7-8 to Tails. I'd roll the die prior to each coin toss and then make my call from what the die indicates from each of 10000 rolls ahead of each of the 10000 coin tosses.)

    And my probability of choosing the outcome of the coin toss correctly (or the 10000 coin tosses collectively) would be (.75*.75)+(.25*.25)=62.5%


    (Actually, with a loaded coin, I'd just choose Heads every time, and be right 75% of the time.)

    And with a loaded set of likely soccer results (10/16 for a higher-seeded win, 4/16 for an "upset" and 2/16 for a draw), I'd pick the higher seeded win every time and have a 62.5 percent chance of being right. (And I'd get 11 right out of the 18 group stage matches in the case of the 2013 Gold Cup, no?)

    Why pick "randomly" when the coin is loaded, or when the expected soccer results are loaded toward some likely outcome?

    (Now if the challenge is to correctly pick scorelines across a set of 18 matches, yes, I'd rely on the models of 5 years of total results to generate the most-likely scorelines for any individual GC matchup.)
     
  3. NoSix

    NoSix Member+

    Feb 18, 2002
    Phoenix
    By definition picking randomly means picking from the alternatives with equal probability. If I pick randomly from 2 alternatives, I choose each 1/2 the time; if I pick randomly from 3 alternatives, I choose each 1/3 of the time, etc.

    If you are choosing outcomes based on the probability of occurrence, you are choosing based on the probability of occurrence, not randomly.

    The whole point of comparing results selected by probability of occurrence to results selected at random is to determine how unlikely it is get a certain number of results correctly by random chance.
     
  4. tab5g

    tab5g Member+

    May 17, 2002
    #54 tab5g, Aug 21, 2013
    Last edited: Aug 21, 2013
    But why stop at 3 alternatives?

    Assign 16 alternatives.

    (And randomly pick and assign 10 of those 16 as the Win for the higher-seeded/home team.)

    Is it possible to do both at the same time (i.e. to randomly pick based on probability of occurrence)?

    Is there a mathematical difference between picking "randomly" (10/16 for a win) and picking "truly randomly" (only 1/3 for a win)?

    But "random chance" can be more accurately assigned for a soccer match result (if the relative strengths/ranks of the two opponents is known/accepted) than simply by doing the truly random 1/3 W, 1/3 L, 1/3 D.
     
  5. NoSix

    NoSix Member+

    Feb 18, 2002
    Phoenix
    #55 NoSix, Aug 21, 2013
    Last edited: Aug 21, 2013
    What do you think? You put 3 balls, identical in every way except one is labeled "W", one is labeled, "D", and one is labeled "L", into a bag. If you reach into the bag without looking and pull out a ball, what is the probability you get a W? a D? an L? You have just randomly selected a result. That is the easy part.

    Now comes the tricky part. I have a long (nearly infinitely long) list of match fixtures with, for each fixture, the probability of a home win in column 1, the probability of a draw in column 2, and the probability of an away win (home loss) in column 3. I reach into my bag with 3 identical balls, and randomly select a result, based on the ball I draw. Then I put the ball back in the bag and repeat the process for every fixture on my list. Now, what proportion of matches have I predicted correctly? If I have randomly selected the results, then I will have selected column 1 1/3 of the time, column 2 1/3 of the time, and column 3 1/3 of the time. Intuitively, it seems like the probabilities of winning, drawing, or losing should matter, but in fact they don't.
    If pW=pD=pL=1/3, then 1/3*1/3+1/3*1/3+1/3*1/3=1/9+1/9+1/9=3/9=1/3.
    If pW=4/7, pD=1/7, pL=2/7, then 1/3*4/7+1/3*1/7+1/3*2/7=4/21+2/21+1/21=7/21=1/3.
    The reason the probabilities don't matter is that they are constrained to sum to 1,
    and so 1/3*pW+1/3*pD+1/3*pL=1/3*(pW+pD+pL)=1/3*1=1/3.
     
  6. tab5g

    tab5g Member+

    May 17, 2002
    #56 tab5g, Aug 21, 2013
    Last edited: Aug 21, 2013
    But for a baseline comparison, why just have 3 balls total to pick from? (It makes no sense to give yourself a 1 in 3 chance of picking a Draw, given what you know about the randomness of Draws popping up, or not, within a set of soccer match results.)

    Start with 7 balls total as available for selection. 4 Ws, 2 Ls and 1 Ds.

    Anyone could then "randomly select" from those 7 balls.

    The probabilities are still constrained to sum to 1, but the probability of accurately picking the winner is better, yes?
    (And the probability of more accurately picking the Ls and the Ds is also improved, since you aren't "over-picking" those each 1/3 of the time.)

    If pW=4/7, pD=1/7, pL=2/7, then ~~ and picking from 7 balls and not just 3 ~~

    (1/7*4/7)+(1/7*4/7)+(1/7*4/7)+(1/7*4/7)+(1/7*1/7)+(1/7*2/7)+(1/7*2/7)=16/49+1/49+4/49=21/49=42.86%


    This could also go back to your original 6-sided die example, and the better (or more accurately "random") approach would be to assign 1 2 and 3 to Win, assign 4 and 5 to loss and assign just side six to draw. (As opposed to the 1&2 Win, 3&4 L and 5&6 D scenario as you initially outlined.)
     
  7. NoSix

    NoSix Member+

    Feb 18, 2002
    Phoenix
    You are free to make up any probability game you like, and your calculations for the game you have created are correct, but have you randomly selected amongst the outcomes of W, D, or L? The answer is no - instead you have selected an outcome in proportion to its probability. What are you trying to accomplish? Your original question, which I think you have long forgotten, was about the random variation inherent in predicting results over a 2-match, 10-match, or 25-match span. In order to say something intelligent about random variation, I need to choose the outcomes randomly. If I choose the outcomes according to their probability, then I am introducing variability due to the differences in probability.
     
    tab5g repped this.
  8. NoSix

    NoSix Member+

    Feb 18, 2002
    Phoenix
    My hypothesis is that past match results contain information that enable me to predict soccer matches better than if I had no such information. To predict a soccer match I first calculate the expected goals for each team based on the past results, then use the Poisson distribution to calculate the probabilities corresponding to those expected goal totals. Finally, I choose among the three results (H)W, D, (H)L by selecting the one with the highest probability. If I had no information to base a choice on, then by choosing randomly among the three alternatives I have a probability of 1/3 of choosing correctly. In that case, my random choice is equivalent to flipping a loaded coin for each match that has pH=1/3 (correct prediction) and pT=2/3 (incorrect prediction).

    First consider a sample size of 2 matches.

    By random chance, the probability of 0 correct predictions out of 2 matches is simply 2/3*2/3=4/9, the probability of 1 correct prediction out of 2 matches is 1/3*2/3+2/3*1/3=4/9, and the probability of 2 correct predictions out of 2 matches is 1/3*1/3=1/9 or about 11%. If I require, say, less than 5% probability of predicting n or more matches correctly by random chance in order to accept my hypothesis, then clearly a sample size of 2 is insufficient to make a determination.

    Next consider a sample size of 10 matches.

    Instead of calculating by hand, I will use the binom.dist function in excel to calculate the probabilities. For example binom.dist(0,10,1/3,false)=0.0173 is the probability of predicting exactly 0 matches out of 10 correctly, if the probability of predicting each match correctly is 1/3.

    The results are as follows:

    n prob cum
    0 0.01734 1.00000
    1 0.08671 0.98266
    2 0.19509 0.89595
    3 0.26012 0.70086
    4 0.22761 0.44074
    5 0.13656 0.21313
    6 0.05690 0.07656
    7 0.01626 0.01966
    8 0.00305 0.00340
    9 0.00034 0.00036
    10 0.00002 0.00002

    where cum is the cumulative probability of predicting n or more matches correctly (by random chance). With a sample size of 10, the probability of predicting 7 or more matches correctly by random chance is only 1.966%, so I would accept my hypothesis if my model were able to predict 7 or more matches correctly.

    Similarly, for 25 matches:

    n prob cum
    0 0.00004 1.00000
    1 0.00050 0.99996
    2 0.00297 0.99947
    3 0.01139 0.99650
    4 0.03131 0.98511
    5 0.06575 0.95380
    6 0.10959 0.88805
    7 0.14872 0.77846
    8 0.16732 0.62974
    9 0.15802 0.46242
    10 0.12642 0.30440
    11 0.08619 0.17799
    12 0.05028 0.09179
    13 0.02514 0.04151
    14 0.01077 0.01637
    15 0.00395 0.00560
    16 0.00123 0.00165
    17 0.00033 0.00042
    18 0.00007 0.00009
    19 0.00001 0.00002
    20 2.0.E-06 2.3.E-06
    21 2.4.E-07 2.6.E-07
    22 2.2.E-08 2.3.E-08
    23 1.4.E-09 1.5.E-09
    24 5.9.E-11 6.0.E-11
    25 1.2.E-12 1.2.E-12

    With a sample size of 25, the probability of predicting 13 or more matches correctly by random chance is 4.151%, so I would accept the hypothesis if my model were able to predict 13 or more matches correctly.
     
  9. pichichi2010

    pichichi2010 Member+

    Oct 24, 2010
    In your nets
    Nat'l Team:
    United States
    Y'all are nerds!!!!!!!!!! :p
     

Share This Page