View Full Version : Question about use of Poisson distribution
voros
04 Jan 2004, 09:15 PM
Originally posted by NoSix
Well, if you add up the time it takes to 1) rip your shirt off, 2) make love to the corner flag, 3) jump into the stands, 4) fall out of the stands, 5) get your shirt back on, and 6) get back onto your side of the pitch, my personal opinion as a non-statistician is that you would be safe sticking with 1-minute intervals. ;-)
I like your method. I agree it is a better way of making a result prediction based only on the information in my original post (plus the league averages, obviously). Of course, your method doesn't account for strength of schedule differences,
To do that correctly would require busting out one of those power ratings techniques like KRACH modified to do what you like.
Those are some pretty dense creatures (based on recursions, IE computer making lots of guesses until it's right), and all that's going to effect is the accuracy of your estimates of a team's goal scoring (and preventing) ability. It shouldn't have any bearing on the methods used after that.
Anyway, the above formula can be reversed so that if the quakes averaged .0083 goals per half minute, the league average was .008 and the average Quakes opponent during the year had a rate of .0087 allowed (that's just a random number I picked)...
...this time we designate the quakes rate as 'd', the average opponents rate as 'b' the league average remains 'c', and we do some algebra to solve for 'a' we get:
a = ((b*c*d)-(c*d))/((b*d)-(c*d)-b+(b*c))
or
a = ((.0087*.008*.0083)-(.008*.0083))//((.0087*.0083)-(.008*.0083)-.0087+(.0087*.008)) =
.0076.
What this means is that a team that scores an average of .0083 goals per half minute against a team that allows .0087 per half minute in a league that averages .008 per half minute, will score .0076 against an "average" team.
IOW, you could use that formula to adjust for competition level if you'd like.
mpruitt
04 Jan 2004, 09:41 PM
As one of the people who really pushed to get this forum up, this is a proud moment. I knew there'd be a day when there'd be a discussion in a thread that would have me completely lost. This is just that. Kudos guys.
NoSix
04 Jan 2004, 09:59 PM
Originally posted by maxim-1
As one of the people who really pushed to get this forum up, this is a proud moment. I knew there'd be a day when there'd be a discussion in a thread that would have me completely lost. This is just that. Kudos guys.
Where did you get lost - was it at ripping the shirt off, or falling out of the stands... ;-)
NoSix
04 Jan 2004, 10:07 PM
Originally posted by voros
To do that correctly would require busting out one of those power ratings techniques like KRACH modified to do what you like.
Those are some pretty dense creatures (based on recursions, IE computer making lots of guesses until it's right), and all that's going to effect is the accuracy of your estimates of a team's goal scoring (and preventing) ability. It shouldn't have any bearing on the methods used after that.
Well, as I indicated in my earler post, the match-specific estimator approach accounts for strength of schedule and I was able to implement it in an Excel spreadsheet.
beineke
05 Jan 2004, 10:21 AM
Originally posted by voros
Here's an alternative:
Or in words the numerator is san jose's rate times dallas' rate divided by the league rate. The denominator is the numerator plus one minus san jose's rate times one minus dallas' rate divided by one minus the league rate:
In effect, you're proposing a multiplicative model instead of an additive one. That seems reasonable.
But it's worth pointing out that there is a simple way to plug the league-wide average L into the original additive model. Instead of the average of the team rates [(A + B)/ 2], you can use the sum of the rates minus the league average [A + B - L].
IMO, the additive model is easier to work with, but the multiplicative model is probably a little more realistic.
voros
05 Jan 2004, 10:34 PM
Originally posted by beineke
In effect, you're proposing a multiplicative model instead of an additive one. That seems reasonable.
But it's worth pointing out that there is a simple way to plug the league-wide average L into the original additive model. Instead of the average of the team rates [(A + B)/ 2], you can use the sum of the rates minus the league average [A + B - L].
Correct. The problem is that model breaks down rather sharply at extremes. (It's the model Strat-o-Matic uses for its table top baseball game).
For example a team that scored .003 goals per half-minute came up against a tough defense that only allowed .003 goals per half-minute in a league that averages .008, you get [.003 + .003 - .008] = -.002
And of course you can't score negative goals (though if you could, I'm sure Italy's National Team would attempt to win its games zero to negative one). :)
Serie Zed
05 Jan 2004, 11:37 PM
Originally posted by maxim-1
As one of the people who really pushed to get this forum up, this is a proud moment. I knew there'd be a day when there'd be a discussion in a thread that would have me completely lost. This is just that. Kudos guys.
Amen.
I like this stuff (and can follow it with proper hand-holding), but it's threads like this one that make me grateful I don't have to do it for a living.
Since I enjoy eating and all.
voros
06 Jan 2004, 01:51 AM
Originally posted by Serie Zed
Amen.
I like this stuff (and can follow it with proper hand-holding), but it's threads like this one that make me grateful I don't have to do it for a living.
Since I enjoy eating and all.
If it helps any :) that formula I posted isn't just some random fromula that shot out of my head, guys like Bill James have been using it for years, and all it really is, is a modification (and extrapolation to serve our purposes) of Bayes' Theorem:
http://en.wikipedia.org/wiki/Bayes%27_Theorem
The idea is that the problem described in the first post, is really a conditional probability problem. Arranging the problem in things we know and things we want to find out.
Things we know:
Team A's goals scored per game against (the condition of) an average team.
Team B's goals scored per game against an average team.
Goals scored (and allowed) per game by an average team.
What we want to know:
Team A's goals scored per game against Team B.
For these types of conditional probability problems, Bayes' Theorem is the Bible, Rosetta Stone and Magna Carta all rolled into one. If you don't start from there, you're asking for trouble.
So I'm not just making things up. In fact I had zero to do with the creation of any of this (I think Log5 is a James' creation and from there it was easy to extrapolate to other things and of course Bayes is from the 18th century), so don't blame me if it seems very technical and dense. :)
mpruitt
06 Jan 2004, 12:46 PM
Originally posted by voros
so don't blame me if it seems very technical and dense. :)
No I don't blame you, I'm just covertly trying to suck you into being the Bill James of soccer.
mpruitt
06 Jan 2004, 12:46 PM
Originally posted by voros
so don't blame me if it seems very technical and dense. :)
No I don't blame you, I'm just covertly trying to suck you into being the Bill James of soccer.
Serie Zed
06 Jan 2004, 04:59 PM
Originally posted by voros
For these types of conditional probability problems, Bayes' Theorem is the Bible, Rosetta Stone and Magna Carta all rolled into one. If you don't start from there, you're asking for trouble.
Actually, intuitively I'd have been able to do something close (didn't read the formula carefully enough to see where it veered away from simple(ish) math).
Bible? Rosetta Stone? Magna Carta? I thought you numbers guys were doing well to READ. And there you go -- referencing the Humanties. ;-)
voros
07 Jan 2004, 02:20 AM
Originally posted by Serie Zed
Bible? Rosetta Stone? Magna Carta? I thought you numbers guys were doing well to READ. And there you go -- referencing the Humanties. ;-)
It helps reel in the chicks that don't immediately run screaming from us.
Which is about 2.17364579843% of them.
NoSix
07 Jan 2004, 02:46 AM
Originally posted by voros
It helps reel in the chicks that don't immediately run screaming from us.
Yes, just think of this thread as the BigSoccer equivalent of "Ode on a Grecian Urn"
voros
10 Jan 2004, 01:01 AM
Originally posted by NoSix
Yes, just think of this thread as the BigSoccer equivalent of "Ode on a Grecian Urn"
Ah yes, Keats:
"Beauty is truth, truth-- beauty.
That is all ye know on Earth and all ye need to know."
JohnR
12 Jan 2004, 01:58 PM
Originally posted by Serie Zed
Actually, intuitively I'd have been able to do something close (didn't read the formula carefully enough to see where it veered away from simple(ish) math).
I hear ya. I think the same thing when our Ph.Ds start genuflecting when you say "Baye's Theorem."
But it's a neat trick -- because I periodically reference Baye's Theorem, I'm the only general business person in the company whom the research department respects.
microbrew
12 Jan 2004, 05:08 PM
Originally posted by voros
It helps reel in the chicks that don't immediately run screaming from us.
Which is about 2.17364579843% of them.
Don't you mean 2.7182818284% of them?
Speaking of conditional probablity, usually that tends to blow a fuse in those who are less mathematically inclined. The false positives example, especially.
And I do eat well, but sleeping...
Craig P
13 Jan 2004, 12:49 PM
Originally posted by NoSix
Anyway, back to using the Poisson distribution to predict match results...
I did one semi-neat thing over the weekend, which was to incorporate match-specific predictors which account for strength of schedule differences in the match history of the teams. I got the idea from one of the papers in Microbrew's reading material thread (in this forum). The authors proposed eight different predictors, none of which I really liked, so I made up my own. Basically, the predictor defines the interaction between the two teams, so the algorithm outputs one set (two values, one for each team) of Poisson parameters for each team in the league, home and away versus every other team in the league. The parameters predicted including strength of schedule adjustments are significantly different in some cases from those using the averages of home GF and away GA, and home GA and away GF. It looks like you're doing something related to the Mease ranking that was originally devised for college football. IIRC, the Mease paper proposes a normal distribution, but that's also based on wins rather than goal scoring. Dunno if Mease is your reference... I don't recall what the title of that paper is.
Craig P
13 Jan 2004, 12:52 PM
Originally posted by voros
To do that correctly would require busting out one of those power ratings techniques like KRACH modified to do what you like.There are actually a couple of score-based hockey rating systems. One is called CHODR, the other is called CCHP. At one time, one was additive and the other multiplicative, but I believe CHODR was reformulated at one point and I don't know if they're still fundamentally different.
NoSix
13 Jan 2004, 11:04 PM
Originally posted by Craig P
It looks like you're doing something related to the Mease ranking that was originally devised for college football. IIRC, the Mease paper proposes a normal distribution, but that's also based on wins rather than goal scoring. Dunno if Mease is your reference... I don't recall what the title of that paper is.
I am not familiar with Mease's paper (but I listed the reference I did use on page 2 of this thread if you are interested). The method I am using can very easily be extended to rank teams - you can use the average home and away strength-of-schedule adjusted coefficients for each team to calculate the predicted points each team would win if they played every other team in the ranking home and away once each. In that way each team's ranking has an intuitively appealing interpretation - it is the number of points that a team would be expected to win in the "final table" of a league with a balanced "round-robin" schedule, even if the data used to calculate the coefficients come from a knock-out tournament or league with an unbalanced schedule.
microbrew
22 Jan 2004, 03:24 PM
I've dug out some old links.
"A simulation model for football championships" at
http://www.ub.rug.nl/eldoc/som/a/01A65/01a65.pdf
"[...] a simulation/probability model that identi?es the team that is most likely to win a tournament. The model can also be used to answer other questions like ‘which team had a lucky draw?’
[...]"
In the link below, the author looks at the scoring of goals in the World Cup as a poisson distribution, using a data set of 232 games.
Chu, S. (2003), "Motivating the Poisson Process Using Goals in Soccer." INFORMS Transactions on Education, Vol. 3, No. 2, http://ite.informs.org/Vol3No2/Chu/