View Full Version : Goals/90, Assists/90 and Points/90
JohnR
30 Dec 2003, 02:39 PM
Originally posted by Andy_B
This whole PK controversy could be easily avoided if FIFA would simply change the rules to be the person who drew the penalty has to take the penalty unless injured to the point of being subbed.
Andy
Personally, I like that proposal but I can't quite see the logic for it, except for to settle arguments such as this one.
Anyway, to register my vote, I would heartily recommend removing PKs from the studies. If Hristo Stoichkov scores once every 3 games by pushing his teammates out of the way so that he can take the PKs, he scores at a .33 rate per 90 minutes (actually more, since he probably won't play the whole 30 minutes per game). But that doesn't make him an effective offensive player, because if he didn't exist some other guy would take and make those same PKs, plus would probably contribute something useful to the offense in the run of play.
NoSix
30 Dec 2003, 07:15 PM
Originally posted by JohnR
Anyway, to register my vote, I would heartily recommend removing PKs from the studies. If Hristo Stoichkov scores once every 3 games by pushing his teammates out of the way so that he can take the PKs, he scores at a .33 rate per 90 minutes (actually more, since he probably won't play the whole 30 minutes per game). But that doesn't make him an effective offensive player, because if he didn't exist some other guy would take and make those same PKs, plus would probably contribute something useful to the offense in the run of play.
You're certainly entitled to your opinion, but someone for whom Hristo is their all time favorite player could use similar reasoning to argue that PK's should be included.
In 901 minutes, Stoichkov scored at a rate of 0.499 goals per 90 minutes. Consider the alternatives:
Player Min G/90
Stewart 1923 0.047
Cerritos 1767 0.204
Curtis 1509 0.119
Quaranta 738 0.122
Eskandarian 728 0.371
Hudson chose Stewart and Cerritos, and got fired. Who knows, maybe if he had given Stoichkov and Eskandarian Stewart's and Cerritos's minutes, things would have been different?
ChrisE
23 Jan 2004, 03:48 AM
Originally posted by voros
Great stuff Chris, the next step is to "normalize" the stats to compare across years. Why:
A. MLS in 2003 decided to get a bit more picky in the way it hands out assists, so assists dropped this year.
B. Scoring was a good deal higher in the 1996-1999 period of MLS before the league started to develop decent defenses league wide.
If you do this, it makes Twellman's performances thus far look pretty damn stunning.
Well, I'm not sure if I agree with this suggestion, and I'm afraid the numbers I came out with may not be a whole lot more telling than the ones I've already used, but I tried this anyway.
If anybody has criticisms about this, or even better suggestions as to how if might be improved, I'd be happy to hear them.
To start off with some numbers:
Year - G - - - A - - Minutes
1996 - 485 - 505 - 315981
1997 - 479 - 620 - 316037
1998 - 690 - 942 - 404066
1999 - 514 - 738 - 378670
2000 - 555 - 769 - 387215
2001 - 473 - 697 - 318993
2002 - 381 - 565 - 282540
2003 - 380 - 418 - 306758
Which gives totals of 3957 Non-PK goals, 5254 Assists, and 2710260 Minutes played (5 years). So we get an average goals/90 of .1314, and an average assists/goal of 1.328.
(I hope the PK debate doesn't arise again, I think we're going to have to agree to disagree for the time being, but right here I think it's particularly important to use non-PK goals because assists aren't handed out for PK's.)
Now, for individual seasons, the numbers are:
1996 0.138 1.041
1997 0.136 1.294
1998 0.154 1.365
1999 0.122 1.436
2000 0.129 1.386
2001 0.133 1.474
2002 0.121 1.483
2003 0.111 1.100
And so the ratios of individual season G/90 and A/G to all time G/90 and A/G look like (respectively)
1996 1.051 0.784
1997 1.038 0.975
1998 1.170 1.028
1999 0.930 1.081
2000 0.982 1.044
2001 1.016 1.110
2002 0.924 1.117
2003 0.848 0.828
So, to normalize everything, as best as my underfed, statistically untrained brain knows how, I divided individual season goal and assist numbers by those two numbers (again, respectively). So, if Chris Henderson had 10 non-PK goals and 7 assists in 2002, I credited him with (10/.924=) 10.823 goals and (7/1.117=) 6.267 assists.
So the top 15 in career (super-)adjusted G/90, minimum 2000 minutes, are:
John, Stern 0.821
Twellman, Taylor 0.821
Buddle, Edson 0.654
Diallo, Mamadou 0.633
Shannon, Musa 0.631
Ruiz, Carlos 0.615
Graziani, Ariel 0.542
Marino, Pete 0.527
Serna, Diego 0.520
Savarese, Giovanni 0.515
De Avila, Antonio 0.504
Razov, Ante 0.504
Lassiter, Roy 0.503
Pineda Chacon, Alex 0.498
Diaz Arce, Raul 0.481
And assists/90 are:
Valderrama, Carlos 0.657
Etcheverry, Marco 0.573
Paz, Adrian 0.535
Preki 0.526
Limpar, Anders 0.516
Williams, Andy 0.514
Stoitchkov, Hristo 0.513
Wynalda, Eric 0.502
Bishop, Ian 0.498
Guevara, Amado 0.480
Martinez, Antonio 0.465
Hermosillo, Carlos 0.463
Warzycha, Robert 0.461
Jara, Guillermo 0.454
Machon, Martin 0.453
and g+a/90:
Twellman, Taylor 1.034
John, Stern 0.957
Buddle, Edson 0.951
Stoitchkov, Hristo 0.918
De Avila, Antonio 0.918
Hermosillo, Carlos 0.903
Serna, Diego 0.890
Shannon, Musa 0.840
Wynalda, Eric 0.837
Cunningham, Jeff 0.836
Diallo, Mamadou 0.830
Pineda Chacon, Alex 0.792
Moore, Joe-Max 0.788
Donovan, Landon 0.781
Razov, Ante 0.781
and the 12 greatest goal-scoring [rate] seasons of MLS history are, minimum 600 minutes:
Diallo, Mamadou 2000 0.953
John, Stern 1998 0.887
Ruiz, Carlos 2002 0.861
Twellman, Taylor 2003 0.841
Pineda Chacon, Alex 2001 0.819
Harris, Wolde 1998 0.815
Molnar, Miklos 2000 0.813
Twellman, Taylor 2002 0.806
Lassiter, Roy 1996 0.796
De Avila, Antonio 1996 0.790
John, Stern 1999 0.760
Cunningham, Jeff 2002 0.745
and assists:
Wynalda, Eric 1997 1.100
Williams, Andy 1998 1.076
Etcheverry, Marco 1996 1.017
Valderrama, Carlos 1997 1.008
Valderrama, Carlos 1996 0.948
Valderrama, Carlos 2000 0.774
Dougherty, Paul 1998 0.750
Etcheverry, Marco 1999 0.749
Warzycha, Robert 1996 0.723
Martinez, Antonio 2001 0.722
Machon, Martin 2000 0.707
Moore, Joe-Max 1998 0.704
Now, Wynalda's 1997 and Williams's 1998 season only lasted about 900 minutes, and Dougherty's was just over a thousand. Nevertheless, it looks like the dedicated playmaker is, for whatever reason disappearing from the league. Maybe it's that the league hasn't been able to find replacements for Etcheverry and Valderamma, maybe it's that a player simply can't sit back and distribute anymore, I don't know (to be fair, 15 on the list was Preki 2003, 22 was Cancela).
On the subject of the career goals/90, Twellman and Stern John stand atop the field like colossuses. Twellman's dominance, combined the fact that Noonan would have slotted in right behind Buddle (.644) with 350 more minutes, makes it hard to wonder whether his scoring isn't a product of Nicol's system. I really look forward to seeing how they respond to playing together next year.
Buddle's numbers are almost as impressive as Twellman's, though, and he's obviously the superior athlete. I think there's no doubt he'll be a fixture on the Nats for years to come, once he gets the opportunity.
That's it for now.
beineke
23 Jan 2004, 11:03 AM
Originally posted by ChrisE
Wynalda, Eric 1997 1.100
Williams, Andy 1998 1.076
Etcheverry, Marco 1996 1.017
Valderrama, Carlos 1997 1.008
Valderrama, Carlos 1996 0.948
Valderrama, Carlos 2000 0.774
Dougherty, Paul 1998 0.750
Etcheverry, Marco 1999 0.749
Warzycha, Robert 1996 0.723
Martinez, Antonio 2001 0.722
Machon, Martin 2000 0.707
Moore, Joe-Max 1998 0.704
Now, Wynalda's 1997 and Williams's 1998 season only lasted about 900 minutes, and Dougherty's was just over a thousand. Nevertheless, it looks like the dedicated playmaker is, for whatever reason disappearing from the league. [/B]
Thanks for prepping all this stuff, Chris. It's a fascinating read.
How hard would it be to threshold this list at, say, 2000 minutes? I'm wondering if the big assist seasons were mostly in 96 and 97, back in the league's (effectively) pro-am days.
mpruitt
23 Jan 2004, 11:50 AM
Fascinating stuff Chris. Really, excellent job. One thing that strikes out at me is actual numbers of Carlos Valderamma's long time, and wel desirved reputation of being a play makers. Very impressive.
I foudn your comment about Steve Nicol's system to be interesting I'm not sure if I could think of any particular reason why to justify what in his system might be the key. Maybe it's just Nicols scouting ability? Perhaps a more direct style of play? I'd really have no idea.
ChrisE
23 Jan 2004, 02:27 PM
Originally posted by beineke
Thanks for prepping all this stuff, Chris. It's a fascinating read.
How hard would it be to threshold this list at, say, 2000 minutes? I'm wondering if the big assist seasons were mostly in 96 and 97, back in the league's (effectively) pro-am days.
Not hard at all (if anyone wants the file I'm using, or a full list, just ask), but I'm afraid it may bias things a little, and would exclude a lot of good candidates. Of the top 10 field players (880 total)in terms of minutes for all clubs in MLS history, only slightly more than half, 491, have played more than 2000 minutes. If you lower that bar to 1500 minutes, you include 787 players. Furthermore, you're going to be biasing this towards the early years simply because teams played more minutes previously. From 1996-1999, 267 players topped 2000 minutes, while from 2000-2003, only 224 did.
So, if I were to use 2000, you'd lose Valderrama's 1997 season (1740 minutes, 19 assists), and Etcheverry's 1999 season (1890 minutes, 17 assists). Since I don't think this is really what you want, I'll use 1500 minutes, but I'll do 2000 (or you can) if you really want.
Top 20:
Etcheverry, Marco 1.017
Valderrama, Carlos 1.008
Valderrama, Carlos 0.948
Valderrama, Carlos 0.774
Etcheverry, Marco 0.749
Warzycha, Robert 0.723
Machon, Martin 0.707
Moore, Joe-Max 0.704
Preki 0.690
Hermosillo, Carlos 0.687
Valderrama, Carlos 0.681
Williams, Andy 0.681
Preki 0.674
Lisi, Mark 0.653
Etcheverry, Marco 0.642
Wynalda, Eric 0.641
Jones, Cobi 0.640
Paz, Adrian 0.637
Mathis, Clint 0.624
Ralston, Steve 0.622
Of the top 100, using 1500 minutes as the cutoff point, they broke down by year like this:
1996 16
1997 17
1998 22
1999 12
2000 12
2001 11
2002 6
2003 4
For reference, number 25 was .596 A/90, 50 was .518, 100 was .394.
beineke
23 Jan 2004, 03:06 PM
Thanks for passing along the additional numbers. Do you have the years handy for those individual seasons?
Also, how does 70% of team minutes seem as a standard? I realize that's tough to meet, but at the opposite end of the spectrum, there are guys like Ralston and Chung who are consistently over 80%. If a player doesn't make the cut, his production could be normalized as if he had played exactly 70%. With a 1500 minute cutoff, there are guys on the list who barely played half a season. IMO, that leaves too much room for random fluctuations.
[I'm not suggesting that you revise this data; rather, I'm wondering if this could be a standard way of looking at things.]
ChrisE
23 Jan 2004, 08:26 PM
Originally posted by beineke
Thanks for passing along the additional numbers. Do you have the years handy for those individual seasons?
oops:
Etcheverry, Marco 1996
Valderrama, Carlos 1997
Valderrama, Carlos 1996
Valderrama, Carlos 2000
Etcheverry, Marco 1999
Warzycha, Robert 1996
Machon, Martin 2000
Moore, Joe-Max 1998
Preki 2003
Hermosillo, Carlos 1998
Valderrama, Carlos 1998
Williams, Andy 2002
Preki 1997
Lisi, Mark 2003
Etcheverry, Marco 1998
Wynalda, Eric 1996
Jones, Cobi 2002
Paz, Adrian 1996
Mathis, Clint 2000
Ralston, Steve 2002
Also, how does 70% of team minutes seem as a standard? I realize that's tough to meet, but at the opposite end of the spectrum, there are guys like Ralston and Chung who are consistently over 80%.
If a player doesn't make the cut, his production could be normalized as if he had played exactly 70%.
I still don't like this. For the Burn this year, exactly one player, Chad Deering, would have met the minutes played qualifications (about 1960). Players who were significant offensive contributors, but who didn't reach 70%, included:
Brad Davis, Taylor Twellman, Landon Donovan, Pat Noonan, Edson Buddle, Mark Lisi, Mike Magee, Jason Kreis (Beasley made it by 9 minutes, Ralph by 25).
I don't see any reason to ignore or discount Twellman or Noonan's scoring rate, simply because they didn't reach a certain plateau. I guess the only lists I've given here are top-scorers, but I think it's equally interesting that Kreis, the league's #2 all time goal scorer, is only #33 all-time in goal-scoring rate (just behind Rodrigo Faria and Junior Agogo).
With a 1500 minute cutoff, there are guys on the list who barely played half a season. IMO, that leaves too much room for random fluctuations.
Again, I'm not sure what you are trying to do here. Is there going to be a significant difference in random fluctuations between 1500 and 1960 minutes? Even if there is, is that much of a problem?
ChrisE
23 Jan 2004, 08:34 PM
Because all I've really done with the lists is rank people according to who is "best," I wanted to try something a little different. So here's a list of the league's 10 all-time leaders in goals-scored (including PK's), along with their (non-PK) goal-scoring rates:
Lassiter, Roy 106 0.503
Kreis, Jason 86 0.394
Diaz Arce, Raul 82 0.481
Preki Radoslav. 77 0.252
Razov, Ante 73 0.504
Moreno, Jaime 71 0.373
Cerritos, Ronald 63 0.384
Mcbride, Brian 62 0.332
Jones, Cobi 59 0.291
Hurtado, Eduardo 58 0.393
And assists:
Valderrama, Carlos 114 0.657
Preki Radoslav. 110 0.526
Etcheverry, Marco 101 0.573
Ralston, Steve 88 0.379
Cienfuegos, M. 80 0.408
Jones, Cobi 71 0.360
Henderson, Chris 66 0.317
Chung, Mark 65 0.296
Kreis, Jason 63 0.298
Warzycha, Robert 61 0.461
So, interestingly, no guys in the top 10 of the raw goals list make the top 10 of the adjusted goal-scoring rate list. The closest are Razov, Lassiter, and Diaz Arce coming in at 12, 13, and 15 respectively. Meanwhile, Kreis is 33, Preki is 66.
Meanwhile, Valderrama is 1st on both lists, Preki is 4th and 2nd, Etcheverry is 3rd and 2nd. Kreis, again, comes in at a terrible 62, with Chung right behind at 66.
Although I like Jason Kreis, I think it's pretty clear that he's not as much an exceptional player (his Nats history would support this) as a durable player who's racked up a lot of points. And there's nothing wrong with that; nevertheless, I think the Burn need to realize that even though they've got MLS's number 2 all-time scorer, they're not going to get anywhere near a cup with him as the focal point of the offense.
voros
24 Jan 2004, 03:37 AM
Originally posted by ChrisE
John, Stern 0.821
Twellman, Taylor 0.821
Buddle, Edson 0.654
Diallo, Mamadou 0.633
Shannon, Musa 0.631
Ruiz, Carlos 0.615
Graziani, Ariel 0.542
Marino, Pete 0.527
Serna, Diego 0.520
Savarese, Giovanni 0.515
De Avila, Antonio 0.504
Razov, Ante 0.504
Lassiter, Roy 0.503
Pineda Chacon, Alex 0.498
Diaz Arce, Raul 0.481
and g+a/90:
Twellman, Taylor 1.034
John, Stern 0.957
Buddle, Edson 0.951
Stoitchkov, Hristo 0.918
De Avila, Antonio 0.918
Hermosillo, Carlos 0.903
Serna, Diego 0.890
Shannon, Musa 0.840
Wynalda, Eric 0.837
Cunningham, Jeff 0.836
Diallo, Mamadou 0.830
Pineda Chacon, Alex 0.792
Moore, Joe-Max 0.788
Donovan, Landon 0.781
Razov, Ante 0.781
and the 12 greatest goal-scoring [rate] seasons of MLS history are, minimum 600 minutes:
Diallo, Mamadou 2000 0.953
John, Stern 1998 0.887
Ruiz, Carlos 2002 0.861
Twellman, Taylor 2003 0.841
Pineda Chacon, Alex 2001 0.819
Harris, Wolde 1998 0.815
Molnar, Miklos 2000 0.813
Twellman, Taylor 2002 0.806
Lassiter, Roy 1996 0.796
De Avila, Antonio 1996 0.790
John, Stern 1999 0.760
Cunningham, Jeff 2002 0.745
On the subject of the career goals/90, Twellman and Stern John stand atop the field like colossuses. Twellman's dominance, combined the fact that Noonan would have slotted in right behind Buddle (.644) with 350 more minutes, makes it hard to wonder whether his scoring isn't a product of Nicol's system. I really look forward to seeing how they respond to playing together next year.
Two things:
We need to remember that Twellman's leaving the Revolution lineup for the most part coincided with the signing of Jose Cancela who really played well for the Revs. So it could very well be that Noonan and Brown had an advantage that Twellman didn't have. Twellman's last game he played in, Cancela played and Twellman scored twice with Cancela starting both sequences. Any one have any numbers on how many minutes Twellman played with Cancela on the field?
The other thing is (as Chris touched on) those comparing Twellman and Jason Kreis, as I have argued before, are not recognizing the sheer enormity of what Taylor Twellman has done as a 22 and 23 year old American in this league. He and Stern John sit atop in Goals. He and Stern John are the only two players to appear twice on the last list. And Twellman sits alone atop the goals and assists list. Interesting that he has fairly good assist numbers for a striker, considering it's a part of his game that's considered weak. Absolutely dominating statistical numbers.
That's _too_ much to write him out of the National Team picture as a "low skill poacher." Maybe everyone is right and he'll never succeed internationally, and maybe a lot of Twellman's success is due to "right place, right time" sort of things...
...but to deny a soon to be 24 year old with those kind of credentials a full opportunity to see what he's got I think is the bridge too far.
beineke
24 Jan 2004, 12:31 PM
Originally posted by ChrisE
Again, I'm not sure what you are trying to do here. Is there going to be a significant difference in random fluctuations between 1500 and 1960 minutes? Even if there is, is that much of a problem?
Several things:
(1) With a 2000 minute threshold, you pointed out a bias towards the earlier seasons when more minutes were played. That bias still exists with 1500 minutes, and may be even worse. So, can we at least agree that %-of-minutes is a better cut-off rule than #-of-minutes?
(2) If we want to make inferences about set-up guys, then it's awkward to find Carlos Hermosillo with an "exceptionally good" season (when he was only on the field 53% of the time). Given that he had three assists his other year in MLS, I think it's safe to conclude that the extremely high number was a fluke. And maybe 1500 vs. 2000 isn't a huge difference, but 1500 vs. 2700 is. Let's not shortchange the Ralstons and Valderramas who produce over an entire season, not half of one.
(3) A big problem with per-minute stats is that subtstitute minutes aren't comparable to starter minutes. A lot more goals are scored late in a game, and substitutes benefit from that. So we need to adjust for a guy like Jeff Cunningham who plays 1700 minutes, many as a sub. (Incidentally, this is the reason why Pete Marino is your #8 all-time goalscorer.)
(4) The below-threshold players you listed all missed the cut because they were out for large portions of the 2003 season, either to injuries or national team duty. As a result, it's harder to measure their true productivity. Anyway, if you change Donovan's minutes played from 1882 to 1950, his per-minute numbers drop by what, 3%?
ChrisE
24 Jan 2004, 06:49 PM
Originally posted by beineke
Several things:
(1) With a 2000 minute threshold, you pointed out a bias towards the earlier seasons when more minutes were played. That bias still exists with 1500 minutes, and may be even worse. So, can we at least agree that %-of-minutes is a better cut-off rule than #-of-minutes?
I think that's pretty clear, yeah, it was just a more difficult criterion to use.
(2) If we want to make inferences about set-up guys, then it's awkward to find Carlos Hermosillo with an "exceptionally good" season (when he was only on the field 53% of the time). Given that he had three assists his other year in MLS, I think it's safe to conclude that the extremely high number was a fluke.
That's not really fair to Hermosillo (who I, admittedly, never saw play). The 3 assist season (1280 minutes) he played even less than his 12 assist season (1530 minutes). Maybe the 12 were a fluke, maybe the 3 were a fluke, I don't see why one is priviledged over the other.
And maybe 1500 vs. 2000 isn't a huge difference, but 1500 vs. 2700 is. Let's not shortchange the Ralstons and Valderramas who produce over an entire season, not half of one.
Obviously, they're not being shortchanged; the total assists statistic still exists. I'd say that a/90 is useful as a complement to raw numbers like that.
(3) A big problem with per-minute stats is that subtstitute minutes aren't comparable to starter minutes. A lot more goals are scored late in a game, and substitutes benefit from that. So we need to adjust for a guy like Jeff Cunningham who plays 1700 minutes, many as a sub. (Incidentally, this is the reason why Pete Marino is your #8 all-time goalscorer.)
Good point. Though, for the record, Cunningham played (about) 1191 minutes as a starter this year, vs. 213 as a sub. Of course, that may be why he didn't score as much this year (3 goals as a starter vs. 2 as a sub).
(4) The below-threshold players you listed all missed the cut because they were out for large portions of the 2003 season, either to injuries or national team duty. As a result, it's harder to measure their true productivity. Anyway, if you change Donovan's minutes played from 1882 to 1950, his per-minute numbers drop by what, 3%?
Their true productivity is pretty clear. Brad Davis scored 4 non-pk goals and 5 assists. Donovan had 11 and 6. There's no reason to think that, had Davis not been hurt, or had Donovan not been away for Nats duty, their goals/90 or assists/90 would have been any different.
I think the statistic becomes a lot more useful if you mess with it anymore than I already have, and start saying it's goals per 90 if a player had played some number of minutes that he didn't.
(Admittedly, Donovan's numbers aren't affected much but 1. other player's are and 2. what reason to change them at all?)
beineke
24 Jan 2004, 07:37 PM
Originally posted by ChrisE
That's not really fair to Hermosillo (who I, admittedly, never saw play). The 3 assist season (1280 minutes) he played even less than his 12 assist season (1530 minutes). Maybe the 12 were a fluke, maybe the 3 were a fluke, I don't see why one is priviledged over the other.
Let's put it another way ... Hermosillo played effectively one full season and got 15 assists. Ralston has had single seasons of 19, 18, and 17 assists.
By your rating system, Hermosillo in 1998 had a much better "season" than Ralston ever did. But if you divide Ralston's career into 1500 minute chunks, it's quite likely that you'd find that he had stretches where he was at least as good as Hermosillo.
what reason to change [the numbers] at all?
Same reason that baseball requires 502 plate appearances for a 162-game season, which in most situations is over 70% of the time ... if you want to evaluate the best season in a per-appearance way, you want to restrict attention to full seasons.
IIRC, players with under 502 PA's can still win the batting title if their adjusted numbers are good enough.
voros
24 Jan 2004, 08:36 PM
From baseball, one of the better ways to decide this issue, I thought, was to use standard deviations as the measure. Take a mythical "average" rate and then checkt to see how many standard deviations the player's rate is from "average."
The problem here is that you need a number that's always between 0 and 1 to do this, and so we have to come up with an appropriate opportunity statistic with which to measure goals against. We could use minutes, though I'd probably prefer minutes X 2.
Anyway if n = opportunities, p = average goals per opportunity and q = 1-p, than the standard deviation would be:
(n*p*q)^(1/2)
Find the average by multiplying opportunities by the average (n*p). Then subtract this number from the actual adjusted goals, divide by the standard deviation above and you have your number of standard deviations from the mean. The more there are, the better the season.
As far as the rule in baseball, I don't think it's in effect anymore, but the rule was that if you were short of the plate appearance mark, you could still win the batting title if you would have still had the best batting average if you added one at bat to his total for each plate appearance he was short. The most famour case of this was in 1939 when Don Padgett was awarded the batting title over Hall of Famer, Johnny Mize.
http://www.baseball-reference.com/p/padgedo01.shtml
beineke
25 Jan 2004, 10:54 AM
Originally posted by voros
Anyway if n = opportunities, p = average goals per opportunity and q = 1-p, than the standard deviation would be:
(n*p*q)^(1/2)
I like this model, but why not make $n$ something closer to an actual number of opportunities, say, five per game? I choose five since players do occasionally score five goals in a game, and because a player like Ante Razov actually takes close to five shots per game.
(Incidentally, that would take us closer to a Poisson model for goalscoring.)
voros
25 Jan 2004, 08:42 PM
Originally posted by beineke
I like this model, but why not make $n$ something closer to an actual number of opportunities, say, five per game? I choose five since players do occasionally score five goals in a game, and because a player like Ante Razov actually takes close to five shots per game.
(Incidentally, that would take us closer to a Poisson model for goalscoring.)
Actually, I believe the poisson model is an "infinite number of chances.
Anyway, the reason is if a guy plays 10 minutes a sub, and scores a goal, he'd break the system since he'd be over '1'.
I think 180 could reasonably be considered a number of opportunities as long as we realize that all players convert a very low number of the opportunities presented.
I use 180 by the logic that you can occassionally see box scores where 2 goals were scored in the same minute, but you never see three (unless it's in the 90th minute and they aren't differentiating between the 90th and the 94th). 90 would likely work fine.
beineke
26 Jan 2004, 09:48 AM
Originally posted by voros
Actually, I believe the poisson model is an "infinite number of chances."
Yep, and in fact 180 (or even 5) will give you a pretty close approximation to that, with std deviation $sqrt(np)$, rather than $sqrt(npq)$.
I mis-read your original post ... sorry about that.
NoSix
27 Jan 2004, 01:52 AM
Originally posted by ChrisE
Well, I'm not sure if I agree with this suggestion, and I'm afraid the numbers I came out with may not be a whole lot more telling than the ones I've already used, but I tried this anyway.
If anybody has criticisms about this, or even better suggestions as to how if might be improved, I'd be happy to hear them.
Now, for individual seasons, the numbers are:
1996 0.138 1.041
1997 0.136 1.294
1998 0.154 1.365
1999 0.122 1.436
2000 0.129 1.386
2001 0.133 1.474
2002 0.121 1.483
2003 0.111 1.100
And so the ratios of individual season G/90 and A/G to all time G/90 and A/G look like (respectively)
1996 1.051 0.784
1997 1.038 0.975
1998 1.170 1.028
1999 0.930 1.081
2000 0.982 1.044
2001 1.016 1.110
2002 0.924 1.117
2003 0.848 0.828
So, to normalize everything, as best as my underfed, statistically untrained brain knows how, I divided individual season goal and assist numbers by those two numbers (again, respectively). So, if Chris Henderson had 10 non-PK goals and 7 assists in 2002, I credited him with (10/.924=) 10.823 goals and (7/1.117=) 6.267 assists.
I almost hate to give this feedback, after all the hard work you've put into this, but I agree with your initial hesitation about this suggestion. The problem I see is that the total year-to-year variation in your data is the sum of deterministic factors (such as rule changes) and random variation. As a practical matter it is hard to separate out which is which. When you normalize in this way, you are effectively assuming all year-to-year variability is deterministic. If you don't normalize, you are effectively assuming all year-to-year variability is random. Given how few data points (seasons) you have to work with, I'm guessing an ANOVA test would have a hard time saying that any of these data points are significantly different from the others, so I would opt for the unnormalized data until such time as that is the case.
beineke
27 Jan 2004, 10:10 AM
Real quick analysis, so it'd be nice if someone verifies ...
A regression of goals/min on year does show a significant decrease in scoring over the years (p val 0.046).
A regression of assist/goal on year + year^2 also shows a significant change over the years (p val 0.009 on quadratic term).
ChrisE
27 Jan 2004, 08:19 PM
Actually, at this point, it's just manipulating a couple of variables on a spreadsheet. It took longer to format those posts than it did to do the (very simple) normalizations.
Regardless, I'm glad someone finally said this. I would hope that, just because it looks like someone put a lot of work into something, that wouldn't make them immune to criticism. I have the same doubts about the normalizations as you do.
However, my problem isn't that the variation is caused by a combination of "deterministic factors and random variation." Instead, I think the problem is it's a result of a bunch of deterministic factors. Basically, when you normalize the goals/90 like I did, you're assuming that offense in MLS has remained static for the last 8 years, and defenses have been the only things that were improving. Obviously, I would hope that this isn't the case, and that offenses have been improving as well (so, in fact, Twellman, Buddle etc. deserve more credit than they get); unfortunately, I've got no idea how to separate these two, so I think it's pretty much impossible not to underestimate the more recent players' ability.
Of course, another problem is that Taylor Twellman 2003 and Ante Razov 2003 (or Jason Kreis of 2000, or whoever) played radically different roles on their teams, another kind of systematic variation that there's simply no way to control for. So, while these numbers may be interesting, they are nothing like a hard and fast guide to whether A is a better forward (or playmaker) than B.
I imagine I missed some (or all) of your subtler points about random versus deterministic variability; if you could provide a suggestion about how a model could take into account both I'd like to see it.
Originally posted by NoSix
I almost hate to give this feedback, after all the hard work you've put into this, but I agree with your initial hesitation about this suggestion. The problem I see is that the total year-to-year variation in your data is the sum of deterministic factors (such as rule changes) and random variation. As a practical matter it is hard to separate out which is which. When you normalize in this way, you are effectively assuming all year-to-year variability is deterministic. If you don't normalize, you are effectively assuming all year-to-year variability is random. Given how few data points (seasons) you have to work with, I'm guessing an ANOVA test would have a hard time saying that any of these data points are significantly different from the others, so I would opt for the unnormalized data until such time as that is the case.