Like I posted earlier in another thread if anyone is interested in helping setup a similar type of web site for MLS stats (a single place to get it all) then let me know as I'd love to get involved but have a limited mathematical background in this sort of thing, but do have some web space and would love to get inolved.
When we started the forum that was always the ultimate goal in my mind. The issues in regards to it were web space and design and if we'd reached some kind of critical mass on here in terms of content. It'd certainly be a great thing to have a site like I envision but it'd be a bit of an undertaking. If you have some webspace that can be used maybe a good first step would be having the ability to share our data files in a better way.
There was some talk back a while ago about how it'd be neat to try a hocky-like +/- stat for soccer but it was determined that doesn't really work because of lack of substitutions, changes in formations etc etc. Turns out the guys on this site have done it for basketball. http://www.82games.com/rolandratings0304.htm
I actually do have (almost correct) numbers from 2003, Maxim. The problem, of course, is exactly what you describe. If you just use raw plus/minus stats, you're taking team performance far too much into account. If you try to remove team performance, you oend up punishing players for being on good teams (which makes absolutely no sense). If anybody's got any suggestions I'd love to hear them.
Have you tried comparing teammates against teammates? You could start with that, if you haven't already. Then see who tends to stands the farthest above his teammates. Not sure what the results would be like, though.
No, I hadn't. Here's Colorado, who had a team +/- per 90 of -.16. The first figure is the player's minutes played, the second is their +/- per 90 minutes, the third is the +/- per 90 of the team when the players were not playing. Code: Min +/-per90 absent Trembly, Seth 1112 0.24 -0.43 Spencer, John 2265 0.16 -1.60 Kotschau, R 2333 0.12 -1.64 Beckerman, Kyle 2124 0.08 -0.97 Mastroeni, P 1655 -0.05 -0.32 Fraser, Robin 2335 -0.08 -0.62 Borchers, Nat 2101 -0.13 -0.27 Garlick, Scott 2346 -0.15 -0.21 Henderson, C 2180 -0.17 -0.15 Kingsley, Zach 480 -0.19 -0.16 Crawford, Matt 750 -0.60 0.00 Roberts, Zizi 729 -0.99 0.13 Stewart, Jeff 608 -1.18 0.12 Carrieri, Chris 2458 -0.04 -1.15 Hart, Wes 2623 -0.03 -2.43 Chung, Mark 2639 -0.17 0.00 Vallow, Scott 380 -0.71 -0.08 Schmidt, Casey 363 -1.49 0.04 Powell, Darryl 335 -1.34 0.00 Rizo, Alberto 297 -1.52 0.00 Herdsman, S 198 -2.27 0.00 Cannon, Joe 45 4.00 -0.23 Blake, Alex 24 -3.75 -0.13 I separated out guys who either played fewer than 400 minutes, or who played 400 minutes or less fewer than the maximum. I think their numbers are probably less indicative, either because of small sample sizes or because you can't extract them from the team's performance as a whole. I'm sure Colorado fans would be incensed that Trembly's at the top, I have no idea how to interpret that. Maybe he caught some minutes as the team got hot, maybe he's actually quite good. As for the rest, I actually think it looks all right - I'm not sure if it's really hugely informative though.
Well, clearly (and this is stating the obvious to the extent), the chart shows that Colorado did better with as many regular starters on the field as possible. The players with less than 1000 minutes tend to have the worst averages (although Trembly at just over 1100 minutes seems to be an odd exception, which is rather interesting). A comparison of these numbers team-by-team could show which teams were hardest hit by injuries and call-ups (assuming teams tend to have the best +/- figures for their starters). And I wonder what a list of the players who are the farthest above their team average would look like. The problem is, as you mentioned, that a player who has a lot of minutes is going to be close to his team average no matter what. The numbers could be more useful to show which players are "impact players" despite not playing a whole lot of minutes. I wonder who the other Trembleys of the league are.
Great stuff, Chris -- I'd suggest ordering things in terms of the +/- gap, in which case Trembly drops to 0.67, while Spencer and Kotschau surge to 1.76 and Beckerman to 1.05 ... of course, the "out" numbers for the last three guys are somewhat noisy. It'd also be interesting to see a "scrubs' " plus/minus, where you just pool all the numbers for players who had low minutes. This might give an indication as to how deep a team's bench is. (I'd bet a large sum of money that Carlos Ruiz led the league in +/- gap.)
That's actually what I was going to do - I had a whole post typed up, became very discouraged that Dario Fabbro was second, and quit. Bad decision dude. Ruiz's +/- per 90 is .12, the galaxy's without him is -.561, putting him at .677 - right between Seth Trembly and Brian Maisonneuve. I hope that makes it extremely clear that these statistics are very limited. I decided to take your suggestion, and see who was the best with 1000 or more minutes played and 1000 or more minutes missed, which is going to put a guy somewhere between 1000 and 1700 - I think it might be these guys who can best be evaluated. ...I don't know what to say about the results - they're not great, but they're not terrible either. (also, I've still got problems with traded players, so they're out) Top 20: Code: Cancela, Jose 1.029 Bocanegra, Carlos 0.926 Mapp, Justin 0.909 Marsch, Jesse 0.874 Ching, Brian 0.859 Magee, Mike 0.803 Quintanilla, Eliseo 0.753 Ekelund, Ronnie 0.747 Victorine, Sasha 0.694 Maisonneuve, Brian 0.678 Trembly, Seth 0.677 Garcia, Freddy 0.646 Gomez, Francisco 0.617 Martino, Kyle 0.549 Robinson, Eddie 0.534 Howard, Tim 0.520 Cunningham, Jeff 0.513 Perez, Orlando 0.417 Pope, Eddie 0.403 Wolyniec, John 0.383 Cancela's pretty great, this list looks quite good until we get to Eliseo Quintanilla. I don't know what to say about that. At all. This list is packed with quality players, several of the 'seasoned veteran variety' with few exceptions (Quintanilla, Trembly, Garcia, Perez). Bottom 20: Code: Noonan, Pat -0.224 Brown, C.J. -0.303 Lagos, Manny -0.331 Cienfuegos, Mauricio -0.385 Stone, Jordan -0.397 Bartolomeu, Edgar -0.419 Gutierrez, Diego -0.421 Alegria, Jose -0.425 Johnson, Edward -0.485 Russell, Ian -0.491 Namoff, Bryan -0.497 Thomas, Shavar -0.636 West, Brian -0.648 Lalas, Alexi -0.683 Convey, Bobby -0.734 McCarty, Chad -1.043 Moore, Joe-Max -1.101 Dunseth, Brian -1.305 Cullen, Leo -1.489 Williams, Andy -1.926 Some big surprises here, a few guys who might have been expected. Andy Williams had a -10 plus/minus. The Fire had a +10. That's ridiculously bad, but I'm pretty confident my numbers are right. Other negative Fire were Faria (-7), Spiteri (-6), Selolwane (-4), Bolanos (-4), and Capano (-4). Williams (plus the last four) were hurt by a meaningless end-of-season game agains the Crew, but Williams' numbers would have been damned bad anyway. Maybe there's a good reason he's never able to stick with clubs... Of the bottom 10, 4 were cut this offseason - Cullen, Dunseth, McCarty, Lalas. Only one of the top 20 even moved teams (Garcia), 10 of the bottom 20 aren't with last year's teams. Convey is a mystery to me, as is Namoff. Noonan is a very big surprise, considering he started to come on about the same time as Cancela. 57 guys made the list, with 29 above 0 and 28 below. Average was about +.02, so it doesn't seem like this has much of an effect on minutes played (though it might this year). Since only 57 guys fit into these categories, and I had to eliminate some, I figured I might as well list the remaining 'mediocre' players: Code: Armstrong, Stephen 0.327 Zotinca, Alex 0.291 Mastroeni, Pablo 0.268 Buddle, Edson 0.258 Morrow, Steve 0.212 Kreis, Jason 0.164 Moreno, Alejandro 0.134 Gbandi, Chris 0.074 Pause, Logan 0.018 DiGiamarino, Joey 0.000 Oughton, Duncan 0.000 Vagenas, Peter 0.000 Rhine, Bobby -0.017 Kante, Daouda -0.098 Talley, Carey -0.119 Walker, Jonny -0.128 Simutenkov, Igor -0.136
Thing is, the "+/- Gap" has a lot to do with a team's alternatives. Noonan being off usually meant that Twellman was on, and vice versa, so New England didn't lose much in the trade. Likewise, Johnny Walker is a good keeper, but he comes out on the negative side when compared with (mostly) Timmy Howard. Alexi Lalas may come out on the bad side because he was filling in for Califf. The alternatives situation is also reflected in the fact that quite a few "unreplaceable" players are near the top of the list ... personally, I think that while Ching, Cancela, Victorine, Ekelund, and Bocanegra were out, their teams didn't have anything close to a comparable replacement. Ruiz either, even though he's caused me to lose hypothetical $$$.
Again, great analysis Numerista. It still doesn't explain why Eliseo Quintanilla is so high, or what the hell is wrong with Andy Williams or Bobby Convey, but I'm confident there's a huge number of things going on in these numbers.
In a sort of pseudo-related thing, I took a look at team plus/minuses at five points in the season (every 6th game). Why? Well, mostly because I think some of the plus/minus is good luck, just slotting into a team at the right time. It's also just sorta interesting. Code: Club 6 12 18 24 30 Columbus 3 -2 -3 -1 0 Chicago 2 6 9 11 10 D.C. United -2 -1 3 2 2 Metrostars 4 3 4 4 0 New England 0 2 -3 -1 8 Colorado -10 -10 -4 -1 -5 Dallas -4 -10 -19 -25 -29 Kansas City 2 5 6 2 4 Los Angeles -3 -3 2 0 0 San Jose 5 3 6 9 10 And here's the differences between weeks: Code: .06 612 1218 1824 2430 Columbus 3 -5 -1 2 1 Chicago 2 4 3 2 -1 D.C. United -2 1 4 -1 0 Metrostars 4 -1 1 0 -4 New England 0 2 -5 2 9 Colorado -10 0 6 3 -4 Dallas -4 -6 -9 -6 -4 Kansas City 2 3 1 -4 2 Los Angeles -3 0 5 -2 0 San Jose 5 -2 3 3 1
I still suspect that we're seeing some true information in this data, but some back-of-the-envelope calculations weren't too comforting. I made a few assumptions: -- player is in for 1700 minutes and out for 1000 minutes (or vice versa) -- team averages 1.5 gpg, both scored and allowed -- goals arrive independently at random intervals ("Poisson") -- player makes no impact at all Based on these assumptions, I came up with a standard deviation of 0.655, which drops to 0.632 if a player plays exactly 50% of the time. For the 56 "medium minutes" players that ChrisE listed, the sample standard deviation is 0.647. So in this respect, our data looks exactly like what we'd expect from nothing but random noise. (As I said, I still think there's some signal in our data, but it's food for thought.) Would it be easy to pull the +/- numbers for the 2003 MLS Best 11? Also, do you know if it's still possible to look at boxscores from past seasons?
Used to be, but with mls redesigning their site I can't find them anymore. I maintain hope they'll be back eventually. These are last year's best eleven. The first number is the difference between their minutes played and minutes not played - higher numbers, in my opinion, mean the second plus minus stat is less reliable. Second is just a straight +/- per 90. Third is this adjusted +/-. Code: Min diff +/-per90 +/- margin Onstad, Pat 2230 0.54 2.145 Bocanegra, Carl 784 0.66 0.926 Pope, Eddie 630 0.16 0.403 Nelsen, Ryan 1806 -0.04 -0.748 Chung, Mark 2507 -0.17 -0.171 Armas, Chris 1750 0.56 1.252 Preki 2559 0.17 0.924 Beasley, Dcus 1154 0.37 0.145 Donovan, Landon 974 0.33 0.037 Spencer, John 1759 0.16 1.760 Razov, Ante 1756 0.40 0.396 I tried getting a weighted average (or whatever it's called) by multiplying the +/- margin by the inverse of the minutes difference. Got a cool .577. Numbers, taken individually, still don't necessarily make a lot of sense though.
Hi guys, I've tried something similar with the EPL.Namely recording the team's average goal difference,corrected for venue both when a certain player played & again when he was absent. The original purpose was to try to identify players who were more influential to a team's success. It seemed to identfy postions on the field that were vital to a team showing it's best performances.Probably not surprisingly these were the backbone of the side,the centre forward(usually the top scorer) thru the central defenders to the goal keeper. Equally unsurprising it showed on average a team was slightly less likely to score without it's top striker(I'll look up the actual ballpark figures) & slightly more likely to concede without it's first choice keeper. If you stuck these figures into a Poisson you could estimate the change in likely win/loss/draw figures without these key players. I've vaguely thougth about running a least squares regression that treated teams with & without their key influential players as seperate teams.This could address the strength of schedule issue as certain top teams may systematically rest their top players against weaker sides. Another option is to look at pairings of players.A team may be only marginally effected by the loss of one main striker,but the loss of two may hugely reduce their chances. Strikers hunt in pairs,central defenders form reffective partnerships etc.. Main problem with this is probably a very lopsided split in the with/without stats. T1
A few figures, Loss of a team's leading striker costs that team on average around 3 tenths of a goal in the EPL.Loss of a first choice keeper slightly less. Looked at over a season,if a team were forced to play with their second choice striker & he didn't improve,it would probably be the difference between winning the league or coming second if you were Arsenal or fighting a relegation threat or mid table mediocrity if you were Southampton. If you're going to chart players this way dealing with the home away split will be vital. A at home to B compared to B at home to A causes a combined difference in supremacies of around 9 tenths of a goal in the EPL.So an inbalance of home and away games probably has more scope to skew figures than even strength of schedule. Depth of squad quality is also going to be a big issue. The only stand out example of a non striker or keeper having a comparable effect on a teams performance by virtue of his absence was Vieira at Arsenal a couple of seasons ago. T1
Yeah, this is a very important point (that I had totally ignored). Strength of opponent would be pretty tough for me to do, but this, quite possible. I doubt it would be so much of an issue in MLS - depth is probably a lot less variable here than in England (it's not something I really feel prepared to look at, either.)[/QUOTE]
How about a more general look, with dummy variable flags for each position for each team? Starting goalie 1 or 0, that sort of thing? Cluster injuries at a position may have some sort of non linear effect too I would guess?
Brilliant idea AV I carn't think of any reasons why such an approach wouldn't work perfectly.It should prove really flexable as well & most importantly I haven't come across anyone who's already tried it. Next step should be to decide what type of data we need & where to get it. T
Well, initially you will need who played in each match later, you could do it on a stats basis, goals, shots etc., I guess but first look you could just do players who played however, tricky thing probably will be when formations change for away games, etc. if a 4-4-2 becomes a 4-5-1 then the starting flag thing is a little more problematic also, you would be looking at it after the fact, so to speak, choosing who the starter is at the end of the season, which you may not always know at the start of the season, if you have midfielders that split time, for example although for purpose of a general study, probably not so important, but a little work to classify them like that, once all the game data is obtained, although finding lists of players and how many games they played in is a fair bit easier than processing all the player data and all the fun name standardisation that entails
On players though - and was thinking about this to do with another sport, not soccer really, if you actually had stats - you could have a smoothed player ability rating from their stats, as well! There again though you get into the important of players in different roles...
In this case, I'm a beggar not a chooser, but I'd be really interested to see these numbers again for 2004. By checking how well things correlate with 2003, we could start to get a better grip on their usefulness. (note: 2004 Revs' numbers are posted on their forum.)
I've been recording the games, etc., but it's a hassle actually setting up the thing to figure out +/-, so I'm not going to bother trying till the end of the regular season.
Perhaps you can show me what the file looks like as .csv? It might be easiest for me to write a script.