Statistics and Anaylsis: NBA style

Discussion in 'Statistics and Analysis' started by microbrew, Apr 23, 2004.

  1. microbrew

    microbrew New Member

    Jun 29, 2002
    NJ
  2. profiled

    profiled Moderator
    Staff Member

    Feb 7, 2000
    slightly north of a mile high
    Club:
    Los Angeles Galaxy
    Like I posted earlier in another thread if anyone is interested in helping setup a similar type of web site for MLS stats (a single place to get it all) then let me know as I'd love to get involved but have a limited mathematical background in this sort of thing, but do have some web space and would love to get inolved.
     
  3. mpruitt

    mpruitt Member

    Feb 11, 2002
    E. Somerville
    Club:
    New England Revolution
    When we started the forum that was always the ultimate goal in my mind. The issues in regards to it were web space and design and if we'd reached some kind of critical mass on here in terms of content. It'd certainly be a great thing to have a site like I envision but it'd be a bit of an undertaking. If you have some webspace that can be used maybe a good first step would be having the ability to share our data files in a better way.
     
  4. mpruitt

    mpruitt Member

    Feb 11, 2002
    E. Somerville
    Club:
    New England Revolution
    There was some talk back a while ago about how it'd be neat to try a hocky-like +/- stat for soccer but it was determined that doesn't really work because of lack of substitutions, changes in formations etc etc. Turns out the guys on this site have done it for basketball. http://www.82games.com/rolandratings0304.htm
     
  5. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    I actually do have (almost correct) numbers from 2003, Maxim. The problem, of course, is exactly what you describe. If you just use raw plus/minus stats, you're taking team performance far too much into account. If you try to remove team performance, you oend up punishing players for being on good teams (which makes absolutely no sense). If anybody's got any suggestions I'd love to hear them.
     
  6. Kevin in Louisiana

    Kevin in Louisiana New Member

    Feb 7, 2003
    Metairie, LA
    Have you tried comparing teammates against teammates? You could start with that, if you haven't already. Then see who tends to stands the farthest above his teammates. Not sure what the results would be like, though.
     
  7. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    No, I hadn't. Here's Colorado, who had a team +/- per 90 of -.16. The first figure is the player's minutes played, the second is their +/- per 90 minutes, the third is the +/- per 90 of the team when the players were not playing.

    Code:
    			           Min	+/-per90   absent
    Trembly, Seth			1112	0.24	-0.43
    Spencer, John			2265	0.16	-1.60
    Kotschau, R			2333	0.12	-1.64
    Beckerman, Kyle			2124	0.08	-0.97
    Mastroeni, P			1655	-0.05	-0.32
    Fraser, Robin			2335	-0.08	-0.62
    Borchers, Nat			2101	-0.13	-0.27
    Garlick, Scott			2346	-0.15	-0.21
    Henderson, C			2180	-0.17	-0.15
    Kingsley, Zach			480	-0.19	-0.16
    Crawford, Matt			750	-0.60	0.00
    Roberts, Zizi			729	-0.99	0.13
    Stewart, Jeff			608	-1.18	0.12
    					
    Carrieri, Chris			2458	-0.04	-1.15
    Hart, Wes			2623	-0.03	-2.43
    Chung, Mark			2639	-0.17	0.00
    					
    Vallow, Scott			380	-0.71	-0.08
    Schmidt, Casey			363	-1.49	0.04
    Powell, Darryl			335	-1.34	0.00
    Rizo, Alberto			297	-1.52	0.00
    Herdsman, S			198	-2.27	0.00
    Cannon, Joe			45	4.00	-0.23
    Blake, Alex			24	-3.75	-0.13
    
    
    I separated out guys who either played fewer than 400 minutes, or who played 400 minutes or less fewer than the maximum. I think their numbers are probably less indicative, either because of small sample sizes or because you can't extract them from the team's performance as a whole. I'm sure Colorado fans would be incensed that Trembly's at the top, I have no idea how to interpret that. Maybe he caught some minutes as the team got hot, maybe he's actually quite good. As for the rest, I actually think it looks all right - I'm not sure if it's really hugely informative though.
     
  8. Kevin in Louisiana

    Kevin in Louisiana New Member

    Feb 7, 2003
    Metairie, LA
    Well, clearly (and this is stating the obvious to the extent), the chart shows that Colorado did better with as many regular starters on the field as possible. The players with less than 1000 minutes tend to have the worst averages (although Trembly at just over 1100 minutes seems to be an odd exception, which is rather interesting).

    A comparison of these numbers team-by-team could show which teams were hardest hit by injuries and call-ups (assuming teams tend to have the best +/- figures for their starters). And I wonder what a list of the players who are the farthest above their team average would look like. The problem is, as you mentioned, that a player who has a lot of minutes is going to be close to his team average no matter what. The numbers could be more useful to show which players are "impact players" despite not playing a whole lot of minutes. I wonder who the other Trembleys of the league are.
     
  9. numerista

    numerista New Member

    Mar 21, 2004
    Great stuff, Chris -- I'd suggest ordering things in terms of the +/- gap, in which case Trembly drops to 0.67, while Spencer and Kotschau surge to 1.76 and Beckerman to 1.05 ... of course, the "out" numbers for the last three guys are somewhat noisy.

    It'd also be interesting to see a "scrubs' " plus/minus, where you just pool all the numbers for players who had low minutes. This might give an indication as to how deep a team's bench is.

    (I'd bet a large sum of money that Carlos Ruiz led the league in +/- gap.)
     
  10. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    That's actually what I was going to do - I had a whole post typed up, became very discouraged that Dario Fabbro was second, and quit.

    Bad decision dude. Ruiz's +/- per 90 is .12, the galaxy's without him is -.561, putting him at .677 - right between Seth Trembly and Brian Maisonneuve. I hope that makes it extremely clear that these statistics are very limited.

    I decided to take your suggestion, and see who was the best with 1000 or more minutes played and 1000 or more minutes missed, which is going to put a guy somewhere between 1000 and 1700 - I think it might be these guys who can best be evaluated.

    ...I don't know what to say about the results - they're not great, but they're not terrible either. (also, I've still got problems with traded players, so they're out)

    Top 20:

    Code:
    Cancela, Jose		1.029
    Bocanegra, Carlos		0.926
    Mapp, Justin		0.909
    Marsch, Jesse		0.874
    Ching, Brian		0.859
    Magee, Mike		0.803
    Quintanilla, Eliseo		0.753
    Ekelund, Ronnie		0.747
    Victorine, Sasha		0.694
    Maisonneuve, Brian		0.678
    Trembly, Seth		0.677
    Garcia, Freddy		0.646
    Gomez, Francisco		0.617
    Martino, Kyle		0.549
    Robinson, Eddie		0.534
    Howard, Tim		0.520
    Cunningham, Jeff		0.513
    Perez, Orlando		0.417
    Pope, Eddie		0.403
    Wolyniec, John		0.383
    
    Cancela's pretty great, this list looks quite good until we get to Eliseo Quintanilla. I don't know what to say about that. At all. This list is packed with quality players, several of the 'seasoned veteran variety' with few exceptions (Quintanilla, Trembly, Garcia, Perez).



    Bottom 20:

    Code:
    Noonan, Pat		-0.224
    Brown, C.J.		-0.303
    Lagos, Manny		-0.331
    Cienfuegos, Mauricio		-0.385
    Stone, Jordan		-0.397
    Bartolomeu, Edgar		-0.419
    Gutierrez, Diego		-0.421
    Alegria, Jose		-0.425
    Johnson, Edward		-0.485
    Russell, Ian		-0.491
    Namoff, Bryan		-0.497
    Thomas, Shavar		-0.636
    West, Brian		-0.648
    Lalas, Alexi		-0.683
    Convey, Bobby		-0.734
    McCarty, Chad		-1.043
    Moore, Joe-Max		-1.101
    Dunseth, Brian		-1.305
    Cullen, Leo		-1.489
    Williams, Andy		-1.926
    
    Some big surprises here, a few guys who might have been expected. Andy Williams had a -10 plus/minus. The Fire had a +10. That's ridiculously bad, but I'm pretty confident my numbers are right. Other negative Fire were Faria (-7), Spiteri (-6), Selolwane (-4), Bolanos (-4), and Capano (-4). Williams (plus the last four) were hurt by a meaningless end-of-season game agains the Crew, but Williams' numbers would have been damned bad anyway. Maybe there's a good reason he's never able to stick with clubs...

    Of the bottom 10, 4 were cut this offseason - Cullen, Dunseth, McCarty, Lalas. Only one of the top 20 even moved teams (Garcia), 10 of the bottom 20 aren't with last year's teams. Convey is a mystery to me, as is Namoff. Noonan is a very big surprise, considering he started to come on about the same time as Cancela.

    57 guys made the list, with 29 above 0 and 28 below. Average was about +.02, so it doesn't seem like this has much of an effect on minutes played (though it might this year). Since only 57 guys fit into these categories, and I had to eliminate some, I figured I might as well list the remaining 'mediocre' players:

    Code:
    Armstrong, Stephen		0.327
    Zotinca, Alex		0.291
    Mastroeni, Pablo		0.268
    Buddle, Edson		0.258
    Morrow, Steve		0.212
    Kreis, Jason		0.164
    Moreno, Alejandro		0.134
    Gbandi, Chris		0.074
    Pause, Logan		0.018
    DiGiamarino, Joey		0.000
    Oughton, Duncan		0.000
    Vagenas, Peter		0.000
    Rhine, Bobby		-0.017
    Kante, Daouda		-0.098
    Talley, Carey		-0.119
    Walker, Jonny		-0.128
    Simutenkov, Igor		-0.136
    
     
  11. numerista

    numerista New Member

    Mar 21, 2004
    Thing is, the "+/- Gap" has a lot to do with a team's alternatives. Noonan being off usually meant that Twellman was on, and vice versa, so New England didn't lose much in the trade. Likewise, Johnny Walker is a good keeper, but he comes out on the negative side when compared with (mostly) Timmy Howard. Alexi Lalas may come out on the bad side because he was filling in for Califf.

    The alternatives situation is also reflected in the fact that quite a few "unreplaceable" players are near the top of the list ... personally, I think that while Ching, Cancela, Victorine, Ekelund, and Bocanegra were out, their teams didn't have anything close to a comparable replacement. Ruiz either, even though he's caused me to lose hypothetical $$$. :)
     
  12. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    Again, great analysis Numerista. It still doesn't explain why Eliseo Quintanilla is so high, or what the hell is wrong with Andy Williams or Bobby Convey, but I'm confident there's a huge number of things going on in these numbers.
     
  13. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    In a sort of pseudo-related thing, I took a look at team plus/minuses at five points in the season (every 6th game). Why? Well, mostly because I think some of the plus/minus is good luck, just slotting into a team at the right time. It's also just sorta interesting.

    Code:
    Club     	6	12	18	24	30
    Columbus	3	-2	-3	-1	0
    Chicago	         2	6	9	11	10
    D.C. United	-2	-1	3	2	2
    Metrostars	4	3	4	4	0
    New England	0	2	-3	-1	8
    Colorado	-10	-10	-4	-1	-5
    Dallas	        -4	-10	-19	-25	-29
    Kansas City	2	5	6	2	4
    Los Angeles	-3	-3	2	0	0
    San Jose	5	3	6	9	10
    
    And here's the differences between weeks:

    Code:
    	        .06	612	1218	1824	2430
    Columbus	3	-5	-1	2	1
    Chicago  	2	4	3	2	-1
    D.C. United	-2	1	4	-1	0
    Metrostars	4	-1	1	0	-4
    New England	0	2	-5	2	9
    Colorado	-10	0	6	3	-4
    Dallas	        -4	-6	-9	-6	-4
    Kansas City	2	3	1	-4	2
    Los Angeles	-3	0	5	-2	0
    San Jose	5	-2	3	3	1
    
     
  14. numerista

    numerista New Member

    Mar 21, 2004
    I still suspect that we're seeing some true information in this data, but some back-of-the-envelope calculations weren't too comforting.

    I made a few assumptions:
    -- player is in for 1700 minutes and out for 1000 minutes (or vice versa)
    -- team averages 1.5 gpg, both scored and allowed
    -- goals arrive independently at random intervals ("Poisson")
    -- player makes no impact at all

    Based on these assumptions, I came up with a standard deviation of 0.655, which drops to 0.632 if a player plays exactly 50% of the time.

    For the 56 "medium minutes" players that ChrisE listed, the sample standard deviation is 0.647. So in this respect, our data looks exactly like what we'd expect from nothing but random noise. (As I said, I still think there's some signal in our data, but it's food for thought.)

    Would it be easy to pull the +/- numbers for the 2003 MLS Best 11? Also, do you know if it's still possible to look at boxscores from past seasons?
     
  15. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    Used to be, but with mls redesigning their site I can't find them anymore. I maintain hope they'll be back eventually.

    These are last year's best eleven. The first number is the difference between their minutes played and minutes not played - higher numbers, in my opinion, mean the second plus minus stat is less reliable. Second is just a straight +/- per 90. Third is this adjusted +/-.

    Code:
    		Min diff	+/-per90	+/- margin
    Onstad, Pat		2230	0.54	2.145
    Bocanegra, Carl		784	0.66	0.926
    Pope, Eddie		630	0.16	0.403
    Nelsen, Ryan		1806	-0.04	-0.748
    Chung, Mark		2507	-0.17	-0.171
    Armas, Chris		1750	0.56	1.252
    Preki	                2559	0.17	0.924
    Beasley, Dcus		1154	0.37	0.145
    Donovan, Landon		974	0.33	0.037
    Spencer, John		1759	0.16	1.760
    Razov, Ante		1756	0.40	0.396
    
    I tried getting a weighted average (or whatever it's called) by multiplying the +/- margin by the inverse of the minutes difference. Got a cool .577. Numbers, taken individually, still don't necessarily make a lot of sense though.
     
  16. tachyon1

    tachyon1 Member

    Apr 23, 2004
    Hi guys,
    I've tried something similar with the EPL.Namely recording the team's average goal difference,corrected for venue both when a certain player played & again when he was absent.

    The original purpose was to try to identify players who were more influential to a team's success.

    It seemed to identfy postions on the field that were vital to a team showing it's best performances.Probably not surprisingly these were the backbone of the side,the centre forward(usually the top scorer) thru the central defenders to the goal keeper.

    Equally unsurprising it showed on average a team was slightly less likely to score without it's top striker(I'll look up the actual ballpark figures) & slightly more likely to concede without it's first choice keeper.

    If you stuck these figures into a Poisson you could estimate the change in likely win/loss/draw figures without these key players.

    I've vaguely thougth about running a least squares regression that treated teams with & without their key influential players as seperate teams.This could address the strength of schedule issue as certain top teams may systematically rest their top players against weaker sides.

    Another option is to look at pairings of players.A team may be only marginally effected by the loss of one main striker,but the loss of two may hugely reduce their chances.

    Strikers hunt in pairs,central defenders form reffective partnerships etc..

    Main problem with this is probably a very lopsided split in the with/without stats.

    T1
     
  17. tachyon1

    tachyon1 Member

    Apr 23, 2004
    A few figures,

    Loss of a team's leading striker costs that team on average around 3 tenths of a goal in the EPL.Loss of a first choice keeper slightly less.

    Looked at over a season,if a team were forced to play with their second choice striker & he didn't improve,it would probably be the difference between winning the league or coming second if you were Arsenal or fighting a relegation threat or mid table mediocrity if you were Southampton.

    If you're going to chart players this way dealing with the home away split will be vital.
    A at home to B compared to B at home to A causes a combined difference in supremacies of around 9 tenths of a goal in the EPL.So an inbalance of home and away games probably has more scope to skew figures than even strength of schedule.

    Depth of squad quality is also going to be a big issue.

    The only stand out example of a non striker or keeper having a comparable effect on a teams performance by virtue of his absence was Vieira at Arsenal a couple of seasons ago.



    T1
     
  18. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    Yeah, this is a very important point (that I had totally ignored). Strength of opponent would be pretty tough for me to do, but this, quite possible.


    I doubt it would be so much of an issue in MLS - depth is probably a lot less variable here than in England (it's not something I really feel prepared to look at, either.)[/QUOTE]
     
  19. AussieVamp2

    AussieVamp2 New Member

    Jul 8, 2000
    Melbourne, Australia


    How about a more general look, with dummy variable flags for each position for each team? Starting goalie 1 or 0, that sort of thing?

    Cluster injuries at a position may have some sort of non linear effect too I would guess?
     
  20. tachyon1

    tachyon1 Member

    Apr 23, 2004
    Brilliant idea AV
    I carn't think of any reasons why such an approach wouldn't work perfectly.It should prove really flexable as well & most importantly I haven't come across anyone who's already tried it.

    Next step should be to decide what type of data we need & where to get it.

    T
     
  21. AussieVamp2

    AussieVamp2 New Member

    Jul 8, 2000
    Melbourne, Australia
    Well, initially you will need who played in each match

    later, you could do it on a stats basis, goals, shots etc., I guess

    but first look you could just do players who played

    however, tricky thing probably will be when formations change for away games, etc.

    if a 4-4-2 becomes a 4-5-1 then the starting flag thing is a little more problematic

    also, you would be looking at it after the fact, so to speak, choosing who the starter is at the end of the season, which you may not always know at the start of the season, if you have midfielders that split time, for example

    although for purpose of a general study, probably not so important, but a little work to classify them like that, once all the game data is obtained, although finding lists of players and how many games they played in is a fair bit easier than processing all the player data and all the fun name standardisation that entails
     
  22. AussieVamp2

    AussieVamp2 New Member

    Jul 8, 2000
    Melbourne, Australia
    On players though - and was thinking about this to do with another sport, not soccer really, if you actually had stats - you could have a smoothed player ability rating from their stats, as well!

    There again though you get into the important of players in different roles...
     
  23. numerista

    numerista New Member

    Mar 21, 2004
    In this case, I'm a beggar not a chooser, but I'd be really interested to see these numbers again for 2004. By checking how well things correlate with 2003, we could start to get a better grip on their usefulness.

    (note: 2004 Revs' numbers are posted on their forum.)
     
  24. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    I've been recording the games, etc., but it's a hassle actually setting up the thing to figure out +/-, so I'm not going to bother trying till the end of the regular season.
     
  25. numerista

    numerista New Member

    Mar 21, 2004
    Perhaps you can show me what the file looks like as .csv? It might be easiest for me to write a script.
     

Share This Page