Inspired by some of ChrisE’s work on shooting stats, I got to wondering: if you want to score more goals, which is more important - shooting more or shooting well? To answer that question I performed a standard least squares regression analysis on the all-time (’96-’02) MLS team shooting statistic data. In the following table, S90 is shots per 90 minutes, S% is shooting percentage or shots on goal divided by shots, G90 is goals per 90 min, and PRED is the predicted goals per 90 min using the prediction formula calculated from the regression analysis. Code: [size=1] TEAM S90 S% G90 PRED CHI 14.5 0.460 1.81 1.77 COL 13.9 0.427 1.48 1.55 CLB 13.8 0.431 1.67 1.54 DAL 13.6 0.481 1.64 1.63 DC 14.1 0.501 1.79 1.80 KC 13.6 0.436 1.38 1.51 LA 14.4 0.478 1.82 1.80 MET 14.1 0.439 1.48 1.63 MIA 13.7 0.446 1.63 1.56 NE 13.0 0.468 1.42 1.46 SJ 13.1 0.485 1.53 1.53 TB 13.5 0.430 1.59 1.47 [/size] Between them, shots and shooting percentage account for 59% of the (team-team) variability in goals scored. (A while back Nutmeg had suggested a “danger rating” which included fouls suffered. By including fouls suffered in the regression analysis I could account for 64% of the variability in the goals scored data, but with a p-value of 0.16, a statistician might be inclined to leave that term out. Perhaps the other 36% is accounted for by Passes, Traps, and Dribbles, but MLS is apparently either unable or unwilling to provide such data.) From the scaled estimates it may be observed that shots account for about 62% of that variability, while shooting percentage accounts for the remaining 38%. Hence, the answer to my question is that shooting more is more important than shooting well. Just to show that these results are not a fluke of the particular data set chosen, I used the prediction formula calculated from the ’96-’02 data, to predict the goals per 90 min for each team in 2003, with quite satisfactory results: Code: [size=1] TEAM S90 S% G90 PRED CHI 14.7 0.441 1.71 1.77 CLB 12.7 0.453 1.41 1.35 DC 10.6 0.447 1.21 0.87 MET 12.7 0.479 1.28 1.43 NE 14.1 0.457 1.76 1.68 COL 12.5 0.448 1.30 1.30 DAL 10.8 0.401 1.14 0.79 KC 13.2 0.438 1.54 1.42 LA 11.7 0.466 1.12 1.17 SJ 12.2 0.435 1.45 1.19 [/size] Finally, for the real stats geeks, details of the regression analysis results are as follows: Summary of Fit R-square is the portion of variation attributed to the model, between 0 and 1. Root Mean Squared Error "RMSE" estimates the standard deviation of the residual. Code: [size=1] RSquare 0.666178 RSquare Adj 0.591996 Root Mean Square Error 0.096385 Mean of Response 1.603333 Observations (or Sum Wgts) 12 [/size] Analysis of Variance The test that the whole model fits better than a simple mean, i.e. testing that all the parameters are zero except the intercept Code: [size=1] Source DF Sum of Squares Mean Square F Ratio Model 2 0.16685549 0.083428 8.9803 Error 9 0.08361118 0.009290 Prob > F C. Total 11 0.25046667 0.0072 [/size] Scaled Estimates Continuous factors centered by mean, scaled by range/2 Code: [size=1] Term Scaled Estimate Std Error t Ratio Prob>|t| Intercept 1.6033333 0.027824 57.62 <.0001 S90 0.1661535 0.047232 3.52 0.0065 SOG% 0.100731 0.042451 2.37 0.0417 [/size]
I'm really glad you did this NoSix, very interesting. Unfortunately, I'm not nearly as good at interpreting this stuff as our other resident statistician, so all I've got is a couple of questions. First of all, why did you choose to use all-time shooting statistics instead of individual seasons? It seems like individual seasons would provide a much better sample from which to draw a regression (though I don't really know this stuff). Second, did you observe a relationship between Shots/90 and shooting percentage? Intuitively, I'd expect that increasing the number of shots would decrease shooting percentage, as you took worse and worse shots (however, I get a correlation of .29 on your data, which I guess contradicts that claim, but also implies there might be other things at work). However, wouldn't recognizing how one affects help answer your question even better? Third, did you account for penalty kicks? I guess there's a good chance they just average out, but there's at least a chance it could impact things significantly. Fourth, if you were a coach, how would you use this data? Tell your players to shoot at every opportunity? Seems a little simplistic, but how else does a team regulate their shots on goal vs. their shot? Fifth, I get standard deviations for shots and sog of 1.3 and .02, respectively, which, if you divide by their means, gives you .088 and .046. So that implies (to my unstatistical self) that not only do shots account for more goals, they're also easier to change (I'd imagine this is a pretty wrong conclusion, however). Sixth, could you explain the statistics at the bottom just a little bit more for folks like me (who don't attend MIT or Stanford)? Specifically the scaled estimates (and the ANOVA?)
Because I was too lazy to cut and paste individual seasons? No, seriously, analyzing the same data broken out by season would add season-season variability into the data, and my objective here was to understand something about the relationship between shots and goals with as few extraneous factors as possible. To answer your first question, no, in fact I put the interaction term S90*S% into the regression analysis and verified that it was nowhere near significant. LOL - you are persistent! These numbers include penalty kicks, although the PK data are called out separately in the MLS stats. The number of PK goals is small compared to the number of goals, so only a real masochist would go adjust the data. Couldn't help but noticing that Tampa Bay, over the life of their franchise, converted only 11 of 22 penalties - yep, just 50%. Still think it's an automatic goal? Come to think of it, don't answer that. I think the message I would give my players is that if you get a good chance to put a shot on goal, take it. I think Marcelo Balboa was spot on when he commented during the Poland game that Reyna and Convey should have taken a shot when the opportunity presented itself, instead of trying to make the extra pass in the box. I can say I was struck by how little variation there was between teams with regards to shot quality. In the long run, a little less than 1 in 2 shots is on goal, and 1 out of 4 shots on goal goes in, and those ratios are remarkably consistent between teams. Not sure I follow your argument here. As the verbiage indicates, the ANOVA is testing the fit of the model (i.e., that goals = a*S90 + b*S% + c, where a, b, and c are constants, versus the alternative that goals = c, where c is a constant (equal to the average goals scored). Since the p value of 0.0072 is less than 0.05, this indicates a good fit of the model to the data. Since the variables S90 and S% have different magnitudes and ranges, you can't compare the relative significance of these two variables just by looking at the magnitudes of the constants a and b which result from your analysis. The scaled estimates are algebraic (linear) transformations of the input variables to give them a mean of 0 and a range of -1 to +1, so that their relative significance can be directly compared. In this case, 0.166/(0.166+0.101)=62% and 0.101/(0.166+0.101)=38%, respectively.
Could you get a copy of this to every player on DC United? Especially Eskandarian, Moreno, Stewart, Cerritos, or any other player that has underachieved in scoring goals.
A few comments -- some are very skeptical and nitpicky. It's nothing personal, just the nature of the business. 1) There is very little to infer from the 62-38 conclusion -- the difference is nowhere near the significance threshold. All that you've shown is that both taking shots and SOG% are significant predictors of goalscoring. 2) I tend to agree with ChrisE that more interesting things could be said with season-by-season data, even though inference would be harder. 3) Calling the one predictor "shooting percentage" is a bit dubious, since that term usually implies "Made/(Made + Missed)." IMO, "shot quality" is even more mis-leading. 4) Nothing in this study strictly implies that it helps to be quick on the trigger. Taking more shots is also a result of being better at creating opportunities to shoot. 5) Looks like something might be wrong -- why does "PRED" trend lower than "G90?"
Valid point - the difference is only 0.06 while the standard errors are 0.04-0.05. To each his own - I can either wait for seven more seasons of data to draw a firmer conclusion, or follow my Bayesian tendencies and argue that if I have to make a decision today based on the available data, I'd recommend to shoot more. My recommendation was to shoot more, not more quickly. I suspect that may be because the '96-'02 data are actually in terms of games played (some of which would be more than 90 minutes), while the 2003 data are truly per 90 minutes.
You said, "I think Marcelo Balboa was spot on when he commented during the Poland game that Reyna and Convey should have taken a shot when the opportunity presented itself, instead of trying to make the extra pass in the box." I'm not saying Balboa was wrong, but that looks like a recommendation to shoot more quickly.
Actually, if you included season as a within-team factor in your analysis or did a hiearchical linear model (or other multi-level random coefficient modeling technique), then having the data in season-by-season form would actually give you better estimates and would probably help explain more of the variance. Overlal awesome analysis, although I would echo numerista and be very cautious in saying whether number of shots or sog% is a more important predictor--you've shown with this sample that they are both important predictors of goals. I'm also surprised that there is no interaction--you'd think that those teams who shoot very little and have a very low sog% would score far fewer goals than those teams that shoot a lot and have a high sog%. Maybe the non-significant interaction was due to low power (n only =12, a pretty small sample), which having data separated by seasons would also help with.
Whoa. I had totally forgotten that you were the guy accusing me of having some sort of hateful desire to steal from Carlos Ruiz his rightfully earned credit. I still don't understand how so many people don't adhere the clearly correct side of this argument)), but I certainly don't want to open that discussion again. I'm sure you know my answer to your question. Thanks for the rest.
Thanks for the kind words. I will defer to your and Numerista's expertise and do a season by season analysis sometime if I can find the time (and if Chris doesn't beat me to it first!)
Hi guys, neat thread. This may interest you. If you do a similar thing with the English premier league & try a regression on goals scored & attempts(headers as well as just shots),you do find a relationship exists between the two. If you further divide the data for home or away sides you then find that it takes slightly more goal attempts for the away side to score a goal than it does the home team. Goal attempts for the away side look to be of poorer quality than those of the home team. The reasons are probably many and varied.A couple that spring to mind are,away fans,being outnumbered around 10/1 will cheer any attempt from their team no matter how feeble.So away players are more willing to attempt a shot no matter how ambitious.Away teams are also more likely to be trailing(home advantage,sympathetic reffing etc),so away attempts are often bourne out of desperation. There's also a weaker relationship between goal attempts & corners gained. T1.
Thanks for the info, man, welcome to the site. I actually recently was looking at foul correlations in MLS, so I happen to have the numbers you're talking about for the 2003 MLS season. Code: per 90 H goals A goals H Shots A Shots H % A % Totals 1.60 1.19 13.90 10.93 0.115 0.109
Jesus christ we need some sort of math translation thread for this forum. Pretty heavy stuff you guys are getting into. Well done. However, I think the quesiton dealing with in this thread is the most important one that we can conquer or try to here. What and who is doing what is most intrinsic to scoring goals?
Thanks for the reply Chris & Maxim, the more countries leagues you look at the more you come to realise that soccer is essentially the same game. I've got the stats for the EPL for the 2001-2002 season on how goals were scored/conceded by each team. Breaks them down by player position(attack-midfield-defence),type of goal(shots,headers,open play,corners,inside the area,inside the six yard box,outside the area etc) & it's also broken down for home/away games. I'll post them if I can sort them into a readable format. T1
Away teams I think generally tend to play more defensive formations in a lot of cases, which likely impacts how many shots they get, and if 1 striker rather than two, each shot may be a little 'harder' as a touch more 'defense' around that player?