Linear Regression of MLS stats 2007-2008

Discussion in 'Statistics and Analysis' started by sokol, Jul 21, 2009.

  1. sokol

    sokol Member

    Aug 4, 2004
    I am being forced to switch from using SPSS to using the R programming language for data analysis this coming school year, and I thought I would get a feel for R by playing around with some MLS stats. I made a table of the basic team stats from 2007-2008 (the table that appears about halfway down the big list of year-end stats from each year).

    I thought it would be fun simply to look at the power those basic stats have of predicting points at the end of a 30 game season using linear regression. Not surprisingly to any of you who understand soccer, the only thing that really predicted points is goals (goal scored, against and differential; goal differential was the strongest predictor of points). Shots and shots on goal have a little bit of predictive power as did assists (although assists are collinear with goals, so they really are just aren't useful). But things like fouls suffered and committed, cautions and corners have almost no predictive power of points at the end of the season.

    So I decided to see if any of these stats could at least predict goals, goals against or goal differential in some way. Naturally, shots had some predictive power, and shots on goal was a little bit better. But still overall there was very little of any significance.

    One interesting thing I found was that corners in most models I tried had a negative coefficient (when predicting goals scored), meaning that my intuition (as well as the intuition of countless coaches and players) that getting corners was a sign that a team is doing well because they are attacking more frequently is incorrect. As I said, the p-value of corners in most models was not significant, however, it was often pretty close and it was significant in the model predicting goals using just shots and corners. This suggests that getting a lot of corner kicks isn't really an indication of good attacking play but more an indication of poor finishing.

    Anyway, I plan to plug in some more data, such as shots against, goal intervals, records when scoring first etc. which is easily available from MLSnet. I wish MLS aggregated possession stats, as that would be another interesting variable to play with. I was just curious if anyone else has tried this with soccer stats before and if they've had similar results?
     

Share This Page