Soccermetrics

Discussion in 'Statistics and Analysis' started by mellon002, Feb 18, 2004.

  1. mellon002

    mellon002 Member

    Jan 24, 2003
    Towson, MD
    Be forewarned, I can't fall asleep so that's why I'm able to do this.


    The tagline for this forum got me interested. I'm a big baseball fan and for anyone who gets really into the game, stats is a huge part. So much of baseball is numbers. So why not start it with soccer. I've been investigating what Sabermetrics means and it's very confusing. I found one interesting formula they use to predict wins and losses.

    http://www.baseball1.com/bb-data/grabiner/manifesto.html

    It's an interesting theory so I figured I would try it with the numbers from last year.

    Chicago Fire: 53 GF and 43 GA - 15W 7L 8T

    53:43 (squared) = 2809:1849
    2809+1849=4658
    2809/4658=.603 win %

    Spliting the ties between both wins and losses:
    19+11=30
    19/30=.633 win %

    New England Revolution: 55 GF and 47 GA - 12W 9L 9T

    55:47 (squared) = 3025:2209
    3025+2209=5234
    3025/5234=.578 win %

    Spliting the ties between both wins and losses:
    16.5+13.5=30
    16.5/30=.550 win %

    NY/NJ MetroStars: 40 GF and 40 GA - 11W 10L 9T

    40:40 (sqared) = 1600:1600
    1600+1600=3200
    1600/3200=.500 win %

    Spliting the ties between both wins and losses:
    15.5+14.5=30
    15.5/30=.517 win %

    Here's the rest of the percentages for the league:
    Team------Proj.------Actual
    DC---------.527-------.483
    CLM--------.500------.467

    SJ----------.623-------.616
    KC----------.543-------.517
    CO---------.441--------.483
    LA----------.500-------.450
    DAL--------.230--------.283

    Some of the percentages are extremely close while others may be off a tad. Here's the interesting thing though, if you can predict the amount of goals scored approximately versus the goals against, you can accurately predict where each team will land. Had you guessed right with the goals and played the percentages, you could have gotten 8 out of 10 teams right in the final standings.

    Predicting goals scored and allowed might not be as hard as you think. Just look for steady players and units.

    Teams with proven scorers such as Ruiz and Twellman, barring injury, should give steady numbers. We can look at their past trends and get a pretty good read on what their production could be in 2004. Example: Carlos Ruiz should be close to 20 goals again this year.

    DC's defense hasn't changed and has only gotten better if anything with the addition of Milton Reyes coming back from a knee injury. They should allow few goals again this year.


    I'm still reading into Sabermetrics and how it could be applied to soccer. If anyone knows anything about Sabermetrics and thinks something could work for Soccermetrics, please post here.

    I feel like a huge nerd.
     
  2. mpruitt

    mpruitt Member

    Feb 11, 2002
    E. Somerville
    Club:
    New England Revolution
    Welcome to the fold. I think someone may have already ran the numbers that you have there but certainly you've obviously broken them down in a pretty clear way. Look around some of the other threads in here to see which one it might be. This whole forum basically started as an idea that advance objective statistical analysis (sabermetrics) could be applied to soccer. The original thread which was a starter for this forum is here

    It's excellent that you've found this stuff as interesting as some of us on here already have. Everytime someone buys into this idea it makes me really excited. If we ever manage to get someone as a moderator on here then that thread definately should be a sticky or as some kind of FAQ. It'd also be great if we could finally get this forum onto the main page, as it's a bit hidden now.

    The term Soccermetrics is something I believe beineke termed, "Stats and Analyisis" just always seemed a little more straight forward.
     
  3. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    I pretty much agree with all this (moreso than Maxim, at least), and have actually done a little bit of work on it, as have a number of other people, if you care to look around. Just now, trying things out, measuring the average error of the predictions, it looks like an exponential of 1.5 gives the lowest average error (around .02).

    I've got to totally disagree with this part; it's much much harder to predict future performance than simply looking at a team as the sum of its parts. The Galaxy last year had a massively worse goal differential than they did in 2002, despite the fact that they lost pretty much no one. Meanwhile, despite losing several integral players, the Chicago Fire relied on several unpredictable newcomers (Damani Ralph, Justin Mapp, Andy Williams) to improve on 2002. No one expected them to be better last year.

    Furthermore, you've got huge problems arising with a team like D.C. that's under new management. That team won't play anything like they did in 2003, now that Etcheverry and Stoitchkov are gone, and Convey is running the midfield (if he even stays the whole season - who knows); there's simply no way (that I can see) to predict how their offense will perform.

    Even your examples reveal problems: Ruiz may have looked about as productive in 2003 as he was in 2002, but he got 7 of his 15 goals off penalties; I don't know if his problem was related to the Galaxy's midfield, or simply a year-long slump, but he clearly wasn't as good this year. Twellman, likewise, missed a lot of time to injuries - how do you predict something like that?
     
  4. mellon002

    mellon002 Member

    Jan 24, 2003
    Towson, MD
    Thanks maxim-1. I've actually been thinking about other stats that we could use. I think Sabermetrics invented the OPS (on-base percentage + slugging percentage). We should start creating some numbers as well.

    For a striker we could use a GPM (Goals per 90 minutes) which would be similar to an ERA in baseball. A GPM could help us to predict results in this way. If a player were to miss extended time due to national team call-ups for the World Cup or Olympics :( their numbers will obviously decrease. But if they have had steady production over the past few years, we can calculate how many minutes they will play and we should be able to accurately predict their production level for the season.

    I'm sure somebody has probably thought of this. I didn't read through all of the thread yet that you posted, but I'm working on it. I did read that someone made the point that we must establish a relationship between the numbers and results. Well, the numbers I crunched last night have a pretty good relationship.

    The most I was off by was The Burn who were off by .053 while in contrast the 'Quakes were only off by .007. The average difference between actual and projected numbers was .033 which happens the same as the Crew's difference. Therefore if you multiply the the projected win % of the Crew (.500) against 30 games you get 15 which was a 1 game difference between their total after ties (14) were factored into the win column. So I guess this means that on average, we can come within 1 game of correctly guessing the win total after splitting ties. The more important part, is the fact that because the win % comes that close to being correct, we don't have to calculate that total. All we do is rank the the win % and we can make predictions for the results!

    I'd say there is a definate relationship. We could even get a group going and make predictions for each team, average all the predictions and then crunch the numbers to have an official board prediction for he upcoming season. What do you think?
     
  5. mpruitt

    mpruitt Member

    Feb 11, 2002
    E. Somerville
    Club:
    New England Revolution
  6. mellon002

    mellon002 Member

    Jan 24, 2003
    Towson, MD
    Where do they get those numbers?
     
  7. mellon002

    mellon002 Member

    Jan 24, 2003
    Towson, MD
    Re: Re: Soccermetrics

    I complely agree and to make a prediction you have to try and do your best to factor all those things in. But how does that make soccer any different that any other sport you try to predict? You just have to do your best. If we did a board prediction by using the formula I used, we could have a group of people ranking teams offenses, defenses, and individual players. The more brains, the closer we can become to making accurate predictions.
     
  8. mpruitt

    mpruitt Member

    Feb 11, 2002
    E. Somerville
    Club:
    New England Revolution
    All the statistics are from MLSnet.com their stats site. A couple of us have turned them into more workable excel files. I think that Chris has one of the more extensive, I can email you mine if you like though the file I've been working with is all team data, not individual players.
     
  9. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    Re: Re: Re: Soccermetrics

    Well, it's very different than baseball because soccer is a team game, whereas baseball is (largely) the product of a lot of discrete events. You may see Rich Aurilia's numbers improve because he's batting behind Barry Bonds, but it's a lot easier to adjust for Bonds's contribution than it is to adjust for Jose Cancela's contribution (it's not like Bonds is in the batter's box with Aurilia). You'll notice that sabermetrics haven't been nearly as successful (for whatever reason) in hockey or basketball as they have in baseball.

    I'm not saying that I don't think it's worthwhile to try to predict future performance, of a team or an individual, with the stats we have - hell, that's why I'm here. I just think it's going to be much, much harder, and ultimately less conclusive, than you're giving it credit for.
     
  10. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    I had actually been thinking about doing something like this. There was an interesting thread on rec.sport.soccer about predicting the World Cup group results using a probabilistic model instead of just going 1,2,3,4 (I'd explain further, but it would be easier to just go to the link). I think that would be a much more informative and realistic way to approach this kind of prediction. I'd just be worried that we wouldn't get more than 4 or 5 responses.

    (Yeah, I just compiled the data from MLS's quite comprehensive statistics section. Kenntomasch has a lot of stuff on player ages and attendances, and I'm sure beineke and voros have some interesting things, I just don't know what.)
     
  11. mellon002

    mellon002 Member

    Jan 24, 2003
    Towson, MD
    Winshares

    http://www.baseballgraphs.com/winshares/

    This is an interesting concept although I think it would be hard to fit for soccer. Maybe we could do something to combine game-winning-goals with game-winning-assists or something like that to get Winshares for MLS.
     
  12. mpruitt

    mpruitt Member

    Feb 11, 2002
    E. Somerville
    Club:
    New England Revolution
    There certainly might be some work to be done by measuring how good a player is over a certain baseline, if you could figure out how to do the baselines them you could judge how much better a person is with SOG, G, anything. That certainly would be an interesting way of looking at playrs. However, looking at game winning goals or game winning assists imo is pretty silly because although the statistic as it is kept is interesting, a 'Game winning goal' really has absolutely no value more than any other goal scored in a game. In soccer at least there may be a different level of luck involved as opposed to a baseball player who may have hit a game winning bloop single. A soccer player has to put himself in the right position to score a goal rather than it simply being his turn in the line up. Perhaps a discussion of game winning goals would be an interesting one. How are they recorded? If the final scoreline is 1-0 and the goal came in the 5th minute is that a game winning goal? Certainly soccer the gwg is probably more signifigant and a more regular occurance than just about any sport due to the lack of goals scored but that too m ight suggest to the random nature of it.
     

Share This Page