Sabermetrics applying to Soccer

Discussion in 'Statistics and Analysis' started by mpruitt, Jul 30, 2003.

  1. bunkmedal

    bunkmedal Member

    Feb 12, 2010
    Club:
    Cardiff City FC
    I know this was posted a long time ago, but it's an interesting point. Most people believe strikers to be the most important/most valuable players on a team. However, as far as I'm concerned this is just a symptom of the fact that it's far easier to measure their contribution to the team than it is to measure that of a midfielder, or a defender.

    Indeed, the same thing has arguably occurred in baseball. The one area of baseball which still proves problematic to subject to statistical analysis is defence and as a result great defensive players are often underrated in their effects in comparison to great pitchers (whose performance can be measured far more simply). Signing two great defensive outfielders can have just as big an effect on reducing runs as signing a great starting pitcher, but the market (and particularly fan/media perceptions) have consistently dictated the opposite. Some of the more influential sabermetric blogs such as USSMariner have repeatedly made this point.

    Strikers are the one type of player which we have a good measurement for and this has lead to them being vastly overrated in terms of value and importance in my opinion. They put the ball in the net and without them a goal cannot occur, but most goals have several contributions which are necessary and if you take any of these away then the goal wouldn't have happened either. We have few freely available statistics which measure great crosses, or brilliant through balls, or great tackles, or the ability of a player to dominate possession in midfield, but the fact that you can't measure something doesn't make it less valuable.
     
  2. ElChupacabra67

    ElChupacabra67 New Member

    Sep 30, 2009
    Georgia
    Club:
    FC Barcelona
    Nat'l Team:
    United States
    Word. I think it might virtually impossible to quantify individual player performance outside of strikers and goalies. I think you can look at a teams overall defensive performance in various ways, but then you have to deal with fact that the players at the back change, so it really becomes a measure of the organizations ability to acquire, coach, and sub in the right mixture game after game to keep the number if shts and sogs down. Look at Chelsea for example. This season they only allow opponents about three sog per game on average. So a team knows they're only going to get a few good chances to score and if they and their strikers aren't efficient as measured by the % of shts that are sogs and the percentage of sogs that they typically turn into goals, then they are screwed.
     
  3. EvanJ

    EvanJ Member+

    Manchester United
    United States
    Mar 30, 2004
    Club:
    Manchester United FC
    Nat'l Team:
    United States
  4. ElChupacabra67

    ElChupacabra67 New Member

    Sep 30, 2009
    Georgia
    Club:
    FC Barcelona
    Nat'l Team:
    United States
    The problem with that index is they don't explain how they arrive at it so we have no way of evaluating itnor using it. In my view it's just a marketing tool. If they would release the formula we would be In a better position to evaluate it.

    It would also be nice to have data on when a player creates a chance for another player but it doesn't result in a G, A or even a shot. Perhaps Actim has acess to this info ( i'm sure the clubs track it) but we have no idea. So we can't measure that part of a players contribution.
     
  5. TimB4Last

    TimB4Last Member+

    May 5, 2006
    Dystopia
    Just because it's impossible we should stop trying?
     
  6. ElChupacabra67

    ElChupacabra67 New Member

    Sep 30, 2009
    Georgia
    Club:
    FC Barcelona
    Nat'l Team:
    United States
    I don't think so. I'm still trying, every day, to improve the metric I use and to find out more info. It all comes down to what stats are available: passes completed, clears, etc and go from there.
     
  7. TimB4Last

    TimB4Last Member+

    May 5, 2006
    Dystopia
    Good luck! It seems to me that if you knew which stats to collect and how to interpret them, they could be very useful, even if they were not a perfect substitute for the trained eye.
     
  8. ElChupacabra67

    ElChupacabra67 New Member

    Sep 30, 2009
    Georgia
    Club:
    FC Barcelona
    Nat'l Team:
    United States
    Here's some of it.

    http://nordeckeluchador.blogspot.com/
     
  9. ElChupacabra67

    ElChupacabra67 New Member

    Sep 30, 2009
    Georgia
    Club:
    FC Barcelona
    Nat'l Team:
    United States
    I posted this in response to a question in another forum, but I think it goes better here. It's about the work I do for a blog. I've been cited for spamming before, even though the blog is a hobby and not commercial, but I won't "pimp" it here again. But in the interest of getting feedback, here's the post:

    I don't know which version of the EPL Rankings you saw, but here's how I did it.


    First, statistics are just tools. And all I'm doing, because I'm a geek, is taking what I learned from American baseball Sabremetrics and linear weights and applying them to soccer. There's a thread here at Big Soccer that talks about this aspect of soccer analysis in more detail.

    Using Excel, I did a simple regression analysis of a whole bunch of different soccer statistics. Simple regression analysis is a way of determining how strong of a correlation there is between two variables (say, the number of corner kicks a team takes to the number of goals it scores). You can also do more complex regression analysis using multiple variables, which I'm now doing with the entire metric. Anyway, I was trying to find a way to develop a model, or metric, that could measure a team's total production. By production I mean a measure of not only how many goals it scores, or games it wins, but how many opportunities to score it creates overall and per game. The assumption is that if the whole point is to create scoring chances, then we need to look for statistical evidence of all such chances and not just successful ones.

    For example, I was convinced that the number of corners a team earned was evidence of how much pressure it was putting on the other team. But you can't just count the number of corners and add it to the number of goals and wins. You have to find a common denominator, or variable, that a corner is "worth" and then compare it to any other statistic in terms of it's "worth."

    I used simple regression analysis not only to figure out which statistics had the strongest correlations to team success as compared to other teams, but also to arrive at a common denominator for weighing every stat. One example of a stat I threw out was yellow and red cards. I had a hunch that the number of yellows and reds a team earned could tell us something about how good that team was. But the regression analysis proved my hunch wrong, at least in so far as statistical analysis goes. There is no numerical correlation between serious infractions and either wins, losses, or goals scored. My conclusion was that playing dirty or physical doesn't give us a way of distinguishing successful teams from less-successful ones. So we don't include it in the metric.

    Anyway, I finally decided to weight everything in terms of its relative worth in goals scored. In other words, if a team scores 2.4 goals per game in the EPL in a given season, how many wins does that usually produce? If it takes only 3.5 Shots on Goal per game, how many wins might it expect to earn? A really good example is Corner Kicks Earned. How many corners, on average, does a team have to create before it scores? One serious and knowledgeable fan I spoke to said it was 4. Another said 6. I looked at all the data from the 2008-09 EPL Season, and then I started looking for published research on soccer statistics by serious statisticians.

    What I discovered was that the actual number of goals scored per corner kick (according to the secondary research) was very, very low: around .03. In other words, only 3% of corner kicks lead directly to a goal. If, however, you hit an in-swinger that lands within a box about 6 by 10 meters large right at the PK spot, that number goes up dramatically to almost 40%. So we include corners in the metric, but we only count them as being "worth" about .03 goals.

    Shots on Goal, however, are different. For example, this season in the EPL up through last weekend's games, a SOG has resulted in a Goal 28% of the time. So, we count SOGs in the metric, but each SOG is worth .28 Goals. So one part of our metric is Goals plus Corners plus Shots on Goal. But we include several other statistics as well. And we count them because they are statistical evidence that the team as a whole has successfully put pressure on the opposing team's goal.

    There are lots of things we can't measure, or can measure but the League's don't give us the data, but all the things listed at the website are included in the metric including some measures which I developed on my own, like measuring true "quality chances" a team creates as opposed to just Shots and Shots on Goal. After all, not every successful shot or SOG is as good as every other. And because we don't have the luxury of watching every single game and actually recording SHOTS, SOGS, and REALLY, REALLY GOOD SOGS, we tried to find a statistical way of giving a team credit for creating better chances than others.

    Further, we did even more analysis to see if weighting things in terms of average goals scored was useful for every statistical category. And as a result we had to adjust the model accordingly, in some cases just on a hunch and then rerun the regression to see if the model worked better or not. One example of this is how many goals an away win was worth compared to a home win. We found we were weighting the away wins slightly too much.

    Currently, the model tracks point production in the league with an R-squared of .92. That's really, really good. To quote:

    "R-Squared is a statistical term saying how good one term is at predicting another. If R-Squared is 1.0 then given the value of one term, you can perfectly predict the value of another term. If R-Squared is 0.0, then knowing one term doesn't not help you know the other term at all. More generally, a higher value of R-Squared means that you can better predict one term from another." Click here for more info.

    Our power rankings thus give a team's score in Production per Game, or the number of goals plus chances created by that team per game minus the goals plus chances it allows to be created against it. The Z-score in our ranking tells you how far apart each team's LM number is so you have a sense of what the numbers mean. The measure it uses is "standard deviations." So it essentially tells you how many standard deviations above or below the average the team is in the Luchametric score. So if Chelsea is at +2 in the Z-score and Hull is at -2, it means that Chelsea is four full standard deviations better than Hull: which is a buttload by any measure.

    Finally, our power rankings have had Chelsea at the top for the last few weeks, at least since we started back up after our hiatus. And their win over Man U suggests even further we're not all that far off. Do the numbers tell us everything? No. Are the better than your own eyes? No, but they do help very much in taking a bunch of data and condensing it down so you can use it to add to what eyes already tell you. For example, we argued last week that Liverpool had a very good chance to catch Spurs. After this weekend it looks even worse for Spurs than we predicted.
     
  10. RichardHKirkando

    RichardHKirkando New Member

    Aug 9, 2006
    Madison, WI
    Club:
    Fulham FC
    Nat'l Team:
    United States
    Not only that, but the fact that its just an index means that its even more meaningless to me. WTF is an actim point? What does that mean? In my attempts to build some kind of "uberstat", I always try to put it into some tangible unit, whether that's goals, wins, or whatever.

    Opta actually does track something called "assist attempts", which I think is exactly what you're looking for. This stat is way more useful IMO than just assists, since the playmaker has very little to do with whether or not the player taking the shot scores or not. I usually use something I call "expected assists", which is basically the player's assist attempts multiplied by his team's goal/shot rate.
     
  11. ElChupacabra67

    ElChupacabra67 New Member

    Sep 30, 2009
    Georgia
    Club:
    FC Barcelona
    Nat'l Team:
    United States
    Nice! That does sound exactly like what I'm looking for. It allows us to give credit to a player when he creates a great through ball, or cross, or whatever, that doesn't lead to a G, SHT, SOGA, or A but certainly is an indication of his contribution to the club and his talent compared to other players.
     
  12. ENB Sports

    ENB Sports Member

    Feb 5, 2007
    Actim is calculated from very basic information that is taken from the same info you can get for example from soccernet gamecast.

    In there own words:
    The six calculations which generate the Actim Index are:
    Calculation 1 – Assesses a player's contribution to a winning team, based on points won by the team when he has appeared
    Calculation 2 – Assesses a player's performance in each game by allocating points for actions that positively contribute to a winning performance such as shots, tackles, clearances and saves. It also takes points away from players for negative actions such as yellow/red cards and shots off target
    Calculation 3 – Allocates points based on time on the pitch
    Calculation 4 – Allocates points for goal scorers
    Calculation 5 – Allocates points for assists
    Calculation 6 – Allocates points for clean sheets

    It comes from the old Opta score that Opta, an insurrance company, through planetfootball and owned by SKY, was paid to calculate. The EPL got angry that a outside company was making a product off thier league so they created Football Data Co and Actim (basicaly a copy of what Opta was doing). Long story short Planetfootball went bankrupt Opta became a independent company called SportingStatz who concentrate on live scoring oppose to statistics now (although they do continue to record statistics)

    When I was offered a job by Stats Inc. in 2000 they gave me Opta's work (Stats was owned by News Digital, who are SKY in the UK and owned Opta work at the time) and asked me to asses it including the Opta Score. At the time I wrote an eight page document explaining the issues with the Opta Score and didn't think it reflected much or was valuable.

    Truth is I think the whole idea of a score or a player rating system is a waste of time for example we don't count passes or build up in play for basketball or ice hockey other then the final output. The key of making statistics in soccer popular is making them basic so everybody can grasp them.

    The work that I continue to produce the most indepth statistics i look at are minutes per game, goals per 90 minutes, assists per 90 minutes, shots per 90 minutes and fouls per 90 minutes. I then take the average of that league a compare each individual player to the league average.

    In the next month or so I will be launching the site www.eplinfo.com which will include all my statistical work for EPL since 1992-1993. When I do I hope all you QA it and give me ideas how to make it better as I'm impressed with your enthusiasm for the subject.
     
  13. RichardHKirkando

    RichardHKirkando New Member

    Aug 9, 2006
    Madison, WI
    Club:
    Fulham FC
    Nat'l Team:
    United States
    I agree that this isn't worth much. Here are my issues with the above:

    1 - Wins and losses on an individual level are as bad, if not worse, than wins and losses for pitchers in baseball. Each individual player is 1/22 responsible for wins and losses. If we're going to use a team-specific number to evaluate individuals, it should be goals (scored and allowed).
    2 - this is all useful, but it doesn't really tell me anything without knowing how much is allocated for what.
    3 - Doesn't tell me anything about how good a player is, only how good his manager thinks he is compared to his backup.
    4 - goal scorers do deserve credit. but that shouldn't be the only meaningful stat we're using to calculate this.
    5 - I don't think assists are a very good stat, it relies way too much on luck. Assist attempts are better, but even that isn't perfect (as ElChupacabra67 pointed out above).
    6 - Should a goalkeeper who keeps a clean sheet while facing 0 shots on target get more credit than one who makes 9 saves on 10 shots faced in a game? While I do feel that goalkeepers are more important than other individual players, it certainly isn't more than the sum of the other 10 men. Clean sheets tell you more about a team's outfield defence than their goalkeeper.

    That all said, I don't think an all-encompassing rating system necessarily has to be pointless or a waste of time either. Problem is that most of the ones that have been done so far are pretty useless. Arbitrarily adding up a bunch of random numbers that don't even mean anything on their own isn't the way to get it done.
     
  14. ENB Sports

    ENB Sports Member

    Feb 5, 2007
    There is an American named Carl Hammond based out of California who i had some conversation with who produced a rating system based off of the Euro in 2004.

    He was a nice guy and had some interesting theories (he was stats university prof. by trade) although he really didn't take into the account the amount of calculated information needed to create any true value in his work.

    One of his ideas was to the breakdown each player by position and rate them totally differently and not only defender, midfielder, attacker but Left Back, Left Winger, Second Striker and so on. He then recorded everything you could think of and I think is goal was to find an average and base the players rating off that.

    If I remember correctly to do this he had an individual tabulator watching each player because he took into account "off the ball" movement as well. His plan was to do something for the World Cup in 2006 and get some media sponsor to pay for it in which I told him it be fairly unrealistic because there isn't a huge need to pay for that type of information and thats the last time I heard from him or his project.

    In the development of my work what I did is look at Bill James and what he did/done for baseball and like me he intially did the work as a hobby and on no budget so the data he tabulated was very basic. As time progressed and there has been money for his projects he and sabermertrics as a whole has dug deeper but for the most part there source data is quite basic. I actually work for a North American sports information company creating products from tabulated data and you be surprised how basic the data is for example I tabulate more catagories for my independent soccer data then they record for baseball.

    The fact is sports statistics does not give the full story of what actually happend in the game for any sport although for most North American sports, statistics has created a business of its own.

    The sad thing about soccer is hardly anybody can tell you who has the most career goals, headers, free kick goals, assists, shots, yellow cards, red cards..........for any soccer league in the World. Which is the reason I started my work but I still haven't figured is it because the information isn't available or is it because no one cares.
     
  15. ATM1963

    ATM1963 New Member

    May 31, 2010
    St Louis
    Club:
    San Jose Earthquakes
    Nat'l Team:
    United States
    I think a very important stat would be what percentage of the time a team has possession of the ball inside their opponents penalty/goal box.
     
  16. ATM1963

    ATM1963 New Member

    May 31, 2010
    St Louis
    Club:
    San Jose Earthquakes
    Nat'l Team:
    United States
    I wonder if anybody has calculated any stadium studies in determining some sort of ballpark effect as has been done in basball.

    I know different stadiums have different effects to how players are tired depending on the elevation as well as the ball traveling further, minor effects from the field have slight variations as slanted fields, wind effects, games played during the night or day.

    While I am not sure if a study of this can be effectively calculated, is anybody aware of this being done?
     
  17. VIII

    VIII New Member

    May 28, 2004
    Re: Sabermatics applying to Soccer

    MPruitt I think you'll find iSoccer.org and The National Standards Project very interesting.

    There are four generally accepted components to evaluating players: technical, tactical, mental, physical. My company (iSoccer.org, former Stanford soccer players with a mix of coaching and internet startup experience) spent the past 2 years testing and tweaking a technical assessment that quantifies technical ability for youth players and a technology platform to allow players and coaches to use the assessment and track their improvement.

    It is not specifically intended to be used for player identification but who knows what will happen after years of compiling standardized data and tracking player development. The primary reason we're doing this include motivating players to focus more on technical development and shifting the youth soccer focus (parents and coaches) away from winning/losing and scoring goals at young ages by providing tangible metrics around skill development. The iSoccer assessment doesn't assess their tactical awareness or mental toughness, it's not supposed to, it's not a silver bullet. But it does do a good job of differentiating between the technical ability of youth players and it does motivate them to improve their scores by working on their technique at home. Make a game out of technical training, teach them to set goals, and players will work harder and have more fun getting better.

    In order to promote this we are giving away the assessment tools for free to players and coaches. We have also launched a research study called the "National Standards Project" to measure the current baseline of US Youth Soccer across the country at every age and competition level. We just announced partnerships with the NSCAA, US Club Soccer, the Urban Soccer Collaborative, and three US Youth Soccer State Associations with many more partnerships in the works.

    If you're involved with youth soccer as a player or coach, go here to help set the standard.

    http://www.TheStandardsProject.org
     
  18. Juan C. Gimeno

    Juan C. Gimeno New Member

    Jun 22, 2010
    Buenos Aires
    Club:
    --other--
    Nat'l Team:
    Argentina
    Re: Sabermatics applying to Soccer

    Hi,

    just to add some other tip, when we are analyzing pro players (we work in Argentina, www.videosfutbol.com.ar) we analyze technical, tactical, mental, physical and social too.
    When you are scouting players it´s important to know:
    ÿ Family situation
    ÿ Labor situation
    ÿ Integration to the club
    ÿ Integration to the city
    ÿ Relation with the environment
    ÿ Respect of hierarchy

    Just think about French-gate in Southafrica and how important could be Integration and respect of hierarchy.
    Don´t hesitate to communicate with me if you think I can help you
    Regards
    Juan Carlos Gimeno
     
  19. TimB4Last

    TimB4Last Member+

    May 5, 2006
    Dystopia
    http://amaral.chem-eng.northwestern.edu/news/2010/jun/16/footballer-rating/

    http://www.plosone.org/article/info:doi/10.1371/journal.pone.0010937

    Authors: Jordi Duch, Joshua S. Waitzman, Luís A. Nunes Amaral
     
  20. garyttb

    garyttb New Member

    Feb 22, 2011
    Club:
    Liverpool FC
  21. garyttb

    garyttb New Member

    Feb 22, 2011
    Club:
    Liverpool FC
  22. el tigre 1

    el tigre 1 New Member

    Mar 11, 2011
    Club:
    Hull City AFC
    Hi everybody,
    Great posts. I am very interested in this area. I currently work for a club in England and would be willing to share some statistical information in the search for some relevant statistical indicators for the team and individuals. I have plenty of ideas on how to rate individual performances but am not an economist and unsure of things like regression analysis.
    Anybody interested?
     
  23. TimB4Last

    TimB4Last Member+

    May 5, 2006
    Dystopia
  24. Jjerg

    Jjerg New Member

    Jun 21, 2011
    Club:
    Celtic FC
    I haven't read every one of the 300+ posts so if I repeat please excuse.
    Sabermetrics were developed to obtain value from an event that has yet to be defined as valuable (runs created) or to better value an event that is misdefined( avg to ops). This system is perfect for soccer because a positive event is not easily recognized. A stat like 'goals created' or 'positive moves' would be perfect for a soccer sabermetrician. The biggest obstacle is the lack of respect the stat gets in the football community. This is evident by the lack of importance the assist receives in most leagues.
    Other huge obstacles are the English attitude. The English dominate the sport in terms of broadcasting. The English public tend to believe the strong and passionate are the best in football. This is reflected in the dismissive attitude toward serious numeric evaluation by the general public.
    Last, the top amateur football analysists are not centralized. Most baseball guys were in the US when the statistical revolution started. Us soccer guys are all over the world. We all may have good ideas but it is difficult for us to gather and discuss and test our ideas in a serious manner.
    This discussion is a great one and I hope we start posting our theories and thoughts. Football is so far behind in statistical analysis. Its up to us to bring it into the 20the let alone the 21St century.

    Jj

    By the way, I'm trying to find home and away tables for 2004 serie c in Brazil. If anyone can direct me towards a site I would 4ever appreciate you.
     

Share This Page