Sabermetrics applying to Soccer

Discussion in 'Statistics and Analysis' started by mpruitt, Jul 30, 2003.

  1. Real Ray

    Real Ray Member

    May 1, 2000
    Cincinnati, OH
    Club:
    Real Madrid
    Nat'l Team:
    United States
    A "chance" (part of the venacular, really, as in, "United need to take their chances...") is simply a play that results in or by any resonable measure, should result in an attempt on goal.

    It's really one of the fundamental points of the game you here stressed in every match by players and coaches-which is why I think it serves as a basic starting point for a statistic-and one that by and large people could come to a reasonable conclusion on in the review of a match.

    You look at player like Zidane and see if there is a direct correlation between the number of touches he gets and the number of chances for Real Madrid-is there a number? Is it related to where he gets the ball? To where other players play? Whether he pushes forward into the attack after his pass or hangs back? The players up top?

    And I would add it's not as subjective or random as you make it out to be. In fact it's no different than looking at an NFL offense: you can eliminate touchdowns or long gains that resulted from broken plays or slips by the opposing defense, and isolate on plays that worked due to the proper read or execution. The same in soccer. You can breakdown the tapes so that you eliminate what are chances or actual goals created by fluke plays of slips, and look to see what combinations or style of play provide you with the highest probablity of success.

    An interesting case I would loved to have looked at would have been David Ginola when he was at Spurs with George Graham. There is a funny story where after a match, Brian Clough noted that, "the young man from France crosses the ball beautifully." Graham replied, "but the trouble with Ginola is that he wants the ball all of the time, " which caused Clough to come back with, "well, in that case, give him the bloody thing." Perhaps Graham was right, the Ginola was a self-indulgent, flair player. But what if his bias towards these type of players was blinding him from what could be proven statistically? That Spurs actually had more chances the more touches Ginola got-that like some NFL backs, you have to give them 30+ carries to see the best results?
     
  2. beineke

    beineke New Member

    Sep 13, 2000
    Re: Sabermatics applying to Soccer

    Originally posted by joe2
    In soccer, unlike baseball or even basketball, any given individual player is more dependent on the skills of his teammates.

    Perhaps, but the point guard only gets the assist if his teammate gets into position for the pass, catches it, and makes the shot. That's not too different from soccer. In football, if a quarterback has bad numbers, it could be his own fault, or it could be dropped passes, or poorly executed routes, or missed blocking assignments, or lousy play-calling. Even so, the stats are useful ... they just require a bit of interpretation.
     
  3. beineke

    beineke New Member

    Sep 13, 2000
    This sounds like exactly the kind of question that Maxim has been talking about: is "selfishness" bad?

    One possible way to approach it would be to pick a player like Justin Mapp or Eddie Lewis, and to track what happens whenever he receives the ball in the middle third of the field under loose defending. Some of the time, he'll take off on the dribble; other times, he'll make a simple pass. How many scoring chances result from either decision? How many goals? How many counterattacks for the opponent?

    If you have a one-in-three chance of beating a guy on the dribble, when is it worth taking him on? As of today, we have very little basis for answering that question.
     
  4. microbrew

    microbrew New Member

    Jun 29, 2002
    NJ
    It just occured to me- one of the strongest correlating factors for a team winning doesn't really apply to MLS. That is, in most leagues, money buys success. That increase the value of evaluating talenting as well as coaching.
     
  5. beineke

    beineke New Member

    Sep 13, 2000
    In this respect, it's fairly similar to college sports, where coaches are kings.
     
  6. joe2

    joe2 New Member

    Oct 14, 2001
    Sabermatics applying to Soccer

    Thanks for starting to clarify "chance" as a discrete event. A play that could reasonably result in an attempt on goal. This is an entirely offensive statistic, but that is okay as long as we recognize that fact. Now, when does a "chance" begin ? If a keeper makes a wonderful kick past midfield to an attacking player is that a point in his "chance" statistic ? If a midfielder makes a short pass to an attacking player and runs into space but is not given a return pass because the attacking player misplays the ball does the midfielder's run count toward his "chance" statistic ? A good pass by the attacker would have lead to a possible shot on goal. In fact, because soccer is made up a a fluid action every good pass should be considered a plus for the passer's "chance" statistic because it could reasonably lead to a situation where a player could get a shot on goal. In fact, dribbling the ball a few yards could open space leading to a "chance" if all the other players moved to space, etc. In fact, passing the ball back to a defender for the purpose of changing fields can certainly be considered a positive "chance" statistic as the passer is moving the ball to an area that could reasonably lead to a better chance for a shot on goal. In this way soccer is considerably different from Football, baseball, basketball. Let me pass the ball to Michael Jordan and I will become statistically one of the best assist men in the league. That is why statistics in all sports are fun but basically non-predictive.
    You suggest that slips, and broken plays be eliminated from a statitical analysis. Why ? They are an integral part of the game. Slips often occur because a superior opponent has faked out a defender. Broken plays occur because a superior defender has busted through an offensive lineman. If a midfielder fakes out a defender and causes him to loses his balance would that not be part of your positive "chance" statistic. Actually, to fake out an opponent in just about any sport is an integral part of success, isn't it ? It is what makes plays work. Throwing a fastball when the batter is looking for a curve. Driving to the hoop then stopping for a fadeawy jumper. Faking a dive play into the line then throwing the bomb.
    There is also the problem of the difference in playing styles depending on the time and score of the match. A soccer team with a 2 goal lead will often let up on the attack and play to keep the ball and use time. Statistically they are losing opportunities for good scoring chances. On the other hand, a team that is behind often increases their scoring chances by attacking. While they may look statistically superior using "chance" as a primary category, they are opening themselves up for counterattacks and playing poor defense. To use a football example: A few yaers ago the Buffalo Bills had awful defensive statistics but were a great defensive squad. But Buffalo played a hurry up offense so their defense was on the field much more than average. In fact, often better American football teams give up a lot of yards to passing. Why ? Because they have a lead and the opponent is trying to catch up by passing the ball.
    I think "chance" as you are discussing it in soccer is a very hazy term and needs to be much more clearly defined to be useful. I am not saying statistical analysis is impossible but I still don't see any clearcut meaningful individual categories in soccer. Unless you develop statistics only for the entire team, maybe that would work better than trying to single out individual players for analysis. But we have that statistic already, it is called the final score !
     
  7. mpruitt

    mpruitt Member

    Feb 11, 2002
    E. Somerville
    Club:
    New England Revolution
    I think "chance" would still be a little bit too subjective. One piece of data gathering that could be usefull would be to mark off the feild in an imaginary grid, mutch as the same way as you see in the matchanalysis reports, that way you could get some sort of concrete number as to when player X shoots from this space on the pitch he has a statistically better chance or has shown statistically more accurate.
     
  8. mpruitt

    mpruitt Member

    Feb 11, 2002
    E. Somerville
    Club:
    New England Revolution
    Does anyone happen to know if the Michael Lewis of Moneyball is the same Michael Lews that writes on soccer for the NY Post?
     
  9. TomEaton

    TomEaton Member

    Mar 5, 2000
    Champaign, IL
    For anybody who's thinking they might like to try to do some MLS statistics interpretation, just be prepared for frustration. Peter Hirdt's latest column on MLSnet asserted that there was an incredibly large correlation between--get this--goalkeeper catches/punches and winning. Yeah, I know, my first reaction to this was the same as yours: that just can't be true. But he did back up his assertions with some supporting data.

    I wanted to do some independent research on this to see if the correlation really was as strong as he contended. I don't think he falsified the data; I just think he might not have been analyzing the entire issue. His company, Elias Sports Bureau (remember them?), clearly has an interest in proving the catch/punch stat to be useful.

    Then I remembered that game-by-game catch/punch statistics aren't included in MLS game summaries; the only catch/punch stats available on MLSnet are season-ending and career numbers for each goalkeeper. Hirdt's contentions focused on single-game totals. So I e-mailed Hirdt, asking where I could get the numbers.

    He responded but told me that the numbers aren't published anywhere. He said if I told him what kind of research I wanted performed, he'd see if he could do it for me.

    See the problem? Basically, we have to take his word for it, because Elias is the only place that has access to the statistics. I was somewhat surprised he didn't offer to sell me the data, but he was probably afraid that then I'd release them publicly (which, in the absence of a contractual agreement not to, I would).

    The only alternative I can see would be to go through game tapes one by one, counting up catches/punches until you had enough to do some useful comparisons. But, first, that would be an enormous amount of work, and second, THAT'S WHAT YOUR STATISTICS PEOPLE ARE FOR. The data have already been collected. By Elias. And now they won't let anybody use them. What a waste.
     
  10. mpruitt

    mpruitt Member

    Feb 11, 2002
    E. Somerville
    Club:
    New England Revolution
    As I think I've said before that Bill James was quoted in Moneyball as expressing some of the exact same frustrations with Elias, that no statistics were ever given to anyone without $$$.

    Going through game by game to chart whatever would be laborious at best. My insistance would be to first see if there's any other avenue in which to find more statistical information. Has anyone else been tracking this stuff, could MLS or individual teams provided it, would college teams willingly post it or give it to your average soccer fan? The college route might be one place to inquire, as any worthwhile analysis, beyond gaining a greater knowledge of the game might focus well on undiscovered players. Mabye you'd find after looking into it through statistical analysis that the NCAA isn't such a bad breeding ground for players.

    I had emailed the guy from Matchanalysis and he mentioned to me that the match reports while relatively well recieved by teams haven't been as a usefull selling point as some of their video analysis. Maybe they'd be even willing to give out the stuff that they already have beyond what's on their website?
     
  11. kenntomasch

    kenntomasch Member+

    Sep 2, 1999
    Out West
    Club:
    FC Tampa Bay Rowdies
    Nat'l Team:
    United States
    The other thing you could do, Tom, is check MLSnet.com after a team's game, when the league has updated the stats. Every goalkeeper's catch/punch stats are there. If you know that Zach Thornton had 42 one week, and the next week he had 44, you know he had two in the previous game.

    Still laborious, but not as bad as the alternatives.
     
  12. joe2

    joe2 New Member

    Oct 14, 2001
     
  13. TomEaton

    TomEaton Member

    Mar 5, 2000
    Champaign, IL
    Good idea, Kenn. I didn't realize they updated the season statistics every week. I personally am not interested in trying to keep up with the stats every week, but maybe someone more dedicated could do it.

    Joe, I agree with you to some extent, and your proffered explanation of why catches/punches might be greater for the winning side is the first one that struck my mind as well. I also understand (as I think everybody who's bothered to follow this thread this far does) that correlation is not the same as causation. The point, though, is that we don't KNOW. We might be able to get some evidence one way or the other if we could, say, tally up the number of catches/punches after the winning team had taken the lead as compared to when the game was tied. If the ratio increased after the lead had been taken, that would lend support to your theory. If it didn't, then it would refute that idea. Without any evidence either way, all we're doing is guessing.

    By the way, I e-mailed Peter Hirdt again to ask him whether it was possible to buy the raw statistical data from Elias, and how much it would cost. He replied curtly that the raw statistics are neither for distribution nor for sale, which makes me wonder why they're bothering to collect them.
     
  14. mpruitt

    mpruitt Member

    Feb 11, 2002
    E. Somerville
    Club:
    New England Revolution
    Instead of worrying about how more statistical analysis wouldn't work with soccer. Why not try to have a less traditional deconstructionist view. There would probably be no way to accurately and objectively define what a 'chance' is but that's too narrow. Right now either way, people make statements that a certain person is good at taking and scoring on all of his chances and that's completely subjective, absolutely entirely an oppinion. The idea is to try to find ways where we could objectify at least some of these things. If I were going to try and see how accurate a teams scoring opportunities are. I'd first start with, a teams goals scored average, overall team possesion in the first third, then go to individual players' possesion vs. dispossesion rate, compare that with a team's defensive goals against average, overall, then their defending in the final third, individual player's defending, and their defending on particular segments in the feild, then try to look at where shots were taken from specific segments on the feild from which the other team has scored. That's at least one way very quickly in my mind that you could begin to generate some kind of objective knowledge as to what kind of team is taking and scoring on enough of their chances.

    Furthermore, the idea of trying to quantify a chance is a bit silly anyways. That's like trying to quantify what baseball player is 'clutch.'
     
  15. microbrew

    microbrew New Member

    Jun 29, 2002
    NJ
    Perhaps a better way to think about this: ask a question, then crunch the math, then refine the question, and so on.

    In this case: what effect does a goal keepers ratio of punchs and catches have on the outcome of the game?

    It's not enough just to prove correlation though- some kind of explanation is needed, along with stats backing up the explanation, and perhaps most importantly, a way of testing the explanation.

    A question I might have: What effect does a yellow card have on the carded player's ratio of attempted tackles to tackles won? I'll keep refining that question, and ask other questions until I build an answer to how a yellow card affects a player's play.
     
  16. kenntomasch

    kenntomasch Member+

    Sep 2, 1999
    Out West
    Club:
    FC Tampa Bay Rowdies
    Nat'l Team:
    United States
    That's the way to do it. Don't start with the numbers, start with the question. I always like to look at things people say (Like "A 2-0 lead is the most dangerous lead in soccer") and then see what the evidence is that would support that or tear it down (turns out you win outright 90% of the time when you go up 2-0, and at home you're almost a mortal lock).

    Is there any way to support the things that people say are true or just accept as true? That's what I look for.
     
  17. joe2

    joe2 New Member

    Oct 14, 2001
     
  18. joe2

    joe2 New Member

    Oct 14, 2001
    MAXIM...I am in some agreement with your idea. There are some things which can be "objectively" counted. I disagree that a person's perception is only his opinion, however. Opinions and analysis are (or can be) based on years of observation and experience. I can tell if a player is good at making space, taking shots, etc. and so can you, based on your experience in observing soccer. Now, if I was to try to evaluate the performance of a judo expert, that would be completely subjective since I have little or no experience with the sport. But you are right, at the present time we do not attempt to quantify that knowledge statistically. That does not mean the knowledge does not exist or those non-counted observations are not valid.

    I also agree with you that trying to quantify "chance" is like trying to quantify "clutch" in baseball. But that does not mean "clutch" is not important...perhaps more important than any other aspect of a player's performance.

    I am not arguing for the sake of argument but only because I have a great deal of trouble trying to figure out how many important aspects of a team's performance could be objectively observed and quantified (in soccer). That is why I think it is extremely important to be precise about terminology.

    As an aside, in the A-League statisticians will only allow one assist per goal, whereas in the past two assists could be awarded. Two assists more accurately reflects the importance of team play in setting up a goal. As a result of that stat change many fine defenders and midfielders are no longer being credited with the great pass that lead to a second touch that lead to a goal. The assist and goal stats are easily recorded and identified but have, in fact, lost some of their meaning because of the change in recording that particular stat. I have no ide why they made the change.
     
  19. kenntomasch

    kenntomasch Member+

    Sep 2, 1999
    Out West
    Club:
    FC Tampa Bay Rowdies
    Nat'l Team:
    United States
    When I was in the A-League years ago they discontinued the two assists. In D3 even before that.
     
  20. voros

    voros Member

    Jun 7, 2002
    Parts Unknown
    Nat'l Team:
    United States
    I suppose I'm obligated to reply to this thread, though I'm not sure how I should reply.

    I'm guessing I'm someone who is the pivot point on this issue, so I'll give my bio and then my thoughts.

    My name is Voros McCracken and I'm one of those Sabermetrics guys out there (though I despise that term). I got a few pages in Moneyball, have written for Baseball Prospectus, Primer, etc., and currently consult for the Boston Red Sox along with Bill James. I basically started posting on Big Soccer because after being hired by the Red Sox, my hobby of posting on baseball message boards and such got severely restricted (for obvious reasons).

    The biggest hurdle to climb in Baseball and by connection Soccer, is that what guys like me do is about statistics. It is NOT about statistics. It is about using reliable methods and information available to analyze the game of baseball (and soccer). The idea is to subject the sport to the same standards of inquiry other endeavors like pharmaceutical research, psychology, and other disciplines where the scientific method is employed to learn about critical issues within the discipline...

    ...it just so happens that in baseball, one of the most obviously important and useful areas of exploration happens to be in the HUGE amount of various statistics compiled with regards to the game in the last 100 years. If I want to find out if a pitcher's hits per balls in play tends to correlate well from year to year, I can go back into this database and develop a study to give some ideas as to the answer. Unspeakably useful things are statistics in baseball, as they allow you to skirt around various subjective and unreliable methods of analysis.

    But that still doesn't mean the study itself is about statistics. Statistics are the tool, not the end result. The end result is learning things you previously didn't know.

    As far as applying this to soccer, here are some basics:

    1) There are some basics that initially need to be dealt with:
    a) Games played as measure of opportunity is inherently inferior to minutes played for the same purpose. Treating someone who comes in as a sub in the 83rd minute as having equal opportunity as someone who plays the full 90 is silly.
    b) Accumulating points is the object of every game. The discussion about what constitutes win percentage is moot, as that doesn't count in the standings, points do. So when we start to talk about qualitative stats and evaluating players, it needs to be done so on the basis of what the player does to achieve his team receiving as many points per game as possible.
    c) Some information is always preferrable to no information. People sometimes make the mistake of assuming if you can't know everything, than there's no point in knowing anything. Just because goals per 90 minutes is not all you can judge strikers by, that doesn't mean the statistic has no meaning or importance. Certainly that stat tells you something.
    d) Descriptive data tends to be better than qualitative data, and objective data tends to be better than subjective data. Not that subjective data is useless or wrong, only that it has the unfortunate tendency to switch scales based on whose doing the compiling.
    2) Measuring "chances" is a bad idea as someone else mentioned. What the hell is a "chance?" And is there any possible way to get a consistent interpretation when using different people to make such distinctions.
    3) Along the same lines, simply compiling things like shots, touches, passes, goals (and derivates like plus/minus), headers, saves, free kicks, penalty kicks and so forth have value even if many of those things turn out to have little value in player evaluation (which, from above, has to do with the relationship between the data and how it contributes to points earned by the team). While not completely objective, you can likely get fairly consistent interpretations across a spectrum of official scorers.
    4) Peer review has its place, even if it isn't of the structured type you see in scholarly journals and such. At one point this year I had compiled a long diatribe about what was wrong with one of Hirdt's analyses (I think it was the one on the importance of goals depending on when they were scored) but I eventually figured no one would care, so I didn't. The point is that Hirdt can't be trusted to be right on something and neither can I or anybody else. Their work needs to be redone, and flaws and holes in it need to be examined.
    5) The all-encompassing single statistic that rates players is fool's gold. It is in baseball and it would be in soccer. Statistics that tell us something about _how_ the player performs are much more valuable than ones which try and tell you _how well_. The former has all sorts of uses, including, but not limited to, the latter.
    6) Statistics that analyze team performance will be much easier in soccer, and of value, but I think much less valuable and interesting than looking at the individual players themselves.
    7) An understanding of the differences between results, statistics, performance, ability and potential ability are all important. The plus/minus stat is a perfect example. It doesn't matter if a team has a much better goal differential when a particular player is on the field, if this difference has little to do with the individual performance and abilities of that player. We cannot _assume_ that it does, and would have to examine the issue further.
    8) Refine, rework and redo. Do this a lot.
    9) Finally, the reason why this sort of thing is helpful is the inherent fallibility of human reasoning. The human mind is an extraordinary thing, but it is its very strengths that often lead to the logical breakdowns we all suffer from. Information that is true because "everybody knows it" is not fact. It is speculation and needs to be evaluated as such. I cannot recommend more highly a book called "How We Know What Isn't So" by Thomas Gilovich which explores why "the only thing infinite is our capacity for self-deception."

    Long post, but I figured I'd make it.
     
  21. Real Ray

    Real Ray Member

    May 1, 2000
    Cincinnati, OH
    Club:
    Real Madrid
    Nat'l Team:
    United States
    Well, I guess it's my stat, so... :)

    Yes, I agree there is subjectivity re: what one views as a legitmate chance-but not to the degree you could not get a consensus IMO-not withstanding the occasional dispute that you see other sports. This idea that you could not agree-I think if you polled a group of coaches or asked them to each watch a match alone and mark on a sheet what they thought were the chances in a match, it would be pretty damn close. And if using voros' view that
    than one way or another, we are going to meet at this intersection called a "chance." You can argue about when it begins or the scoring of a particular match, but I don't see how you avoid it as a starting point for your basic stat-or at least coming to agreement on what such a play is.

    In terms of a "chance" here is a clip I pulled from the 1982 World Cup Of Paolo Rossi-what anyone would call a "chance."
    http://www.geocities.com/castmind/rossi.html

    As far as joe's question as to when a chance begins, I would view it as the point when the action manifests itself into an attempt on goal. I would of course use the assist stat and as maxim noted, you would have to break the field down into thirds, which will clarify the "when" question better.

    To take Rossi's game further, you could then begin the process of breaking his match down with categories like:
    Inside the area
    Total chances:
    Goals:
    From Passes Outside the area:
    From Passes Inside The Area
    From Corner Kicks
    From Throw In
    From Individual Runs Into The Area
    Rebound
    Left Foot
    Right Foot
    Headers
    etc., etc,...


    Each of these and all other categories would then be linked to specific player(s) involved in the chance. You would then work backwards towards the furthest indentifiable point of inception for that specific chance-a throw-in, a goal kick, or an intercepted pass, say.

    So in the the case of the clip I posted you could provide a basic scoring as follows:
    Brasil vs Italy 1982 WC
    Chance #9
    Player: Paolo Rossi (Italy)
    Inside the area
    From pass inside the area (Graziani)
    Shot: Left foot
    Result:Miss


    An example of a his first score of the match would be
    Brasil vs Italy 1982 WC
    Chance #4
    Player: Paolo Rossi (Italy)
    Inside the area
    From pass outside the area (Cabrini)
    Shot: Header
    Result: Goal


    With Rossi's goal, you would also have a statistical trail that starts with Conte gaining posession at midfield, then his pass to Cabrini out on the left. This links these two players to Rossi chance/goal and provide a larger context to place the chance. But the actual chance is Cabrini's center into Rossi.

    So essentially what you're doing is going backwards from scoring chances, breaking down the play to its furthest point of identifiable inception, and then defining each action with some form of scoring notation.
     
  22. Sachin

    Sachin New Member

    Jan 14, 2000
    La Norte
    Club:
    DC United
    So basically, you're invoking Schrodinger's Cat and the Heisenberg Uncertainty Principle to determine when a chance stars.

    Sachin
     
  23. mpruitt

    mpruitt Member

    Feb 11, 2002
    E. Somerville
    Club:
    New England Revolution
    For those who don't know, Voros McCracken is Nekcarccm Sorov spelled backwards. (BTW, what the heck was that line in the book about.) I had wanted to find out if there were any people who had worked on these type of things in baseball that also had an interest in soccer. It's awesome to know that there are.

    For also those who don't know, in moneyball our fellow bigsoccer poster here was written up to quite some acclaim. And if he and his friends give the Red Sox the players to win a World Series, then I suppose the name of my first born will be Voros.
     
  24. Real Ray

    Real Ray Member

    May 1, 2000
    Cincinnati, OH
    Club:
    Real Madrid
    Nat'l Team:
    United States
    Well considering the definition
    I suppose I am.
     
  25. kenntomasch

    kenntomasch Member+

    Sep 2, 1999
    Out West
    Club:
    FC Tampa Bay Rowdies
    Nat'l Team:
    United States
    voros has it down. He's appointed the leader of the group. Can I get a second?
     

Share This Page