Sabermetrics applying to Soccer

Discussion in 'Statistics and Analysis' started by mpruitt, Jul 30, 2003.

  1. JG

    JG Member+

    Jun 27, 1999
    Depends on your definition of quick.

    It probably does...I counted a goal or two that came from crosses after short corners.
     
  2. Real Ray

    Real Ray Member

    May 1, 2000
    Cincinnati, OH
    Club:
    Real Madrid
    Nat'l Team:
    United States
    Because unlike baseball, stats like "shots on goal," in soccer are too empty; they are poor in value compared to many of baseball's basic stats. IMO a sport like soccer needs for its stats to be placed in a more defined context in order provide a deeper understanding. I think there as to be an element of subjectivity in a stat like "shots on goal," as shots vary to a wide degree in soccer.

    What I've done to illustrate this, is scored the first 20 minutes of the US-Port match in WC 2002.
    http://www.geocities.com/castmind/usa02.html
    (It a geocities site, so you may get the try again message.)

    I did not score passes, corners, saves, etc., but obviously, they would be included in a total breakdown of the game. I only scored what I describe as, "Attempts On Goal." These are broken down into two categories:

    1. "Chances": A play in the offensive third, that results in or by any resonable measure, should result in a shot on goal.

    2. "Shot": Attempts on goal that fall outside the defintion of a chance-an attempt to chip a keeper or free kicks/shots deep in or beyond the offensive third, for instance.

    In the first 9:24 of the match, there were 6 Attempts On Goal. 2 were Chances (McBride, O' Brien) 4 were Shots, (Stewart, Pope, Mc Bride, and Donovan).

    I also included McBride's goal, just to flesh out the stats, but did not score all the way through to the 36th min.

    My wish is for something like this to be availabe on cd-rom for each team-"MLS Abstract: MetroStars 2003." Each game would have a page like the front page, with the lineup and a link to each player's game summary page, detailing that all of his stats, including video clips. At the end would be the final numbers crunch, with stats like, "Chance Conversion Rate," and some type of author/expert summary of the data.

    Not very likely of course, but I might to it just for the hell it during the next WCQ. Tom is right though, it's pretty difficult and painstaking.
     
  3. TomEaton

    TomEaton Member

    Mar 5, 2000
    Champaign, IL
    I think you misunderstood me, Ray; what I meant was that studying something like the number of "chances" converted is going to be difficult because different observers will disagree as to what was a chance and what wasn't, whereas everyone agrees what a corner kick is. As you point out, even something like what constitutes a "shot" can be the subject of disagreement. I personally doubt that any definition of a "chance" will eliminate the subjectivity factor to a degree great enough to come to any reliable conclusions. That said, I invite people to try.
     
  4. NGV

    NGV Member+

    Sep 14, 1999
    Whether different people's observations would be close and how close they are is crucial, and that can (and should) be tested. So if it turns out your observers agree 90% of the time, an analysis based on subjective coding of plays as "chances" or "non-chances" may be useful. If they agree 20% of the time, it probably isn't worth the effort. In social sciences, this is called "intercoder reliability."

    That kind of study - quantitative study based on subjective "coding" of events or things - isn't uncommon in political science, for instance. An example would be ratings of countries' level of "democracy" or other political regime characteristics, which have become somewhat widely used. Obviously, we can't objectively prove that Malaysia or whoever scores a "4" on civil liberties. But if we ask ten different experts to rate the country on a a five point scale using the most clear and consistent criteria we can come up with, and nine of them say "4" while the tenth says "3," chances are our measurement isn't too bad, and using it try to determine if a country's degree of protection of civil liberties correlates with other things (say, levels of economic growth) may be worthwhile.

    So, while these types of measures are inferior to ones where there's no real chance of disagreement among observers, they aren't inherently useless, and are sometimes a lot better than nothing. The question is, how much additional insight do we gain by using "chances" instead of "goals," and is it enough to compensate for the added subjectivity and the problems that go along with it?
     
  5. Real Ray

    Real Ray Member

    May 1, 2000
    Cincinnati, OH
    Club:
    Real Madrid
    Nat'l Team:
    United States
    Yeah.

    The one point that Tom makes that I think you need, is the raw collection of data-an index that the scorer(s) could have handy.

    For instance, of the 6 Attempts On Goals (there were 3 not 2 chances, my typo) the one I had the hardest scoring was Stewart's free kick that created rebound for Pope. If it was Beckham, I would have scored it a chance, knowing Beckham's quality. But not having seen enough of Stewart and not having any hard data to show percentages for free kicks, I went just on my own view of the play and scored it as a shot. But I could be wrong and the data might show that this play for Stewart has a high enough percentage to note it as a chance.

    It's a fun exercise though...well, if you enjoy this sort of thing :)
     
  6. NER_MCFC

    NER_MCFC Member

    May 23, 2001
    Cambridge, MA
    Club:
    New England Revolution
    Nat'l Team:
    United States
    One example of the difficulties presented by the relative lack of unequivocably definable discrete events in soccer is playing itself out in this (very interesting) discussion. A goal correlates to a run in baseball or a touchdown in football, a result of a number of discrete events, not a discrete event in itself. Similarly, a chance probably correlates to an at-bat or an offensive series, a collection of discrete events that might or might not result in a score.

    Any thoughts on what the irreducible events in soccer are? There are the obvious ones, corners, throws, goal kicks, free kicks (including IFKs and kick-offs, right?) and penalties. Would everything else be touches or fouls? Or would the obvious ones be sub-categories of touches and fouls? A shot would be a sub-category of touches (and other events); a chance would be made up of a collection of events, and a goal would be the result of some collections of events.
     
  7. NGV

    NGV Member+

    Sep 14, 1999
    The main problem is that human perceptions and memory are unreliable and conditioned by prior biases. In other words, I think we tend to notice and remember things that fit our expectations, and overlook or forget things that don't match those expectations. So, if I'm convinced that Tony Sanneh sucks, I'll notice when he makes a bad pass, and be less likely to remember the good passes he made. If I've accepted the idea that it a two goal lead is "dangerous," I'll remember the few times when a team got complacent while up 2-0 and gave up the lead, and subconsciously disregard the great majority of games in which it didn't happen.

    That's why keeping as objective and complete as possible a record of what happens during games is useful, even if it doesn't lead to clear conclusions about how to evaluate individual players or win games. It's always good to have unbiased information against which we can measure our subjective perceptions, to see if they match.
     
  8. kenntomasch

    kenntomasch Member+

    Sep 2, 1999
    Out West
    Club:
    FC Tampa Bay Rowdies
    Nat'l Team:
    United States
    Re: Re: Sabermetrics applying to Soccer

    And the human mind seeks to draw a connection between discrete events, whether they are actually connected or not.

    I think it's easier to do big-picture things than individual player things sometimes (or, at least the "discrete events" analysis we've been talking about). Like "how often does a team up 2-0 blow the lead?" You can get a fairly reliable answer to that one just by checking because those are quantifiable and reliable numbers (the study I did showed that teams that took a 2-0 lead won outright 90% of the time, which, to me, makes a 2-0 lead almost a "lock", not "the most dangerous lead in soccer").

    Or "would you rather host the first or second leg of a two-legged, aggregate goals tie?" A limited study I did of the Champions League, UEFA Cup and A-League one year showed that a team's chances of advancing to the next round were almost identical whether they hosted the first or second leg of the tie (I've heard someone did a much larger study over a longer time, strictly in European competition, that showed some advantage to hosting the second leg). That's useful information.

    That corner kick information, that's useful. If we knew that 3% figure held up over time, would anyone change their corner kick strategy, knowing that it's a low-percentage play?

    There are questions, and there are answers. I think whoever said it's about the quest for answers to questions, not necessarily the numbers themselves, was spot-on.
     
  9. beineke

    beineke New Member

    Sep 13, 2000
    That video work is slick ...

    For a different perspective on tracking, here is the work that I did on the US-Portugal game. It took less than an hour, and there's gobs of information there. Karl Keller tried to organize a BigSoccer group to chart the entire game, but most people flaked. Still, I'm willing to do more charting, given a reliable group of people to share the work.

    http://www-stat.stanford.edu/~beineke/portwrite.xls

    One thing I just noticed is that midway through this segment, McBride seems to hit the wall. He wins his first three headers (and had been winning headers all game up to this point). Then he loses four straight and fails to control a ball on the ground ... that's why Portugal was able to mount the steady pressure that led to Agoos's own goal.
     
  10. NER_MCFC

    NER_MCFC Member

    May 23, 2001
    Cambridge, MA
    Club:
    New England Revolution
    Nat'l Team:
    United States
    Did you also look at 1-0 and 3-0 games? I have always thought the 'most dangerous lead' opinion was rooted in the fact that goals are relatively rare in soccer, but not so rare that a 1-0 lead seems safe. I would certainly expect a tendency for a team to relax with a 2-0 lead where they wouldn't at 1-0.

    I remember that there was an item in Soccer America recently saying, in effect, that a team that scores a goal will have a winning percentage (counting ties as .500) of somewhere around .660, with the first goal of the game producing a winning %age of over .700. I guess the question should really be, how does a 2-0 lead compare with 1-0 and 3-0. I would expect them to fall along a continuum of 1-0, 2-0 and 3-0, but I wouldn't be surprised if the spacing wasn't even. Given the rarity of goals in soccer, taking a 2-0 lead and not winning should be dramatically rarer than after 1-0 and somewhat less rare than after 3-0. Is it?
     
  11. joe2

    joe2 New Member

    Oct 14, 2001
    NGV....You are saying (only better) what I have been trying to point out. We can quantifiy info in soccer. But we have to recognize the problems in our data collection.A good place to start is as you suggest, around the problem of "chances" and "goals". I think it would be possible to get a reasonable amount of agreement as to what a chance is in the attacking third. And it would be valuable info. We have all seen players who contribute to scoring opportunities without showing up on the stat sheet. this would be a way to remedy that lack of information. What we need next is an operational definition of what a "chance" is. If we can do that we could probably work backward to other defintions of events leading to chances. Anyone want to give us a possible definition of a "chance" that could be observed and agreed on by most people ? Here is one which you can pick apart and refine:
    "Chance" any movement with the ball which leads to an opportunity to score a goal within two touches.
    "Opportunity to score" any touch which if properly executed results in a shot on goal. These are off the top of my head so feel free to criticize and refine.
     
  12. kenntomasch

    kenntomasch Member+

    Sep 2, 1999
    Out West
    Club:
    FC Tampa Bay Rowdies
    Nat'l Team:
    United States
    It's fairly easy to get the value of a 1-0 lead. MLS keeps stats on "Record when scoring first", which, by definition, means taking a 1-0 lead.


    RECORD WHEN SCORING FIRST GOAL
    TEAM......................W..L..T PCT
    San Jose Earthquakes......6..0..0. 1.000
    MetroStars................5..0..1.. .917
    Columbus Crew.............5..0..3.. .813
    Chicago Fire..............7..2..1.. .750
    New England Revolution....5..1..3.. .722
    Los Angeles Galaxy........3..0..4.. .714
    Colorado Rapids...........5..2..1.. .688
    Kansas City Wizards.......6..3..4.. .615
    D.C. United...............3..2..1.. .583
    Dallas Burn...............2..2..1.. .500
    MLS TOTAL................47.12.19.. .724


    Teams that scored the first goal went on to win outright 47 of 78 times (.602) - obviously there are some 0-0 games, in which nobody scores first.

    We all know that goals are precious in soccer, and it stands to reason that if you get one, your chances of winning increase. But you've got a much better chance of coming back from 0-1 down than 0-2 down, statistically. I haven't looked at 3-0, but I'd be stunned if more than 1 in 100 came back from 3 goals down.

    As for the letdown factor, well, maybe it is, maybe it isn't. You'd think that after all these years of people saying there's a tendency to let up with a 2-0 lead, that everyone would know that and not let up.

    Again, trying to draw a connection between discrete events. Maybe you didn't blow a 2-0 lead because you let up, maybe the other team outplayed you. Maybe there was luck involved. Maybe many things.
     
  13. beineke

    beineke New Member

    Sep 13, 2000
    Thanks, JG. Here's a trend that we might watch for the rest of the season:

    After 5 of the first 9 goals were scored by the visiting team, home field advantage seems to have taken hold. Home teams have scored 10 of the last 14 corner kick goals. Two of those four away goals were scored against the Metros while Pope and Jolley were both out.

    Does familiarity with the home conditions make a difference?
     
  14. kenntomasch

    kenntomasch Member+

    Sep 2, 1999
    Out West
    Club:
    FC Tampa Bay Rowdies
    Nat'l Team:
    United States
    I wonder.

    I'd love to break that down in terms of the guy who took the corner kick. Does Ante Razov (who takes most of the Fire's CK's) have a better chance of dropping one in where a teammate can score on it than someone else does? What's a guy's ratio of corner kicks to assists on CK goals? Is there a difference between players, from year-to-year?

    Who scores on corner kicks? Is it taller players, like a Jim Curtin, or crafty players, like a John Spencer? This would be interesting information to have.
     
  15. monster

    monster Member

    Oct 19, 1999
    Hanover, PA
    Club:
    DC United
    Nat'l Team:
    United States
    Great, just what soccer needs - stat geeks. :p Can we get you guys your own forum?

    Your statistically challenged modeator :D
     
  16. kenntomasch

    kenntomasch Member+

    Sep 2, 1999
    Out West
    Club:
    FC Tampa Bay Rowdies
    Nat'l Team:
    United States
    I was about to say you're not quite geeky enough to be posting here. :)

    Hey, if someone wants to give us our own forum, we'd be all for that. Unless someone wants to volunteer to host it in his premium member forum.

    I have a bunch of studies and numbers that are partially done that I've never put anywhere because I couldn't find a good place.
     
  17. superdave

    superdave Member+

    Jul 14, 1999
    VB, VA
    Club:
    DC United
    Nat'l Team:
    United States
    I can't believe it took me so long to join this thread.

    1. Project Scoresheet. About 15 years ago, some people tried to get people in every ballpark to commit to scoring every game. With the Shootout, and 2 games nationally broadcast every Saturday, and a 10 team league, we could do this.

    2. I'm interested in how tactics in MLS are evolving. A question I put to Fox Populi (which didn't make it) was asking whether Harkes thought MLS tactics were moving away from the AM-DM dichotomy (probably best exemplified by the Richie Williams/Marco Etcheverry central midfield tandem) and moving toward a shared responsibility (New England before their recent acquisitions, for example.)

    A. It wouldn't be hard, I would think, to track all of the central midfield pairings (eliminating the Fire and other teams with a 3 man backline), and compare the touches for them in each third of the field. Now, that wouldn't allow us to compare Now to Then, but it would be interesting to see which teams use two way mids and which don't.

    B. I'm also interested in the differences between 3 man and 4 man backlines. I would think that the marking assignments are easier in a 3 man backline...two man markers, and a free defender to pick up runners. I would think, tho, that a 3 man backline would draw fewer offsides. That free defender needs to lay back. But it would be interesting (and fairly easy) to check.

    A weakness of the 3-5-2 is that there's only one flank player on each side. Which invites crosses. So (and this would take some work on definitions) does the header winning percentage of the defenders in a 3 man backline have a bigger impact on goals allowed than a 4 man backline?

    I'm sure you guys can come up with other issues.

    We could pick an issue (or more) to track over a season, or half season, and write it up.

    In 3-6-1. ;)
     
  18. beineke

    beineke New Member

    Sep 13, 2000
    It's definitely the big guys, although that's partly because the little guys aren't usually stationed in the box.

    The most surprising thing I've seen in the data so far is that Chicago has allowed five goals from corners, more than anyone else in the league. And it's not due to the narrow field in Naperville -- all five were on the road. What's wrong with Thornton and all those tall players?

    There is also a hint that defense may be a more important factor than offense. The standard deviation of goals allowed is 1.6, as opposed to 1.3 for goals scored. In 12 games with Pope and/or Jolley, the Metros allowed only 1 goal from a corner. In 5 games without them, they allowed 3.

    I'd like to go into even more detail, but I've got to run...
     
  19. mpruitt

    mpruitt Member

    Feb 11, 2002
    E. Somerville
    Club:
    New England Revolution
    This is great that so many people have been interested in this thread. Beyond the fact that there really should be some kind of Society of Soccer Researchers out there that doesn't exist as of yet, I think that we should ask to have a Stats & Analysis forum added. This thread has been great, but if we ever to talk about anything trully substanitive we should probably do it in a slightly more structured way.

    Keep this stuff coming guys, unfortunately some of this analysis is a little bit over my head, cause I'm a moron. But the whole point is trying to think differently and learn more about the game isnt it.
     
  20. JG

    JG Member+

    Jun 27, 1999
    After reviewing the videos, it's possible that some of the goals on my list shouldn't count.

    Brad Davis on 5/17--play starts from a short corner, but Dallas makes 4 passes before Davis scores from 25 yards...didn't look like a set play, unlike the other short corner goals I counted.

    Jeff Cummingham on 5/17--play starts from a free kick--MLS match report misidentifies it as a corner.

    John Spencer on 6/14--initial shot is blocked...Spencer collects ball, passes to Chung on the wing, and then heads in Chung's cross.

    Damani Ralph(PK) on 7/19--Revs can't clear corner, but Fire knock the ball around a bit before Curtin draws the pk...the original ck doesn't even show up on the highlight clip.

    The Cunningham one obviously shouldn't count...the others are subjective.

    Of the 19 remaining goals, only 5 are headers.
     
  21. kenntomasch

    kenntomasch Member+

    Sep 2, 1999
    Out West
    Club:
    FC Tampa Bay Rowdies
    Nat'l Team:
    United States
  22. beineke

    beineke New Member

    Sep 13, 2000
    If we're asking what to do on a corner kick, I think that the other four are important cases to include. Based on your descriptions, it does sound as if the plays evolved from a corner.

    My impression is that the great majority of corner kicks are attempts to cross the ball for a header on goal. But even with this small amount of data, that doesn't seem to be a very productive approach. Almost certainly, the success rate is higher for other options ... short corners, or crosses to knockdowns, dummies, flick-ons, or volleys.

    Thanks for doing the follow-up on this.
     
  23. mpruitt

    mpruitt Member

    Feb 11, 2002
    E. Somerville
    Club:
    New England Revolution
    Since I haven't really posted anything substanitive in terms of a hypothesis to throw out there.

    I believe that the biggest difficency in terms o the stat tracking of soccer games is the Percentage of Possesion statistic. As it has been mentioned before possesion does not always corrilate to gaining points, or goals. I think that a far more accurate way would be to track some kind of team turnovers. Who really cares if a team is trying to bunker down, holding the ball and passing the ball along its back four. I know that this might be a little bit more difficult to track than say a turnover in basketball, but based on a team level I think that it could give a lot of insight.

    Perhaps it would be even more difficult and pointless to track for an individual player because people taking on guys one on one is rather rare, but I would be wiling to bet that teams, no matter their style would have a better won/loss/tie record if they had a Plus Dispossesion record than a Minus one.

    Give the flukey break neck pace of soccer, obiously one wrong move can change a game. But no one's really keeping statistics as to how many wrong moves teams are making. Statistically speaking, I'd be that some of the best teams in the world would have this ration be very high, rather than 10 David Regis' on the feild.

    Edit: PS, nice job by Tom really forcing the issue with Peter Hirdt.
     
  24. beineke

    beineke New Member

    Sep 13, 2000
    One more corner kick goal this week ... Magee sailed it over the pack, and Jolley went backdoor to send a volley into the net.

    Yet again, a corner goal is not a header, and yet again, Chicago is the victim.
     
  25. JG

    JG Member+

    Jun 27, 1999
    You're probably right about that...I tested on the last five years of premiership results and the mean error in "point percentage" using a poisson model was 3.1%, which is better than I could get using variants of pythagorean and goal ratio formulas. It seems to do a better job with the teams at the top of the league.

    I stumbled across a pop-scienceish probability book from the UK today...they had a couple pages about soccer...talked about using poisson distributions to predict results as above, also mentioned a study by a British newspaper showing that the team scoring first in a premiership match won 69% of the time, plus a dutch study about how teams perform playing 10 vs. 11, which could presumably be used to determine when a defender is better off commiting a red-card foul than allowing a sure goal.
     

Share This Page