View Full Version : Sabermetrics applying to Soccer
Pages :
1
2
3
4
5
[
6]
7
8
9
10
11
12
13
14
15
16
Originally posted by kenntomasch
Wow, you could tell that in a quick scan, how the goals were scored?
Depends on your definition of quick.
That's less than 3% (does that 847 include short corners, where they don't make an attempt, really to have the ball drop directly on someone's head in an attempt to get it on net? I'm guessing it does).
It probably does...I counted a goal or two that came from crosses after short corners.
Real Ray
06 Aug 2003, 11:55 PM
....As Joe pointed out, why should we concern ourselves with the difficulty of defining something nebulous like a "chance" when we don't even know comparatively simple things like how often goals are scored off of corner kicks?
Because unlike baseball, stats like "shots on goal," in soccer are too empty; they are poor in value compared to many of baseball's basic stats. IMO a sport like soccer needs for its stats to be placed in a more defined context in order provide a deeper understanding. I think there as to be an element of subjectivity in a stat like "shots on goal," as shots vary to a wide degree in soccer.
What I've done to illustrate this, is scored the first 20 minutes of the US-Port match in WC 2002.
http://www.geocities.com/castmind/usa02.html
(It a geocities site, so you may get the try again message.)
I did not score passes, corners, saves, etc., but obviously, they would be included in a total breakdown of the game. I only scored what I describe as, "Attempts On Goal." These are broken down into two categories:
1. "Chances": A play in the offensive third, that results in or by any resonable measure, should result in a shot on goal.
2. "Shot": Attempts on goal that fall outside the defintion of a chance-an attempt to chip a keeper or free kicks/shots deep in or beyond the offensive third, for instance.
In the first 9:24 of the match, there were 6 Attempts On Goal. 2 were Chances (McBride, O' Brien) 4 were Shots, (Stewart, Pope, Mc Bride, and Donovan).
I also included McBride's goal, just to flesh out the stats, but did not score all the way through to the 36th min.
My wish is for something like this to be availabe on cd-rom for each team-"MLS Abstract: MetroStars 2003." Each game would have a page like the front page, with the lineup and a link to each player's game summary page, detailing that all of his stats, including video clips. At the end would be the final numbers crunch, with stats like, "Chance Conversion Rate," and some type of author/expert summary of the data.
Not very likely of course, but I might to it just for the hell it during the next WCQ. Tom is right though, it's pretty difficult and painstaking.
TomEaton
07 Aug 2003, 02:13 AM
I think you misunderstood me, Ray; what I meant was that studying something like the number of "chances" converted is going to be difficult because different observers will disagree as to what was a chance and what wasn't, whereas everyone agrees what a corner kick is. As you point out, even something like what constitutes a "shot" can be the subject of disagreement. I personally doubt that any definition of a "chance" will eliminate the subjectivity factor to a degree great enough to come to any reliable conclusions. That said, I invite people to try.
Originally posted by Real Ray
This idea that you could not agree-I think if you polled a group of coaches or asked them to each watch a match alone and mark on a sheet what they thought were the chances in a match, it would be pretty damn close.
Whether different people's observations would be close and how close they are is crucial, and that can (and should) be tested. So if it turns out your observers agree 90% of the time, an analysis based on subjective coding of plays as "chances" or "non-chances" may be useful. If they agree 20% of the time, it probably isn't worth the effort. In social sciences, this is called "intercoder reliability."
That kind of study - quantitative study based on subjective "coding" of events or things - isn't uncommon in political science, for instance. An example would be ratings of countries' level of "democracy" or other political regime characteristics, which have become somewhat widely used. Obviously, we can't objectively prove that Malaysia or whoever scores a "4" on civil liberties. But if we ask ten different experts to rate the country on a a five point scale using the most clear and consistent criteria we can come up with, and nine of them say "4" while the tenth says "3," chances are our measurement isn't too bad, and using it try to determine if a country's degree of protection of civil liberties correlates with other things (say, levels of economic growth) may be worthwhile.
So, while these types of measures are inferior to ones where there's no real chance of disagreement among observers, they aren't inherently useless, and are sometimes a lot better than nothing. The question is, how much additional insight do we gain by using "chances" instead of "goals," and is it enough to compensate for the added subjectivity and the problems that go along with it?
Real Ray
07 Aug 2003, 06:46 AM
Yeah.
The one point that Tom makes that I think you need, is the raw collection of data-an index that the scorer(s) could have handy.
For instance, of the 6 Attempts On Goals (there were 3 not 2 chances, my typo) the one I had the hardest scoring was Stewart's free kick that created rebound for Pope. If it was Beckham, I would have scored it a chance, knowing Beckham's quality. But not having seen enough of Stewart and not having any hard data to show percentages for free kicks, I went just on my own view of the play and scored it as a shot. But I could be wrong and the data might show that this play for Stewart has a high enough percentage to note it as a chance.
It's a fun exercise though...well, if you enjoy this sort of thing :)
NER_MCFC
07 Aug 2003, 09:08 AM
One example of the difficulties presented by the relative lack of unequivocably definable discrete events in soccer is playing itself out in this (very interesting) discussion. A goal correlates to a run in baseball or a touchdown in football, a result of a number of discrete events, not a discrete event in itself. Similarly, a chance probably correlates to an at-bat or an offensive series, a collection of discrete events that might or might not result in a score.
Any thoughts on what the irreducible events in soccer are? There are the obvious ones, corners, throws, goal kicks, free kicks (including IFKs and kick-offs, right?) and penalties. Would everything else be touches or fouls? Or would the obvious ones be sub-categories of touches and fouls? A shot would be a sub-category of touches (and other events); a chance would be made up of a collection of events, and a goal would be the result of some collections of events.
Originally posted by voros
The problem with intuitive thinking, is that despite being unthinkably valuable, it has its pitfalls. This is why we think more babies are born during full moons, eggs can be stood on end during the equinox, and that more domestic violence occurs on Super Bowl sunday than any other day of the year.
The main problem is that human perceptions and memory are unreliable and conditioned by prior biases. In other words, I think we tend to notice and remember things that fit our expectations, and overlook or forget things that don't match those expectations. So, if I'm convinced that Tony Sanneh sucks, I'll notice when he makes a bad pass, and be less likely to remember the good passes he made. If I've accepted the idea that it a two goal lead is "dangerous," I'll remember the few times when a team got complacent while up 2-0 and gave up the lead, and subconsciously disregard the great majority of games in which it didn't happen.
That's why keeping as objective and complete as possible a record of what happens during games is useful, even if it doesn't lead to clear conclusions about how to evaluate individual players or win games. It's always good to have unbiased information against which we can measure our subjective perceptions, to see if they match.
kenntomasch
07 Aug 2003, 10:34 AM
Originally posted by NGV
The main problem is that human perceptions and memory are unreliable and conditioned by prior biases.
And the human mind seeks to draw a connection between discrete events, whether they are actually connected or not.
I think it's easier to do big-picture things than individual player things sometimes (or, at least the "discrete events" analysis we've been talking about). Like "how often does a team up 2-0 blow the lead?" You can get a fairly reliable answer to that one just by checking because those are quantifiable and reliable numbers (the study I did showed that teams that took a 2-0 lead won outright 90% of the time, which, to me, makes a 2-0 lead almost a "lock", not "the most dangerous lead in soccer").
Or "would you rather host the first or second leg of a two-legged, aggregate goals tie?" A limited study I did of the Champions League, UEFA Cup and A-League one year showed that a team's chances of advancing to the next round were almost identical whether they hosted the first or second leg of the tie (I've heard someone did a much larger study over a longer time, strictly in European competition, that showed some advantage to hosting the second leg). That's useful information.
That corner kick information, that's useful. If we knew that 3% figure held up over time, would anyone change their corner kick strategy, knowing that it's a low-percentage play?
There are questions, and there are answers. I think whoever said it's about the quest for answers to questions, not necessarily the numbers themselves, was spot-on.
beineke
07 Aug 2003, 10:59 AM
Originally posted by Real Ray
What I've done to illustrate this, is scored the first 20 minutes of the US-Port match in WC 2002.
http://www.geocities.com/castmind/usa02.html
That video work is slick ...
For a different perspective on tracking, here is the work that I did on the US-Portugal game. It took less than an hour, and there's gobs of information there. Karl Keller tried to organize a BigSoccer group to chart the entire game, but most people flaked. Still, I'm willing to do more charting, given a reliable group of people to share the work.
http://www-stat.stanford.edu/~beineke/portwrite.xls
One thing I just noticed is that midway through this segment, McBride seems to hit the wall. He wins his first three headers (and had been winning headers all game up to this point). Then he loses four straight and fails to control a ball on the ground ... that's why Portugal was able to mount the steady pressure that led to Agoos's own goal.
NER_MCFC
07 Aug 2003, 11:00 AM
Originally posted by kenntomasch
the study I did showed that teams that took a 2-0 lead won outright 90% of the time, which, to me, makes a 2-0 lead almost a "lock", not "the most dangerous lead in soccer"
Did you also look at 1-0 and 3-0 games? I have always thought the 'most dangerous lead' opinion was rooted in the fact that goals are relatively rare in soccer, but not so rare that a 1-0 lead seems safe. I would certainly expect a tendency for a team to relax with a 2-0 lead where they wouldn't at 1-0.
I remember that there was an item in Soccer America recently saying, in effect, that a team that scores a goal will have a winning percentage (counting ties as .500) of somewhere around .660, with the first goal of the game producing a winning %age of over .700. I guess the question should really be, how does a 2-0 lead compare with 1-0 and 3-0. I would expect them to fall along a continuum of 1-0, 2-0 and 3-0, but I wouldn't be surprised if the spacing wasn't even. Given the rarity of goals in soccer, taking a 2-0 lead and not winning should be dramatically rarer than after 1-0 and somewhat less rare than after 3-0. Is it?
joe2
07 Aug 2003, 11:09 AM
Originally posted by NGV
Whether different people's observations would be close and how close they are is crucial, and that can (and should) be tested. So if it turns out your observers agree 90% of the time, an analysis based on subjective coding of plays as "chances" or "non-chances" may be useful. If they agree 20% of the time, it probably isn't worth the effort. In social sciences, this is called "intercoder reliability."
That kind of study - quantitative study based on subjective "coding" of events or things - isn't uncommon in political science, for instance. An example would be ratings of countries' level of "democracy" or other political regime characteristics, which have become somewhat widely used. Obviously, we can't objectively prove that Malaysia or whoever scores a "4" on civil liberties. But if we ask ten different experts to rate the country on a a five point scale using the most clear and consistent criteria we can come up with, and nine of them say "4" while the tenth says "3," chances are our measurement isn't too bad, and using it try to determine if a country's degree of protection of civil liberties correlates with other things (say, levels of economic growth) may be worthwhile.
So, while these types of measures are inferior to ones where there's no real chance of disagreement among observers, they aren't inherently useless, and are sometimes a lot better than nothing. The question is, how much additional insight do we gain by using "chances" instead of "goals," and is it enough to compensate for the added subjectivity and the problems that go along with it?
NGV....You are saying (only better) what I have been trying to point out. We can quantifiy info in soccer. But we have to recognize the problems in our data collection.A good place to start is as you suggest, around the problem of "chances" and "goals". I think it would be possible to get a reasonable amount of agreement as to what a chance is in the attacking third. And it would be valuable info. We have all seen players who contribute to scoring opportunities without showing up on the stat sheet. this would be a way to remedy that lack of information. What we need next is an operational definition of what a "chance" is. If we can do that we could probably work backward to other defintions of events leading to chances. Anyone want to give us a possible definition of a "chance" that could be observed and agreed on by most people ? Here is one which you can pick apart and refine:
"Chance" any movement with the ball which leads to an opportunity to score a goal within two touches.
"Opportunity to score" any touch which if properly executed results in a shot on goal. These are off the top of my head so feel free to criticize and refine.
kenntomasch
07 Aug 2003, 11:26 AM
It's fairly easy to get the value of a 1-0 lead. MLS keeps stats on "Record when scoring first", which, by definition, means taking a 1-0 lead.
RECORD WHEN SCORING FIRST GOAL
TEAM......................W..L..T PCT
San Jose Earthquakes......6..0..0. 1.000
MetroStars................5..0..1.. .917
Columbus Crew.............5..0..3.. .813
Chicago Fire..............7..2..1.. .750
New England Revolution....5..1..3.. .722
Los Angeles Galaxy........3..0..4.. .714
Colorado Rapids...........5..2..1.. .688
Kansas City Wizards.......6..3..4.. .615
D.C. United...............3..2..1.. .583
Dallas Burn...............2..2..1.. .500
MLS TOTAL................47.12.19.. .724
Teams that scored the first goal went on to win outright 47 of 78 times (.602) - obviously there are some 0-0 games, in which nobody scores first.
We all know that goals are precious in soccer, and it stands to reason that if you get one, your chances of winning increase. But you've got a much better chance of coming back from 0-1 down than 0-2 down, statistically. I haven't looked at 3-0, but I'd be stunned if more than 1 in 100 came back from 3 goals down.
As for the letdown factor, well, maybe it is, maybe it isn't. You'd think that after all these years of people saying there's a tendency to let up with a 2-0 lead, that everyone would know that and not let up.
Again, trying to draw a connection between discrete events. Maybe you didn't blow a 2-0 lead because you let up, maybe the other team outplayed you. Maybe there was luck involved. Maybe many things.
beineke
07 Aug 2003, 11:48 AM
Originally posted by JG
I kept a list of them, but not terribly detailed.
7/26 San Jose 1st goal
7/19 Chicago 3rd goal
7/19 San Jose 1st goal
7/16 Dallas 2nd goal
7/12 New England 1st goal
7/4 Colorado 1st goal
7/2 San Jose 1st goal
7/2 San Jose 3rd goal
6/21 Kansas City 1st goal
6/18 Los Angeles 1st goal
6/14 Colorado 1st goal
6/7 DC United 1st goal
5/31 Dallas 2nd goal
5/31 San Jose 1st goal
5/24 Columbus 2nd goal
5/24 Dallas 1st goal
5/17 Columbus 2nd goal
5/17 Los Angeles 1st goal
5/17 Dallas 1st goal
5/17 New England 2nd goal
4/26 Kansas City 2nd goal
4/26 Metrostars 1st goal
4/19 Kansas City 1st goal
Thanks, JG. Here's a trend that we might watch for the rest of the season:
After 5 of the first 9 goals were scored by the visiting team, home field advantage seems to have taken hold. Home teams have scored 10 of the last 14 corner kick goals. Two of those four away goals were scored against the Metros while Pope and Jolley were both out.
Does familiarity with the home conditions make a difference?
kenntomasch
07 Aug 2003, 12:03 PM
I wonder.
I'd love to break that down in terms of the guy who took the corner kick. Does Ante Razov (who takes most of the Fire's CK's) have a better chance of dropping one in where a teammate can score on it than someone else does? What's a guy's ratio of corner kicks to assists on CK goals? Is there a difference between players, from year-to-year?
Who scores on corner kicks? Is it taller players, like a Jim Curtin, or crafty players, like a John Spencer? This would be interesting information to have.
monster
07 Aug 2003, 12:14 PM
Great, just what soccer needs - stat geeks. :p Can we get you guys your own forum?
Your statistically challenged modeator :D
kenntomasch
07 Aug 2003, 12:19 PM
I was about to say you're not quite geeky enough to be posting here. :)
Hey, if someone wants to give us our own forum, we'd be all for that. Unless someone wants to volunteer to host it in his premium member forum.
I have a bunch of studies and numbers that are partially done that I've never put anywhere because I couldn't find a good place.
superdave
07 Aug 2003, 12:23 PM
I can't believe it took me so long to join this thread.
1. Project Scoresheet. About 15 years ago, some people tried to get people in every ballpark to commit to scoring every game. With the Shootout, and 2 games nationally broadcast every Saturday, and a 10 team league, we could do this.
2. I'm interested in how tactics in MLS are evolving. A question I put to Fox Populi (which didn't make it) was asking whether Harkes thought MLS tactics were moving away from the AM-DM dichotomy (probably best exemplified by the Richie Williams/Marco Etcheverry central midfield tandem) and moving toward a shared responsibility (New England before their recent acquisitions, for example.)
A. It wouldn't be hard, I would think, to track all of the central midfield pairings (eliminating the Fire and other teams with a 3 man backline), and compare the touches for them in each third of the field. Now, that wouldn't allow us to compare Now to Then, but it would be interesting to see which teams use two way mids and which don't.
B. I'm also interested in the differences between 3 man and 4 man backlines. I would think that the marking assignments are easier in a 3 man backline...two man markers, and a free defender to pick up runners. I would think, tho, that a 3 man backline would draw fewer offsides. That free defender needs to lay back. But it would be interesting (and fairly easy) to check.
A weakness of the 3-5-2 is that there's only one flank player on each side. Which invites crosses. So (and this would take some work on definitions) does the header winning percentage of the defenders in a 3 man backline have a bigger impact on goals allowed than a 4 man backline?
I'm sure you guys can come up with other issues.
We could pick an issue (or more) to track over a season, or half season, and write it up.
In 3-6-1. ;)
beineke
07 Aug 2003, 12:25 PM
Originally posted by kenntomasch
Who scores on corner kicks?
It's definitely the big guys, although that's partly because the little guys aren't usually stationed in the box.
The most surprising thing I've seen in the data so far is that Chicago has allowed five goals from corners, more than anyone else in the league. And it's not due to the narrow field in Naperville -- all five were on the road. What's wrong with Thornton and all those tall players?
There is also a hint that defense may be a more important factor than offense. The standard deviation of goals allowed is 1.6, as opposed to 1.3 for goals scored. In 12 games with Pope and/or Jolley, the Metros allowed only 1 goal from a corner. In 5 games without them, they allowed 3.
I'd like to go into even more detail, but I've got to run...
mpruitt
07 Aug 2003, 03:43 PM
This is great that so many people have been interested in this thread. Beyond the fact that there really should be some kind of Society of Soccer Researchers out there that doesn't exist as of yet, I think that we should ask to have a Stats & Analysis forum added. This thread has been great, but if we ever to talk about anything trully substanitive we should probably do it in a slightly more structured way.
Keep this stuff coming guys, unfortunately some of this analysis is a little bit over my head, cause I'm a moron. But the whole point is trying to think differently and learn more about the game isnt it.
After reviewing the videos, it's possible that some of the goals on my list shouldn't count.
Brad Davis on 5/17--play starts from a short corner, but Dallas makes 4 passes before Davis scores from 25 yards...didn't look like a set play, unlike the other short corner goals I counted.
Jeff Cummingham on 5/17--play starts from a free kick--MLS match report misidentifies it as a corner.
John Spencer on 6/14--initial shot is blocked...Spencer collects ball, passes to Chung on the wing, and then heads in Chung's cross.
Damani Ralph(PK) on 7/19--Revs can't clear corner, but Fire knock the ball around a bit before Curtin draws the pk...the original ck doesn't even show up on the highlight clip.
The Cunningham one obviously shouldn't count...the others are subjective.
Of the 19 remaining goals, only 5 are headers.