PDA

View Full Version : Sabermetrics applying to Soccer


Pages : 1 2 3 [4] 5 6 7 8 9 10 11 12 13 14 15 16

kenntomasch
04 Aug 2003, 07:05 AM
The other thing you could do, Tom, is check MLSnet.com after a team's game, when the league has updated the stats. Every goalkeeper's catch/punch stats are there. If you know that Zach Thornton had 42 one week, and the next week he had 44, you know he had two in the previous game.

Still laborious, but not as bad as the alternatives.

joe2
04 Aug 2003, 08:12 AM
[QUOTE]Originally posted by TomEaton
[B]For anybody who's thinking they might like to try to do some MLS statistics interpretation, just be prepared for frustration. Peter Hirdt's latest column on MLSnet asserted that there was an incredibly large correlation between--get this--goalkeeper catches/punches and winning. Yeah, I know, my first reaction to this was the same as yours: that just can't be true. But he did back up his assertions with some supporting data.

Actually, that statistic makes sense to me. The team that has fallen behind tends to push up and try to create more scoring opportunities. It will also take lower quality shots (more easily deflected or caught) in desperation to create a scoring chance. The team that is ahead tends to lay back and play to protect the lead.

This illustrates the problem with statistical data. You can find corelations with all kinds of things but that does not imply causality. If that statistic accurately reflected the cuase of winning (the keeper making more catches and saves) then the obvious plan of attack by a statistically- "intelligent" coach would be to allow the opponent more shots on goal, so the keeper could make more saves. Of course, that would be silly. Look at baseball as an example.

The problem remains, what events in soccer can be accurately recorded and do those events have predictive value in a corelation. In a team sport as complex and fluid as soccer I doubt that. No has yet to give a precise definition of when a "chance" begins....

TomEaton
04 Aug 2003, 04:19 PM
Good idea, Kenn. I didn't realize they updated the season statistics every week. I personally am not interested in trying to keep up with the stats every week, but maybe someone more dedicated could do it.

Joe, I agree with you to some extent, and your proffered explanation of why catches/punches might be greater for the winning side is the first one that struck my mind as well. I also understand (as I think everybody who's bothered to follow this thread this far does) that correlation is not the same as causation. The point, though, is that we don't KNOW. We might be able to get some evidence one way or the other if we could, say, tally up the number of catches/punches after the winning team had taken the lead as compared to when the game was tied. If the ratio increased after the lead had been taken, that would lend support to your theory. If it didn't, then it would refute that idea. Without any evidence either way, all we're doing is guessing.

By the way, I e-mailed Peter Hirdt again to ask him whether it was possible to buy the raw statistical data from Elias, and how much it would cost. He replied curtly that the raw statistics are neither for distribution nor for sale, which makes me wonder why they're bothering to collect them.

mpruitt
04 Aug 2003, 04:48 PM
Originally posted by joe2
The problem remains, what events in soccer can be accurately recorded and do those events have predictive value in a corelation. In a team sport as complex and fluid as soccer I doubt that. No has yet to give a precise definition of when a "chance" begins....

Instead of worrying about how more statistical analysis wouldn't work with soccer. Why not try to have a less traditional deconstructionist view. There would probably be no way to accurately and objectively define what a 'chance' is but that's too narrow. Right now either way, people make statements that a certain person is good at taking and scoring on all of his chances and that's completely subjective, absolutely entirely an oppinion. The idea is to try to find ways where we could objectify at least some of these things. If I were going to try and see how accurate a teams scoring opportunities are. I'd first start with, a teams goals scored average, overall team possesion in the first third, then go to individual players' possesion vs. dispossesion rate, compare that with a team's defensive goals against average, overall, then their defending in the final third, individual player's defending, and their defending on particular segments in the feild, then try to look at where shots were taken from specific segments on the feild from which the other team has scored. That's at least one way very quickly in my mind that you could begin to generate some kind of objective knowledge as to what kind of team is taking and scoring on enough of their chances.

Furthermore, the idea of trying to quantify a chance is a bit silly anyways. That's like trying to quantify what baseball player is 'clutch.'

microbrew
04 Aug 2003, 06:14 PM
Perhaps a better way to think about this: ask a question, then crunch the math, then refine the question, and so on.

In this case: what effect does a goal keepers ratio of punchs and catches have on the outcome of the game?

It's not enough just to prove correlation though- some kind of explanation is needed, along with stats backing up the explanation, and perhaps most importantly, a way of testing the explanation.

A question I might have: What effect does a yellow card have on the carded player's ratio of attempted tackles to tackles won? I'll keep refining that question, and ask other questions until I build an answer to how a yellow card affects a player's play.

kenntomasch
04 Aug 2003, 06:20 PM
That's the way to do it. Don't start with the numbers, start with the question. I always like to look at things people say (Like "A 2-0 lead is the most dangerous lead in soccer") and then see what the evidence is that would support that or tear it down (turns out you win outright 90% of the time when you go up 2-0, and at home you're almost a mortal lock).

Is there any way to support the things that people say are true or just accept as true? That's what I look for.

joe2
04 Aug 2003, 06:22 PM
[QUOTE]Originally posted by TomEaton
Joe, I agree with you to some extent, and your proffered explanation of why catches/punches might be greater for the winning side is the first one that struck my mind as well. I also understand (as I think everybody who's bothered to follow this thread this far does) that correlation is not the same as causation. The point, though, is that we don't KNOW. We might be able to get some evidence one way or the other if we could, say, tally up the number of catches/punches after the winning team had taken the lead as compared to when the game was tied. If the ratio increased after the lead had been taken, that would lend support to your theory. If it didn't, then it would refute that idea. Without any evidence either way, all we're doing is guessing.


I tend to agree with this idea, tom. You have identified data which can probably be reliably collected and evaluated. I think you pose the question in a testable format.

joe2
04 Aug 2003, 06:44 PM
Originally posted by maxim-1
[

Right now either way, people make statements that a certain person is good at taking and scoring on all of his chances and that's completely subjective, absolutely entirely an oppinion. The idea is to try to find ways where we could objectify at least some of these things....

Furthermore, the idea of trying to quantify a chance is a bit silly anyways. That's like trying to quantify what baseball player is 'clutch.' [/B]

MAXIM...I am in some agreement with your idea. There are some things which can be "objectively" counted. I disagree that a person's perception is only his opinion, however. Opinions and analysis are (or can be) based on years of observation and experience. I can tell if a player is good at making space, taking shots, etc. and so can you, based on your experience in observing soccer. Now, if I was to try to evaluate the performance of a judo expert, that would be completely subjective since I have little or no experience with the sport. But you are right, at the present time we do not attempt to quantify that knowledge statistically. That does not mean the knowledge does not exist or those non-counted observations are not valid.

I also agree with you that trying to quantify "chance" is like trying to quantify "clutch" in baseball. But that does not mean "clutch" is not important...perhaps more important than any other aspect of a player's performance.

I am not arguing for the sake of argument but only because I have a great deal of trouble trying to figure out how many important aspects of a team's performance could be objectively observed and quantified (in soccer). That is why I think it is extremely important to be precise about terminology.

As an aside, in the A-League statisticians will only allow one assist per goal, whereas in the past two assists could be awarded. Two assists more accurately reflects the importance of team play in setting up a goal. As a result of that stat change many fine defenders and midfielders are no longer being credited with the great pass that lead to a second touch that lead to a goal. The assist and goal stats are easily recorded and identified but have, in fact, lost some of their meaning because of the change in recording that particular stat. I have no ide why they made the change.

kenntomasch
04 Aug 2003, 06:59 PM
When I was in the A-League years ago they discontinued the two assists. In D3 even before that.

voros
04 Aug 2003, 07:53 PM
I suppose I'm obligated to reply to this thread, though I'm not sure how I should reply.

I'm guessing I'm someone who is the pivot point on this issue, so I'll give my bio and then my thoughts.

My name is Voros McCracken and I'm one of those Sabermetrics guys out there (though I despise that term). I got a few pages in Moneyball, have written for Baseball Prospectus, Primer, etc., and currently consult for the Boston Red Sox along with Bill James. I basically started posting on Big Soccer because after being hired by the Red Sox, my hobby of posting on baseball message boards and such got severely restricted (for obvious reasons).

The biggest hurdle to climb in Baseball and by connection Soccer, is that what guys like me do is about statistics. It is NOT about statistics. It is about using reliable methods and information available to analyze the game of baseball (and soccer). The idea is to subject the sport to the same standards of inquiry other endeavors like pharmaceutical research, psychology, and other disciplines where the scientific method is employed to learn about critical issues within the discipline...

...it just so happens that in baseball, one of the most obviously important and useful areas of exploration happens to be in the HUGE amount of various statistics compiled with regards to the game in the last 100 years. If I want to find out if a pitcher's hits per balls in play tends to correlate well from year to year, I can go back into this database and develop a study to give some ideas as to the answer. Unspeakably useful things are statistics in baseball, as they allow you to skirt around various subjective and unreliable methods of analysis.

But that still doesn't mean the study itself is about statistics. Statistics are the tool, not the end result. The end result is learning things you previously didn't know.

As far as applying this to soccer, here are some basics:

1) There are some basics that initially need to be dealt with:
a) Games played as measure of opportunity is inherently inferior to minutes played for the same purpose. Treating someone who comes in as a sub in the 83rd minute as having equal opportunity as someone who plays the full 90 is silly.
b) Accumulating points is the object of every game. The discussion about what constitutes win percentage is moot, as that doesn't count in the standings, points do. So when we start to talk about qualitative stats and evaluating players, it needs to be done so on the basis of what the player does to achieve his team receiving as many points per game as possible.
c) Some information is always preferrable to no information. People sometimes make the mistake of assuming if you can't know everything, than there's no point in knowing anything. Just because goals per 90 minutes is not all you can judge strikers by, that doesn't mean the statistic has no meaning or importance. Certainly that stat tells you something.
d) Descriptive data tends to be better than qualitative data, and objective data tends to be better than subjective data. Not that subjective data is useless or wrong, only that it has the unfortunate tendency to switch scales based on whose doing the compiling.
2) Measuring "chances" is a bad idea as someone else mentioned. What the hell is a "chance?" And is there any possible way to get a consistent interpretation when using different people to make such distinctions.
3) Along the same lines, simply compiling things like shots, touches, passes, goals (and derivates like plus/minus), headers, saves, free kicks, penalty kicks and so forth have value even if many of those things turn out to have little value in player evaluation (which, from above, has to do with the relationship between the data and how it contributes to points earned by the team). While not completely objective, you can likely get fairly consistent interpretations across a spectrum of official scorers.
4) Peer review has its place, even if it isn't of the structured type you see in scholarly journals and such. At one point this year I had compiled a long diatribe about what was wrong with one of Hirdt's analyses (I think it was the one on the importance of goals depending on when they were scored) but I eventually figured no one would care, so I didn't. The point is that Hirdt can't be trusted to be right on something and neither can I or anybody else. Their work needs to be redone, and flaws and holes in it need to be examined.
5) The all-encompassing single statistic that rates players is fool's gold. It is in baseball and it would be in soccer. Statistics that tell us something about _how_ the player performs are much more valuable than ones which try and tell you _how well_. The former has all sorts of uses, including, but not limited to, the latter.
6) Statistics that analyze team performance will be much easier in soccer, and of value, but I think much less valuable and interesting than looking at the individual players themselves.
7) An understanding of the differences between results, statistics, performance, ability and potential ability are all important. The plus/minus stat is a perfect example. It doesn't matter if a team has a much better goal differential when a particular player is on the field, if this difference has little to do with the individual performance and abilities of that player. We cannot _assume_ that it does, and would have to examine the issue further.
8) Refine, rework and redo. Do this a lot.
9) Finally, the reason why this sort of thing is helpful is the inherent fallibility of human reasoning. The human mind is an extraordinary thing, but it is its very strengths that often lead to the logical breakdowns we all suffer from. Information that is true because "everybody knows it" is not fact. It is speculation and needs to be evaluated as such. I cannot recommend more highly a book called "How We Know What Isn't So" by Thomas Gilovich which explores why "the only thing infinite is our capacity for self-deception."

Long post, but I figured I'd make it.

Real Ray
04 Aug 2003, 09:06 PM
Well, I guess it's my stat, so... :)

Yes, I agree there is subjectivity re: what one views as a legitmate chance-but not to the degree you could not get a consensus IMO-not withstanding the occasional dispute that you see other sports. This idea that you could not agree-I think if you polled a group of coaches or asked them to each watch a match alone and mark on a sheet what they thought were the chances in a match, it would be pretty damn close. And if using voros' view that Accumulating points is the object of every game. The discussion about what constitutes win percentage is moot, as that doesn't count in the standings, points do. So when we start to talk about qualitative stats and evaluating players, it needs to be done so on the basis of what the player does to achieve his team receiving as many points per game as possible than one way or another, we are going to meet at this intersection called a "chance." You can argue about when it begins or the scoring of a particular match, but I don't see how you avoid it as a starting point for your basic stat-or at least coming to agreement on what such a play is.

In terms of a "chance" here is a clip I pulled from the 1982 World Cup Of Paolo Rossi-what anyone would call a "chance."
http://www.geocities.com/castmind/rossi.html

As far as joe's question as to when a chance begins, I would view it as the point when the action manifests itself into an attempt on goal. I would of course use the assist stat and as maxim noted, you would have to break the field down into thirds, which will clarify the "when" question better.

To take Rossi's game further, you could then begin the process of breaking his match down with categories like:
Inside the area
Total chances:
Goals:
From Passes Outside the area:
From Passes Inside The Area
From Corner Kicks
From Throw In
From Individual Runs Into The Area
Rebound
Left Foot
Right Foot
Headers
etc., etc,...

Each of these and all other categories would then be linked to specific player(s) involved in the chance. You would then work backwards towards the furthest indentifiable point of inception for that specific chance-a throw-in, a goal kick, or an intercepted pass, say.

So in the the case of the clip I posted you could provide a basic scoring as follows:
Brasil vs Italy 1982 WC
Chance #9
Player: Paolo Rossi (Italy)
Inside the area
From pass inside the area (Graziani)
Shot: Left foot
Result:Miss

An example of a his first score of the match would be
Brasil vs Italy 1982 WC
Chance #4
Player: Paolo Rossi (Italy)
Inside the area
From pass outside the area (Cabrini)
Shot: Header
Result: Goal

With Rossi's goal, you would also have a statistical trail that starts with Conte gaining posession at midfield, then his pass to Cabrini out on the left. This links these two players to Rossi chance/goal and provide a larger context to place the chance. But the actual chance is Cabrini's center into Rossi.

So essentially what you're doing is going backwards from scoring chances, breaking down the play to its furthest point of identifiable inception, and then defining each action with some form of scoring notation.

Sachin
04 Aug 2003, 09:14 PM
Originally posted by Real Ray

So essentially what you're doing is going backwards from scoring chances, breaking down the play to its furthest point of identifiable inception, and then defining each action with some form of scoring notation.

So basically, you're invoking Schrodinger's Cat and the Heisenberg Uncertainty Principle to determine when a chance stars.

Sachin

mpruitt
04 Aug 2003, 09:26 PM
For those who don't know, Voros McCracken is Nekcarccm Sorov spelled backwards. (BTW, what the heck was that line in the book about.) I had wanted to find out if there were any people who had worked on these type of things in baseball that also had an interest in soccer. It's awesome to know that there are.

For also those who don't know, in moneyball our fellow bigsoccer poster here was written up to quite some acclaim. And if he and his friends give the Red Sox the players to win a World Series, then I suppose the name of my first born will be Voros.

Real Ray
04 Aug 2003, 09:32 PM
Originally posted by Sachin
So basically, you're invoking Schrodinger's Cat and the Heisenberg Uncertainty Principle to determine when a chance starts.
Sachin
Well considering the definition
one cannot simultaneously know both the position and the momentum of a given object to arbitrary precision
I suppose I am.

kenntomasch
04 Aug 2003, 11:22 PM
voros has it down. He's appointed the leader of the group. Can I get a second?

voros
04 Aug 2003, 11:27 PM
Originally posted by kenntomasch
voros has it down. He's appointed the leader of the group. Can I get a second?
Umm, declined?

It was more of a "I've been through the wars sort of thing" than a volunteer effort. :)

kenntomasch
04 Aug 2003, 11:33 PM
Doesn't matter. You're drafted. :)

voros
04 Aug 2003, 11:42 PM
Originally posted by kenntomasch
Doesn't matter. You're drafted. :)
Then, to avoid the draft, I guess I'll have to start posting to the Canada forum. :)

us#1by2006
04 Aug 2003, 11:47 PM
Voros,

Thanks for providing your professional insight to us. I appreciate what you were willing to offer us.

joe2
04 Aug 2003, 11:55 PM
Voros...interesting post. I would like to respond to what I see as conceptual weaknesses....
Point 1 b...The goal of a soccer team may not always be to accumulate as many points per game as possible. Near the end of a season very often getting 1 point for a tie is just as good as 3 for a win. But that is a minor point as in general you are correct.
Point 1 C, while it seems obvious is only correct to the point that the information is reliable. In other, bad information is in fact worse than no information because it may lead to faulty strategies and faulty conclusions.
1 D....descriptive data is always better than qualitative data ?...I don't see any basis for this statement. In fact, I don't see how you can make a distinction. Each piece of data collected always has a descriptive as well as qualitative component to it.
1 D...objective data tends to be better than subjective data ?....again, this depends on the purpose of the data collection. Certainly subjective and objective data, to the extent they can be separated, are useful, each in their own way. Neither is "better" than the other, or rather the "betterness" of each is dependent on the context in which each collected and used.
2. ...If you cannot attempt to define a "chance" in some way then any statistical analysis of soccer would be a nice set of numbers but with very little practical application. "Chances" or opportunities to score are the essence of the game. It is akin to saying we can't know what an At Bat is in baseball so we will ignore it.
3. and 4. I do agree that these discrete events have some descriptive value. And all of us should always be open to criticism and refinement of our work.
I also tend to agree with points 5,6,7, and 8.
Point 9. What you say is basically true. But it is also true of statistical interpretation as well. Because information is collected and assigned a numerical value does not in itself make it better than intuitive information. That is the great fallacy we see in the numbers game. ( For a good example of this just look at IQ scores, for example) The fallacy of the human mind, which I agree with you about, extends to the creation of mathematical models, which are after all, creations of the human mind. In other words, just assigning numerical value to events does not, in itself, make the information more meaningful, accurate or useful. Statistics can, after the fact, identify trends. But so can an expert human observor (a great coach, for instance).
A Final Point....personally I find statistical info of all kinds interesting, but not useful as a predictor of individual future events. Kind of like nice , fun toys.