PDA

View Full Version : Q: Is there anything in soccer similar to what Bill James has done in b'ball?


jri
15 Apr 2004, 12:43 PM
I've never heard to much "on field" statistical collection or analysis, outside of goals/assists/keeper saves/shot etc.

Is there any company or service that digs deeper (statistically) into player performances (giveways/steals, bad balls/good ball- I'm making this up, but you get the picture)..

ChrisE
15 Apr 2004, 03:05 PM
What is the stuff that Bill James does, jri?

I'm sure that stuff like that exists, and if you page through this relatively small forum you'll doubtless find some, but for the most part, we don't have access to it.

afgrijselijkheid
15 Apr 2004, 03:26 PM
the EPL fantasy game they used to have on was a great source of stats a couple years ago - they used everything in possible in their scoring system - the last year they ran the game, it had been scaled back considerably

the yahoo fantasy EPL game keeps a decent amount of stats

kenntomasch
17 Apr 2004, 07:35 AM
Bill James is not about the stats. He's about the objective search for knowledge first and foremost. The stats are just one tool.

Real Ray
17 Apr 2004, 08:43 AM
What is the stuff that Bill James does, jri?

I'm sure that stuff like that exists, and if you page through this relatively small forum you'll doubtless find some, but for the most part, we don't have access to it.
Most of the stat guys in the forum know about OPTA, but as Kenn hinted it is much more than that re: soccer, but not impossible. To understand this better, take a look at some of the point from his 1998 "Baseball Abstract":
1. Minor league batting statistics will predict major league batting performance with essentially the same reliability as previous major league statistics.

2. Talent in baseball is not normally distributed. It is a pyramid. For every player who is 10 percent above the average player, there are probably twenty players who are 10 pecent below average.

3.What a player hits in one ballpark may be radically different from what he would hit in another.

4. Ballplayers, as a group, reach their peak value much earlier and decline much more rapidly than people believe.

5.Players taken in the June draft coming out of college (or with at least two years of college) perform dramatically better than players drafted out of high school.

6. The chance of getting a good player with a high draft pick is substantial enough that it is clearly a disastrous strategy to give up a first round draft choice to sign a mediocre free agent. (see note #1)

7.A power pitcher has a dramatically higher expectation for future wins than does a finesse picther of the same age and ability.

8.Single season won-lost records have almost no value as an indicator of a pitcher's contribution to a team.

9. The largest variable determining how many runs a team will score is how many times they get their leadoff man on base.

10. A great deal of what is perceived as being pitching is in fact defense.
True shortage of talent almost never occurs at the left end of the defensive spectrum. (see note #2)

11.Rightward shifts along the defensive spectrum almost never work. (see note #2)

12.Our idea of what makes a team good on artificial turf is not supported by any research.

13. When a team improves sharply one season they will almost always decline in the next.

14. The platoon differential is real and virtually universal

These are the ideas/facts that you would have to draw from soccer data-which I think shows you hard it would be, considering the nature of soccer vs baseball.

kenntomasch
17 Apr 2004, 01:15 PM
The tack I take, at least, is to take things that people say are just true "because they say they're true" and see if there is evidence to support it (the 1981 Abstract was the first one I ever saw, and I have all the rest of them, plus the historical books). Like when people say "the 2-0 lead is the most dangerous lead in soccer," so I went back and checked, and, in MLS at least, if you go up 2-0, you're very, very likely to win outright (especially at home, you're almost a lock) and very, very unlikely to even blow the lead. (Note to self: I need to re-do that study with last year's numbers.)

Or when people get excited about corner kicks, and yet, based on the numbers in last year's MLS Playoff Guide, goals are scored on only between 2 and 3% of corners. That means 97-98% of the time, it's not a goal, and so not (in my book) particularly dangerous. Yet we've been conditioned to think of corners as "dangerous scoring chances."

There are things we can't know, and things we can know. What really is the advantage in hosting the second leg of a two-legged, aggregate-goal series (there isn't much of one that I've found, as long as you don't go to series OT or PK's, and maybe that was the whole point)? These things we can find out.

But unless we're going to chart everything second-by-second (logistically difficult), we can't know things like how many times a certain defender gets beaten by his man, what his tackle percentage is, how often his passes are on the mark, things like that.

Soccer is more subjective. Player game ratings, where a coach or former coach just looks at the game and assigns an arbitrary (but, obviously, somewhat educated) number to a player's performance, are like the products of skating judges.

Start with the question, not the statistics. You can shake statistics and see what falls out of them (I've done it), but it's much better, I think, to ask a question or take something someone posits as "a well-known fact" and look to see if there is evidence to support it.

In soccer, you just have to look harder, or accept that the lack of available information will sometimes leave your question unanswered.

numerista
17 Apr 2004, 04:11 PM
These are the ideas/facts that you would have to draw from soccer data-which I think shows you hard it would be, considering the nature of soccer vs baseball.

Thanks for posting this list, Ray -- oddly enough, it drives me towards exactly the opposite conclusion. Commenting on a few of these...

1. Minor league batting statistics will predict major league batting performance with essentially the same reliability as previous major league statistics.

Comparing data for lower-division and higher-division goalscoring is straightforward.

2. Talent in baseball is not normally distributed. It is a pyramid. For every player who is 10 percent above the average player, there are probably twenty players who are 10 pecent below average.

This is as much a theoretical insight as anything, relying on the bell curve. I've used it to explain, for instance, why reuniting Germany did not have a major impact on their talent pool, and why generally, countries with a few million people have about as much talent as demographically similar countries with many millions. In such cases, systemic differences (e.g. training, nutrition) are vastly more important.

3.What a player hits in one ballpark may be radically different from what he would hit in another.

Research has shown, for instance, that Stern John scored at a much higher rate on a smaller field. I suspect we'll see the same thing with Damani Ralph. It's tougher to track defenses down to the attributes of individuals, but it is possible.

Another related thing that I've found is that home field advantage can be quite variable from one team to the next.

4. Ballplayers, as a group, reach their peak value much earlier and decline much more rapidly than people believe.

I've concluded something similar with soccer data in multiple ways ... hopefully, I'll get around to presenting new results sometime soon.

5.Players taken in the June draft coming out of college (or with at least two years of college) perform dramatically better than players drafted out of high school.

6. The chance of getting a good player with a high draft pick is substantial enough that it is clearly a disastrous strategy to give up a first round draft choice to sign a mediocre free agent. (see note #1)


... not too hard to explore these kinds of effects. I've done some work documenting how the Galaxy's long run of success was built upon a superior draft record. Their dropoff in 03 followed a run of three years (01, 02, 03) where other teams did better in the draft.

A question that interests me now is whether teams are being smart by paying so much attention to P-40's. Last year's draft was stocked, but it's still hard to fathom how Pat Noonan slipped to #9, Damani Ralph to #17, and how Nat Borchers and David Testo went undrafted.

7.A power pitcher has a dramatically higher expectation for future wins than does a finesse picther of the same age and ability.

For goalscorers, we have ways to quantify these things (and empirical evidence that speedy forwards tend to peak early) ... sog % seems to be a good proxy for power. I expect that a player's weight will be an interesting predictor to explore, as well as fouls suffered. (Suffering few fouls seems to be a good predictor of longevity.)

8.Single season won-lost records have almost no value as an indicator of a pitcher's contribution to a team.

Something similar might be our studies of goalkeepers' save percentages ... from MLS data, I remember finding that a good save percentage makes very little practical difference from a mediocre one.

9. The largest variable determining how many runs a team will score is how many times they get their leadoff man on base.


This one obviously gets better, the more measurements that exist. It'd be interesting to see the effects of "first foul," "first shot," and "first goal."

10. A great deal of what is perceived as being pitching is in fact defense.


We've found that great deal of what is perceived as goalkeeping is in fact defense (e.g. by predicting save percentages from the offside trap).

11.Rightward shifts along the defensive spectrum almost never work.

This kind of thing is definitely amenable to study in soccer.

12.Our idea of what makes a team good on artificial turf is not supported by any research.

This looks do-able, too.

13. When a team improves sharply one season they will almost always decline in the next.

Regression to the mean, standard stuff.

14. The platoon differential is real and virtually universal

Can't think of an analogue, so I'll mention that promotion/relegation, an idea that seems to be about putting all the strong teams together, actually creates more imbalanced groupings ... another kind of interesting finding.

By and large, these kinds of things are easy for us. Where we have trouble is in fine-grained quantitative analysis of a player's performance. On that matter, we're up the creek until there's better data gathering.

kenntomasch
17 Apr 2004, 04:41 PM
Comparing data for lower-division and higher-division goalscoring is straightforward.

I don't know that it's as straightforward as comparing minor and major-league hitting performance. The former is more of an expression of individual ability, with Bill James' translation system attempting to remove the factors that tend to unrealistically enhance or detract from the expression of that ability.

I don't think we know yet how to adjust for the contextual factors of lower-division soccer (the fields, the teammates serving you, the bus trips, the hacks you may play against, coaching system, many, many things).

And, as I always like to say, goals are one of the few stats we have, but they really don't give an adequate expression of the true value of most of the eleven players on the field at a given moment.

I'd like to see the data, if you have some - I know Stern John scored well in the A-League and in MLS, but I don't know if you can take someone's limited A-League, PSL, or PDL statistics (if you're talking about soccer in the US, which most of the time is what we do) and do a "translation" a la James' work and tell how well a soccer player will perform in MLS. If, that is, indeed, your goal, and, in the context of this discussion, it would seem as though something analagous to James' work is what we're on about.

If you're talking about England or other countries, there may, in fact, be a correlation to how well someone scores in the lower divisions and how well they score in the Premiership. I would imagine the differences between the external factors in the A-League and those in MLS make for a much larger gap than between Nationwide Division I and the Premiership.


This is as much a theoretical insight as anything, relying on the bell curve. I've used it to explain, for instance, why reuniting Germany did not have a major impact on their talent pool, and why generally, countries with a few million people have about as much talent as demographically similar countries with many millions. In such cases, systemic differences (e.g. training, nutrition) are vastly more important.

You mean, a team from a country the size of Trinidad might be able to lay a 5-2 whipping on a team from a country the size of the United States? Could you please tell that to the people who lose their minds when the CONCACAF Champions Cup is played? ;)


Research has shown, for instance, that Stern John scored at a much higher rate on a smaller field. I suspect we'll see the same thing with Damani Ralph.

I haven't seen that research. I'd like to. I started to do something on the relation between field size and overall scoring, but I didn't find that there was a huge difference in the size of fields, when you looked at all ten in MLS. San Jose, sure. Cardinal Stadium was narrow. Ohio Stadium was narrow. But there are factors as far as surface and the ability of the team in question and its opponent that have to factor into it as well. I am not sure that all other things were equal in that type of analysis (I think whichever of the Hirdt boys it was that did Analyze This last year did something like this, but I'm not sold on how they tried to apply concepts from other sports to soccer - not just yet, anyway).


Another related thing that I've found is that home field advantage can be quite variable from one team to the next.

Indeed, regardless of sport. While overall, historically, home teams have won 55% of Major League Baseball games, the data I studied for the 2002 MLB season showed that home-field advantage (the difference between a team's home won-lost record and its road won-lost record) varied from +21 games (in the case of Colorado) to -9 games (in the case of Boston). It's not a straight across-the-board advantage.

The 2001 MLS data (which is all I can find on my hard drive quickly while I'm watching Freddy sit the bench again, knowing it's going to inspire another round of "Freddy must start" stories next week) shows DC United had an HFA of .440 (counting ties as half-win and half-loss) while Colorado was only .077 and Tampa Bay was even worse at .060. So, yes, it's variable, but I am not sure you can take credit for the discovery. ;)


I've concluded something similar with soccer data in multiple ways ... hopefully, I'll get around to presenting new results sometime soon.

Please do. The search for objective knowledge is what this forum is all about.



Something similar might be our studies of goalkeepers' save percentages ... from MLS data, I remember finding that a good save percentage makes very little practical difference from a mediocre one.

While I have a hard time comparing the impact a pitcher has on a baseball game with the impact a goalkeeper has on a soccer game, I'm in agreement that save percentage should make little practical difference. At the end of the day, it's about how many times your team puts it in the net versus how many times they put it in yours.


This one obviously gets better, the more measurements that exist. It'd be interesting to see the effects of "first foul," "first shot," and "first goal."

I'd be amazed if first foul and first shot had any correlation at all to victory. First goal does, apparently, if for no other reason than goals, which by their very nature are very scarce, are each more important than runs in baseball. When there are very few goals scored, whoever gets the first one is in a very good position. Whoever gets the first two is in a very, very good position.


We've found that great deal of what is perceived as goalkeeping is in fact defense (e.g. by predicting save percentages from the offside trap).

Or by having four defenders in front of you versus three, or by having good ones in front of you versus having stiffs.


By and large, these kinds of things are easy for us. Where we have trouble is in fine-grained quantitative analysis of a player's performance. On that matter, we're up the creek until there's better data gathering.

Agreed. Big picture stuff has data available. Getting it to the atomic level of individual player performance is harder to do.

numerista
17 Apr 2004, 05:46 PM
I don't think we know yet how to adjust for the contextual factors of lower-division soccer (the fields, the teammates serving you, the bus trips, the hacks you may play against, coaching system, many, many things).

But as I recall, the surprising thing about James's result is that a very crude adjustment system was enough.

If you're talking about soccer in the US, which most of the time is what we do

No, I'm thinking of a hierarchy like England, where the levels of play are reasonably stable and there is plenty of player movement between divisions.

do a "translation" a la James' work and tell how well a [Division One] soccer player will perform in the [Premiership].

One point here: the goal is to do as good a job of predicting for a lower-division player as we would if the player had already been playing in the Premiership. (If we could, this would debunk the common preference for players who are "proven" at a particular level.)


I haven't seen that research. I'd like to. I started to do something on the relation between field size and overall scoring, but I didn't find that there was a huge difference in the size of fields, when you looked at all ten in MLS. San Jose, sure. Cardinal Stadium was narrow. Ohio Stadium was narrow. But there are factors as far as surface and the ability of the team in question and its opponent that have to factor into it as well. I am not sure that all other things were equal in that type of analysis (I think whichever of the Hirdt boys it was that did Analyze This last year did something like this, but I'm not sold on how they tried to apply concepts from other sports to soccer - not just yet, anyway).

I think I'm referring to the same Peter Hirdt column.


Indeed, regardless of sport. While overall, historically, home teams have won 55% of Major League Baseball games, the data I studied for the 2002 MLB season showed that home-field advantage (the difference between a team's home won-lost record and its road won-lost record) varied from +21 games (in the case of Colorado) to -9 games (in the case of Boston). It's not a straight across-the-board advantage.

Believe it or not, Kenn, that's the kind of range you'd expect to see
if every team had the same home field advantage.

kenntomasch
18 Apr 2004, 12:47 AM
But as I recall, the surprising thing about James's result is that a very crude adjustment system was enough.

Enough for baseball. I'm not sure we have enough data for soccer. Maybe we do. I'd be interested to see it before I dismiss it, though.


Believe it or not, Kenn, that's the kind of range you'd expect to see
if every team had the same home field advantage.

Stay around here long enough and you'll see that higher math is not really my strong suit. Basic stuff, yeah, I can handle. But when you start getting into anything beyond Standard Deviations, I invoke my "It was my understanding there would be no math" response.

numerista
19 Apr 2004, 11:57 AM
But when you start getting into anything beyond Standard Deviations...

You're gonna hate me for pointing this out, but this subject is standard deviations :D

kenntomasch
19 Apr 2004, 12:08 PM
Hence "beyond Standard deviations."

Gareth
25 Apr 2004, 10:33 AM
do a google search for Carl Hammond Powerstats. Its a very interesting statistical analysis system that has been developed and used on several levels in the US by an accounting professor who has been a soccer nut for 30 something years.

Auxodium
27 Apr 2004, 03:56 AM
i guess you could keep stats on: tackles, passes, headers, throw-ins, free kicks, fouls commited, cards, goals, missed, own goals, appearences, caps stuff like that. I gues you can't do anything more.


now cricket is a stat man's heaven! The WHOLE GAME is a bunch of stats with the players added! :D