PDA

View Full Version : Sabermetrics applying to Soccer


Pages : 1 [2] 3 4 5 6 7 8 9 10 11 12 13 14 15 16

kenntomasch
01 Aug 2003, 12:38 PM
There's a fairly good correlation, I've found, between whatever type of "points" a particular team sport uses and how well the points scored and points allowed tracks with w/l percentage.

Which, if you think of it, makes sense. In a reasonably-sized data sample, if you're giving up more "points" (or runs, or goals, or whatever) than you score, chances are you're going to lose more games than you win. And vice-versa.

I ran the numbers once and I don't remember what they showed. But draws have a tendency to throw things off, as does the 3-1-0 thing.

Edited to add the pythagoreans from MLS in 2002:


Team.........GF..GA....Exp...W....L..T..Pct....Diff
Chicago......43..38.. .561...11..13..4.. .464..+9.7%
Colorado.....43..48.. .445...13..11..4.. .536..-9.0%
Columbus.....44..43.. .511...11..12..5.. .482..+2.9%
Dallas.......44..43.. .511...12...9..7.. .554..-4.2%
DC United....31..40.. .375....9..14..5.. .411..-3.5%
Kansas City..37..45.. .403....9..10..9.. .482..-7.9%
Los Angeles..44..33.. .640...16...9..3.. .625..+1.5%
MetroStars...41..47.. .432...11..15..2.. .429..+0.4%
New England..49..49.. .500...12..14..2.. .464..+3.6%
San Jose.....45..35.. .623...14..11..3.. .554..+7.0%


As you can see, it comes within 4.2% of the actual W-L-T percentage for seven of the ten teams. It was way off on Colorado, Chicago, and, to a lesser extent, Kansas City.

I don't know if that's because, since goals in soccer are much less plentiful than runs in baseball, the odd goal, or one at an inopportune time, has a greater effect than that in baseball. Maybe the effect is magnified, making the correlation harder to track.

But I know that if you rank teams by points or W-L-T percentage or whatever, and you put their straight goal differential at the end, usually the positives are at the top and the negatives are at the bottom.

beineke
01 Aug 2003, 02:13 PM
I don't know if that's because, since goals in soccer are much less plentiful than runs in baseball, the odd goal, or one at an inopportune time, has a greater effect than that in baseball. Maybe the effect is magnified, making the correlation harder to track.


This is true, but remember also that the baseball season is 162 games. That allows a lot more time for things to even out.

By the way, if you asked a hypothetical stats grad student about this, he might consider a Poisson model instead of the simpler Pythagorean Formula. Then he might spend a few minutes scratching out formulas on a sheet of paper and conclude that the Pythagorean Formula is sort of like assuming that every game is decided by two runs.

From there, he might think ... soccer is usually decided by only one goal. Under that assumption, you get a different equation, and it's really simple:

Goals scored
--------------------------------------------------
Goals scored + Goals Allowed

If this turns out to be an improvement for soccer, a hypothetical stats grad student would be very pleased. :)

Flyer Fan
01 Aug 2003, 02:28 PM
Originally posted by kenntomasch
I ran the numbers once and I don't remember what they showed. But draws have a tendency to throw things off, as does the 3-1-0 thing. And that's where I got most "confused." I'm never sure how to account for draws. Does one consider a winning percentage or a non-losing percentage? To me, draws should count as losses in a winning percentage since you didn't win. However, I think it's more common to consider a draw half a win and half a loss, right?

kenntomasch
01 Aug 2003, 02:33 PM
Right.

But points wise, a win is more than twice as good as a draw under 3-1-0.

I agree, it does leave it open to interpretation about what percentage to use.

microbrew
01 Aug 2003, 02:39 PM
There's quite a bit of academic papers written on sports, including soccer. And let's get some links:

Good intro article, covering players' market and competitive balance:
http://www.iesbs.com/pdf/sports_economics.pdf
In particular, section 3.5 mentions MLS and has a brief history of single entity ownership.

This paper, It's Fourth Down and What Does the Bellman Equation Say? A Dynamic-Programming Analysis of Football Strategy by David Romer,
was in the news a few months ago.
http://emlab.berkeley.edu/users/dromer/papers/nber9024.pdf

The Sport League's Dilemma: Competitive Balance versus Incentives to Win by
Frederic Palomino and Luca Rigotti, which concludes that "Under demand maximization, a performance-based reward scheme (used by European sport leagues) may be optimal. Under joint profit maximization, full revenue sharing (used by many US leagues) is always optimal."
http://repositories.cdlib.org/iber/econ/E00-292/

Maybe I should start a thread on this: I had an earlier post with more links, I'll need to search. It's possible it was wiped. Anyway, you can find out more by looking at bibliographies of these papers.

microbrew
01 Aug 2003, 03:19 PM
Originally posted by beineke
From there, he might think ... soccer is usually decided by only one goal. Under that assumption, you get a different equation, and it's really simple:

Goals scored
--------------------------------------------------
Goals scored + Goals Allowed

If this turns out to be an improvement for soccer, a hypothetical stats grad student would be very pleased. :)

I'll have to refresh my memory on what processes Poisson distributions model best, but...

That formula probably could be improved upon. Some things I can think of, immediately, are:
1) somehow collapse runaway scores
2) take a closer look at goals in overtime wins/losses, as teams behave differently in overtime

In any case, say a stat like is useful. How do you find players that maximize the Goals Scored to Goals Allowed ratio?

beineke
01 Aug 2003, 03:45 PM
Originally posted by microbrew
That formula probably could be improved upon. Some things I can think of, immediately, are:
1) somehow collapse runaway scores
2) take a closer look at goals in overtime wins/losses, as teams behave differently in overtime

You could certainly do things with a more sophisticated model, but the beauty of the Pythagorean Formula is that it requires only a minimal amount of time and information to compute ... in that respect, the ratio of Goals Scored to Total Goals is even quicker.

Incidentally, I ran the numbers for both the Pythagorean and the simple ratio. Mean error:

Pythagorean: 5.0 %
Simple Ratio: 3.6 %

kenntomasch
01 Aug 2003, 03:59 PM
I'd have to check when I got home, but I'm almost positive James relayed in an early Abstract why he squared everything. For some reason, I want to say it was because the error was reduced that way.

Which, as you showed, is the opposite of MLS. Which makes sense for the reasons you elucidated.

Speaking of academic papers, you have one about a study that showed that teams don't score significantly more goals or have fewer draws when the league uses a 3-1-0 system versus a 2-1-0 system, don't you?

JG
01 Aug 2003, 04:15 PM
Originally posted by kenntomasch
I'd have to check when I got home, but I'm almost positive James relayed in an early Abstract why he squared everything. For some reason, I want to say it was because the error was reduced that way.


Probably. IIRC empirical study has shown that the best exponent for baseball is 1.83. Something like 16.1 works well for the NBA.

There was an interesting article on rec.sport.soccer a few years ago where the author analyzed the value of goals in a Serie A season based on their context (i.e. what's the value of a goal that puts your team ahead by 2 goals in the 35th minutes?) and came up with a "value-weighted" topscorers list that had some large differences from the "raw" topscorer list.

http://www.rsssf.com/miscellaneous/paserman-howmuchgoals.html

Presumably that study could be expanded to other leagues and seasons to get more accurate point values for goals in each situation, and to see which players consistently score important goals. I can't remember if the author ever did more work on the subject--will check google.

NER_MCFC
01 Aug 2003, 04:28 PM
I'm both a baseball fan and something of a numbers geek, and I am inclined to agree with everyone who mentioned the issue about the rarity of goals in soccer. Their rarity means that no particular series of events that actually leads to a goal will do so very often. The relative lack of discrete events is also a problem. This is to say nothing about the lack of unanimity on definitions of events (Was that a bad pass or a bad bounce? Was that a cross or a shot?).

It seams to me that there is plenty of potential for statistics in analyzing individual performance (the Simon Elliot example) or particular situations (like the corner kick analysis that was mentioned), but unless someone finds events or patterns that consistently correlate with scoring or allowing goals I'm at a loss to see how you could use the kind of analysis that baseball, especially, allows of overall team and season performances.

microbrew
01 Aug 2003, 04:31 PM
Here's my post which has a link to paper analyzing the three point victory and Golden Goal.

http://www.bigsoccer.com/forum/showthread.php?s=&postid=414316#post414316

While rereading the paper, I came across this quote:
"Most importantly, the following conditions hold for virtually every soccer fan: (i) (s)he has spent a fair
amount of leisure time thinking about the effects and suitability of rule changes, (ii) (s)he
has come up with a strong ad-hoc opinion about it, and (iii) (s)he believes that economic
modelling cannot add anything to this debate."

beineke
01 Aug 2003, 05:06 PM
Originally posted by JG
There was an interesting article on rec.sport.soccer a few years ago where the author analyzed the value of goals in a Serie A season based on their context (i.e. what's the value of a goal that puts your team ahead by 2 goals in the 35th minutes?) and came up with a "value-weighted" topscorers list that had some large differences from the "raw" topscorer list.


Although this is an interesting idea in principle, it's very muddy statistically.

Without getting too technical, the ability to score "clutch" goals relies on having some ability to score goals to begin with, so "non-clutch" goals are still a useful indicator of scoring ability. We shouldn't downweight them very much just because they weren't tactically all that important.

The above is true as long as every opponent is playing respectable defense ... and in Serie A, that's what you expect. A more promising approach would be to do this kind of study for the US national team, adjusting for strength of opponent.

Is Joe-Max Moore one of our all-time great scorers? He has 24 goals, but he got 4 in a 7-0 friendly against El Salvador, 2 in a 7-0 rout against Barbados, and 2 more in an 8-1 win against the Cayman Islands. A few of his other goals have been scored on penalty kicks. It'd be very interesting to see his adjusted scoring totals.

beineke
01 Aug 2003, 05:31 PM
Originally posted by NER_MCFC
Unless someone finds events or patterns that consistently correlate with scoring or allowing goals I'm at a loss to see how you could use the kind of analysis that baseball, especially, allows of overall team and season performances.

In sports like baseball and American football, measurement is assisted by defining intermediate goals. Getting on base leads to runs, and yardage leads to touchdowns. So we get a foothold by measuring on base percentage and yardage.

In soccer, we can also define intermediate goals ... winning possession, maintaining possession from defensive third to middle third, the middle third to attacking third, and the attacking third into scoring position.

Along the way, it's possible to tabulate the tackles that Armas wins, the passes that Reyna receives and completes, and the on-target crosses that Eddie Lewis delivers. These numbers will be imperfect and subjective, but they still capture a chunk of what's happening on the field.

kenntomasch
01 Aug 2003, 05:33 PM
It's just not as easy to quantify those things as it is in baseball or football.

I mean, it can be done (OPTA does a bunch of it) but it's labor-intensive.

JG
01 Aug 2003, 05:44 PM
Originally posted by beineke

Without getting too technical, the ability to score "clutch" goals relies on having some ability to score goals to begin with, so "non-clutch" goals are still a useful indicator of scoring ability. We shouldn't downweight them very much just because they weren't tactically all that important.

The idea as I see it wouldn't necessarily be to measure scoring ability, but to see whether certain players have a knack for scoring important goals, and also to construct a scoring table that gives a better indication of how important a player's goals were to their team.

OTOH it would probably be tricky...the system would probably favor guys whose teams play a lot of close games (presumably teams near the middle of the table) while hurting guys on teams that play more lopsided games (presumably the teams at the top and bottom of the league)

I think that just the score/time/result probability data would be interesting too...in which situations is the benefit of scoring a goal greater than the cost of allowing a goal?

joe2
01 Aug 2003, 06:12 PM
I have done a thorough statistical study of basketball, baseball, American football and soccer. I have found a strong corelation in all sports between scoring the most points and winning. I hope all coaches and statisticians take heed of these findings.

NGV
01 Aug 2003, 06:22 PM
Originally posted by JG
[B]The idea as I see it wouldn't necessarily be to measure scoring ability, but to see whether certain players have a knack for scoring important goals, and also to construct a scoring table that gives a better indication of how important a player's goals were to their team.
[B]

Even if there's really no such thing as an ability to score important goals, though, some players will still end up at the top of an important goal table based on pure luck. So, for it to be believable, you'd have to show that the same players tend to display this "knack" consistently from season to season.

In baseball, as far as I know (which admittedly isn't very far), attempts to show a consistent ability to hit "in the clutch" have pretty much come up empty. I'd suspect the same would be true for soccer.

skipshady
01 Aug 2003, 06:41 PM
I'm not familiar with the field of statistics at all, but I'm wondering if you could generate a soccer equivalent of hockey's plus/minus stat, as in, how many goals/shots/scoring chances occurs when a certain player is on the field, as opposed to when he's off, or how many goals/shots/scoring chances are allowed.

This may be a good measure of, for example, a good defensive midfielder who allows his central midfield partner to make forward runs, or a wingback who makes intelligent runs to stretch the defense.
In either case, the player contributes to the attack without touching the ball.

beineke
01 Aug 2003, 06:51 PM
Originally posted by JG
The idea as I see it wouldn't necessarily be to measure scoring ability, but to see whether certain players have a knack for scoring important goals


That's a nice idea, but their study is too elaborate to be interpretable ... here's a cleaner starting point.

MLS 2002
Goals leaders

Ruiz 24 G, 9 GW -- 7.63 expected
Twellman 23 G, 5 GW -- 5.63 expected
Cunningham 16 G, 3 GW -- 4.00 expected
Graziani 14 G, 6 GW -- 4.35 expected
Razov 14 G, 3 GW -- 3.25 expected
Kreis 12 G, 5 GW -- 3.55 expected
Diallo 12 G, 4 GW -- 3.20 expected
Faria 12 G, 2 GW -- 3.22 expected
Carrieri 11 G, 5 GW -- 3.32 expected
Chung 11 G, 3 GW -- 3.32 expected
Henderson 11 G, 2 GW -- 3.32 expected

For each of the top goalscorers, we have his total goals, his game-winners, and his expected number of game-winners under a simple null model (player goals * team wins/team goals).

There doesn't seem to be much in the way of trends here, though we can also look at other seasons.

I think that just the score/time/result probability data would be interesting too...in which situations is the benefit of scoring a goal greater than the cost of allowing a goal?

Agreed. I once saw a fascinating hockey paper about when to pull the goalie ... I think it was in Chance, but I don't know how to find it now.

beineke
01 Aug 2003, 06:58 PM
Originally posted by skipshady
I'm not familiar with the field of statistics at all, but I'm wondering if you could generate a soccer equivalent of hockey's plus/minus stat, as in, how many goals/shots/scoring chances occurs when a certain player is on the field, as opposed to when he's off, or how many goals/shots/scoring chances are allowed.


This is most useful in hockey because all players spend a lot of time both on and off the ice.

In soccer, it's less useful because most players are on the field all the time. Only injuries provide a good way to compare team X with or without a certain player.

Derek Fisher once led the NBA in plus/minus per minute. That's because he was always on the court teaming with Shaq, while facing the opposing teams back-ups.