PDA

View Full Version : Goals For, Goals Against, and the Table


Stan Collins
26 Sep 2005, 12:24 PM
As another thread went around on 'rewarding offensive play', I decided to do something I've been meaning to do a while back, which is find out how well correlated a team's finish in the regular season is (I did playoffs a while back) with their offense (measured by Goals For) and defense (measured by Goals Against).

I took every team in MLS's 9-year history, and compared their points rank, their GF rank, and their GA rank. When two or more teams were tied in any category, I averaged their rank. (Like two teams tied for second each got 2.5, or 3 teams tied for second each got 3. This made the data more fine-grained, and in at least one case increased the correlation value.) I did this only within the season and within the division, as this makes for more apples-to-apples comparisons; that is, the differences in schedules, offensiveness of the league that year, etc. are mostly ironed out this way.

Here was my data set:

Year Team Pts PRnk GF GFRnk GA GARnk

2004
Crew 49 1 40 4 32 1
United 42 2 43 2 42 2
Metro 40 3 47 1 49 5
Rev 33 4.5 42 3 43 3
Fire 33 4.5 36 5 44 4

Wizards 49 1 38 3 30 1
Galaxy 43 2 42 1 40 4
Rapids 41 3 29 5 32 2
Quakes 38 4 41 2 35 3
Burn 36 5 34 4 45 5


2003
Fire 53 1 53 2 43 3
Rev 45 2 55 1 47 5
Metro 42 3 40 4 40 2
United 39 4 38 5 36 1
Crew 38 5 44 3 44 4

Quakes 51 1 45 2 35 1.5
Wizards 42 2 48 1 44 3
Rapids 40 3 40 3 45 4
Galaxy 36 4 35 4.5 35 1.5
Burn 23 5 35 4.5 64 5


2002
Rev 38 1.5 49 1 49 5
Crew 38 1.5 44 2 43 3
Fire 37 3 43 3 38 1
Metro 35 4 41 4 47 4
United 32 5 31 5 40 2

Galaxy 51 1 44 2.5 33 1
Quakes 45 2 45 1 35 2
Burn 43 3.5 44 2.5 43 3
Rapids 43 3.5 43 4 48 5
Wizards 36 5 37 5 45 4


2001
Fusion 53 1 57 1 36 2
Metro 42 2 38 4 35 1
Rev 27 3 35 3 52 4
United 26 4 42 2 50 3

Fire 53 1 50 1 30 1
Crew 45 2 49 2 36 2
Burn 35 3 48 3 47 3
Mutiny 14 4 32 4 68 4

Galaxy 47 1 52 1 36 2
Quakes 45 2 47 2 29 1
Wizards 36 3 33 4 53 4
Rapids 23 4 36 3 47 3


2000
Metro 54 1 64 1 56 2.5
Rev 45 2 47 3 49 1
Fusion 41 3 54 2 56 2.5
United 30 4 44 4 63 4

Fire 57 1 67 1 51 2
Mutiny 52 2 62 2 50 1
Burn 46 3 54 3 54 3
Crew 38 4 48 4 58 4

Wizards 57 1 47 1.5 29 1
Galaxy 50 2 47 1.5 37 2
Rapids 43 3 43 3 59 4
Quakes 29 4 35 4 50 3


1999 (Last year of Shootout)
United 57 1 65 1 43 2
Crew 45 2 48 3 39 1
Mutiny 32 3 51 2 50 3
Fusion 29 4 42 4 59 5
Rev 26 5 38 5 53 4
Metro 15 6 32 6 64 6

Galaxy 54 1 49 3 29 1
Burn 51 2 54 1 35 2
Fire 48 3.5 51 2 36 3
Rapids 48 3.5 38 5 39 4
Clash 37 5 48 4 49 5
Wizards 20 6 33 6 53 6


1998
United 58 1 74 1 48 1
Crew 45 2 67 2 56 2
Metro 39 3 54 3 63 4
Fusion 35 4 46 5.5 68 6
Mutiny 34 5 46 5.5 57 3
Rev 29 6 53 4 66 5

Galaxy 68 1 85 1 44 1
Fire 56 2 62 2.5 45 2
Rapids 44 3 62 2.5 69 6
Burn 37 4 43 6 59 4
Clash 33 5 48 4 60 5
Wizards 32 6 45 5 50 3


1997
United 55 1 70 1 53 3
Mutiny 45 2 55 2 60 5
Crew 39 3 42 4 41 1
Rev 37 4 40 5 53 3
Metro 35 5 43 3 53 3

Wizards 49 1 57 1 51 3
Galaxy 44 2 55 3 44 1
Burn 42 3 55 3 49 2
Rapids 38 4 50 5 59 4.5
Clash 30 5 55 3 59 4.5


1996
Mutiny 58 1 66 1 51 2
United 46 2 62 2 56 3.5
Metro 39 3 45 4 47 1
Crew 37 4 59 3 60 5
Rev 33 5 43 5 56 3.5


Galaxy 49 1 59 2 49 2
Burn 41 2.5 50 3.5 48 1
Wizards 41 2.5 61 1 63 5
Clash 39 4 50 3.5 50 3
Rapids 29 5 44 5 59 4


And here were my results:
Goals For: http://home.mindspring.com/~stancollins/sitebuildercontent/sitebuilderpictures/mlsdata_18166_image001.gif
y = 0.7609x+0.7174
R-squared = 0.572

Goals Against:http://home.mindspring.com/~stancollins/sitebuildercontent/sitebuilderpictures/mlsdata_16720_image001.gif
y = 0.5841x + 1.2476
R-squared = 0.3388

Now, I plan to eventually publish these results on the News and Analysis forum eventually, along with interpretation (including a comparison to the playoffs, once I find the thread where I did that). I'd like to check here first for any obvious data entry errors, as well as reciving any comments you guys have on interpretation.

It appears to me that this means the MLS regular season rewards offense over defense. As someone who favors deliberatley rewarding offense over defense, this struck me as interesting. It seems common knowledge that, as Luis Bueno put it in Sunday's Press-Enterprise, "Goals are always at a premium, and more often than not, clubs that can prevent them are better off than those that constantly fill the nets." (http://www.pe.com/sports/soccer/stories/PE_Sports_Local_D_galaxy_feature_25.303b16b.html) The claim may be true for MLS Cup of the US Open Cup, the evidence he cited (subject of a different thread), but it seems untrue for the regular season.

It seems that in the regular season, if I'm doing my analysis right, that scoring more goals (relative to the competition) is more regularly rewarded, and rewarded to a greater extent, than allowing fewer of them. My first suspect for why this would be is that the 3/1/0 points system is working; teams are generally 'playing to win' rather than playing 'not to lose', and the teams that socre more get those crucial 3 points, leaving the 0-0 and 1-1 drawing teams behind (high scoring draws and losses are comparatively rare).

Some of the interesting questions to pursue are, how does this compare with European 'unbalanced' leagues? How does it compare with leagues that used the old 2/1/0 points system? How does it compare with leagues (like the USL) that have used even more overtly offense-oriented scoring systems? And lastly, how does it compare with the MLS playoffs (a comparison I intend to do at a later point).

Stan Collins
26 Sep 2005, 12:31 PM
Btw, I'd complain about not being able to post pictures to the very forum where they're probably the most useful again, but in this case the pictures don't help much. I don't know how to make 'weighted' (ie bigger) dots where the data point occures more often (if that can be done at all with excel), so it looks like a bunch of dots scattered across the screen barely different from randomly, until you apply the best-fit line.

numerista
26 Sep 2005, 02:01 PM
Btw, I'd complain about not being able to post pictures to the very forum where they're probably the most useful again, but in this case the pictures don't help much. I don't know how to make 'weighted' (ie bigger) dots where the data point occures more often (if that can be done at all with excel), so it looks like a bunch of dots scattered across the screen barely different from randomly, until you apply the best-fit line.

Hey Stan,

Not much time, so quick replies ...

1. to get weighted dots effect, add small amounts of random noise to each data point.

2. does excel let you regress onto both variables at once?

3. is the difference in regression coefficients significant?

4. is there a theoretical reason to believe offense might be more important than defense? Are your results consistent with that? (JG has explored this w/ Poisson model.)

5. any thoughts on the direction of the causal arrow? is it offense => good, or is it good => opponent forced to gamble, leading to more insurance goals? not sure this is resolvable.

Stan Collins
26 Sep 2005, 02:30 PM
1. to get weighted dots effect, add small amounts of random noise to each data point. Yea, might be worth doing. Especially if I post a picture to N&A.

2. does excel let you regress onto both variables at once? I don't think so, or don't know how to make it work.

3. is the difference in regression coefficients significant? Without doing the math, you would think it is, since there are 98 data points. (Upon completion of this season, there will be 110. It will be interesting to see if that makes the R-squareds larger or smaller).

4. is there a theoretical reason to believe offense might be more important than defense? Are your results consistent with that? (JG has explored this w/ Poisson model.) The 3-1-0 is the easy target candidate. I would think looking at old English (or some other league) results under the 2-1-0 versus the 3-1-0 might give some clue as to how much difference that makes.

Other reasons? Dunno. Open to suggestions.

5. any thoughts on the direction of the causal arrow? is it offense => good, or is it good => opponent forced to gamble, leading to more insurance goals? not sure this is resolvable. Insurance goals should weaken the correlation, shouldn't they, since they 'don't count' for the puposes of your table rank? (Like if you scored 100 goals on only 10 wins because all your wins were 10-0, you would be fist in GF rank, but not in table rank).

Of course, maybe the argument is that teams that lose a lot tend to lose by a lot more than the teams that win a lot win by a lot (thus adding more garbage goal 'noise' to the GA stat than the GF). Even so, I didn't keep track of total goals allowed, but ranking within the division, so you only get a noise facter when a team that was better in GA or GF finishes worse in the table, or vice versa. For that, you need an unusual kind of team: one that wins more often but when they lose, they lose badly (or the other way around).

scaryice
26 Sep 2005, 07:14 PM
Looking at my excel files, I agree with you, goals for seem more important. Of the top ten MLS team seasons (out of 98 overall), 7 are in the top ten in goal scored per game. Only 2 are in the top ten in goals allowed.

numerista
27 Sep 2005, 01:10 PM
Without doing the math, you would think it is, since there are 98 data points. (Upon completion of this season, there will be 110. It will be interesting to see if that makes the R-squareds larger or smaller).


I tend to suspect that the difference in coefficients won't be signficant. Excel ought to be able to give you the std. error of a regression coefficient. (This won't apply perfectly to rank regression, but should be good enough.)


You only get a noise facter when a team that was better in GA or GF finishes worse in the table, or vice versa. For that, you need an unusual kind of team: one that wins more often but when they lose, they lose badly (or the other way around).

Not necessarily ... if a team loses low-scoring games, then it could be good in GA but bad in the table. An example of this is the 2000 Quakes, a team that knew it was bad, and did a lot of bunkering in order to avoid getting blown out.

Stan Collins
27 Sep 2005, 03:45 PM
It didn't end up making that much difference for the Quakes, though. They ranked 4th on points, 4th on offense, and 3rd on defense.

And is that really noise, or does it just prove the point that the key factor to placement in the table is being able to score some goals yourself?

-----

As another element in the analysis, here are the number of 'exact matches' (where the standings in one category mirror the table) for offense as opposed to defense:

GF: 4 (out of 20 conferences/divisions, or 20%)
GA: 2 (out of 20, or 10%)

Notwithstanding that you appear correct that the difference is not statistically significant. The Standard Errors are:
GF: .966
GA: 1.201

That's mostly down to the number of observations. The extra 12 may help a little, but since we're dealing with the square root here, probably not enough. We'll see in three weeks.

However, while there may not be statistical significance, it doesn't look like that difference is random.
----

Also, I added a 'total' equation, which is the mean of the GF and GA rankings, and got the numbers for that:
y = 1.0305x - .00914
R-squared = .6862
Standard Error: .665

Now that only tells you what any idiot could have figured out, that scoring more and allowing fewer goals over a season gets you a better spot in the table. But I thought the coefficients might be interesting in themselves.

numerista
28 Sep 2005, 08:02 AM
It didn't end up making that much difference for the Quakes, though. They ranked 4th on points, 4th on offense, and 3rd on defense.

The 2000 Quakes? I believe that they finished 12th on points and 12th on offense, but a strong-looking 4th on defense.

And is that really noise, or does it just prove the point that the key factor to placement in the table is being able to score some goals yourself?

In isn't only a question of what that Quakes team was able to do. It's also a question of what they chose to do. Because they were overmatched, they adopted a heavily defensive playing style.

Stan Collins
28 Sep 2005, 02:48 PM
The 2000 Quakes? I believe that they finished 12th on points and 12th on offense, but a strong-looking 4th on defense. Ah, well, the confusion there is I only compared across divisions, which for the most part helps iron out schedule variations. Thererfore they were 4th in their 4-team division on offense, 3rd in defense, and 4th overall.

It isn't only a question of what that Quakes team was able to do. It's also a question of what they chose to do. Because they were overmatched, they adopted a heavily defensive playing style. Yes, but there would be question as to whether this caused them to give up fewer goals than a team slightly better than they, who because of that played a slightly less defensive style.

And I think there would be question as to whether the phenomenon repeats itself very often. Most truly overmatched teams tend to give up rather a lot of goals even despite playing a defensive style. I doubt the remainder is enough to skew the thing very much.

Serie Zed
28 Sep 2005, 08:40 PM
You can regress on up to 16 variables at a time in Excel (IIRC).

Just define your Y variable as the wins column, then choose both the GF and GA columns for your X variables.