PDA

View Full Version : First or Last 15 Minutes Most Important?


Pages : [1] 2 3

the101er
05 Sep 2003, 09:42 AM
Based on an article that ran in FourFourTwo this month, I decided to investigate which teams scored goals, when, in the EPL. The original article claimed that the last 5 minutes of any match weren't important even though the data showed that several teams would have changed position in the standings if games had been played only through the 85th minute.

I suppose I started with the hypothesis, if the last 5 minutes aren't important, what about the last 10 or the last 15? I then, thought, well what about the idea that "Getting the first goal is the most important." So I also looked at how teams scored in the first 15 minutes of games.

I was surprised by what I found. I am going to attempt to copy my study here, and let the critics have at it. This is in no way a conclusive study and I have suggested areas of further investigation.

I am continuing by looking back at the 2001/2002 season and also looking at some interesting correlations that look like they are there in the data, but I haven't verified. Perhaps somone with more knowledge of statistics can help me determine if two sets of data can be proven to be related.

Here are the results of a study of goals scored in the first 15 minutes and last 15 minutes of all EPL games in 2002/2003, with comments following.

Scoring 0-15 Min 75-90 Min Total

Man Utd. 7 17 74
Opposition 7 6 34
Ratio 1.00 2.83 2.18
Variance -54.05% 30.18%
Goal Diff. 0.00 11.00 40.00
% of Total GD 0.00% 27.50%

Arsenal 18 14 85
Opposition 3 10 42
Ratio 6.00 1.40 2.02
Variance 196.47% -30.82%
Goal Diff. 15.00 4.00 43.00
% of Total GD 34.88% 9.30%

Newcastle 5 17 63
Opposition 3 2 48
Ratio 1.67 8.50 1.31
Variance 26.98% 547.62%
Goal Diff. 2.00 15.00 15.00
% of Total GD 13.33% 100.00%

Chelsea 7 19 68
Opposition 2 10 38
Ratio 3.50 1.90 1.79
Variance 95.59% 6.18%
Goal Diff. 5.00 9.00 30.00
% of Total GD 16.67% 30.00%

Liverpool 6 13 61
Opposition 6 15 41
Ratio 1.00 0.87 1.49
Variance -32.79% -41.75%
Goal Diff. 0.00 (2.00) 20.00
% of Total GD 0.00% -10.00%

Blackburn 8 8 52
Opposition 4 17 43
Ratio 2.00 0.47 1.21
Variance 65.38% -61.09%
Goal Diff. 4.00 (9.00) 9.00
% of Total GD 44.44% -100.00%

Everton 4 9 48
Opposition 8 14 49
Ratio 0.50 0.64 0.98
Variance -48.96% -34.38%
Goal Diff. (4.00) (5.00) (1.00)
% of Total GD 400.00% 500.00%

Southampton 4 10 43
Opposition 7 18 46
Ratio 0.57 0.56 0.93
Variance -38.87% -40.57%
Goal Diff. (3.00) (8.00) (3.00)
% of Total GD 100.00% 266.67%


Man City 4 19 48
Opposition 9 11 49
Ratio 0.44 1.73 0.98
Variance -54.63% 76.33%
Goal Diff. (5.00) 8.00 (1.00)
% of Total GD 500.00% -800.00%

Tottenham 9 13 51
Opposition 6 13 62
Ratio 1.50 1.00 0.82
Variance 82.35% 21.57%
Goal Diff. 3.00 0.00 (11.00)
% of Total GD -27.27% 0.00%

Middlesboro 1 9 48
Opposition 4 9 44
Ratio 0.25 1.00 1.09
Variance -77.08% -8.33%
Goal Diff. (3.00) 0.00 4.00
% of Total GD -75.00% 0.00%

Charlton 6 10 45
Opposition 6 15 56
Ratio 1.00 0.67 0.80
Variance 24.44% -17.04%
Goal Diff. 0.00 (5.00) (11.00)
% of Total GD 0.00% 45.45%

Birmingham 2 14 41
Opposition 6 13 49
Ratio 0.33 1.08 0.84
Variance -60.16% 28.71%
Goal Diff. (4.00) 1.00 (8.00)
% of Total GD 50.00% -12.50%

Fulham 6 9 41
Opposition 4 8 50
Ratio 1.50 1.13 0.82
Variance 82.93% 37.20%
Goal Diff. 2.00 1.00 (9.00)
% of Total GD -22.22% -11.11%

Leeds 12 16 58
Opposition 4 13 57
Ratio 3.00 1.23 1.02
Variance 194.83% 20.95%
Goal Diff. 8.00 3.00 1.00
% of Total GD 800.00% 300.00%

Aston Villa 6 11 42
Opposition 5 15 47
Ratio 1.20 0.73 0.89
Variance 34.29% -17.94%
Goal Diff. 1.00 (4.00) (5.00)
% of Total GD -20.00% 80.00%




Bolton 6 15 41
Opposition 6 12 51
Ratio 1.00 1.25 0.80
Variance 24.39% 55.49%
Goal Diff. 0.00 3.00 (10.00)
% of Total GD 0.00% -30.00%

West Ham 1 10 42
Opposition 9 11 59
Ratio 0.11 0.91 0.71
Variance -84.39% 27.71%
Goal Diff. (8.00) (1.00) (17.00)
% of Total GD 47.06% 5.88%

West Brom A. 2 5 29
Opposition 7 16 65
Ratio 0.29 0.31 0.45
Variance -35.96% -29.96%
Goal Diff. (5.00) (11.00) (36.00)
% of Total GD 13.89% 30.56%

Sunderland 3 5 21
Opposition 6 17 65
Ratio 0.50 0.29 0.32
Variance 54.76% -8.96%
Goal Diff. (3.00) (12.00) (44.00)
% of Total GD 6.82% 27.27%


COMMENTS: Some numbers jump to the forefront and deserve further investigation. First the performance improvements by teams like Man United, Arsenal and Newcastle in specific parts of the game. In soccer, we traditionally discuss tactics as 442, 433, etc. Perhaps a new nomenclature needs to be devised to discuss time variable strategies.

It would appear, for example, that Arsenal and to a less successful degree Chelsea, work hard to get the first goal of the match and then sit back. Not surprising, perhaps, from 2 continental coaches. The more robust, traditionally British sides at Newcastle and Man United dominate the last 15 minutes of matches. Again, not surprising given the expected coaching predilections of Sirs Alex Ferguson and Bobby Robson.

This type of analysis, though, shouldn’t be overused. It is still the teams with the most goals scored and fewest allowed at the top of the table. What this analysis might show is a coaching philosophy that is being matched by on field performance. That is, goals are predictably being scored by the best teams at certain points in the game. Either because of superior talent, coaching or coaching philosophy.

One would expect superior talent to show through out the whole match. So, if successful teams are scoring significantly more goals during certain time periods, this would indicate a coaching philosophy encouraging more aggressive play during those periods of the match. The fact that the best teams are able to “impose their will” on the opposition, shows that the coaching philosophy is being correctly interpreted by the players.

Goals scored at certain key points in the match can’t replace total number of goals as an indicator of success, if a team can predictably improve its ability to score during certain time segments, it appears to increase its overall goals scored. That is, there appears to be a verifiable correlation between scoring goals during key time frames and scoring more goals.

Another area of study: how do teams of similar abilities perform against each other? Without doing a formal study, from the 2002-03 data it appeared that teams of close to equal ability (IE. within a controlled ranking [+/- 2] in the final standings) have more difficulty creating critical early or late goals.

One question that is not answered by this data is the age old question: is it better to score first or last? Of the top 4 teams, the philosophical split appears to be even. Wenger and Ranieri pushing for early goals. Robson and Ferguson wear down the opposition. Perhaps part of Liverpool’s less successful campaign can be attributed to Houllier’s inability to decisively do either.

Further study: The next step is to breakdown more results to see if these trends can be confirmed in larger data sets and to study more subtle nuances of strategy. Perhaps there is no failure at Liverpool, but a designed strategy to take advantage of fatigue at the end of the first half, or play more aggressively at the beginning of the second.

It can be argued that certain late goals should be dropped as insignificant. For example, Newcastle scoring to make the score 2-6 in a loss to Manchester United. But, this misses 3 key factors. First, draws in the final standings are separated by goal difference, so every goal is significant. Second, if goals are meaningless to the result of the game, then they are equally meaningless to both teams, so there is likely little bias towards the winning or losing team. And third, every field player wants to score goals and goalkeepers, regardless of the scoreline, don’t want to allow any.

beineke
05 Sep 2003, 11:14 AM
Great stuff ... I'm impressed by how striking some of these results are (e.g. Newcastle).

One suggestion ... it'd be nice for your table to show goals scored from mins 16-75. That's useful for looking at a team like Everton...

scored mins 1-15 4 16-75 35 76-90 9 Total 48
allowed mins 1-15 8 16-75 27 76-90 14 Total 49

This way, it's easier to pick out the fact that they've done better in the middle game than at either end.

microbrew
05 Sep 2003, 11:53 AM
Originally posted by the101er
The original article claimed that the last 5 minutes of any match weren't important even though the data showed that several teams would have changed position in the standings if games had been played only through the 85th minute.


The last or fifteen (or five minutes) seems arbitrary to me. A quick check might be to take any arbitrary time period, say the 20th to 35th minute, and see what effect it has, and compare to the first or last few minutes.

Perhaps it would be more efficient and more intuitive if you plotted goals vs. time, then compared the plots of different teams to each other, as well as create a plot for the entire league.
We could repeat this for different years and see how consistent the plots are.

If you point me to the raw data, I could play with it in Matlab or Perl. I maybe see how much it deviates from being iid, though I don't know how to do that in a mathematcially rigourous way.

the101er
05 Sep 2003, 01:08 PM
Yes, Newcastle got 100% of their goal difference in the last 15 minutes, so that seems significant. And teams played Man United even in the first 15 minutes, so it will be interesting to see where Man United is dominating games.

Here is where I'm getting the results.

http://www.soccerbot.com/fa/tables/ukprem03.htm

Watch out as they sometimes switch the way they report own goals. This may have lead to some errors in my numbers that I haven't rechecked.

I looked at 2001/2002 for Liverpool, and they did much better in the first and last 15 minutes, and finished second in the league.

Again, I don't know anything about statistics yet, so I don't want to start coming to conclusions (even though I am) without getting some sort of correlation factor, or whatever its called.

beineke
05 Sep 2003, 02:21 PM
Originally posted by microbrew
If you point me to the raw data, I could play with it in Matlab or Perl. I maybe see how much it deviates from being iid, though I don't know how to do that in a mathematcially rigourous way.

If you're interested in doing this, here's what I'd suggest ... we've got a 2 x 3 table for each team. The rows are GF and GA, the columns are minutes 1-15, 16-75, and 76-90.

For Everton, this is
4 35 9
8 27 14

The question of interest is whether the ratio GF/GA depends on what stretch of the game we're in. There is a standard statistical test for this ... the chi-square. Go here and type in the numbers for each team.

http://www.ubmail.ubalt.edu/~harsham/Business-stat/otherapplets/Catego.htm

Make note of the p-values. Most will not be significant, but it would be interesting to examine all of them together. By chance, they should be uniform between 0 and 1. Instead, you'll probably see a bunch that are 0.25 or below.

Incidentally, for Everton the correct p-value is 0.179.

superdave
05 Sep 2003, 03:27 PM
Great stuff.

Did anyone check out Blackburn's numbers? As a fan, I had a sense they were kicking away points at the end of matches. This chart proves it.

Also, the differences are probably more than a matter of tactics, but also of managers putting a different emphasis on stamina when signing starting players.

IASocFan
05 Sep 2003, 03:59 PM
The last 15 minutes is where a team's depth comes into play. Bringing on 3 subs on the same talent as the starters lifts a team. With injuries - which are unavoidable, having no depth causes problems in the last 15 minutes.

the101er
05 Sep 2003, 07:46 PM
Okay.

I have been pounding my head against the chi square test all day. Fortunately, I'm my own boss and I can sort of justify this. Hey, I'm finally learning statistics.

Anyway, if you take the predicted goal difference for the team and input the actual goal difference, you can get a moderate rejection of the null hypothesis for Man United in the 2002-2003 season. That is: Was Man United able to dominate certain time periods of the game? Like asking, is a six sided die weighted? If you throw it 60 times and compare your results to expected results, the chi square test will tell you whether the number of 6's you are rolling is statistically significant or not, since obviously it would be a fluke to roll each number 10 times.

That is: Over the whole season Man United's goal difference was +40. So, in any part of 1/6th of the game, totalling all games, they should be +6.67. 6.67 x 6 = 40.
We are using 6, 15 minute segments of the game.

So, going from minute 0 to minute 90, taking actual goal difference and comparing it to predicted goal difference:
Minute: 0-14,15-29,30-44,45-59,60-74,75-90

Predicted: 6.7, 6.7, 6.7, 6.7, 6.7, 6.7

Actual: -1, 11, 6, 7, 6, 11

These numbers give a moderate rejection of the null hypothesis, indicating that Man United somehow had the dice weighted. There is one significant trough and two significant peaks. In the first 15 minutes, teams played Man United even. Then United picked them apart in the next 15 minutes. Finally, United picked up the scraps at the end of games.

I may have miscalculated or misused the Chi-squared test, as I am still learning about all of this. I would certainly be glad to have someone refute what I have written here.

One thing is obvious, scoring goals at anytime in the game is a good thing. I can't get any rejection of the null hypothesis when comparing goals scored at certain times of the match and won/loss record. But, again, this might just be due to the fact that I am a lousy statistician.

beineke
05 Sep 2003, 08:53 PM
Originally posted by the101er
I may have miscalculated or misused the Chi-squared test, as I am still learning about all of this. I would certainly be glad to have someone refute what I have written here.


Ok, you've stated the problem correctly, but there is one important problem with your analysis.

Need count data

This is a biggie. The test you've used is appropriate when you're totalling things up. When you're taking a difference (in your case, Goals For minus Goals Against), this test no longer works. Unfortunately, I don't have time to suggest a fix right now.

Just one other note ... when I looked at Blackburn's late-game numbers, I got slightly different results from yours. I think you might have mis-tabulated an own goal, and (IIRC) there was something else off, too.

Fraid I've gotta run...

Karl K
05 Sep 2003, 11:53 PM
I think you need to take a step back even further.

First, you need to look at the distribution of the timing of goals scored over the ENTIRE League. In statistical terms, what is the distribution of the population? Are those distributions random? Do they follow a poisson distribution?

Once you do that, then you can look at the scoring of individual teams -- your sample -- to see how much they deviate from the league as a population. Are the deviations statistically signifcant?

If the answer to that question is "no" then you're topic, as they say in statistics, is "uninteresting." What appears to be a proclivity on the part of team A to score late is really just an apparent aberration -- they are in fact, in a statistical sense, more or less average.

But if the deviations ARE statistically significna, you can then run some correlation analyses. Do teams that score more often (in a statistical sense) in the early or late minutes have better goal diferentials?

And so on.

This may require multi-season data.

the101er
06 Sep 2003, 11:37 AM
I am beginning to see the problems we are facing as non-statisticians/soccer coaches. We see numbers, they make sense (or we will them to make sense) and we base our judgements on erroneous assumptions.

Too busy right now to do more calculations, but I am back on it ASAP.

Thanks for the advice.

beineke
06 Sep 2003, 12:54 PM
Let me offer a distinction here, because I think that two questions are getting blurred ...

Question 1: Are goals scored at a constant rate?

Answer No. See, for instance, the bottom of page 5 of this document. Relatively few goals are scored early, more goals are scored late in the game.

MLS all-time totals by 15-minute slice...

474 541 631 625 669 872

http://www.mlsnet.com/statistics/pdf/AT_league.pdf

Is this interesting?

Mildly. It's interesting that the increase has a fairly steady trend, not just a spike around the end.

Question 2: Do different teams have characteristic in-game trends? That is, are some teams strong starters, while others are strong finishers?

Answer Most definitely. (Karl, forgive me for not elaborating, but this data does show that there is clear variation from team to team.)

Is this interesting?

Downright fascinating. What the heck is going on? Can we point to anything that distinguishes fast starters like Leeds and Arsenal from, say, a slow starter like ManU? What kind of late-game subs were fast finishers like ManU, Newcastle, and Chelsea using? Were they scoring a lot of insurance goals, or did they also score meaningful ones?

I'm afraid I don't know these teams well enough to speculate.

Karl K
06 Sep 2003, 07:13 PM
Originally posted by beineke
Let me offer a distinction here, because I think that two questions are getting blurred ...

Question 1: Are goals scored at a constant rate?

Answer No. See, for instance, the bottom of page 5 of this document. Relatively few goals are scored early, more goals are scored late in the game.

MLS all-time totals by 15-minute slice...

474 541 631 625 669 872

http://www.mlsnet.com/statistics/pdf/AT_league.pdf

Is this interesting?

Mildly. It's interesting that the increase has a fairly steady trend, not just a spike around the end.

Question 2: Do different teams have characteristic in-game trends? That is, are some teams strong starters, while others are strong finishers?

Answer Most definitely. (Karl, forgive me for not elaborating, but this data does show that there is clear variation from team to team.)



B., I think the key question remains: does any given team deviate in a statistically signifiant way from the league as a whole?

Maybe a simple correlation analysis is all you need to do to determine that. You may be right, in that some some teams are strong starters, or strong finishers, relative to the league as a whole.

I guess I am skeptical, but here's where a rigororus methodology will refute or confirm my skepticism.

microbrew
07 Sep 2003, 01:54 AM
Nice PDF file.

I plotted the trends for each MLS team, and also compare it to the league aggregate by coverting everything to percentages. From just looking over the plots: nothing jumps out at me, no one team really sticks out.

An exercise would be to break it down by year by year, and then coach by coach.

Anyway, more data is needed- too few teams, too few goals, too few years. It would be interesting though, to look at other leagues.

beineke
07 Sep 2003, 10:56 AM
Originally posted by Karl Keller

I guess I am skeptical, but here's where a rigororus methodology will refute or confirm my skepticism.

Karl, I used sound methodology. I'm just not interested in describing it in detail.

If you look at the chi-square statistics I described earlier, Newcastle's yields a p-value of 0.005 (.0032 using a more precise method). That's better than ten times what is ordinarily needed for statistical significance. The problem is that you also have to adjust for multiple comparisons. Multiple comparisons is a complex issue, and not really appropriate to this forum.

the101er
08 Sep 2003, 11:58 AM
Statisticians talk about "interesting" data. This, as far as I can tell, usually refers to data that rejects the null hypothesis.

But in laymen's terms, I think whatever we discover here is going to be interesting. We just need to be sure all of the numbers are right and the statistical work is sound.

If the null hypothesis is: All teams (good or bad) tend to score goals at the same time during games. Then, if we can't find any teams' data that reject the null hypothesis, we can say that better teams just score more goals than worse teams and it doesn't matter when in the match. This, I think, would be counter-intuitive to many soccer pundits, who tend to see either an early goal or a late goal as "critical".

I am busy correcting the first numbers I posted which, I apologize, had some errors. I think this is mostly due to the haphazard nature of reporting own goals at the website I am using. And, this time, I am counting all goals and splitting goals into 6 time segments of the game.

As I understand it, I should be able to take all of the data for the whole season and use this as my expected results and run a chi-squared analysis for each team against the expected results. It would be nice if someone had the same type of numbers for the Premiership, that were reported for MLS.

beineke
08 Sep 2003, 12:36 PM
Originally posted by the101er
As I understand it, I should be able to take all of the data for the whole season and use this as my expected results and run a chi-squared analysis for each team against the expected results.

Not sure if this is what you're saying, but you can run a chi-square for all 20 teams at once by putting all those numbers into a 20 x 6 table.

microbrew
08 Sep 2003, 12:55 PM
BTW, we can say the last 15 minutes are the most important, because that's where the most goals are scored...

beineke
08 Sep 2003, 01:03 PM
Originally posted by microbrew
BTW, we can say the last 15 minutes are the most important, because that's where the most goals are scored...

... only if these goals have an impact on the final result.

Karl K
09 Sep 2003, 11:59 AM
Originally posted by the101er
Statisticians talk about "interesting" data. This, as far as I can tell, usually refers to data that rejects the null hypothesis.

But in laymen's terms, I think whatever we discover here is going to be interesting. We just need to be sure all of the numbers are right and the statistical work is sound.

If the null hypothesis is: All teams (good or bad) tend to score goals at the same time during games. Then, if we can't find any teams' data that reject the null hypothesis, we can say that better teams just score more goals than worse teams and it doesn't matter when in the match. This, I think, would be counter-intuitive to many soccer pundits, who tend to see either an early goal or a late goal as "critical".

I am busy correcting the first numbers I posted which, I apologize, had some errors. I think this is mostly due to the haphazard nature of reporting own goals at the website I am using. And, this time, I am counting all goals and splitting goals into 6 time segments of the game.

As I understand it, I should be able to take all of the data for the whole season and use this as my expected results and run a chi-squared analysis for each team against the expected results. It would be nice if someone had the same type of numbers for the Premiership, that were reported for MLS.

Well said.

Look forward to seeing your results.