PDA

View Full Version : First or Last 15 Minutes Most Important?


Pages : 1 2 [3]

sljohn
06 Apr 2004, 01:50 AM
Great stuff in this thread.

Re: the last post. It's hard for me to get too worked up about a quintic trend across six data points. That sounds like over-fitting to me.

I recently spoke with a statistician that used functional data analysis (FDA) to map the dynamics of ebay auction bids. As part of that work he did a cluster analysis and found two distinct patterns of auction bid dynamics.

I think it may be possible to do something similar with goal-scoring times to see if there are indeed different teams that stand out as having unique patterns.

Does anyone have a data set with time of goals scored and times of goals conceded by team? The optimal data set provide a history for all games across an entire season. I'm hoping to study FDA with that prof in the May/June timeframe and would love to have a data set like that to play with.

Hmmm... MLS has a game summary for each game online, doesn't it? Perhaps I can whip up a script to grab the info off of there...

ur_land
06 Apr 2004, 01:53 AM
A couple other points on these analyses....

I did all the analyses on the natural-log transformed goals scored, but I reported the untransformed means.

Also, I just did a quick analysis on the '02 and '03 data to see if the linear trend interacts with points (a proxy for team quality), and the interaction is not significant in either year. So there is no evidence that this linear trend is more true for good teams than bad teams.

ur_land
06 Apr 2004, 01:58 AM
It's hard for me to get too worked up about a quintic trend across six data points. That sounds like over-fitting to me.


I agree that the quartic and quintic trends are pretty meaningless--that pretty much what I said. I think the interesting thing is the consistent linear trend across the data sets.

As for the overfitting argument, I think you're coming at this from the perspective of a traditional regression analysis. Instead, think of this as an ANOVA, just using the regression/contrast code framework (because, in a deep sense, multiple regression and analysis of variance are mathmatically equivalent).

numerista
06 Apr 2004, 01:20 PM
Thanks for posting these results. I'm not too familiar with the particular kind of contrasts you're testing, but you ought to be able to plot the fitted curve whose significance you're testing. That would make it easy to interpret the "quintic" effect in the EPL.

tachyon1
24 Apr 2004, 05:53 PM
Hi guys,
apologies if I've missed it,but how are you treating the 31-45 & 76-90 minutes segments?
In the EPL (& I presume in the US) injury time/substitution time & time taken to restart to after a goal gets added onto the ends of each half.Thus the 45th min can last 2 or more minutes & the 90th min can be even longer.Few stats sites record what the ref actually adds on.But the consequence,in the EPL at least is that there are definite & artificial peaks in goal scoring at the ends of each half.

T1

tachyon1
26 Apr 2004, 05:53 AM
Some stats.

Average goals per game for each 5 minute segment for the EPL.Taken over 5 seasons from the late 90's & early 00's.

0-5 mins.......0.15
6-10 mins.....0.12
11-15 mins....012
16-20 mins....0.12
21-25 mins....0.13
26-30 mins....0.12
31-35 mins....0.13
36-40 mins....0.13
41-45 mins....0.17
46-50 mins....0.14
51-55 mins....0.15
56-60 mins....0.15
61-65 mins....0.16
66-70 mins....0.15
71-75 mins....0.16
76-80 mins....0.15
81-85 mins....0.17
86-90 mins....0.23

A gradual increase as the game progresses.But also clearly illustrates the problem in taking the 41-45 & 86-90 segments at face value.The 0-5 min figure's slightly weird.....at least you know not to miss the kick off.

As regards helping you decide which team is more likely to win it's probably more important to look at which team is more likely to get the first goal.In the EPL team's that score first go on to win around 70% of the time,draw 20% & only lose 10%.

You can derive a reasonable model of how likely a team is to score the first goal from an accurate assessment of the expected goal supremacies between opponents.

T1

mpruitt
26 Apr 2004, 06:05 PM
That is extremely weird. Might it be that there's a problem with looking at number of minutes in soccer because of time added on? I wish to heck they'd start including that in the total time played.

tachyon1
27 Apr 2004, 05:31 AM
Hi Maxim,

Injury time is certainly the cause of the two spikes at the ends of each half,especially now refs have a more formalised way of adding on injury time.They're also supposed to allow 30 seconds per substitue & (& i may be wrong on this one) 30 secs per goal.

Maybe someone has been recording the number of minutes the fourth official holds up at the end of each half,but even then that only indicates a minimum that will be added.
Also it's noticable that quite afew refs are allowing attacking moves to finish before calling time.

Just a gut feeling but I reckon that the 45th min lasts for between two and three minutes,whilst the 90th min goes on for between three and four mins.Even working in 15 minute segments these sort of figures can add around 20% to these particular timescales.

It's certainly considered a significant enough market for the spread betting firms to occassionally offer prices based on 1st half injury time multiplied by 2nd half injury time.

Top class rugby have started stopping the clock during stoppages so that the half ends when the clock reads 40 minutes(they only play 80 mins in rugby).

As mentioned in another thread A*(B^0.84) gives a reasonable estimate for the goal expectancy for the remainder of the match.Where A is the initial goal expectancy & B is the proportion of the game remaining.

If you let B=0.5(which ignoring injury time:-) is half the match) you get the answer that around 56% of the goals will be scored in the second half.And that fits pretty well with what happens.

T1

superdave
04 May 2004, 09:43 AM
That is extremely weird. Might it be that there's a problem with looking at number of minutes in soccer because of time added on? I wish to heck they'd start including that in the total time played.
I thought there was a decision, maybe 2 years ago, to have a goal scored in the 2nd minute of first half stoppage time to be recorded as 47+, while a goal scored in the 2nd minute of the second half would be 47. I don't notice that in match reports. If they did that, we could just throw out all 46+ and 93+ goals.