PDA

View Full Version : Sabermetrics applying to Soccer


Pages : 1 2 3 4 5 6 7 8 9 [10] 11 12 13 14 15 16

kenntomasch
26 Aug 2003, 08:05 PM
I'm in the process of cleaning up around here, so I'll have to search a bit for the full numbers.

But here's the data from the 2000 season:

There were 81 games in which a team took a 2-0 lead over its opponent. The team that took the lead won 74 games, lost three, and tied 4. That's 91% outright wins, and a .938 W-L-T Percentage (counting a tie as half a win, half a loss).

Teams that took a 2-0 lead at home went 50-0-3. That's right, they didn't lose a game.

Nine times a team blew a 2-0 lead. Twice they won anyway, three times they lost (twice in overtime), and four times they tied (obviously all in overtime).

39 times (just under half), the two goals were the only ones the team scored in the game.

21 times they went on to score one more goal.
14 times they ended up scoring four goals.
5 times they ended up scoring five goals.
2 times they scored a total of six goals.

The 81 teams allowed a total of 70 goals in the game, or less than one a game.

37 times the opposition stayed at zero goals (and obviously lost).
27 times the opposition managed a goal (and obviously lost).

11 times the opposition scored two goals (going 0-7-4)
3 times the opposition scored three goals (getting an overtime win and losing the other two games)
3 times the opposition scored four goals (winning two of the three, including one in overtime).

That's the 2000 data. I have the others around here somewhere.

Chicago76
26 Aug 2003, 08:43 PM
I've come into this discussion way too late, but I'd like to add a few things (Okay, a lot of things).

First, I think a "chance" is something that can be quantified in soccer. Maybe not perfectly, but with a high degree of success. Other sports have subjective statistics--the error in baseball comes to mind. How many times have we wondered whether a play would be scored a hit or an error on a difficult fieldding play? To illistrate: I was at a Sox game a few weeks back. A player hit a line drive to right. The ball bounced oddly off the wall and the fielder reached down at the base of the wall and bobbled it. It delayed his throw to the relay man by about a 0.5 to 1 sec. The batter reached third standing up, barely trotting the last third of the way to third. The relay man didn't even attempt to throw him out. It was scored a double with the batter reaching third on an error. Never mind that unbobbled with a perfect throw and relay, there maybe was less than a 5% chance of getting the guy.

The error is a stat that supposes a typical major league player at position X should be able to make a play to either get the batter out or hold him to X base, given the circumstances of that play (bad hop, deflection off of pitcher, etc.).

Zidane (or Heydude) plays a ball into the area where an attacker has a view of the goal and we should be able to consider this a chance. Likewise, if the same ball were played to an attacker at the 18 and he was closely marked or had 7 or 8 bodies between him and the goal, this would not be a chance. There is some degree of subjectivity, but I think most people would agree that chances could be quantified.

By making a definition that if a player has a "relatively" (subjective, I know) clear view of the goal within X yards (20-25 yards say), this should be considered a chance. The standard should be the same for everyone. Beckham with a view from 30 yards does not constitute a chance, because your typical professional from that range could not be expected to convert.

Another point I'd like to bring up is that subjective stats in soccer, due to the fluid nature of the game, may explain a story much better than objective ones. Even a sport with discrete plays like football could use more of a subjective basis for some statistics. Peyton Manning once had a year where 6 balls were tipped by receivers and intercepted. Maybe tipped isn't such a good word. He didn't overthrow them. Instead they bounced off player's helmets, pads, and arms. Yet QB rating and interception stats wrongly attribute a WR's mistake to him.

Going back to the chance concept, charting plays and chances, particularly in the offensive third of the field for attackers would have some merit, I believe.

Charting could gather:
1-Possession/Dispossession
2-Creation of corners: can an attacker force a defender to concede a corner?
3-Beating a player vs. losing posession
4-Pass completion
5-Chance creation
6-Chance conversion-off a volley, left or right footed, header, etc.

With enough history, we could see player tendencies and relative value in various scenarios. For instance--in a numbers situation, an attacking mid has a choice, beating a player and getting a chance on goal, or passing the ball to a striker. How likely is the A-mid likely to beat a player, get a chance after doing so, or successfully convert?
How likely is the striker likely to retain possession, or successfully convert the chance depending on the type of service (volley, time to touch and shoot, direct shot, header, etc). With each successive pass in a numbers situation, how much is the probability reduced that a team will score (as defenders recover)?

Ultimately, knowing how the probability of successfully executing (getting a "chance" and then converting) either scenario A or B would prove useful in identifying the players involved.

beineke
26 Aug 2003, 09:36 PM
Originally posted by microbrew

As for streakiness: unless there is some underlying cause, a player being "hot" or "cold" is a meaningless to me.

Thanks for digging out that tennis paper. From a first glance at it, in-match streakiness is almost negligible. I'll try to get a better look at it soon.

That is also an interesting perspective about looking for an underlying cause for streakiness. It fits nicely with Dave's point about how streaks in basketball might be explained by the fact that it's easier to score points in transition.

beineke
26 Aug 2003, 09:45 PM
Originally posted by voros
Don't remember who did it. I think either Keith Woolner or David Grabiner. The problem with Sabermetric studies is that it's not like psychology or physics or something where there's journals this stuff appears in. It's scattered everywhere.


Not too surprisingly, given that maxim can't even get us a forum on BigSoccer. ;-)

Thanks for the pointer to Woolner ... I happened to meet him a few years ago, so I may drop him a note.

microbrew
26 Aug 2003, 11:19 PM
Originally posted by beineke
Thanks for digging out that tennis paper. From a first glance at it, in-match streakiness is almost negligible. I'll try to get a better look at it soon.


I found the paper looking for the phrase "independent individually distributed sport" in Google.

This thread is getting unwieldy. And is there anyone else here with some kind of math background?

My own background is only undergrad courses in electrical engineering- all targeted towards signal processing and telecommunications, with one class in microeconomics with calculus (helpful in reading those papers).

mpruitt
27 Aug 2003, 01:47 AM
Originally posted by microbrew

This thread is getting unwieldy. And is there anyone else here with some kind of math background?


Hate to keep harping on this, and this is probably the last time I'm going to do it, but yes I agree. We should have a forum for this stuff but no one seems to want or feel the need to post about it.

Petition For Stats and Analyisis Forum: Soccermetrics and Useless Info (http://www.bigsoccer.com/forum/showthread.php?s=&threadid=64954)

p.s. yeah beineke I am trying aren't I. I'm begining to either feel like or feel for pc4th and his Best Of BigSoccer and/or Newbie forum idea.

p.p.s. Great post by Chicago76, all of which very well put and thought out. A very good comparison to the error statistic in baseball. However, I'm sure that most sabermetricians would tell you that they see the error statistic as inherently flawed, hence the reason for the creation of the DIPS statistic. There's an informative section in Moneyball that touches on it.

However, being that soccer lacks ANY real substanitive statistics, is it more valuable to have reasonably educated subjactive ones or none at all. At the start of this thread someone challendged people to define a "chance." Maybe a chance couldn't be summed up well in one sentence like an error, ~"A player not making aplay that does not require extraordinary effort." But maybe a soccer chance could be relative accurately albeit subjectively quantified by a checklist of 5 criteria that could be agreed upon.

beineke
27 Aug 2003, 12:33 PM
Originally posted by microbrew
And is there anyone else here with some kind of math background?


Despite the whole Poisson discussion, I'm not sure this is too relevant. For instance, Moneyball mentions that Bill James was an English major.

Taking Kenn as another example (to see if I can embarrass the guy ;)), he's done an impressive and thorough analysis of US soccer attendance, without using any math more complicated than a median.

kenntomasch
27 Aug 2003, 12:40 PM
Medians I can do. Means I can do. Those are the biggies. Comparing apples to apples in those things, I try to do.

I have some sort of grasp of Standard Deviations, but I'm not sure their use is practical in those attendance things, but I have thrown them in there in the past just because I could.

My eyes glaze over at the other stuff about distributions and regressions and things like that. And I was a Telecommunications major. That's why. :)

Thank God for Excel, though.

This off-season, I'll revamp that whole thing with a whole lot of analysis I've always meant to get to, but have never gotten around to.

kenntomasch
27 Aug 2003, 05:30 PM
More on the 2-0 lead thing (I'm finding the data in bits and pieces):

In 2001, there were 68 games in which a team took a 2-0 lead. They won 60 of those games outright (88.2%), lost 4, and tied 4 (a .912 W-L-T%). There were 8 blown leads.

I have to go back and check how I counted shootouts in 1999, but the one piece of paper I just found shows a 64-1-5 record in 70 games for teams that took a 2-0 lead in 1999. I can't recall off the top of my head if I was counting shootout wins as wins or not, but considering I included ties, I probably didn't. I probably counted all the 2-0 leads that ended up going to shootout as draws.

In which case, it was 91.4% outright wins, and a .950 W-L-T%.

So for those three years (1999-2001) it's 219 times a team went up 2-0, and 198 times they won outright (90.4%), 8 times they lost, and 13 times they tied (a .934 W-L-T%). That's what I would call not quite a lock, but a pretty good chance you're going to win.

I am almost certain I did the 2002 season and have it somewhere. And I know the Fire came back from 0-2 down to beat Kansas City on the road earlier this season, but I'll wait to do 2003 until the season is over.

the101er
28 Aug 2003, 05:19 PM
First: this is great. I just read Moneyball and had the same thoughts that are being expressed here. Then I remembered the "Direct Play" era in English soccer, so I think we need to be cautious when applying statistics.

The head of coaching at the English FA determined sometime in the late 80's or early 90's that most goals were scored off of less than 4 passes. So, he developed the direct play system. I even have one of the videos.

Also, I think the game will evolve and change. In baseball, if too many players start focusing on getting on base via walks, then the statistics will start to impact the game. That is, the measuring system is interfering with the system being measured. So, that is another thing to watch out for.

But with those difficulties discussed, I think this is a fascinating area of study.

microbrew
28 Aug 2003, 06:04 PM
Originally posted by kenntomasch
Medians I can do. Means I can do. Those are the biggies. Comparing apples to apples in those things, I try to do.

I have some sort of grasp of Standard Deviations, but I'm not sure their use is practical in those attendance things, but I have thrown them in there in the past just because I could.



Standard deviation can be thought of as a measurement of how spread out the attendances are. So, an attendance that's within one standard deviation would be considered normal. And attendance outside of one standard deviation would be considered unusual.

A large standard deviation usually is a sign of some kind of skewing by a point or several of points of data. It could also just mean the attendances vary a lot. You'd have to look at the data- and there are better ways of calculating skewness.

So if I saw a large standard deviation, I'll want to take a closer look.

kenntomasch
28 Aug 2003, 06:10 PM
Yes, and that's just about the extent of my understanding of it. I can do the math, too. I'm just more comfortable with the simpler math.

voros
28 Aug 2003, 06:32 PM
Originally posted by kenntomasch
Yes, and that's just about the extent of my understanding of it. I can do the math, too. I'm just more comfortable with the simpler math.
Multiple linear regression analysis is a must. For a lot of things, it gets you at least 50% of where you need to be. Can be an excellent starting point when you're just trying to get a general idea of what is what. Unfortunately, without functions on spreadsheets, you have to know some complex linear algebra to do it. Fortunately, Excel includes the function as an add-in, so...

You can overcook and data-mine with it, so you still need to use solid methods, but it's a basic tool with fairly substantial power.

As an example, you could enter in the size of the city, the percentage of Spanish speaking population in the city, the team's pts. per game and use them as variables run against the team's attendance numbers and use your data to see which, if any, of those appears to have a significant correlation with attendance, and also have a linear equation to estimate attendance given those variables.

kenntomasch
28 Aug 2003, 06:37 PM
http://www.bragg.army.mil/reupxviiiabn/images/confused.jpg

the101er
28 Aug 2003, 06:58 PM
Soccer coaches do try to be as logical as possible. For example, Anson Dorrance claims to measure everything measurable. He keeps score in all scrimmages, sprints, fitness tests and tries to measure his players as objectively as possible.

Is there anyway to get hold of fitness data on players?

Is there a predictability of some level of success based on ball juggling?

What about the fact that only a handful of players have played on a national side from U17 through full international? Surely, that says something about burn out.

What about the age old debate over what is more valuable: technique or fitness? Fitness can very easily be tested. Then we would just need to measure the success of players. Define success.

Also, don't assume that baseball is that much easier to measure than soccer. As mentioned in "Moneyball" a lot of the objective statistics in baseball are based on subjective human judgement.

To sum up, we need to consider factors to measure outside of just game performance.

microbrew
28 Aug 2003, 08:25 PM
I havn't said much evaluating individual players, but by position is to gather data, analyze it and see what can be teased from it.

Originally posted by the101er
Soccer coaches do try to be as logical as possible. For example, Anson Dorrance claims to measure everything measurable. He keeps score in all scrimmages, sprints, fitness tests and tries to measure his players as objectively as possible.

Soccer coaches, logical? :-)

That's just gathering data. Making sense of that data is another story. I guess this is called data mining. Gather enough data, and use a computer to tease out some interesting info.


Is there anyway to get hold of fitness data on players?

Is there a predictability of some level of success based on ball juggling?

Is there a place for NBA draft workout or NFL combine type of data? Yes, but even in those sports those stats are over-rated.

As for ball juggling correlating with soccer skill (skill is something we're very, very interested in measuring), I don't think there is very strong correlation, or so my gut instinct tells me. It's a skill that's pretty trivial skill compare to dribbling, passing accurately or tackling.

What about the fact that only a handful of players have played on a national side from U17 through full international? Surely, that says something about burn out.

Probably says more about more about scouting of teen players and the development curve for most players. U17 or even U20, the guys are just not physically or mentally mature.

What about the age old debate over what is more valuable: technique or fitness? Fitness can very easily be tested. Then we would just need to measure the success of players. Define success.

Determining the value of fitness can be done easier-take the best paid players in the world (assuming efficient markets theory, the best players in the world are the best paid) and measure their fitness. I'd guess this would give a range of fitness, that if a player falls out of, that player likely won't be a world class player.


Also, don't assume that baseball is that much easier to measure than soccer. As mentioned in "Moneyball" a lot of the objective statistics in baseball are based on subjective human judgement.

I don't know- on base percentage, batting average, runs scored, etc. aren't very subjective. The advantage of baseball is that you have a lot ot well defined discrete events.

I supposed you could do the same thing as for soccer. Take a player and track him or her through out the game. Take note of touches, time of possession, passes completed, giveaways, giveaways under pressure, takeaways, shots on goal, assists, etc. And then break down the stats by location, say inside the opponents penalty box.
Who knows if these stats are useful? That's not the point.

To sum up, we need to consider factors to measure outside of just game performance.

You know, the NFL administers psychological tests. Some kind of personality test, IIRC.

mpruitt
29 Aug 2003, 12:29 AM
Originally posted by the101er

The head of coaching at the English FA determined sometime in the late 80's or early 90's that most goals were scored off of less than 4 passes. So, he developed the direct play system. I even have one of the videos.


Don't nessicarily knock that because the point wouldn't be to say, oh look how stupid this guy is using statistics to do something stupid. The point would be to say, "was he right?" Let's find some numbers to either back this guy up or find out if he's wrong. If he's right, then why's he right? What could be the reasons, and is that statistically important or just a nature of the game?

This type of stuff is golden, http://www.matchanalysis.com/cgi-local/tmr.pl?team=168&game=2043&pg=1&nohd=&system=1&sysct=1 unfortunately most of it juts isn't out there. Maybe things like Matchtracker's online could be valuable but really an easier way would be just to go out and do it by hand. However, as we mentioned in the begining of this thread that would be laborious at best.

kenntomasch
29 Aug 2003, 12:34 AM
Originally posted by the101er
What about the fact that only a handful of players have played on a national side from U17 through full international? Surely, that says something about burn out.

I think it might also say something about the pyramid theory - that there are only a very, very few truly elite players capable of playing at the full international level. Slightly more are capable of playing at the U23 level. Slightly more at the U17 level.

A certain number are MLS quality. Slightly more are A-League quality. More are PSL quality. And so on. Down to our rec league, where there's no quality. :)

mpruitt
29 Aug 2003, 12:44 AM
To add to my point before. I think I'm going to test out some of the matchtrackers sometime soon and compare them to a taped game just for curiousity sake to see how acurate they are. Obviously in baseball a good matchtracker could be almost perfect, because they're obviously discrete events, but I'm just curious to see how acurate these online tracking devises actually track the game. Having some rough idea could be an interesting key to doing statistical analysis.

voros
29 Aug 2003, 11:37 AM
Originally posted by maxim-1
Don't nessicarily knock that because the point wouldn't be to say, oh look how stupid this guy is using statistics to do something stupid. The point would be to say, "was he right?" Let's find some numbers to either back this guy up or find out if he's wrong. If he's right, then why's he right? What could be the reasons, and is that statistically important or just a nature of the game?

Exactly. When I first heard about this, I didn't say, "WOW! Genius." I began asking:

1. What were the percentage of possessions that contained X number of passes. For example, he might have found that 80% of goals were scored off of less than 4 passes. Of course if 90% of possessions were less than 4 passes, that would tell us the _opposite_ of the conclusion he drew. Mainly that stringing together 4 passes or more made you _more_ likely to score, not less. IOW, if 40% of the traffic fatalities in this country involved drunk drivers, that would mean that more fatal accidents occur with only sober drivers than with drunk drivers. Does that mean driving sober is more dangerous than driving drunk? Of course not.

2. Why the number 4. What is the special significance of drawing the line at 4 passes? If he went through the numbers and then looked and drew the line at 4 because that's where it crossed the 50% mark, he would need to go back and verify this infor with an independent set of data. You want to avoid the multiple endpoints problem.

3. Clearly a certain amount of game theory is applicable here. If you drastically change the way the game is played, you cannot necessarily conclude that you'll get similar results to your data set played under different conditions.

4. Goal _scoring_ is only half the equation. Even if a strategy involving relatively few passes per possession did score a few more goals, it would be unhelpful if the same strategy also resulted in your conceding just as many or more goals.

See the statistic doesn't mean much in and of itself, until you cover everything it _could_ mean other than the conclusion you favor.