Mom's Basement - Stats and Analysis Thread

Discussion in 'Arsenal' started by bandwagongooner, Jul 21, 2016.

  1. bandwagongooner

    bandwagongooner Member+

    Dec 9, 2006
    Club:
    Arsenal FC
    Nat'l Team:
    United States
    http://www.espnfc.com/german-bundes...packing-helps-us-understand-effective-passing

    This is a great article about a very interesting counting stat. It's called "packing" and it's a measurement of passing players in the vertical space. Basically, do you get the ball behind players either by dribbling or passing.

    Not surprisingly, the world class players (Kroos, Boateng) were at the top of the board for moving the ball past players. Happily, Xhaka was 5th in the BL last year, which indicates he may be able to add some speed and edge to the attack. On the other side, Ozil was excellent at receiving passes that took out defenders, which was taken as an indication of his ability to find space. Griezmann was excellent at both passing defenders with pass/dribble and receiving balls that had taken out defenders.

    On a team by team basis the raw "packing" numbers correlate with winning at 0.6, which is better than possession or passes made. It's by no means perfect, but I think it's the first number that starts to quantify skill that we know exists but hasn't been measured yet. It also explains why someone like Mertesacker is better than people think, which is that he never gets taken out of the play. Imo, it explains why Ramsey is overrated as a CM because he gets dribbled by all the time. Turns out that just standing in the right place is more important than making tackles.

    Finally, and least surprisingly, England sucked at bypassing defenders.

    __________________

    For the thread, anything that's data or analysis, particularly if it's relevant to Arsenal, is welcomed.
     
    crazy150 and thebigman repped this.
  2. thebigman

    thebigman Member+

    May 25, 2006
    Birmingham
    Club:
    Arsenal FC
    Nat'l Team:
    England
    Cool stat

    Actually useful compared to some of these numbers people spout when they have barely seen a player
     
  3. bandwagongooner

    bandwagongooner Member+

    Dec 9, 2006
    Club:
    Arsenal FC
    Nat'l Team:
    United States
    It's really annoying when stats are used to end discussions rather than as a springboard to ask new questions.
     
    thebigman repped this.
  4. crazy150

    crazy150 Member+

    Aug 27, 2006
    North Cuba
    Very nice read. A good example on how stars don't provide all the answers but can be useful.

    A more detailed explanation http://bundesligafanatic.com/impect-packing-the-future-of-football-analytics-is-here/

    This kind of thing confirms what we know about players like özil. Edit: also explains why England selection bias of kick-hard, run-fast, get-stuck in players fails them time and time again.

    Was reading a discussion on Twitter...this from a united fan...couldn't say it better:

     
    thebigman repped this.
  5. crazy150

    crazy150 Member+

    Aug 27, 2006
    North Cuba
    Bump. @NorthBank we will discuss stats here.

    It’s from a few years ago, but here is a pretty skeptical artical about XG. He’s not saying it’s useless, just that it’s not gospel.

    https://deadspin.com/why-soccers-most-popular-advanced-stat-kind-of-sucks-1685563075


    This is from Statsbomb a few years ago based on data from European leagues over he 12/13 season.
    upload_2018-10-9_12-46-38.jpeg
    It’s interesting that most of the French and German leagues are to the right of the expectation (I.e. more goals than expected). Germany had two teams with 30% more goals than expected—I bet you can guess which two. I also bet you can guess that “Spanish” team far to the right as well ;) Meanwhile England had almost the entire league below expectations.

    From the width of this distribution, you can see there is a huge width means there is plenty of space for efficient operators to be without “luck”. While our current ratio of 1.8 seems an outlier it’s not too far off from the 1.4 that Barca had that year.
     
    maskito repped this.
  6. NorthBank

    NorthBank Member+

    Arsenal; NYRB
    United States
    Mar 29, 2006
    Connecticut
    Club:
    Arsenal FC
    Nat'l Team:
    United States
    #6 NorthBank, Oct 9, 2018
    Last edited: Oct 12, 2018
    Thanks @crazy150 for digging up this old stats thread! Not sure if the guys will follow you over here though.

    That graph is interesting but it's 6 years old. Isn't there something like that which is more current, like from last season??
     
  7. thebigman

    thebigman Member+

    May 25, 2006
    Birmingham
    Club:
    Arsenal FC
    Nat'l Team:
    England
    I remember that packing

    That is actually a more useful stat for me as it truly shows how useful a player is at something specific

    Xg is very subjective in some ways to me and players may have an impact on games outside of it
     
  8. crazy150

    crazy150 Member+

    Aug 27, 2006
    North Cuba
    I can’t find one. Ever since xG caught on, people are way less transparent with their models.

    Coming from a science background, the first thing I’d do is try to isolate what things my model might overlook which may account for the skew among the leagues and the outliers. That is why are the German and French teams more efficient? Why is Barca so superior, Or why do they appear so? Then I would update the model to try to account for this.

    This Is what actually happens, too. Here is a xG model which accounts for player and team histories. That is it modifies the xG value based on the shot taker and the defense. So using their model, lacazette (a good shooter even mentioned in the article) would be awarded a higher xG for his shot from range against Fulham vs schurrle’s shot against us.

    http://www.scisports.com/news/2016/expected-goals-model-2-0

    Of course it’s a moving target. Imagine you were building an expected home run model in 1998. Before then the leader in home runs for a season usually fell between 40 and 60. Your model be fitted to this data. How far off would the Barry bonds and McGuire years differ from your model. Is that just luck or had the game changed?
     
  9. crazy150

    crazy150 Member+

    Aug 27, 2006
    North Cuba
    @The Jitty Slitter i think you might find this interesting. It’s a revisit on the “hot hand fallacy”. Basically, they show that analysis of research which led to debunking the hot hand in basketball was “flawed” and that the data from same experiment show hot hands at work—11% more likely to make a three pointer when on a streak in the sample data. Probably not as huge a difference as players/coaches think, but not insignificant.

    http://theconversation.com/momentum...ot-hand-with-the-mathematics-of-streaks-74786

    Or if your feeling really nerdy, the full paper
    https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2627354
     
  10. NorthBank

    NorthBank Member+

    Arsenal; NYRB
    United States
    Mar 29, 2006
    Connecticut
    Club:
    Arsenal FC
    Nat'l Team:
    United States
    On today's Arsecast there's a moderately interesting chat about xG, with Tim from 7amkickoff. I learned the derivation of "statsbomb" which was like duh.

    But when he said that he uses a certain xG model (Understat?) amongst the several/many xG formulas out there, it reminded me of something I've been wondering about...

    It seems to me that having various disparate xG models is not a good thing from several points of view, so how might this resolve?

    It got me wondering if there's any precedent, especially for a more subjective stat that has become mainstream, accepted. The one that came to mind is Possession.

    I recently learned that Possession is currently derived from the number of passes (I guess Opta pioneered this model). But previously it was based on an actual time-clock model, which is what I had assumed it still was now. Not.

    So how did that happen that Opta's model won out, and is now apparently the accepted method? And what, by analogy, is likely to happen with the xG stat? Assuming that a standard, universal xG stat is what the world wants.
     
  11. mebeSajid

    mebeSajid Member+

    Feb 16, 2009
    Atlanta, GA
    Club:
    Arsenal FC
    Why is this not a good thing? The whole point behind xG is measuring chance quality - different models will weight things differently. Assuming sufficiently robust underlying data (shot locations, header/volley/something else data), it's possible to validate how good a model is:

    Does it match up with goals scored over a sufficiently large sample (e.g. an entire league for a season)?

    How good is the "fit" (i.e. how large are variances in the model)?

    Does the model underweight or overweight certain types of chances (e.g. do people other than Messi and Podolski score way more than what the model says?).

    Your use of the term "subjective" is wrong. There's always going to be some some subjectivity in measuring things. In footie, scorers determine goals vs own goals, refs determine penalties and fouls. In the NFL, yard counts are somewhat arbitrary. Virtually all of baseball is about whether one guy decides whether a pitch is a ball or a strike.

    What you mean to say is "is there a precedent for a derived metric based on an algorithm that can include a designer's personal preferences?" The best example I can think of is baseball, which is full of such stats: VORP, fielding models (where data is also limited and you can get pretty big variances), WAR, Win Shares. Then there's this crazy PER stat in basketball that people finally poked holes in and people started using Real Plus-Minus instead. Teams use these models to do all kinds of stuff: Arsenal a couple of years back completely reworked their attack and defense to prioritize shot location. Arsenal also probably had some sort of passing model to measure passer quality, and this lead to Arsenal signing Xhaka (and to a lesser extent Elneny and Mustafi).

    I like xG as a metric for only a handful of things: aggregate team performance over a set of games (as shorthand for whether a team is good or not), performance of forwards (do they shoot enough?, do they shoot from good locations?), and GK performance (is this GK a sieve? is he DDG?). Better metrics exist for evaluating most other positions.
     
  12. crazy150

    crazy150 Member+

    Aug 27, 2006
    North Cuba
    The models that I’ve seen do vary quite a bit and yes that is a concern. Also the lack of transparency and variance reporting is an issue. You can’t have a meaningful discussion if people just shop for a model that supports their arguments.

    The other thing that people fail to realize is that the models change. The builders are constantly updating their models with new data as well as tweaking how they “bin” the shots (I.e. how they breakdown shots into components for averaging).

    So while you may have a few outliers this season, the builder will go back and tweak the model to incorporate the new data, bin differently, add elements (weather, month, etc) and proclaim “hey look, the total goals is within 1% of my XG and my r2 is 0.999 my model is great”. This is fine of course, but it’s self correcting in that you can always say your model is accurate, but that accuracy based on of historic data says nothing about predictive power. Just read climatology papers from the last three decades.

    Of course the game is also changing. Coaches are looking for efficiencies. So a shot that today is a high XG chance, tomorrow may not be and vice versa as teams seek advantages and to plug gaps. Coaches will find poor finishers and tel them to shoot less and vice-versa based on these models. Training methods to increase conversion of the lower xG shots will be implemented. Defensive tactics to deal with high xG shots will be developed, etc.

    As an example, look at the three pointer in basketball. When it was novel the average percent conversion was like 25% but within a decade or so it was more like 35%. Is guess the xG value of those shots has changed a lot over the years as players/coaches have responded.
     
    NorthBank repped this.
  13. NorthBank

    NorthBank Member+

    Arsenal; NYRB
    United States
    Mar 29, 2006
    Connecticut
    Club:
    Arsenal FC
    Nat'l Team:
    United States
    Universality and standardization are often good things. And in particular, with something like sports stats, you benefit greatly by comparing apples to apples, over time... sometimes going back several years.

    If the underlying model/formula for a stat (xG, possession, etc) is not a universal standard, or it's changing/evolving constantly, then your ability to make these comparisons is compromised, not to mention any decisions that you might make based on those analyses.

    Take the possession stat and the 2 models that are discsussed in this article I came across recently:
    https://slate.com/culture/2014/06/s...ory-of-the-games-most-controversial-stat.html

    The variance between the Opta vs Deltatre models can be quite large, e.g. 64% vs 57% for the 2014 Spain-Nederlands example she cites.

    What I didn't mention before, is that I'm looking at this from the POV of the outsider: the fan, the journalist, the coach scouting an opponent. I'm not talking about any given team's proprietary data or stats database. Those might vary in which models/formulas they're based on, and that might be warranted decisions for each team.

    Saying I'm just plain "wrong" is a bit simplistic don't you think? Also I didn't say "subjective". I very carefully wrote "more subjective". I.e. xG is more subjective than goals scored. That's just objectively true. ;)

    Honestly I'm not really sure what you're saying here, but I can only clarify that what I meant was: Is there a precedent for a stat which had variations based on different algortithms, but which then coalesced towards one standard algorithm? And I postulated that Possession might be such a stat. I can't say I know this to be true, because I just don't follow this stuff at all closely, but that is the impression I've been getting about Possession.
     
    crazy150 repped this.
  14. mebeSajid

    mebeSajid Member+

    Feb 16, 2009
    Atlanta, GA
    Club:
    Arsenal FC
    Interesting tweet

    1075811990277369857 is not a valid tweet id
     
  15. thebigman

    thebigman Member+

    May 25, 2006
    Birmingham
    Club:
    Arsenal FC
    Nat'l Team:
    England
    Stats combined with eye test surely has been the best way?

    Systems, and emotions in the swing of a game combined with data

    Isn’t that obvious? Ps why is this Knutson guy so revered?
     
  16. NorthBank

    NorthBank Member+

    Arsenal; NYRB
    United States
    Mar 29, 2006
    Connecticut
    Club:
    Arsenal FC
    Nat'l Team:
    United States
    What's the GK model?
     

Share This Page