....Professional football's main objective is to always put the ball into the net. as simple as that ! Defenders.. Tackles Won Def Covers Interceptions Duels Won Headers Won Clearances Yellow Cards Red Cards Positioning Concentration for 96 minutes Composure Anticipating Decisions Stronger personality Leadership a winning mentality etc... midfielders Assists Pre-Assists Goals Passing accuracy Vertical Passing skills Diagonal Passing skills Vision Shots on Target Ratio % Yellow Cards Red Cards Passing Skills Anticipating Concentration Construct playmaking moment Off the ball movements Decisions Composure Behavior in Big matches etc.... Attackers Goals Assists Pre-assists.. Shots on Target ratio % accuracy etc... well there are..total Footballers.. they are great defending... setting up the game.. " playmaking players " and attacking at the same time always . Pelé...Cruyff...Franz Beckenbauer.. Di Stefano...Ruud Gullit.... Lothar Matthaus ... Professional football is not the dribble completed ...that is the main objective or . like Skateboarding..and Bike...Xgame s.. Espn Where the skateboarder... went out doing maneuvers of a higher degree of difficulty... so he added points. ++++++++++ points .. by the way.. or circuit..there in the championship ... the greater the degree of difficulty of the maneuver in question, it will be worth many more points.. In professional football this is not the main objective Like a Player enters the Field to give or execute dribbles with a greater degree of difficulty" with higher technique and accuracy " and then earns points for this..always Honestly, usually it don't wins anything. dribbles completed with Shots Off the Target are worthless in professional football . FreeStyle Football championship ... is made to Dribbles ...there But Professional Football ... not not not or never never never .. was the main Target ....of the game . ! ......
I think you come from a place of peace, and genuine curiosity, so I'll be as curtious as I can be in my vexed state. I cannot say for sure why this discussion is aggravating me so much, but I would liken it to finding out something potentially elucidating, and the rest of society saying no just to preserve their established hierarchy, even if they know better deep down inside. It is a disgusting path of intellectual dishonesty and selfish preservation. There is nothing wrong with tallying actions taken on the pitch. I do appreciate it beyond words, when companies with vaster resources, can chronicle all the actions I missed out on due to being a casual spectator, or a human with very limited memories. I learned so much from the data-sets and patterns. I also like individual passion projects where we all attempt to calibrate actions just for the love of the game, even if it is blantantly biased to make the recipient of the analysis look better, I critique the agenda behind it, but still love the work (the annoyance is just due to the extra-steps of re-calibration required to properly digest the numbers). However, this is a premise that, for me, mathematically speaking, is just a better initial premise from the get-go. I cannot prove it, nor can I articulate it into words adequately, but it is the immediate response I got inside. The more I read, the more promise I see from the initial premise. I do not really care for the execution of it, and find the nit-picking discussions of that nature very dull and circular, how does it fully negate the premise? Did somebody literally find zero solutions for the errors after 10,000 years of supercomputer calculations, or is it just a Lionel Messi fanboy not liking the table with the name Thomas Muller on top of the list as opposed to Lionel Messi. As an explanation of that particular outcome, that was in part, due to the quadratic terms coming up with a U-shaped curve that peaks at the age of 28, or so. It is why all the top names on that list was around the age of 28, and why Cristiano Ronaldo was no-where to be seen because he was that much older than the rest. However, nobody bothered to check if the peak ratings (as in the ratings at age 28 for all players), resulted in the sets of names. Just the most tribal behaviour, full of annoying comments with zero thought, and maximization of unproductive converstational cesspool. The degree to which some of the posters here want to shut-down the premise as useless and even counter-productive, stems not from intellectual curiosity, or even deep knowledge of all potential outcomes that result from this premise. For me, it stems from a very bullshit human behaviour to protect what they feel is of more value, than the seeking of the truth from an intellectual curiosity stand-point. In this case, the preservation of on-the-ball action tallies and who it may benefit. They mind-******** themselves into thinking that what they do is not only intellectually correct, but morally superior. It is like talking to a fundamentalist religious fanatic, and I want no part of it. The conversation has zero productive outlet, and it is a dead-end game from a earnest stand-point. I'll just throw this one segment as a food for thought. If the top-down rating (Plus-Minus model) has more predictive value than the tallies of actions with far greater mathematical integrity than any models suggested by Fantasy Premier League, like the VAEP, isn't the premise not worth abandoning because it might insult the memory of Diego Maradona en-route to greater statistical enlightment? That's what I think anyways.
The arrogance and moral superiority with which you write is one of the most insane and delusional things I’ve seen on the internet in years. You speak as if you were some kind of God of objectivity, when all you're doing is projecting. Who do you think you are to accuse others of intellectual dishonesty? Are you really contributing something superior or closer to the truth? Are you that sure of it? Calm down a bit. This model that you defend is not superior or more objective than anything or anyone. From my limited intellectual capacity, I have two fundamental problems, two basic issues. In fact, I find it incredible how those who work with these models are more concerned with "adjustments" or "polishing" certain aspects and don’t even stop to think about the basics, the fundamental problems that the general idea might have. If you're truly interested in critique because you're a God of objectivity, I suppose you'll take into account what I'm saying, or maybe what I'm saying seems very stupid to you? My First problem: When I see them talking about +1, +2, -1, -2, I already realize it's a mistake. It's like when they tell me that Lewandowski is a better scorer than Harry Kane because he has a higher goal average. It's basic and simplistic thinking. If player A has a goal average of 1, he's better than player B, who has a goal average of 0.7. That's the basic reasoning used by Plus-Minus from the start: if a team has a goal difference of +2.5, it's better than a team with a goal difference of +1.6, for example. If a player has a goal difference of +1.5, he's better than a player with a goal difference of +0.8. Maybe you think I’m very stupid, but I can assure you that this is a basic conceptual mistake, and I've learned it by analyzing players' goal contribution percentages in their teams. Just because a player has a higher 'raw number' doesn’t automatically mean he's better, as you're not analyzing the particular contexts of each team. Here’s my contribution, I think one way to correct this is to stop thinking that goal difference is +1 or -1, and think of it in terms of percentage, that is, (goals for) / (goals for + goals against). If you do that, the model’s results will be somewhat different and you’ll probably have some players from smaller teams. Even so, I don't mean to say that using goal difference in % will be absolute perfection, but I do think it could be a bit closer to being correct. Because that’s another thing, has no one thought that the players with the best scores always played on the best teams? Because obviously, the goal difference will be greater in those teams. What happens if the best player in the world, who objectively influences his team a lot, plays for a mid-table team? I can assure you that your Plus-Minus model will never rank that player highly. My second critique: This thing doesn’t take into account the difference in the level of the opponents. It seems so obvious and absurd that it hardly needs much explanation. It's totally random and inherently unfair. A player misses an easy game where his team wins, and that takes points away from the player. I have no hope that it’s possible to correct something like this. The best thing would be to simply take the matches the player played in and the team's goal difference, measured in percentage, during the minutes the player was on the field, compared to the minutes the player was on the bench, always counting only the matches the player participated in. Even so, this also has many problems. Do you think you can specifically address these two criticisms I have of the model, or are you going to start talking about something else? Ultimately, we are facing an impossible problem; we must always keep in mind that expecting statistics to represent reality exactly as it is, is a mistake in itself. At most, we can play with certain patterns and draw interesting conclusions. Pretending that you have moral and intellectual superiority over others is also a mistake.
Since the very beginning I were arguing that it's better to include assists and remove key passes because key passes that doesn't turn into assists are most likely bad/meaningless. I used the xA per key pass to demonstrate this. I think we could or count goals and assists or count shots on target and key passes. But on this case we should value more the shots on target than the key passes
Evaluating all the players in the database. So a win against a third division team doesn't represent as much, if something very little... maybe it can even have a negative impact like if you won 1-0, 2-0 against a very weak team that you was expected to won for more? I mean, check the method.
I have no idea who Trachta likes and prefers and I dont agree with everything he says, for example his last post. Flaw of the model is more fundamental one than just the use of absolute numbers instead of relative. Percantages do not solve the fundamental problem I am speaking of. It is a dead end even with percantages. Look, for what is worth, I have masters degree in chemistry and I am currently working in an analytical laboratory with highly sensitive instruments such as GC-MS, HPLC, UV-Vis spectrophotometer where accuracy is essential.. the idea of accurately measuring measurand of interest and determining uncertainty levels of measurements is at the very core of my job. The signal-to-noise ratio is a techncial term and very much applies to any kind of measurement. This is the reason why the model doesnt work and I said that. I am not sure what about my comments is cricular? I mean I could explain it further if it is unclear, but I engaged with the topic at hand explaining exactly why. The reason why I brought up Cristiano twice in the last two responses is pretty simple: 1. The point of communication is to say something in a way that is understandable to the other side so of course I will use familiar phrases and examples to explain my points. I will continue to do so because there is no other way around communication.. and this in not something that should really trigger you. It is weird that it does and that you can't have a discussion without mentioning particular names in football. That is totally on you. 2. Also I see you are getting a lot of reps on your posts from the "Ronaldo tribe." Rest assured, this is in no way a validation of you making particularily valid, gournd-breaking points. These guys jump at any nonsense that even in the slightest way puts Cristiano in a greater light than Messi or shakes the status quo of football algorithms, which you have done with your recent posts. In two weeks, you might say something a bit critical of Ronaldo and they will jump on you like scavangers. One day you are on their side, the other you are the worst enemy. If I trigger you by discscussing the actual points of conversation and STILL being open for in-depth discussion, you are not going to like it here on the forum. So the another reason why I use familiar examples is because I am appealing to a broader reading public, not just you.
Here is an idea. Instead of measuring all of team's goals for and against for every particular player when on the pitch, one could measure all team's goals for and against that have involvmemt of the player of interest. If player is on the pitch and his team scores, but he is not involved, he doesnt get +1 and vice versa. If his team concedes while he is on the pitch, but is not involved, he doesnt get -1. Only in certain cases would he get +1 or -1. This way noise gets reduced much more. Player is not rewarded for his team scoring if they are not involved in it even if they are on the pitch, nor penalized for conceding if they weren't at fault.. Then combining that with percantages as Trachta recommends, and you might get to something that is reasonable.. Now the question is what criteria needs to be set for determining player's involvment in goals for and against. In this way my previous example of Cristiano and Arbeola playing the same 38 games would yield a different +/- score for each, because their involvment is different. It is still far away from being anything of value, but this is already something much more reasonable...but this is also something quite different than the classical +/- model.
I think the author is on the process of doing what wish I could do if I was smart. Thanks for introducing him to me. The dude who wrote the paper is very smart, and I don't think I'm the guy to expose his statistical defiencies of his mathematical approach in full detail, but I will say he leaves paper-trails to follow that easily for the novices of the topic like me. The paper you personally linked did not explicitly explain the mathematical concepts the author explained elsewhere, in another paper. The formulas are doing my head in, but do seem to be quite mathematically sound, from what little research I've done. (PDF) Modelling the financial contribution of soccer players to their clubs (researchgate.net) He mentions the following passage (found in the 5th page): "A player’s rating depends on all other players involved in each segment: when the opposition has lower ratings, the players on a team must consistently obtain positive scores to maintain a difference in rating. If a team is consistently obtaining worse scores when a particular player is included, that player will be assigned a lower rating than the team mates. Players that appear in different leagues or divisions help to calibrate the rating levels in those competitions, to form an opinion on the difference in the average level of player quality." 1) Strength of schedule is accounted for, and rewarded or punished to a degree that is deemed fit (through sound mathematical principles, of course, not by a whim) by the author. The data-set of the earlier seasons, was used for the prediction model, which was tested on the 2013/2014 season to see if the numbers had predictive powers, which it had, so the following ratings were used for the following formula. 2) Sample size is key, since a consistent pattern across multiple years is required to make the numbers make sense. (PDF) Offensive and Defensive Plus–Minus Player Ratings for Soccer (researchgate.net) 3) This more recent paper uses the data of 26,619 unique players, who played in 38,126 matches across a time-frame of approximately 9 seasons or so. Since the quadratic terms create a U-shaped final graph that seems to peak at age 28, or so, with quite harsh ratings for inexperienced, or more mature players, I think the best way to compare players (who are all at differing stages of their respective careers), is to get the peak estimated rating (usually given around the age of 28), rather than the current rating. The problem is that none of this dude's paper publishes a peak ratings list, other than very few select examples (most noticeably Bayern Munich players), but I did notice he has a YouTube channel where he outlines the top 100 best players, with their estimated peak rating (with a new improved model used since January 2019). This was the list of the players with the peak estimated rating (as in, let's pretend everyone is fixed at their optimal ages, as opposed to varying in terms of stages of their careers). The details will be wrong (it's not agenda for certain players, I genuinely make lots of mistakes) because I manually wrote down the details from his YouTube video shown here. I purposefully picked this date (December 2019), because the more recent ones had older legends drop out entirely, making me unable to get their peak estimated rating. Of course, as the model is ever-changing with the influx of new data, the most up-to-date precise peak estimated rating values will be different. Best football players in the world (youtube.com) 1. Cristiano Ronaldo: 0.370 2. Lionel Messi: 0.366 3. David Silva: 0.344 3. Luis Suarez: 0.326 4. Thomas Muller: 0.325 5. Robert Lewandowski: 0.325 6. Neymar: 0.309 7. Manuel Neuer: 0.307 8. Mario Mandzukic: 0.305 9. Kyle Walker: 0.301 10. Sadio Mane: 0.298 4) I know the very mention of the above list will trigger some. Take your complaints elsewhere, I am not here to debate the finality of this list, just what the current specifications of the formula resulted in saying. My gut instinct does not agree with all of the names, nor do I intellectually believe that the mathematical equation is complete, and needs no fixes. But reading the thought process behind the mathematical formula, and the amount of data-set processed for this particular conclusion, was for me. Prior to this list, I actually was very sure N'Golo Kante would be rated really high by this formula, but he didn't even break the 0.2 barrier, even at his peak. I'm going to need some time to think as for the reasons. 5) I don't even understand what the hell is going on, and haven't finished reading all the minor details mentioned in the texts, nevermind having a great idea about all the details the author forgot about and what can be improved. I have no idea how some of you can claim to have understood all of the paper in like a 30 second read, and come up with the conclusion that this is a statistical dead-end. 6) I'll go back to reading this dude's work and respond to those who want to debate, not be annoying. It is better than responding to people who want to annoy me for taking an interest.
I think I should stick to easier topics next time, because the stress load is ********ing insane. This was a very difficult topic for me to digest, and I think I got really annoyed at Trachta10, because I was fascinated enough to read some of the papers, but found the topic challenging. I also believed I was being mislead by time-consuming, but ultimately fruitless exchanges with Trachta10, who I liked a lot for his previous work, but found him utterly insufferable in this debate. It is why I have ignored him. I do think he engaged in the conversation with a lot of agenda, and intellectual dishonesty. Maybe it didn't warrant such heavy insults, but I would have struggled to break the loop otherwise. I'll try disengaging in a more graceful manner next time, frustrated or not. I don't think I'm that smart (I have mentioned many times that I was taking my sweet ass time understanding even the basics of the papers, nevermind the more mathematically complex ideas, due to my own limitations in capacity). I do not think people here are dumb. However, for this particular topic, I just don't think some paid as much attention to the topic as me, but then used argumentative behaviour to push what they felt was right anyhow, without any intellectual curiosity, and just snide remarks as to why this topic was so simple, and easy to understand why it was mathematically doomed no matter what. With zero interest in the thought process behind the formulation of the paper. All fan-bases can be insufferable. People are insufferable in general, including me. Lionel Messi fans currently have more free-reign over their obnoxious behaviour, because history has written Lionel Messi as the clear victor, and you can often get away with saying the most stupid shit imagineable, and Lionel Messi's greatness and popularity will cover for it. If there's an agenda from my end, it's my annoyance at that phenomenon, not Lionel Messi as a player. I think he is a freak of nature, who can literally walk to victory in a game that demands non-stop movement. If one day I fight Cristiano Ronaldo fans, I hope it doesn't sway how I form my thoughts too much, by the number of internet likes I receive from fighting them. I hope I try to stay true to what I think is correct. I am biased too. I just think I'm less biased in this particular topic than some.
I don't even know the proper full results (YouTube only shows so much, and there is a pay-wall for the site that explains why the information is such a bitch to extract to begin with), the statistical significance of the results, the inherent biases, and the potential blind-spots of the approach. However, for the kind of longevity-focused and team-performance-centric viewpoint I tend to have, the premise seems killer, and the mathematical execution of is better than anything I can come up with. I'll try to digest the results, not just the top 10, because I don't really think this model is the thing that'll help decide the outcome of Lionel Messi versus Cristiano Ronaldo. Rather, my focus is on the lower end of the hierarchy, and if the model is more accurate at rating the non-world class players that the Ballon d'Or forgets about. The 90% of the professional player base. That's more my focus, because I think WhoScored has somewhat warped our entire perceptions as to thinking playing in the style of Lionel Messi, is a virtue worth chasing in pragmatic terms also. In my opinion, players like Adel Taarabt and Hatem Ben Arfa should not be rated highly in pragmatic terms, especially given how much tactical leniency they receive from the teams. They are better at showing people what they can do with the ball, than figuring out how to best help the team via those abilities. I want to find their numbers via this model. That's my gripe. You can debate Lionel Messi versus Cristiano Ronaldo elsewhere. It is not that interesting to me.
So you are not going to respond to me? Also, if you are responding to me than either tag me or quote me else I might not read it. I haven't read all of your words and it is silly to expect I do unless you tag me in some way and directly address me. I see now that you were saying some things I have missed. Just quickly started reading through the paper and this immediately caught my eyes: "While the proposed valuation system does look at players’ value as a function of their performance, it does not consider club performances and direct player contributions to such. No definitions of relevant player contributions to results are offered, the Opta Index being assumed as a sufficient measure of player quality instead." I'll let chatGPT respond, because if I say something, it will be "circular reasoning." "This passage critiques a player valuation system in football analysis by pointing out certain limitations in how it assesses player performance. Here's a breakdown of the key points: "While the proposed valuation system does look at players' value as a function of their performance": The valuation system being discussed is designed to assess players based on how they perform on the field. The analysis seems to focus on individual player performance metrics to derive a player's value. "it does not consider club performances and direct player contributions to such": The critique here is that the system fails to account for how the player's performance impacts the overall performance of their team (the club). It suggests that there’s a gap between individual assessments and how those performances contribute to team success. In football, the value of a player is often determined by their influence on the team’s outcomes, such as winning matches or securing trophies. "No definitions of relevant player contributions to results are offered": This means that the valuation system doesn’t explicitly define which player actions or behaviors are important in determining match outcomes. For instance, it doesn’t specify whether assists, defensive actions, or pressing are critical to evaluate player effectiveness. "the Opta Index being assumed as a sufficient measure of player quality instead": The Opta Index (a widely used performance rating system) is taken as a given in the valuation model. The authors are pointing out that instead of creating its own way of evaluating which contributions matter for team results, the model relies entirely on this existing metric, which could be too simplistic. The Opta Index may overlook context like how a player's actions contribute to their team's overall success. Summary: The passage argues that the valuation system assesses players based on their individual performance metrics (like those from the Opta Index), but it doesn’t account for how these performances translate to team success or define specific contributions that directly influence match outcomes. As a result, the model might lack depth in understanding the true impact of a player on the team’s overall performance."
That is just your opinion that there are players of the style like Messi and that whoscored or any other football algorithm is biased towards it. If the whoscored algorithm favors players like Messi, then why is only Messi so much ahead of everyone else, even ahead of those with his specific style? Or does Messi has so much different style to anyone else in football? You don't have to answer, but just to make it super clear. it is YOU who keep bringing this up and implying conclusions that Messi's playing style is something worth chasing in pragmatic sense. I haven't seen anyone argue otherwise. And btw, these algorithms likely underwent machine learning training on incomprehensible amount of data and ratings are a result of that. At least sofascore suggests that they have done that. They likely covered many types of data and have destilled some that are the best predictors of results and use those at the core of their evaluations. It is not few people behind the scenes pondering about whether through ball should be rated 0,21 or 0,24 and trying to favor any one player.
That's interesting, but do you know that this considers matches where the player didn't even play a minute? So, are you talking about a method that only takes into account the player's playing time? That would be something completely different because what this aims to do is measure how well the team performed while the player was or wasn't playing, even in matches where they didn't play.
I am not sure I follow... I gave an idea that would give 3 different sets of data. For example, when team scores, it can happen in 3 scenarios: 1. player is on the pitch and was involved (+1 point) 2. player is on the pitch and wasn't involved (0 points) 3. player wasn't on the pitch.(0 points) The same for goals conceded.. that would give 3 sets of data that can be then manipulated and compared however.. The only obstacle then would be to think of a clever set of criterias for involvement. But I see some holes in this as well.. to be clear, the bottom-up approach to evaluation for me is by far the best. Combination of data and subjective evaluation.
Sure, but if the player doesn't play a match and his team wins without him, would that be a negative value for the player? Because that's what the Plus-Minus model in football is about...right?
I wrote in detail so you understand my perspective (as in the motivations for my interest, and the reason why I believe in the general direction of the formula, not its final output as of now), and lessen the needless insults. I'll get to your other points with time, I am not ChatGTP that pops out answers in 10 seconds. Although I must say, I much rather prefer reading the papers itself than trying to convince you otherwise. 1) The mathematical premise seems sound, or at least superior to the models used by WhoScored (although I can only guess the nature of it, I have no idea what precise model they use). The computing speed and power of artificial intelligence is vast, but it cannot escape being misdirected. Give it the wrong initial premise (even if it has been deemed by society to be the best model humanity has to offer), and it will take the wrong course of action, or take way too long to correct itself. This has been proven in other fields of artificial intelligence already, including deep-learning programs. I would much rather give whatever learning algorithms this premise over the bottom-up tallying of actions, although I have learned a lot from these sets of data also. It is not that they are useless and should be stopped altogether, to stop the blatant bias for or against Lionel Messi, it is to start a new pathway of analysis that seems to cover a lot of the blind-spots of the most frequently used models of analysis today. If WhoScored uses deep-learning algorithms, why not use it for other models also? Why are you taking 5 seconds to conclude it is a dead-end, and then spending 5 minutes trying to think whether Lionel Messi is best represented by this model, especially versus Cristiano Ronaldo. I already told people I truly don't give a shit about who comes out on top. You should talk to their respective PR managers, and the influencers on media about this battle of legacies. 2) Accountability for success If you increase the sample size, which I have emphasized many times over, the need for accountability decreases in my opinion. Either that, or you have found the luckiest ************************ in the entire world who magically allows his team to win non-stop for a decade from his mere presence alone, despite just sitting in the middle of the pitch masturbating to the attractive women in the audience. 3) Lionel Messi I do not care much for the top 10 list, and whether Lionel Messi tops Cristiano Ronaldo. It was posted with that number due to lack of available data for the peak estimated ratings, and because I do not have time to check all 100 players in detail The topic of "Lionel Messi versus player X" is a topic that you have processed inside your head about 10,000 times more than me. I truly do not give a shit, although I do have vague theories of my own. The reason why I said I wish to focus more on the lower end of the spectrum is because I care about the mathematical model being more accurate for the 26,619 unique players it processes the data for, than making sure all the subtle nuances between Lionel Messi and Cristiano Ronaldo are covered, and brute forcing those differences to the rest of the mortals who may impact the game in completely different and huge variety of methods and functions. I think WhoScored cares more about the ordering of the first two or three pages, than it does about the rest. Maybe you don't think so. As somebody who has expertise in measurement and data-processing, wouldn't you agree that the model that focuses trying to capture the all the subtle nuances and previously uncaptured genius of these magnificent on-the-ball magicians and playmakers, will have huge ramifications when that exact approach and formula is then forced onto the rest of the players? Why is that somehow a slight versus Lionel Messi? Why do you always revert the topic back to Lionel Messi no matter how hard I try? So I care more about the general universal applicability of the model, which I suspect the Plus-Minus may overtake the more bottom-up WhoScored-esque models, if given sufficient time and man-power. 4) Being annoying or condescending I am sorry if I came across this way, but I do think I put in more time trying to be intellectually ready for this topic. The amount of clarity and zero-hesitation when approaching such a complex mathematical concept (for me at least), and the amount of foresight required to come to such conclusions made me feel like I was being conned. I still think it comes from a place of trying to be a better admirer of Lionel Messi, than trying to discuss the intracacies of this formula, and what it can do, what it fails to do, and what can be done to improve it. I will try to respond with more substance of my own, but you have to let me read, instead of constantly engaging in acts that can be interpreted from my end as misdirection and agenda-driven behaviours. This is the best I can offer without having understood one iota more about the topic compared to the moment in time when I last posted, due to other responsibilities. It's taking me a long time to adequately understand what I think is a fascinating statistical piece (in terms of concept, not execution, I cannot emphasize this enough), let me post about it without coming across as a constant nagging presence that wants to negate the very premise of it, due to idealogical concerns, not intellectual ones.
I'm strongly struck by how the discussion moves towards more complex topics without ever stopping to question the foundations or whether the underlying idea even makes any sense. I'm not going to talk about the math, but rather the idea behind it. Correct me if I'm wrong. But this Plus-Minus model generally aims to measure how a player makes their team play better, meaning the team has a certain performance when the player is on the pitch, and when the player doesn't play. To measure this performance, the model uses the 'team's goal difference.' I already see this as a problem; it's very simplistic to assume that the team's performance aligns with that goal difference. But anyway, let's pretend this makes sense... Could we say that the general idea is to measure the positive influence a player has on the team? If this is true, my problem is that this influence must be relative to the team. And the interesting thing is that my logic tells me it's easier to have a big influence on a weaker team since there is less competition between players within the team. Imagine if you take Messi out of Barcelona and put him in Eibar, I believe that as an individual within Eibar, he would have much more influence than he does in a team like Barcelona. In other words, I think there's no doubt that the same player in a weaker team would be more influential than if we put him in a stronger team. However, since the Plus-Minus model only measures goal difference. And it's impossible for a player, even if they were the reincarnation of both Pelé and Maradona together, to have a large goal difference in a weak team. So I suspect this model is really not measuring the player's actual influence. If we're talking about super teams like Real Madrid, Barcelona, Bayern Munich, or similar, these teams are made up of the best players in the world, and in general, they will almost always win. Meaning, if you take Messi out of Barcelona, they will still win games. If you take Cristiano Ronaldo out of Real Madrid, they will still win, and the same goes for Bayern Munich without Thomas Muller. So, I find it hard to believe that any player from those teams could be so influential as to be considered the most influential player in the world. Logically, I would think that the players who have the most influence on their teams would be from not-so-strong teams, precisely because there's less competition within the team. I'm speculating because I don't have certainty, but if I had to bet, I'd say it's impossible that Messi, Cristiano, or Muller are the players who have the most real impact on their teams. That impact should be inversely proportional to the level of their teammates (more or less). So, the fact that the Plus-Minus model ranks players from the best teams in the world makes me doubt whether it's really measuring the player's influence on their team.
For those that care, and were legitimately concerned with the mathematical flaws of the model. Those of you who just like being experts about every single tiny details of certain players, keep doing what you do, but don’t ever belittle the mathematical integrity of really smart people like the author of this paper, after about 2 seconds worth gut-reaction and immediate throw-at-the-wall-and-see-what-sticks arguments, and doing your absolute best to sabotage everything that is being done from my end of the table to learn more. For every insult I did to the work of Trachta10, I was probably in the wrong, but think of this, what did the author of this paper ever do to warrant such immediate ridicule of his mathematical work? Even the premise? Even the mathematical integrity? Really? Nothing good can come out of this? You sure you didn't start the insults? The mathematical solutions given by the author of the paper basically seems to be two-fold in nature. I am a novice, so correct me if I’m wrong, but this seems to be the theoretical foundation. I will sound like an asshole, but at least I tried to do some justice to the work done here. 1) Tikhanov regularization, basically the mathematical solution, for the caused by independent variables causing huge swings in common linear regressions (basically most variations of potential problems that people like me would ask, like what about players who plays with strong teammates which in turn could influences the plus-minus formula). Common linear regression is basically a model that is way more intuitive (and go-to statistical model done by the brain, because it is so straight-forward to understand, and because of its simplistic nature also causes a lot of statistical errors). Within the Tikhanov regulariization equation, λ the regularization factor in the equation that ranges from 0 to infinity, where you can basically correct for all the interwoven independent variables messing up the formula. The closer you get to zero, the noisy data overpowers the formula. The closer you get to infinity, the noisy data gets crushed by the formula, at the cost of making the patterns too oversimplified. So there is a fine mathematical balance, where you either let the real data take over the statistical integrity, or you account for the interwoven nature of the independent variables mathematically, and protecting the integrity of the formula at the cost of overpowering the raw data. I couldn’t find the precise λ value used in the papers, but the fact of the matter is that you can just adjust it. Closer to zero, or closer to infinity. The range of options available for your statistical analysis is literally the entire non-negative real number spectrum. To say all potential variations of the aforementioned Tikhanov analysis is doomed from the get go, off a sample size of one, seems really weird to me, especially because the nature of statistical analysis is genuinely not that intuitive. Unless you have a deeper understanding of the mathematical equations, and have gone through the numbers, I have no idea how any of you managed to intuit an immediate mathematical dead-end from this one specific usage of a really complex mathematical formula. At best, you have found mis-executions that you disliked. That's how I feel anyways. 2) Sample size Basically the model gets exponentially more sound the larger your sample size, because the reliance on strong regularization diminishes with each added data-sets. The model isn’t perfect for week-to-week overview, and you shouldn’t need such heavy reliance on statistics for week-to-week analysis anyways, because that’s where our eyes and brain beat the current mathematical models. Our brains start to fail heavily once the data-sets becomes accumulated into their thousands, and we simply cannot track all the data (due to time-restraints) and our memories start to fail. It is why humans love to measure players by moments and their peaks, because our brain is wired that way. We are not supercomputers capable of processing that much data in a consistent and logically sound manner. For every nuance, and every crude statistical data you formulate to describe a specific player, it does nothing to help you extrapolate the findings universally for every type of player in existence. How is that a more sound mathematical premise compared to this? How far down the list, does your particular preferred algorithm method of choosing who is superior, start to breakdown and look like a complete mess? 78th best player? 290th? 25,049th? How fittingly does a model that is mostly aimed to describe all the well-known playmaking, dribbling, on-the-ball highlight reel machines do in that capacity? What is wrong with saying that is a real bias that is partly overcome through this formula?
This isn’t so much about knowing math or about being smart or not, it’s more about how one can forget to take certain factors into account. The problem is that they are venturing into more complex territories, but they never stopped to think about the foundations, to question the validity of the premises. Most people have deeply ingrained in their minds that 'if one player scores more goals than another player, then that player is better.' We automatically assume that a larger 'raw number' equates to something better, but football is a sport where the context of each team is very different and determines that number. You could have a very good player on a weak team, and their numbers won’t be as high, but if you analyze their particular context, you might conclude they’re not doing badly at all. Football, in particular, is an extremely collective sport. The Plus-Minus model is supposed to measure a player's influence on their team. Beyond the fact that trying to measure this through goal difference is somewhat simplistic, my fundamental issue is that it assumes, for example, that a player playing for Eibar has the same numerical possibilities as one playing for Bayern Munich. It’s literally the same thing that happens when someone says that Puskás is a better goal scorer than Ronaldo, or things like that. Clearly, goal difference will always be greater for the more powerful teams, though of course, what’s being measured is the goal difference with and without the player. Still, it favors those who play on stronger teams or those that score more goals. What’s interesting is that if we’re supposed to be finding out which players positively impact their teams the most, this doesn’t make sense on its own. It always has to be an impact relative to the team. Even so, I don’t see this Plus-Minus model as something entirely dead. I think it can be improved if we analyze each player’s context more thoroughly. That’s why I suggested using percentages, because I believe this way we could benefit players from weaker teams. In general, I’m not convinced by the idea behind the model. I still think it could be something very disconnected from reality. Even so, I never expect any statistic to be perfect, so I do think it’s at least something interesting.
My understanding is that classical +/- model (like the one in basketball) is pure team's goal difference for when player is on the pitch.. one can measure team's goal difference when he is off the pitch and compare this data afterwards, but that is separate thing. If I were to do it, which i wont because it is literally pointless, i would measure these data in 3 separates sets as I described above without trying to combine it in any kind of way and then analyze and interpret that afterwards.. Yeah, you hit the point there. Many "experts" are stuck in their own ways just looking for ways to use fancy tools to analyze data without ever questioning what they are measuring. It is difficult explain how ridiculous "experts" can be in their given field without some actually seeing it for themselfs. It is like using thermometer down to 0,0001°C in a wide open room with wind. Experts can be quite dumb and short-sighted by their expertise. It is quite common. No matter how accurate and advanced tools you use statistically, if you have nonsensical data, your results will be nonsense. As you said, neither goal difference aligns with team performance, let alone with a performance of a single player amongst 21 others. The model definetively assumes that goal difference is a correct, consistent indicator of team's performance, which is simply not true. This is already an almost insurmountable gap to close and we havent even touched on how that reflects performance of a single player. This is due to many variables that influence goal difference in football and their complex, chaotic interdynamic. In another words there is a huge element of randomness in football, which is why it is so exciting to watch in the first place. Furthermore, you've touched 3 additional key points: 1. Along side ! player who is analyzed, he has 10 other teammates on the pitch who have their own "random" (unpredictable) performance that will just as much influence outcome of match (the goal difference) as his own performance. The model doesnt differentiate between that. That is the Cristiano and Arbeola example of playing the same 38 games and coming up with the same +/- score. The model sees all players from the team as equal - meaning that for each goal for an against, it evaluates them as having the same influence. This is fundamentally wrong. If team scores a goal, credit is not spread out equal to all players of the team. Some are more worthy of credit than others. Classical +/- model offers no solution in that regard (which is why ive thrown the idea of "involvement" to somewhat mvoe around that flaw). In reality if someone performs 8/10 for a given game, his performance is not better if his teammates perform worse than 8/10, nor is his performance worse if teammates perform better than 8/10. But whether his teammates perform worse or better will be reflected in the goal difference, shich is measured, even tho the player in question hasnt performed any differently. The model simply has no mechanism of differentiating such scenarios.. (unlike the bottom-up approach). Although I dont think that Messi and Cristiano would be necessarily underrepresented by the model. 2. The model doesnt account for the level of performance by opposition's players. Just as teammates are equally capable of influencing the goal difference, performances of opposition can counter that to an equal extent. The goal difference is a result of interplay between 22 players on the pitch who are all equally capable of influencing game. Trying to discern how one player of 22 on the pitch performed based on such rough metric like the goal difference is impossible. It is really like using a kitchen scale to measure mass of one crystal of sugar. But maybe if you take 1000 measurements with the kitchen scale? No. It is not a problem of sample size. It is a problem of resolution.. 3. As i already touched upon above, along sidr there being 21 other players on the pitch capable of influencing the game to the same extent as player who is being analyzed, football is extremely unpredictable and random by itself. Things such as tactical set ups, deflections, referee mistakes all largerly influence the goal difference and therefore results of the model. The hope is that with large sample sizes you can cancle out this noise, but that is not the case for many reasons. This noise is massive and random when looked at from the perspective of player who is analyzed. The player has no control over so many variables that influence the goal difference like performance of his teammates and opposition, and massive amount of luck, yet the model judges him based on these events on the pitch. If noise was consistent, which is not, you could cancle it out, but it is a dead end situation. Random number minus random number is not equal to zero. It is still a random number with a veeery small likelyhood of that number being zero. Some of these points are detailed, but the overall point is that noise that is being measured with the brute goal difference is massive and random to the point it is not being feasible to account for in reality.
I am not sure if we are arriving at the same conclusions after reading the same paper. In fact, I think you are drawing conclusions made from an entirely different source altogether. Whatever statistical analysis you are doing right now, does not align with the contents I've read in terms of the mathematics and formulas used by the author. If you mean a Plus-Minus model can go horribly wrong if executed by a someone who has no clue about the mathematic work involved, yes, I think the threshold for proper execution is quite high, as is the potential theoretical ceiling. If it is productive debate you truly seek, can you actually address the contents of the paper instead of just repeating what you think? Or you can continue via your own metrics, and don't assume mathetical faults when you didn't even bother going through the formulas, and just let each other be. You are following the same path of wilfull misrepresentation, without even having the courtesy to check the original source.
If I ask for one thing, don't assume mathematical faults on a dude who has been doing this as his job without even reading his papers, it is his realm of expertise. It is quite insulting and intellectually dishonest, and what's more you do it willingly just to prioritize the legacy of Lionel Messi, on the off-chance this might derail whatever progress you made through your own works. This was why I was so critical. I already addressed the issue of multi-collinearity, and how the author specifically targets against it with his statistical models, and how Tikhanov regulation equation literally has infinite choice of what lambda value to use, to specifically battle against this issue. That means no matter how complex and massive the number of independent variables, the author has the power to brute force his way out via modifying the regression coefficients. It comes at a price, but the need for a suffocatingly large lambda value decreases with increased sample size. It is why I said the solution seemed to be two-pronged, and you literally by-passed my entire argument with a mental exercise of using a 38 game sample size with the most redundant mathematically flawed model that the author doesn't even use. The premise was fine. The mathematical work was as good as I've seen in any footballing journal. I think there's further work to be done. I am disgusted by how quick people who don't even read the papers just make assumptions on the supposed flawed nature of the mathematics based on their gut-reaction to the names on the list. I'm losing my sanity trying to argue against clear misconceptions about the paper, when it is the same innane points over and over again, and what's worse is that you can just read the papers.
You have absolutely no idea what you are talking about, as you've admitted yourself. The level of complexity of 11v11 team sport and the level of randomness in a low scoring sport is so far and beyond feasability of anything remotely similar to that model, It is a non-starter. These experts of yours are deluded believing their own bs. If they knew what they were talking about or were doing something of quality, they wouldn't be making countless publications on random topics to scrape for a living, but they would be rich beyond their imaginations from all the betting money they won in football. The fact that you openly say that you are "new" to statistics, don't quite understand the paper and what the author is saying, and yet STILL are huge proponent of the paper and his words is honestly funny. You don't see irony in your words and attitude at all? Okay, please, continue reading the paper and wasting many hours of your time thinking about it before coming to the same conclusion you could have made by having a polite discussion with me or someone else.. i have been overly-patient with you at this point and you keep crossing the line of respect for no reason other than your own shit in your head...