Comments and forum for a non-linear regression on the Crew (SAS)

Discussion in 'Statistics and Analysis' started by taylor, Mar 1, 2004.

  1. taylor

    taylor Member+

    Jun 9, 2000
    Fav team: FC CARL ZEISS JENA
    Club:
    --other--
    Nat'l Team:
    Germany
    Hey everyone. I'm semi new to this forum. I normally hang out over on the DC & YA Abroad forums, but Maxim-1 told me this board might be a source for some help or peer review for my econometrics project.

    The project will be an estimation on the demand for attendance for the Columbus Crew from 1996 to 2003. The model will be computed in SAS and minitab to calculate a non-linear model. I'll be using all the bells and whistles on the programs (well at least the ones that I know of e.g. proc model autoregress, binary, transformation etc...). I'm planning to use around six variables (price, stadium, weather, winning record short term, long term, and net average goals using the aboved mentioned techniques) but am open to more variables (e.g. over fitting it) or expanding the scope to several teams or MLS as a whole.

    If any of you have any suggestions and or comments, I would appreciate it. Especially if anyone can provide some new raw data, I would be forever in their cyber debt and would buy them a beer or two if you ever come to Berlin, DC, or Bloomington. Finally, as a caveat, if you write and I don't immediately respond, please don't take offense. Life as a grad student can being very draining (e.g. I haven't slept in 30 hours).
    Cheers,
    Taylor
     
  2. mpruitt

    mpruitt Member

    Feb 11, 2002
    E. Somerville
    Club:
    New England Revolution
    Gosh, all of that seems way over my head at least, and certainly a very ambitious project. One that probably should've been done for quite some time now. In terms of getting at data, I wonder what kind of stuff Kenn has. Unfortunately he's been kind of MIA. I'm sure you know his site and the numbers he has gathered, that may be of at least some assistance to you in terms of days, times, double headers etc etc. Other than that in terms of correlating it with weather or winning percentage, obviously the game reports are all there on MLSnet.com and the pricing information wouldn't be hard to find off of the team's websites. Unfortunately gathering all of that would be pretty labor intensive but definately doable. One question I wonder is if game reports from MLSnet.com include weather and temperature. I know you see that kind of thing from game reports on Soccer Times, at least then it'd be all there for the taking providing you'd want to take it.

    Edit: just went back and answered some of my own question. This of course is the list of links with game reports, which do include weather and temperature, also the team's record at the time. So there's that. I'm curious though are you trying to look for how tickets are distributed throughout the stadium by price or what?
     
  3. taylor

    taylor Member+

    Jun 9, 2000
    Fav team: FC CARL ZEISS JENA
    Club:
    --other--
    Nat'l Team:
    Germany
    Sorry, I forgot to mention that I have collected all the data for the Crew for this time period. If anyone has the data for other teams and or familiar with regressions involving sports it would be great to discuss here on the board.
    Thanks again.
     
  4. taylor

    taylor Member+

    Jun 9, 2000
    Fav team: FC CARL ZEISS JENA
    Club:
    --other--
    Nat'l Team:
    Germany
    Well, you already hit one of the nails on the head. The problem is that there is (probably) going to be some really high measurement error for attendance because
    A) It is unclear how many actually pay full price or recieve some sort of MLS price cut
    B) I don't have receipts for different categories of actual sales for different seating. Also, if you have ever been to Crew stadium (I have) price segragation (sorry, I am forgetting the offical term for seperating product right now) doesn't work because one can easily walk down to the closest seating without any trouble.

    Any ideas as to how to minimize these problems????
    I also forgot that I took July 4th data and double headers data, although interestingly enough the Crew had the least amount of double headers that I oberservationally noted.
     
  5. mpruitt

    mpruitt Member

    Feb 11, 2002
    E. Somerville
    Club:
    New England Revolution
    I'm not sure that you'd be able to get your hands on that kind of information without getting it directly from the club, which would probably be a little bit dicey, You could always try but breaking it down to price points seems like it might be a little unobtainable, to me at least.
     
  6. mellon002

    mellon002 Member

    Jan 24, 2003
    Towson, MD
    DC is expected a sell-out crowd for opening day because of Freddy and judging by the preseason hype (DC-KC game sold-out) I'd say you can take it to the bank that when DC comes to town you will have a sell-out.
     
  7. mpruitt

    mpruitt Member

    Feb 11, 2002
    E. Somerville
    Club:
    New England Revolution
    Well whether you ment to suggest it or not, that might be another variable for taylor to take into consideration, team opponents.
     
  8. mellon002

    mellon002 Member

    Jan 24, 2003
    Towson, MD
  9. microbrew

    microbrew New Member

    Jun 29, 2002
    NJ
    Other things that may affect attendence:

    Time of day and day of the week. Games at the wrong time may conflict with local kid and rec league games. Weekday games tend to have a much lower attendance.

    I thought about various promotions, but the only one that might significantly affect attendance is the 'buck-a-brats', or so I've heard.

    Is there any effect due to soccer games as the same time as NFL or college football games?
     
  10. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    I don't know anything for sure about this regarding MLS, but I'm pretty sure the Sounders fans have recorded a pretty significant difference in attedance on days when the Mariners are in playing in town.

    (And I'd be shocked if Columbus didn't see a lot of the same thing with Ohio State.)
     
  11. taylor

    taylor Member+

    Jun 9, 2000
    Fav team: FC CARL ZEISS JENA
    Club:
    --other--
    Nat'l Team:
    Germany
    Columbus doesn't have NFL football so I didn't record that data.

    I also forgot to include some other variables which I have recently recorded. There are three (after a ton of net mining) TV variables. I now have the games for when the Crew were aired on local, espn, and spanish speaking stations in Columbus.

    As I said, I am still interested in collecting data. If anyone has some relevant stuff. Or (and more importantly for me now) if any of you know how to deal with regressing sports demand curves (sas and minitab format), I would appreciate any insites. I realize of course, that MLS is a way is still a virgin subject in American sports, so anyone with info on other sports is welcome too.
     
  12. taylor

    taylor Member+

    Jun 9, 2000
    Fav team: FC CARL ZEISS JENA
    Club:
    --other--
    Nat'l Team:
    Germany
    HELP!
    Ok, I am in a bit of trouble. I am required to run a two stage least squares model and don't know what my second equation can be! As a reminder, I have the data collected from 1996 to 2001.

    my first equation is

    model one: fans(attendance)=end(binary, whether they won the previous game or now) opp(binary, whether opp team was good) price(price per yearly avereage) entv(binary, whether game was televised in English) pop(population of Columbus) spantv(binary, whether game was televised in Spanish) newstad(binary new stadium or not).

    I however need to have a second model, and none of these are proper instrumental variables for a second equation! Does anyone know what I can do and or have other variables that I can collect that would relate.

    For those that haven't done this madness, I am in desperate need of a variable(s) that relate to the above mentioned variables (heirarchicial or simultaneous).


    If any of you have ideas, I would greatly appreciate them
     
  13. mellon002

    mellon002 Member

    Jan 24, 2003
    Towson, MD
    I'd PM ChrisE if you don't hear anything on this thread.
     
  14. ChrisE

    ChrisE Member

    Jul 1, 2002
    Brooklyn
    Club:
    --other--
    Nat'l Team:
    American Samoa
    What timing!

    Unfortunately, I don't really understand what you're trying to do, taylor. Try numerista and, failing him, voros.
     
  15. Guinho

    Guinho Member+

    San Jose Earthquakes, bless their hearts
    Estonia
    May 27, 2001
    San Francisco, CA
    Club:
    San Jose Earthquakes
    Nat'l Team:
    United States
    Well, it seems you are going into areas of nonlin regression I am less familiar with, but if I follow you you are looking for something else these variables could explain. The only thing that comes to mind for me is winning percentage if you are up for having one of your models be logistic or some cousin. If you could get info on the sales of different seats by price, you might try explaining expensive seats and cheap seats. If you are willing to limit yourself to televised matches only, perhaps TV share is a good second model to fit?

    I'd also make some practical suggestions:

    1) I liked some of your initial ideas for variables earlier, like weather, etc. what happened to those.

    2) the population of Columbus is not likely to have varied much and is going to be insanely autocorrelated in time. In fact, this whole data set is going to have temporal autocorrelation issues that will influence any hypothesis testing you do. Make sure you account for this. (It's been a while since I've dealt with autocorrelated data, so I'm too rusty to make more specific recommendations.) Other than that I can't imagine you have much variation in this variable, so you are including a variable without much potential explanatory power, which means you are burning degrees of freedom for nothing.

    3) I'm not sure, but since you have a bunch of those changing in time more or less concurrently (esp. the two televized variables, newstad, pop, and price), I'm guessing that in addition to autocorrelation in the data set, you may find high levels of mulitcollinearity as well. Overall, by the time you correct for this, you may be looking at a greatly reduced number of degrees of freedom (depending on how you go about dealing with these issues).

    4) Since you have roughly 7 x 15 = 105 games, and 7 explanatory variables, by the time you include interactions and non linear terms, account for multicollinearity and autocorrelation you may find that the model explains squat. One of the sad realities of such work.


    Anyway, those are some of my thoughts. I wouldn't mind being wrong about some of these, but be sure to let me know if I am!

    G.
     
  16. numerista

    numerista New Member

    Mar 21, 2004
    Afraid I don't have the cycles to get too far into this, but here's the kind of thing Taylor wants to do.

    Suppose he wants to know the effect of weather on the number of people in attendance, but attendance numbers aren't so good for this purpose. This is because some people buy tickets but don't show up when weather is bad.

    In order to correct this bias, he needs some variable that is related to weather and attendance but that don't have this unwanted correlation ... for example, parking revenues, concession revenues, or even somebody's anecdotal description of how full the stadium was.
     
  17. taylor

    taylor Member+

    Jun 9, 2000
    Fav team: FC CARL ZEISS JENA
    Club:
    --other--
    Nat'l Team:
    Germany
    Hey everyone, I am quite busy with a Cost Benefit paper, so please excuse my recent reticence. Guinho, you're absolutely correct on the multico and autoco (I didn't post that info), but I believe there are some things I can do to get around that (e.g. transformation). I don't beleive the problem is as bad as you may think (i.e. Watson test was around a 1.5 etc..) as for the second equation, I think I am going to try a heirarchical supply side. The problem, again is lack of info. Developing a cost curve can be a bitch with MLS.

    oh and as far as price discrimination for tickets, I am going to blissfully live in ignorance, as my prof will too :). It is just too hard to account for that given my very limited amount of info on ticket sales.

    As for the other variables, I got lazy and thought I wouldn't need weather. But as you correctly said, I need some non-binary variables. I however won't be able to do much work on this until the weekend so excuse me in advance for not responding sooner.

    Now off to the bloody bursar for a good bleeding...
    Thanks again for all your help.
    Cheers,
    "Taylor the Endebted"

    ps. I think I can also use some Interchangable (wc?) Variables for some problematic variables.
     
  18. profiled

    profiled Moderator
    Staff Member

    Feb 7, 2000
    slightly north of a mile high
    Club:
    Los Angeles Galaxy
    I don't know an incredible amount about this sort of thing, but thought I'd offer up a suggestion.

    Have you actually got in contact with someone from the Crew/MLS and asked them if they could provide any of the data you require? You never know they might suprise you and be very helpful.

    A friend of mine did his masters thesis on something involving sports (can't quite remember what, but it wasn't financial or math based), and got in contact with the Galaxy and they set him up to meet with the team and some of the players, and where overall very helpful, so it can't hurt to ask.
     
  19. Guinho

    Guinho Member+

    San Jose Earthquakes, bless their hearts
    Estonia
    May 27, 2001
    San Francisco, CA
    Club:
    San Jose Earthquakes
    Nat'l Team:
    United States

    Particularly if you share the results of the analysis with them. It's actually the sort of thing they'd have to pay someone a lot of money to do for them otherwise probably, and who knows, what you uncover might just be useful to them.

    G.
     
  20. mpruitt

    mpruitt Member

    Feb 11, 2002
    E. Somerville
    Club:
    New England Revolution
    I just want to thank Huss for putting the Stats and Analysis page and threads on Bigsoccer Live. I get a weird kind of pleasure from seeing a conversation about non-linear regressions on the front page.
     
  21. taylor

    taylor Member+

    Jun 9, 2000
    Fav team: FC CARL ZEISS JENA
    Club:
    --other--
    Nat'l Team:
    Germany
    A couple of years ago, a guy from my undergrad worked there and was able to help me out (that is why I am using the Crew over DC). Now that he is gone, I don't have any inside person.

    After your post I decided to try. I was forwarded to a different person (Mr. Wuerth) and left a message. I will try calling back after class. But, I am skeptical of this guy disclosing info because he is the PR director. I, therefore, doubt he is familiar with regressions, but it is worth a shot.

    Guihno, I am skepitical that they would understand how important and expensive a tool this could be for them, if they had to pay for it.

    I will post later today about my new contact.
    Cheers,
    Taylor
     
  22. taylor

    taylor Member+

    Jun 9, 2000
    Fav team: FC CARL ZEISS JENA
    Club:
    --other--
    Nat'l Team:
    Germany
    I left a couple of messages. He never called me back. I will try again tomorrow (Thursday).
     
  23. Auxodium

    Auxodium New Member

    Apr 11, 2003
    Perth, Australia
    and what will this acheive? better marketing? :eek:
     
  24. taylor

    taylor Member+

    Jun 9, 2000
    Fav team: FC CARL ZEISS JENA
    Club:
    --other--
    Nat'l Team:
    Germany
    Well, I have had quite some success. I was able to take out the autoco by using a yule-walker function and the multico is within workable parameters...(barely on a couple). Below are the estimates for the Crew. For those who don't know how to read this, I will offer a quick synopsis.
    First however, let me offer the caveat that I opperating with a 7% level of significance.
    The first thing I would like to say is that the model is obviously underfit, so the results should be intepreted with a high degree of flexibility when thinking of the results.

    Now to the juicy stuff. The variables below explain 35% of why fans attend games.
    The most obvious significant variables are price and stadium. E.G. a $1 price increase causes a -2269 in fan attendance. The new stadium resulted in 6459 new fans per game. A surprising variable is televised games in Spanish (1600 people don't show up for televised spanish games). The role of a high quality opponent on attendance does not play a role in the conducted model. That is to say, a winning team does not draw more than another team. Population is also found not to be significant. Finally, winning matters. If the Crew won the previous game, 3146 people attended the current game.

    So there you go folks. As I said before, the numbers should be viewed as estimations. That is all they are. Since I do not have all the information available to me, one MUST interpret the values with some salt. But they are sure as hell more scientific than anything else I have seen, IMHO. If you are interested in the details or if you would be willing to collect a lot of data for your team to see the results, send me an email.
    Cheers,
    Taylor


    The AUTOREG Procedure

    Yule-Walker Estimates

    SSE 936985352 DFE 84
    MSE 11154588 Root MSE 3340
    SBC 1804.49284 AIC 1781.69944
    Regress R-Square 0.2732 Total R-Square 0.3540
    Durbin-Watson 1.7457


    Standard Approx
    Variable DF Estimate Error t Value Pr > |t|

    Intercept 1 -38411 42814 -0.90 0.3722
    won_last game1 3146 853.9676 3.68 0.0004
    price 1 -2296 694.2424 -3.31 0.0014
    population 1 77.3345 47.3620 1.63 0.1062
    opponent 1 700.1484 719.6596 0.97 0.3334
    engtv 1 -28.3462 925.6405 -0.03 0.9756
    spantv 1 -1696 922.1362 -1.84 0.0694
    newsta 1 6459 2212 2.92 0.0045
     
  25. mpruitt

    mpruitt Member

    Feb 11, 2002
    E. Somerville
    Club:
    New England Revolution
    Yeah I'd have no idea how to interpret these results but how'd you finally come about to yoru success? Were you able to contact someone with the Crew, were they helpful?
     

Share This Page