Sports Analytics

Sports Analytics has become one of the hottest fields, due in part to success stories like those described in “Moneyball”.Organizations of all types continuously look for ways to gain an edge over their competitors, and athletic teams are no exception. In fact, the urgency to manage and leverage data may be even greater for athletics, because the need to win is so critical.

In this exercise, you will be acting as the data analytics director for a mid-market major league baseball team, which has limited funds to acquire free-agent talent this coming season. Your team was about average last year in terms of wins, batting average, on-base percentage, earned-run average (era), and errors.

Team ownership wants to see the team reach the post-season, which means the goal is to finish 1st or 2nd in the division next year. Assuming that every key player returns from last year’s team and remains healthy, which type of player should the team pursue: a player who can bat and/or get on-base, pitch well (i.e. low era), or commits few (if any) errors?

The background for this project can be found at http://www.seanlahman.com/baseball-archive/statistics/ .Mr. Lahman has assembled one of the most comprehensive sources of baseball data and has made this data available to researchers for free. We will be using the 2017-2018 data, specifically the Teams.csv file. In this assignment, you are looking for statistics that are most important when it comes to winning, specifically team batting average, on-base percentage, team era, errors, or any combination of the above. It is important that your analyses be properly grounded so that the conclusions you reach are valid and correct.

Your job…

First, you need to write a SAS program that will read in and analyze the baseball team data.Here are some recommendations:

Read in the “Teams.csv” file, which is in the Assign5 folder.
/courses/d031ece5ba27fe300/MIS370/Assign5/Teams.csv
Make sure that your control totals balance back to those in the Teams.csv file.
You will only be analyzing team data from the 2017-18 seasons.
You will need to calculate batting average (ba) = hits / at-bats and on-base percentage (OBP) = (hits + walks)/(at-bats)
Create a dummy variable (top_two), where 1 is for those teams that finished 1st or 2nd in their divisions, and 0 for teams that finished 3rd, 4th, or 5th.
Perform a t-test, by top_two, for each of your key variables (wins, ba, OBP, era, e)
Be sure that your top_two variable is formatted meaningfully.
Provide a boxplot, across top_two, for a variable that has a significant difference, and another that does not
Be sure that your top_two variable is formatted meaningfully.
Run a correlation analysis of your key variables (wins, ba, obp, era, e)
Run a multivariate regression, in which wins is the dependent variable and ba, obp, era, and are the independent variables. Be sure to include an analysis of multicollinearity and a residual plot against one of your independent variables.
Then, prepare a business report for the executive team that explains your activities and findings. Your writing must be clear and concise, but accurately and adequately explain the situation you addressed. Please use the following format:

Executive Summary
Background
Describe the goal of your research.
Describe the dataset you’re using, and how the data was obtained.
Describe the methods by which you’re analyzing the data.
Results
Describe your results in written form. There will be a separate section for each analysis (three in total).
Use proper English but be direct. Avoid 1st person and 2nd person language like the plague.
Provide descriptive statistics of the variables you’re analyzing. Summarize important findings.
Provide the results of your t-tests for the variables you’re analyzing. Summarize important findings.
Your two box-plots should go in this section with SYTB – one for a variable with a meaningful difference, and the other for a variable without a meaningful difference.
Provide the correlation matrix for the variables you’re analyzing. Summarize important findings.
Describe the results of your multivariate regression analyses. Summarize important findings.
Conclusion and Next steps
Briefly summarize your results.
Describe the practical implications of your results.
Describe what can be done to further leverage this data.

IS IT YOUR FIRST TIME HERE? WELCOME

USE COUPON "11OFF" AND GET 11% OFF YOUR ORDERS