successfully complete this project include the following.
1) Select a topic that
involves two quantitative variables,
2) Locate or collect data,
3) Analyze data statistically,
5) Document the results in a proper technical report.
Find a topic that interests you that involves bivariate data. It can be related to your educational
discipline, your personal interests, a current event, etc.
2) Locate or collect the data.
a) Identify the explanatory variable(x) and the response variable (y).
b) You can use existing data found on-line or in literature, or you can collect the data yourself.
Many on-line sites are available with free access to databases. Also, you can use newspaper or
magazine articles, almanacs, or other reference books. However, it is not acceptable to simply
pull data sets directly from a statistics textbook. In your search, you may find that researchers
have already analyzed the data that you are interested in. This is fine, as you will still need to
work through the complete statistical analysis. The authors may or may not have included the
raw data in the article. If the data is not included, the authors may reference where the data
can be found. This is an ideal opportunity to utilize the library and librarians. Also, keep in mind
you may use two different sources for your two variables. For example, you may find national
GDP data from one source, and life expectancy data from another source. Please use care to
match the two variables appropriately, such as by year.
If you plan to collect the data yourself, prepare a data collection plan. Keep in mind what we
have learned in class regarding data collection methods. However, in the interest of time and
resources, you may want to limit your sampling to our class. Note that only appropriate data
may be collected from your classmates – nothing will be allowed that infringes upon students’
privacy or investigates issues that anyone might find embarrassing or hurtful.
c) Identify whether the data you are using is from an experimental or observational study.) Statistical Analysis of Data
a) Construct a scatterplot of the bivariate data. Interpret.
b) Calculate and interpret the correlation coefficient.
c) Compute the equation for the least-squares linear regression line, using Excel, Minitab, or any
other statistical software. State the equation and interpret the coefficients.
d) Compute and interpret the coefficient of determination, R
e) Perform residual analysis on the regression model.
i. Plot the residuals against the explanatory variable and interpret the plot.
ii. Check for constant error variance (homoscedasticity)
iii. Identify any possible outliers in the data (are they influential or have high leverage?)
f) Since we often build regression models to use for prediction, provide at least one example of
prediction. If possible, compare the predicted value to the actual value.
4) Draw Conclusions
a) Provide an overall statement on the appropriateness and validity of using your regression
equation for prediction. Keep in mind, determining that an explanatory variable is not a good
predictor of a response variable is not a failure; that is valuable knowledge as well.
b) Discuss how the analysis results, coefficients, and residual diagnostics led you to make your final
conclusions regarding the usefulness and adequacy, etc. of your model.
c) Comment on the ‘extrapolation’ capability of your model. What range of values do you consider
acceptable for use with your model? Why do you consider that range acceptable?
d) Comment on how you might improve or change your regression analysis if you were to repeat it.
Suggest any areas for future study.
e) Provide any final comments relating to the project – likes, dislikes, difficulties encountered,
software issues, surprises.