Data Analysis Exercise 4 (DAX4)

Do an analysis on the same dataset in Excel and SAS, and then report on our analysis. The data
that we will use for this analysis is called SASHELP.CARS, and the analysis we will do is an ANOVA. I have explored the
dataset and have developed a research question: Is there a statistically significant difference in the weight of sedans
based on their origin? To find the answer to this question, I am going to analyze the dataset and perform an ANOVA
comparing the sample means of the weight variable by continent of origin (USA, Asia, or Europe).
An ANOVA is a set of statistical methods used mainly to compare the means of two or more samples. Estimates of
variance are the key intermediate statistics calculated, hence the reference to variance in the title ANOVA.1
In testing the null hypothesis for a One-Way ANOVA, we generally follow these steps:

  1. State the plain language research question: Is the average weight of sedans in this dataset statistically similar
    regardless of origin?
  2. State the hypotheses:
    • Null hypothesis – HO: µAsia = µEurope = µUSA
    • Alternative hypothesis – HA: Not all means are equal.
  3. State the criteria for rejecting HO:
    • α = 0.05
  4. Test the assumptions for the One-Way ANOVA:
    • Assumption of Independence (Research Design)
    • Assumption of Normality (Shapiro-Wilk Pr<W, and examination of graphs and charts)
    • Assumption of Homogeneity of Variance (Levene’s Test of Homogeneity)
  5. Compute the Test Statistic:
    • Calculate the F ratio
  6. Decide whether to retain or reject your null hypothesis:
    • If p > α, then retain the null hypothesis
    • If p < α, then reject the null hypothesis, and accept the alternative hypothesis
  7. Examine post tests for which categories are different.
  8. Download “Business Memo Template DAX4.docx” from the DAX4 Assignment.
  9. Save the downloaded file as a Word file and name it “LastName DAX4” without the quotation marks. The file
    extension will be added automatically. For example, my file would be “Bohler DAX4.docx”. Again, ignore the
    quotation marks in this part of the instructions.
  10. Edit the date, from, and to fields of the memo to have the correct information.
  11. For the subject area, change the title to something like “Examination of Sedan Weight by Origin” or something
    similar.

1 http://www.statsdirect.com/help/analysis_of_variance/anova.htm
Data Analysis Exercise 4 (DAX4)
ANOVA SAS and Excel
2

  1. Note that paragraph 1 has been completed for you. In general, the statistical questions would not likely be
    included in a business memo but are in this one to show what you are testing.
    • Null hypothesis – HO: µAsia = µEurope = µUSA
    • Alternative hypothesis – HA: Not all means are equal.
    Formatting/Filtering/Exploring the DATA and Testing Assumptions
    Creating Filtered Dataset Video
    Summary Statistics With Classification Video
  2. In paragraph 2, Discuss the steps you took to format/filter the data and test assumptions. The following steps
    will help you fill out paragraph 2.
  3. Open SAS.
  4. Select Task-Data-Filter Data and use the following settings for saving your dataset. (YOU WILL HAVE TO USE THE
    LIBRARY NAME THAT YOU CREATED SO YOUR DATA SET NAME WILL BE DIFFERENT!!!!)
  5. Run the filter data and go to your library to make sure that the dataset was created.
    Data Analysis Exercise 4 (DAX4)
    ANOVA SAS and Excel
    3
  6. Run the summary statistics for weight with origin as a classification variable. Select the statistic options as
    previously run in DAX 2 and select all the options for plots. What are the values for “Origin”? What is the N for
    each “Origin” and are they approximately even? What is the N, Minimum, Mean, Maximum, and Standard
    Deviation of Weight for the dataset? Are there any missing values? Add any appropriate charts and tables to
    your Business Memo.
  7. Run the summary statistics for weight without the classification.
  8. Use the output tables from both sets of statistics to create Table 1 (Similar to below w/o the watermark).
    Origin N N Miss Mean Std Dev Minimum Maximum Range Skewness Kurtosis
    Asia 94 0 3,161.37 584.29 2,035 4,802 2,767 0.23 (0.43)
    Europe 78 0 3,613.77 482.45 2,524 5,399 2,875 0.95 2.81
    USA 90 0 3,461.24 566.60 2,348 4,605 2,257 (0.02) (0.83)
    Total 262 0 3,399.06 579.00 2,035 5,399 3,364 0.11 0.07
    Data Analysis Exercise 4 (DAX4)
    ANOVA SAS and Excel
    4
  9. To perform an ANOVA, we need to make a few assumptions:
    a. Independence: The observations are random and independent samples from the population.
    b. Normality: Each group sample is drawn from a normally distributed population (errors are normally
    distributed).
    c. Homogeneity of variance: The variances of the distributions in the populations are equal.
  10. Test for Independence: This is a methodological concern and is determined by the set up for the study. For this
    assignment, please consider that the assumption of independence has been met in the research study design.
    You should still remark that there are several records in a group and if the group sizes are approximately the
    same.
  11. Test for Normality:
    a. Is Skewness more than ± 0.8? If yes, that is a concern.
    b. Is Kurtosis more than ± 3.0? If yes, that is a concern.
    c. Do the histograms appear normal (more or less)? If they do not approximate the normal distribution
    curve, that could be an issue that indicates that the normality assumption for an ANOVA is invalid.
    d. Look at the boxplots. Are there any extreme outliers?
    e. If your data does not violate these rules of thumb, you can accept the ANOVA assumptions have been
    satisfied for this analysis.
    Additional test for normality
    (Optional for this exercise)
    Open the “t Test” task, and on the DATA tab, select the SASHELP.CARS dataset, One-sample test, and select
    “Weight” as the “Analysis variable”. Click on the Filter icon and the Type = “Sedan” filter by typing it into the text
    box. On the OPTIONS tab, make sure “Tails:” is “Two-tailed test” and the “Tests for normality” box is checked. Run
    the tasks. Scroll down the input to you find the “Tests for Normality” table and review the “p Value” for the
    “Shapiro-Wilk” statistic. If this value is greater than alpha (α = 0.05) then, accept the assumption that the data came
    from a normally distributed population. If the Shapiro-Wilk p Value is less than α, then you need to consider if your
    assumption of normality is valid.
    In our example, the Shapiro-Wilk p Value is 0.0113 which is less than 0.05, which would mean that we should reject
    the null hypothesis that the population that Weight came from was normally distributed. However, Shapiro-Wilk is
    very conservative, and ANOVA is robust against the assumption of normality given a large enough data set and the
    group sizes are approximately equal. For our example: Asia n=94, Europe n=78, and USA n=90. That means in our
    case, we can consider our population as nearly normal in distribution and proceed with our analysis.
    If there no other concerns from the examination of the distribution of the data, then you can consider the
    assumption of normality to have been met.
    Data Analysis Exercise 4 (DAX4)
    ANOVA SAS and Excel
    5
  12. Test for homogeneity of variance: We will examine this assumption by running the ANOVA task, alternatively
    you can run the “Distribution Analysis” task, but it is more involved and most of the information we will need
    can be found in the OUTPUT from the ANOVA Diagnostic Plots. LEVENE’s Test. This will be included in the
    Interpreting ANOVA Results Video.
    Running an ANOVA in SAS
    Running an ANOVA in SAS
    Interpreting the ANOVA OUTPUT
  13. Select the “One-Way ANOVA” task.
  14. Perform an ANOVA on your filtered dataset.
  15. Select data using looking at “Weight” as the Dependent variable and the “Origin” variable as providing the
    categories.
  16. Run the task (F3). Is the mean Weight statistically different for the different values of “Origin” in the filtered
    (Sedan) data set? Report your findings in the Business Memo.
  17. Download the results to use for the MEMO
    Running an ANOVA in Excel (Video)
  18. Open the Seda data file in Excel.
  19. Prepare the data for analysis in Excel (Excel requires the data be in columns), by putting the Origin and Weight
    data into a separate sheet. Rename the sheet “ANOVA Data”. You will have 3 columns with each origin as the
    header and the weights for that origin underneath.
    Data Analysis Exercise 4 (DAX4)
    ANOVA SAS and Excel
    6
  20. Select the single factor ANOVA in data analysis and select all the records from the 3 columns for your data
    range.
  21. Run the analysis and save the output as you will need it for the memo.
  22. Write and upload your memo using the structure and the questions provided in the memo template.