# Data Analysis Exercise 4 (DAX4)

Do an analysis on the same dataset in Excel and SAS, and then report on our analysis. The data
that we will use for this analysis is called SASHELP.CARS, and the analysis we will do is an ANOVA. I have explored the
dataset and have developed a research question: Is there a statistically significant difference in the weight of sedans
based on their origin? To find the answer to this question, I am going to analyze the dataset and perform an ANOVA
comparing the sample means of the weight variable by continent of origin (USA, Asia, or Europe).
An ANOVA is a set of statistical methods used mainly to compare the means of two or more samples. Estimates of
variance are the key intermediate statistics calculated, hence the reference to variance in the title ANOVA.1
In testing the null hypothesis for a One-Way ANOVA, we generally follow these steps:

1. State the plain language research question: Is the average weight of sedans in this dataset statistically similar
regardless of origin?
2. State the hypotheses:
• Null hypothesis – HO: µAsia = µEurope = µUSA
• Alternative hypothesis – HA: Not all means are equal.
3. State the criteria for rejecting HO:
• α = 0.05
4. Test the assumptions for the One-Way ANOVA:
• Assumption of Independence (Research Design)
• Assumption of Normality (Shapiro-Wilk Pr<W, and examination of graphs and charts)
• Assumption of Homogeneity of Variance (Levene’s Test of Homogeneity)
5. Compute the Test Statistic:
• Calculate the F ratio
6. Decide whether to retain or reject your null hypothesis:
• If p > α, then retain the null hypothesis
• If p < α, then reject the null hypothesis, and accept the alternative hypothesis
7. Examine post tests for which categories are different.
9. Save the downloaded file as a Word file and name it “LastName DAX4” without the quotation marks. The file
extension will be added automatically. For example, my file would be “Bohler DAX4.docx”. Again, ignore the
quotation marks in this part of the instructions.
10. Edit the date, from, and to fields of the memo to have the correct information.
11. For the subject area, change the title to something like “Examination of Sedan Weight by Origin” or something
similar.

1 http://www.statsdirect.com/help/analysis_of_variance/anova.htm
Data Analysis Exercise 4 (DAX4)
ANOVA SAS and Excel
2

1. Note that paragraph 1 has been completed for you. In general, the statistical questions would not likely be
included in a business memo but are in this one to show what you are testing.
• Null hypothesis – HO: µAsia = µEurope = µUSA
• Alternative hypothesis – HA: Not all means are equal.
Formatting/Filtering/Exploring the DATA and Testing Assumptions
Creating Filtered Dataset Video
Summary Statistics With Classification Video
2. In paragraph 2, Discuss the steps you took to format/filter the data and test assumptions. The following steps
3. Open SAS.
4. Select Task-Data-Filter Data and use the following settings for saving your dataset. (YOU WILL HAVE TO USE THE
LIBRARY NAME THAT YOU CREATED SO YOUR DATA SET NAME WILL BE DIFFERENT!!!!)
5. Run the filter data and go to your library to make sure that the dataset was created.
Data Analysis Exercise 4 (DAX4)
ANOVA SAS and Excel
3
6. Run the summary statistics for weight with origin as a classification variable. Select the statistic options as
previously run in DAX 2 and select all the options for plots. What are the values for “Origin”? What is the N for
each “Origin” and are they approximately even? What is the N, Minimum, Mean, Maximum, and Standard
Deviation of Weight for the dataset? Are there any missing values? Add any appropriate charts and tables to
7. Run the summary statistics for weight without the classification.
8. Use the output tables from both sets of statistics to create Table 1 (Similar to below w/o the watermark).
Origin N N Miss Mean Std Dev Minimum Maximum Range Skewness Kurtosis
Asia 94 0 3,161.37 584.29 2,035 4,802 2,767 0.23 (0.43)
Europe 78 0 3,613.77 482.45 2,524 5,399 2,875 0.95 2.81
USA 90 0 3,461.24 566.60 2,348 4,605 2,257 (0.02) (0.83)
Total 262 0 3,399.06 579.00 2,035 5,399 3,364 0.11 0.07
Data Analysis Exercise 4 (DAX4)
ANOVA SAS and Excel
4
9. To perform an ANOVA, we need to make a few assumptions:
a. Independence: The observations are random and independent samples from the population.
b. Normality: Each group sample is drawn from a normally distributed population (errors are normally
distributed).
c. Homogeneity of variance: The variances of the distributions in the populations are equal.
10. Test for Independence: This is a methodological concern and is determined by the set up for the study. For this
assignment, please consider that the assumption of independence has been met in the research study design.
You should still remark that there are several records in a group and if the group sizes are approximately the
same.
11. Test for Normality:
a. Is Skewness more than ± 0.8? If yes, that is a concern.
b. Is Kurtosis more than ± 3.0? If yes, that is a concern.
c. Do the histograms appear normal (more or less)? If they do not approximate the normal distribution
curve, that could be an issue that indicates that the normality assumption for an ANOVA is invalid.
d. Look at the boxplots. Are there any extreme outliers?
e. If your data does not violate these rules of thumb, you can accept the ANOVA assumptions have been
satisfied for this analysis.
(Optional for this exercise)
Open the “t Test” task, and on the DATA tab, select the SASHELP.CARS dataset, One-sample test, and select
“Weight” as the “Analysis variable”. Click on the Filter icon and the Type = “Sedan” filter by typing it into the text
box. On the OPTIONS tab, make sure “Tails:” is “Two-tailed test” and the “Tests for normality” box is checked. Run
the tasks. Scroll down the input to you find the “Tests for Normality” table and review the “p Value” for the
“Shapiro-Wilk” statistic. If this value is greater than alpha (α = 0.05) then, accept the assumption that the data came
from a normally distributed population. If the Shapiro-Wilk p Value is less than α, then you need to consider if your
assumption of normality is valid.
In our example, the Shapiro-Wilk p Value is 0.0113 which is less than 0.05, which would mean that we should reject
the null hypothesis that the population that Weight came from was normally distributed. However, Shapiro-Wilk is
very conservative, and ANOVA is robust against the assumption of normality given a large enough data set and the
group sizes are approximately equal. For our example: Asia n=94, Europe n=78, and USA n=90. That means in our
case, we can consider our population as nearly normal in distribution and proceed with our analysis.
If there no other concerns from the examination of the distribution of the data, then you can consider the
assumption of normality to have been met.
Data Analysis Exercise 4 (DAX4)
ANOVA SAS and Excel
5
12. Test for homogeneity of variance: We will examine this assumption by running the ANOVA task, alternatively
you can run the “Distribution Analysis” task, but it is more involved and most of the information we will need
can be found in the OUTPUT from the ANOVA Diagnostic Plots. LEVENE’s Test. This will be included in the
Interpreting ANOVA Results Video.
Running an ANOVA in SAS
Running an ANOVA in SAS
Interpreting the ANOVA OUTPUT
13. Select the “One-Way ANOVA” task.
14. Perform an ANOVA on your filtered dataset.
15. Select data using looking at “Weight” as the Dependent variable and the “Origin” variable as providing the
categories.
16. Run the task (F3). Is the mean Weight statistically different for the different values of “Origin” in the filtered 