Do an analysis on the same dataset in Excel and SAS, and then report on our analysis. The data

that we will use for this analysis is called SASHELP.CARS, and the analysis we will do is an ANOVA. I have explored the

dataset and have developed a research question: Is there a statistically significant difference in the weight of sedans

based on their origin? To find the answer to this question, I am going to analyze the dataset and perform an ANOVA

comparing the sample means of the weight variable by continent of origin (USA, Asia, or Europe).

An ANOVA is a set of statistical methods used mainly to compare the means of two or more samples. Estimates of

variance are the key intermediate statistics calculated, hence the reference to variance in the title ANOVA.1

In testing the null hypothesis for a One-Way ANOVA, we generally follow these steps:

- State the plain language research question: Is the average weight of sedans in this dataset statistically similar

regardless of origin? - State the hypotheses:

• Null hypothesis – HO: µAsia = µEurope = µUSA

• Alternative hypothesis – HA: Not all means are equal. - State the criteria for rejecting HO:

• α = 0.05 - Test the assumptions for the One-Way ANOVA:

• Assumption of Independence (Research Design)

• Assumption of Normality (Shapiro-Wilk Pr<W, and examination of graphs and charts)

• Assumption of Homogeneity of Variance (Levene’s Test of Homogeneity) - Compute the Test Statistic:

• Calculate the F ratio - Decide whether to retain or reject your null hypothesis:

• If p > α, then retain the null hypothesis

• If p < α, then reject the null hypothesis, and accept the alternative hypothesis - Examine post tests for which categories are different.
- Download “Business Memo Template DAX4.docx” from the DAX4 Assignment.
- Save the downloaded file as a Word file and name it “LastName DAX4” without the quotation marks. The file

extension will be added automatically. For example, my file would be “Bohler DAX4.docx”. Again, ignore the

quotation marks in this part of the instructions. - Edit the date, from, and to fields of the memo to have the correct information.
- For the subject area, change the title to something like “Examination of Sedan Weight by Origin” or something

similar.

1 http://www.statsdirect.com/help/analysis_of_variance/anova.htm

Data Analysis Exercise 4 (DAX4)

ANOVA SAS and Excel

2

- Note that paragraph 1 has been completed for you. In general, the statistical questions would not likely be

included in a business memo but are in this one to show what you are testing.

• Null hypothesis – HO: µAsia = µEurope = µUSA

• Alternative hypothesis – HA: Not all means are equal.

Formatting/Filtering/Exploring the DATA and Testing Assumptions

Creating Filtered Dataset Video

Summary Statistics With Classification Video - In paragraph 2, Discuss the steps you took to format/filter the data and test assumptions. The following steps

will help you fill out paragraph 2. - Open SAS.
- Select Task-Data-Filter Data and use the following settings for saving your dataset. (YOU WILL HAVE TO USE THE

LIBRARY NAME THAT YOU CREATED SO YOUR DATA SET NAME WILL BE DIFFERENT!!!!) - Run the filter data and go to your library to make sure that the dataset was created.

Data Analysis Exercise 4 (DAX4)

ANOVA SAS and Excel

3 - Run the summary statistics for weight with origin as a classification variable. Select the statistic options as

previously run in DAX 2 and select all the options for plots. What are the values for “Origin”? What is the N for

each “Origin” and are they approximately even? What is the N, Minimum, Mean, Maximum, and Standard

Deviation of Weight for the dataset? Are there any missing values? Add any appropriate charts and tables to

your Business Memo. - Run the summary statistics for weight without the classification.
- Use the output tables from both sets of statistics to create Table 1 (Similar to below w/o the watermark).

Origin N N Miss Mean Std Dev Minimum Maximum Range Skewness Kurtosis

Asia 94 0 3,161.37 584.29 2,035 4,802 2,767 0.23 (0.43)

Europe 78 0 3,613.77 482.45 2,524 5,399 2,875 0.95 2.81

USA 90 0 3,461.24 566.60 2,348 4,605 2,257 (0.02) (0.83)

Total 262 0 3,399.06 579.00 2,035 5,399 3,364 0.11 0.07

Data Analysis Exercise 4 (DAX4)

ANOVA SAS and Excel

4 - To perform an ANOVA, we need to make a few assumptions:

a. Independence: The observations are random and independent samples from the population.

b. Normality: Each group sample is drawn from a normally distributed population (errors are normally

distributed).

c. Homogeneity of variance: The variances of the distributions in the populations are equal. - Test for Independence: This is a methodological concern and is determined by the set up for the study. For this

assignment, please consider that the assumption of independence has been met in the research study design.

You should still remark that there are several records in a group and if the group sizes are approximately the

same. - Test for Normality:

a. Is Skewness more than ± 0.8? If yes, that is a concern.

b. Is Kurtosis more than ± 3.0? If yes, that is a concern.

c. Do the histograms appear normal (more or less)? If they do not approximate the normal distribution

curve, that could be an issue that indicates that the normality assumption for an ANOVA is invalid.

d. Look at the boxplots. Are there any extreme outliers?

e. If your data does not violate these rules of thumb, you can accept the ANOVA assumptions have been

satisfied for this analysis.

Additional test for normality

(Optional for this exercise)

Open the “t Test” task, and on the DATA tab, select the SASHELP.CARS dataset, One-sample test, and select

“Weight” as the “Analysis variable”. Click on the Filter icon and the Type = “Sedan” filter by typing it into the text

box. On the OPTIONS tab, make sure “Tails:” is “Two-tailed test” and the “Tests for normality” box is checked. Run

the tasks. Scroll down the input to you find the “Tests for Normality” table and review the “p Value” for the

“Shapiro-Wilk” statistic. If this value is greater than alpha (α = 0.05) then, accept the assumption that the data came

from a normally distributed population. If the Shapiro-Wilk p Value is less than α, then you need to consider if your

assumption of normality is valid.

In our example, the Shapiro-Wilk p Value is 0.0113 which is less than 0.05, which would mean that we should reject

the null hypothesis that the population that Weight came from was normally distributed. However, Shapiro-Wilk is

very conservative, and ANOVA is robust against the assumption of normality given a large enough data set and the

group sizes are approximately equal. For our example: Asia n=94, Europe n=78, and USA n=90. That means in our

case, we can consider our population as nearly normal in distribution and proceed with our analysis.

If there no other concerns from the examination of the distribution of the data, then you can consider the

assumption of normality to have been met.

Data Analysis Exercise 4 (DAX4)

ANOVA SAS and Excel

5 - Test for homogeneity of variance: We will examine this assumption by running the ANOVA task, alternatively

you can run the “Distribution Analysis” task, but it is more involved and most of the information we will need

can be found in the OUTPUT from the ANOVA Diagnostic Plots. LEVENE’s Test. This will be included in the

Interpreting ANOVA Results Video.

Running an ANOVA in SAS

Running an ANOVA in SAS

Interpreting the ANOVA OUTPUT - Select the “One-Way ANOVA” task.
- Perform an ANOVA on your filtered dataset.
- Select data using looking at “Weight” as the Dependent variable and the “Origin” variable as providing the

categories. - Run the task (F3). Is the mean Weight statistically different for the different values of “Origin” in the filtered

(Sedan) data set? Report your findings in the Business Memo. - Download the results to use for the MEMO

Running an ANOVA in Excel (Video) - Open the Seda data file in Excel.
- Prepare the data for analysis in Excel (Excel requires the data be in columns), by putting the Origin and Weight

data into a separate sheet. Rename the sheet “ANOVA Data”. You will have 3 columns with each origin as the

header and the weights for that origin underneath.

Data Analysis Exercise 4 (DAX4)

ANOVA SAS and Excel

6 - Select the single factor ANOVA in data analysis and select all the records from the 3 columns for your data

range. - Run the analysis and save the output as you will need it for the memo.
- Write and upload your memo using the structure and the questions provided in the memo template.