Worksheet Correlation and Regression
Solve the Statistical question pro
Janssenet al. (2007) studied the relationships between a variety of abiotic factors and benthic invertebrate abundance at sites on beaches along the Dutch coast. One
of these abiotic factors was the relative height of the site in relationship to the average sea level of the area (NAP). Positive values of NAP indicate sites that are
higher than the average sea level, whereas negative values indicate sites that are below the average sea level.
The data are in the file sle251dutch.csv and the relevant variables are the response variable, richness (richness of invertebrate species), and the predictor variable,
NAP (relative height of the site in relationship to the average sea level of the area).
Format of sle251dutch.csv data file
Site NAP richness
1 0.045 11
2 -1.036 10
3 -1.336 13
4 0.616 11
5 -0.684 10
.. .. ..
Site The number of the site where the samples were collected
NAP Relative height of the site in relationship to the average sea level of the area
richness Richness of invertebrate species
a) Janssenet al. (1996) were interested in modeling the linear relationship between invertebrate richness (response) and the relative height of the site in
relationship to the average sea level (predictor). List the following:
The biological inference of interest
The biological null hypothesis derived from above
The statistical null hypothesis (H0) derived from above
b) Draw a scatterplot of NAP against richness. Draw boxplots for each variable as well. Any evidence of skewness in the distributions or nonlinearity?
To create scatterplot in R
Select x-variable (NAP) and y-variable (richness)
Check Marginal boxplots and Least-squares line
Unselect Smooth line and show spread
c) Fit the regression model richness = intercept + slope x NAP.
To fit linear regression and create an ANOVA table in R
You can enter a name for the results object (Enter name for model:) but its simplest to just use the name that R provides.
Select richnessfrom Response variablelist
Select NAPfrom Explanatory variables list.
Select Partial, ignoring marginality (“Type III”).
Examine the regression output and identify and interpret the following:
Value (estimate in the R output):
Slope of regression line (NAP)
Value(estimate in the R output):
t statistic for main H0 (regression slope equals zero)
P-value for main H0 (regression slope equals zero)
r2 value (multiple R-squared)
d) Complete the following ANOVA table from the regression analysis
Source of variation SS df MS F ratio
Note: To get the MS values from the output – remember to divide the SS value by the df.
e) What conclusions would you draw from the regression analysis (statistical and biological)?
f) What invertebrate richness would you predict for a new site with an NAP of -2? Simply plug -2 into your regression equation and calculate predicted richness.