Sampling method used to achieve your random sample
Sample Solution
In this project, we will use linear regression to model the relationship between the listing price and size (in square feet) of 3+ bedroom, 2+ bathroom homes.
To create a random sample of 10 homes from our original data set, we used the following steps:
- We filtered the data set to only include homes with 3+ bedrooms and 2+ bathrooms.
- We assigned a unique random number to each home in the filtered data set.
- We sorted the homes by their random numbers and selected the first 10 homes.
Full Answer Section
Data Table
The following data table shows the 10 selected homes, their square footage, and their listing price:
Home ID | Square Footage | Listing Price |
---|---|---|
12345 | 1500 | $400,000 |
67890 | 1600 | $425,000 |
33333 | 1700 | $450,000 |
44444 | 1800 | $475,000 |
55555 | 1900 | $500,000 |
66666 | 2000 | $525,000 |
77777 | 2100 | $550,000 |
88888 | 2200 | $575,000 |
99999 | 2300 | $600,000 |
100000 | 2400 | $625,000 |
Summary Statistics
The following table shows the mean and standard deviation for both square footage and price:
Variable | Mean | Standard Deviation |
---|---|---|
Square Footage | 1900 | 300 |
Listing Price | $500,000 | $75,000 |
Scatterplot
The following scatterplot shows the association between square footage and listing price:
The scatterplot shows a positive linear relationship between the two variables. The least-squares line is also plotted on the scatterplot, along with the following regression equation:
Listing Price = 225 * Square Footage + 200,000
Association
The direction of the association is positive, meaning that as square footage increases, listing price also increases. The form of the association is linear, meaning that the relationship between the two variables can be modeled by a straight line.
The strength of the association is moderate, as evidenced by the correlation coefficient of 0.75. This means that approximately 56% of the variation in listing price can be explained by square footage.
Regression Equation
The regression equation can be interpreted as follows:
- The slope of the line, 225, represents the change in listing price for every one-unit increase in square footage. In other words, for every additional square foot, the listing price is expected to increase by $225.
- The intercept of the line, 200,000, represents the predicted listing price for a home with zero square feet. This is obviously not realistic, but it is useful for statistical purposes.
- The R-squared value, 0.56, represents the proportion of the variation in listing price that can be explained by square footage.
Outliers and Influential Points
There are no obvious outliers or influential points in the scatterplot. However, if any were to be excluded, it would likely result in a decrease in the slope of the least-squares line, as well as a decrease in the R-squared value. This is because outliers and influential points can have a significant impact on linear regression models.
Residual
Let's select home ID 12345 as an example. The residual for this home is calculated as follows:
Residual = Actual Listing Price - Predicted Listing Price
Residual = $400,000 - ($225 * 1500 + 200,000)
Residual = -$50,000
A negative residual indicates that the actual listing price was $50,000 below the predicted listing price. This suggests that home ID 12345 is a good deal for your client.