Collecting Data for a group of students in a statistics
Full Answer Section
Plug in the values:
p(x) = e^(-7 + (0.06 * 50) + (1 * 3.5)) / (1 + e^(-7 + (0.06 * 50) + (1 * 3.5))) ≈ 0.997
The student has a very high estimated probability (almost certain) of getting an A with these values.
(2) Hours Needed for 50% Chance of A with GPA 3.4
We can use the log-odds formula and solve for X₁ (hours studied):
log(p(x)) = β₀ + β₁X₁ + β₂X₂
log(0.5) = -7 + 0.06X₁ + (1 * 3.4) // p(x) is 50% chance of A
X₁ ≈ 81.6 hours
The student with a GPA of 3.4 would need to study approximately 81.6 hours to have a 50% chance of getting an A.
**2. Weekly Data Analysis (Note: R code snippets are included for reference, but results may vary slightly depending on software version or environment.)
(3) Summary and Scatterplots
library(ISLR)
summary(Weekly)
pairs(Weekly)
- Summary: This provides basic statistics like mean, median, minimum, and maximum for each variable.
- Scatterplots: These visualize pairwise relationships between variables.
Look for correlations between Year and Volume in the summary output and scatterplot matrix.
(4) Logistic Regression with Lags and Volume
model <- glm(Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Volume, data = Weekly)
summary(model)
Examine the p-values in the summary output. Statistically significant predictors will have low p-values (typically < 0.05).
(5) Confusion Matrix and Evaluation
predictions <- predict(model, type = "response")
cut_off <- 0.5 # Threshold for classifying up/down
cm <- table(Weekly$Direction, predictions > cut_off)
accuracy <- sum(diag(cm)) / sum(cm)
precision <- cm[1, 1] / (cm[1, 1] + cm[1, 2])
recall <- cm[1, 1] / (cm[1, 1] + cm[2, 1])
print(cm)
cat("Accuracy:", accuracy, "
")
cat("Precision:", precision, "
")
cat("Recall:", recall, "
")
- Confusion Matrix: This shows how well the model classified up/down movements (actual vs. predicted).
- Accuracy: Proportion of correctly predicted observations.
- Precision: Proportion of true positives among predicted positives.
- Recall: Proportion of true positives identified by the model.
(6) Logistic Regression with Lag 2 for Held-Out Data (2010)
train_data <- Weekly[Weekly$Year < 2010, ]
test_data <- Weekly[Weekly$Year == 2010, ]
model_lag2 <- glm(Direction ~ Lag2, data = train_data)
predictions_lag2 <- predict(model_lag2, newdata = test_data, type = "response")
cm_lag2 <- table(test_data$Direction, predictions_lag2 > cut_off)
accuracy_lag2 <- sum(diag(cm_lag2)) / sum(cm_lag2)
precision_lag2 <- cm_lag2[1, 1] / (cm_lag2[1, 1] + cm_
Sample Solution
1. Logistic Regression Calculations
(1) Probability of an A with 50 Hours Studied and 3.5 GPA
We can estimate the probability (p(x)) using the logistic regression formula:
p(x) = e^(β₀ + β₁X₁ + β₂X₂) / (1 + e^(β₀ + β₁X₁ + β₂X₂))
where:
- β₀ = -7 (estimated coefficient)
- β₁ = 0.06 (estimated coefficient)
- β₂ = 1 (estimated coefficient)
- X₁ = 50 (hours studied)
- X₂ = 3.5 (undergrad GPA)