1) (20 points) Apply linear regression on both “diabetes” and “advertising” datasets and write a short paragraph about your findings.
2) (20 points) What is the linear regression model for each case?
Logistic Regression
1) (30 points) Use “bank” dataset and apply the logistic regression technique using the WEKA tool in 3 different settings, including:
a. 5 fold-cross validation.
b. 10 fold-cross validation.
c. 80% training.
Write a short paragraph about your findings and compare the results.
2) (30 points) Remove 3 features randomly from the dataset and repeat the same procedure (part 1) again.
Sample Answer
Linear Regression Analysis
1. Analysis of Findings
For the "diabetes" and "advertising" datasets, linear regression is used to model the relationship between a dependent variable and one or more independent variables.
Diabetes Dataset: In this dataset, linear regression would likely be applied to predict a quantitative measure of disease progression based on factors like age, sex, body mass index, and blood pressure. The findings would likely reveal that certain variables (e.g., BMI) have a stronger positive or negative correlation with disease progression than others. The model's coefficients would indicate the magnitude and direction of these relationships. For example, a positive coefficient for BMI would suggest that as BMI increases, so does disease progression. The model's R-squared value would tell us what percentage of the variance in disease progression is explained by our input variables.