Computer science

Part A: (7 marks)

Introduction:
K-Nearest Neighbor (KNN) is a supervised learning algorithm where the result of new instance query is classified based on majority of K-nearest neighbor category. The purpose of this algorithm is to classify a new object based on attributes and training samples. Indeed, KNN used neighborhood classification as the prediction value of the new query instance.

The following data classifying the Power Saving Lights by their economical feasibility as Preserver or Wasteful
We consider 2 factors for classifying:
X1: Lightning Duration
X2: Power Consuming
We suppose use the number of nearest neighbor’s k = 2.
The following data presents six training samples, using the KNN algorithm, classify the last sample as Preserver or Wasteful assuming that X1 = 10 and X2 = 500

X1: Lightning Duration (Hours) X2: Power Consuming (Watts) Y: Classification
6 900 Wasteful
2 150 Wasteful
5 600 Wasteful
3 80 Preserver
4 200 Wasteful
2 60 Preserver
10 500 ???????
Table 1: Training data

1). 1). Calculate the Euclidian distance between the query-instance and all the training samples. Insert values in table 2 and provide detail of calculus. (1.5 marks, 0.25 for each value)

X1: Lightning Duration (Hours) X2: Power Consuming (Watts) Euclidian distance to the query-instance (10, 500)
6 900
2 150
5 600
3 80
4 200
2 60
Table 2: Euclidian distance between the query-instance and all the training samples
……………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………

2). 2). Sort the distance and determine nearest neighbors based on the k-th minimum distance. Insert values in table 3. (3 marks, 0.25 for each value)

X1: Lightning Duration (Hours) X2: Power Consuming (Watts) Euclidian distance to the query-instance (10, 500) Rank minimum distance Is it included in 2-nearest neighbors?
6 900
2 150
5 600
3 80
4 200
2 60
Table 3: Section of the 2-nearest neighbors

3) Gather the category Y of the nearest neighbors. Insert values in table 4 and justify your response.

Answer:
(1.5 marks, 0.25 for each value)
X1: Lightning Duration (Hours) X2: Power Consuming (Watts) Euclidian distance to the query-instance (10, 500) Rank minimum distance Is it included in 2-nearest neighbors? Y= category of nearest neighbor
6 900
2 150
5 600
3 80
4 200
2 60

Table 4: Categories of the 2-nearest neighbors
…………………………………………………………………………………………………………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………………………………………………………………………………………………………

4) Use simple majority of the category of nearest neighbors as the prediction value of the query instance. (2 marks)

…………………………………………………………………………………………………………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………………………………………………………………………………………………………