hw3 110590049

tags data

2023 Educational Data Mining and Applications HW3.pdf

8.11

8.12

samplethreshold sample
tupleclassprobability
1P0.95
2N0.85
3P0.78
4P0.66
5N0.60
6P0.55
7N0.53
8N0.52
9N0.51
10P0.40
thresholdsTPFPTNFNFPRTPR
0.4055001.01.0
0.5145011.00.8
0.5244110.80.8
0.5343210.60.8
0.5542310.40.8
0.6032320.40.6
0.6631420.20.6
0.7821430.20.4
0.8511440.20.2
0.9510540.00.2
1.0000550.00.0

ROC

8.16

change the traing dataset to balance by oversampleing the fraudulent cases or undersampling nonfraudulent cases.

by threshold-moving to reduce the error chance on majority case

9.4

Eager ClassificationLazy Classification
AdvantageBetter interpretability
Better efficiency
Robust to Noise
DisadvantageRobust to Noise
Need for re-training when have new data
Vulnerability to irrelevant features
Limited interpretability

9.5

def distance(a,b):
    return sum([abs(a[i]-b[i]) for i in range(len(a))])
    
def KNN(input_data,k,dataset,answer):
    distances=[]
    i=0
    for data in dataset :
        distances.append({
            "distance":distance(input_data,data),
            "answer":answer[i]
        })
        i+=1
    nesrest=sorted(distances,key=lambda x:x["distance"])[:k]
    counter={key:0 for key in answer}
    for x in nesrest:
        counter[x["answer"]]+=1
    predict={key:counter[key]/k for key in counter}
    return predict
data=[[1,2,3],[0,-1,0],[1,4,4],[1,3,4]]
answer=["a","a","b","b"]
input_data=[0,0,0]
k=3
print(KNN(input_data,k,data,answer))