hw3 110590049
tags data
2023 Educational Data Mining and Applications HW3.pdf
8.11
8.12
| sample | threshold sample | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
ROC
8.16
change the traing dataset to balance by oversampleing the fraudulent cases or undersampling nonfraudulent cases.
by threshold-moving to reduce the error chance on majority case
9.4
| Eager Classification | Lazy Classification | |
|---|---|---|
| Advantage | Better interpretability Better efficiency | Robust to Noise |
| Disadvantage | Robust to Noise Need for re-training when have new data | Vulnerability to irrelevant features Limited interpretability |
9.5
def distance(a,b):
return sum([abs(a[i]-b[i]) for i in range(len(a))])
def KNN(input_data,k,dataset,answer):
distances=[]
i=0
for data in dataset :
distances.append({
"distance":distance(input_data,data),
"answer":answer[i]
})
i+=1
nesrest=sorted(distances,key=lambda x:x["distance"])[:k]
counter={key:0 for key in answer}
for x in nesrest:
counter[x["answer"]]+=1
predict={key:counter[key]/k for key in counter}
return predict
data=[[1,2,3],[0,-1,0],[1,4,4],[1,3,4]]
answer=["a","a","b","b"]
input_data=[0,0,0]
k=3
print(KNN(input_data,k,data,answer))