hw2 110590049

tags data

2023 Educational Data Mining and Applications HW2.pdf

6.3

a

  1. Support(L) ≥ minimum support threshold.
  2. Since S is a subset of L, it means that any transaction containing all items in S also contains all items in L.
  3. Support(S) ≥ Support(L) (because S includes all transactions that L includes)
  4. Therefore, Support(S) ≥ minimum support threshold.
  5. This proves that S is also a frequent itemset.

b

  1. S’ ⊆ S
  2. {T ∈ D | S ⊆ T} = Support(S) , all possible set T that is the superset of S
  3. {T ∈ D | S ⊆ T} ⊆ {T ∈ D | S’ ⊆ T}
  4. Support(S) ≤ Support(S’)

c

  1. Confidence(X => Y) = Support(X ∪ Y) / Support(X).
  2. Confidence(s => {l-s}) = Support(s ∪ {l-s}) / Support({l-s}).
  3. Confidence(s’ => {l-s’}) = Support(s’ ∪ {l-s’}) / Support({l-s’}).
  4. Support(s’ ∪ (l - s’)) ≤ Support(s ∪ (l - s))
  5. Support(s’) ≤ Support(s)
  6. Confidence(s’ => (l - s’)) ≤ Confidence(s => (l - s))

d

  1. If I is frequent in D, it must have a support count greater than or equal to the minimum support count threshold (minsup) for frequent itemsets in D.
  2. If I is not frequent in any partition Pi, it must have a support count less than the minsup in each partition.
  3. Any itemset that is frequent in the original database D must be frequent in at least one partition of D.

6.6

min_support=0.6
min_confi=0.8

IDItems
T100{M, O, N, K, E, Y}
T200{D, O, N, K, E, Y}
T300{M, A, K, E}
T400{M, U, C, K, Y}
T500{C, O, O, K, I, E}

a

Apriori algorithm

1-Itemsets2-Itemsets3-Itemsets
ItemSupport
K1.0
E0.8
Y0.6
M0.6
O0.6
C0.4
N0.4
D0.1
A0.1
U0.1
I0.1
ItemSupport
{K,E}0.8
{K,Y}0.6
{K,M}0.6
{K,O}0.6
{E,Y}0.4
{E,M}0.4
{E,O}0.6
{Y,M}0.4
{Y,O}0.2
{M,O}0.2
ItemSupport
{K,E,Y}0.4
{K,E,M}0.4
{K,E,O}0.6
{K,Y,M}0.4
{K,Y,O}0.4
{K,M,O}0.2
{K,M,E,O}0.2

FP-growth algorithm

itemsetsconditionsupport 0.6 itemsetsfrequent itemsets
e{k:4}{k:4}
m{e,k:2},{k:1}{k:3}
o{k,e,m:1},{k,e:2}{k,3}{e:3}
y{k,e,m:1},{k,e,o:1},{k,m:1}{k:3}

conclusion

By query 2 time to build FP-tree reduce the time to query database .So FP-growth is more efficient compared to a priori.

b

frequent itemsetssupportConfidence
0.61.0
0.61.0

6.14

hotdogs!(hot dogs)total
hamburgers20005002500
!(hamburgers)100015002500
Total300020005000

a

b

c