hw2 110590049
tags data
2023 Educational Data Mining and Applications HW2.pdf
6.3
a
- Support(L) ≥ minimum support threshold.
- Since S is a subset of L, it means that any transaction containing all items in S also contains all items in L.
- Support(S) ≥ Support(L) (because S includes all transactions that L includes)
- Therefore, Support(S) ≥ minimum support threshold.
- This proves that S is also a frequent itemset.
b
- S’ ⊆ S
- {T ∈ D | S ⊆ T} = Support(S) , all possible set T that is the superset of S
- {T ∈ D | S ⊆ T} ⊆ {T ∈ D | S’ ⊆ T}
- Support(S) ≤ Support(S’)
c
- Confidence(X => Y) = Support(X ∪ Y) / Support(X).
- Confidence(s => {l-s}) = Support(s ∪ {l-s}) / Support({l-s}).
- Confidence(s’ => {l-s’}) = Support(s’ ∪ {l-s’}) / Support({l-s’}).
- Support(s’ ∪ (l - s’)) ≤ Support(s ∪ (l - s))
- Support(s’) ≤ Support(s)
- Confidence(s’ => (l - s’)) ≤ Confidence(s => (l - s))
d
- If I is frequent in D, it must have a support count greater than or equal to the minimum support count threshold (minsup) for frequent itemsets in D.
- If I is not frequent in any partition Pi, it must have a support count less than the minsup in each partition.
- Any itemset that is frequent in the original database D must be frequent in at least one partition of D.
6.6
min_support=0.6
min_confi=0.8
| ID | Items |
|---|---|
| T100 | {M, O, N, K, E, Y} |
| T200 | {D, O, N, K, E, Y} |
| T300 | {M, A, K, E} |
| T400 | {M, U, C, K, Y} |
| T500 | {C, O, O, K, I, E} |
a
Apriori algorithm
| 1-Itemsets | 2-Itemsets | 3-Itemsets | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
FP-growth algorithm
| itemsets | condition | support 0.6 itemsets | frequent itemsets |
|---|---|---|---|
| e | {k:4} | {k:4} | |
| m | {e,k:2},{k:1} | {k:3} | |
| o | {k,e,m:1},{k,e:2} | {k,3}{e:3} | |
| y | {k,e,m:1},{k,e,o:1},{k,m:1} | {k:3} |
conclusion
By query 2 time to build FP-tree reduce the time to query database .So FP-growth is more efficient compared to a priori.
b
| frequent itemsets | support | Confidence |
|---|---|---|
| 0.6 | 1.0 | |
| 0.6 | 1.0 |
6.14
| hot | dogs | !(hot dogs) | total |
|---|---|---|---|
| hamburgers | 2000 | 500 | 2500 |
| !(hamburgers) | 1000 | 1500 | 2500 |
| Total | 3000 | 2000 | 5000 |
a
b
c