hw2 110590049

tags data

Support(L) ≥ minimum support threshold.
Since S is a subset of L, it means that any transaction containing all items in S also contains all items in L.
Support(S) ≥ Support(L) (because S includes all transactions that L includes)
Therefore, Support(S) ≥ minimum support threshold.
This proves that S is also a frequent itemset.

b

S’ ⊆ S
{T ∈ D | S ⊆ T} = Support(S) , all possible set T that is the superset of S
{T ∈ D | S ⊆ T} ⊆ {T ∈ D | S’ ⊆ T}
Support(S) ≤ Support(S’)

c

Confidence(X => Y) = Support(X ∪ Y) / Support(X).
Confidence(s => {l-s}) = Support(s ∪ {l-s}) / Support({l-s}).
Confidence(s’ => {l-s’}) = Support(s’ ∪ {l-s’}) / Support({l-s’}).
Support(s’ ∪ (l - s’)) ≤ Support(s ∪ (l - s))
Support(s’) ≤ Support(s)
Confidence(s’ => (l - s’)) ≤ Confidence(s => (l - s))

d

If I is frequent in D, it must have a support count greater than or equal to the minimum support count threshold (minsup) for frequent itemsets in D.
If I is not frequent in any partition Pi, it must have a support count less than the minsup in each partition.
Any itemset that is frequent in the original database D must be frequent in at least one partition of D.

6.6

min_support=0.6
min_confi=0.8

ID	Items
T100	{M, O, N, K, E, Y}
T200	{D, O, N, K, E, Y}
T300	{M, A, K, E}
T400	{M, U, C, K, Y}
T500	{C, O, O, K, I, E}

a

Apriori algorithm

1-Itemsets

2-Itemsets

3-Itemsets

Item	Support
K	1.0
E	0.8
Y	0.6
M	0.6
O	0.6
C	0.4
N	0.4
D	0.1
A	0.1
U	0.1
I	0.1

Item	Support
{K,E}	0.8
{K,Y}	0.6
{K,M}	0.6
{K,O}	0.6
{E,Y}	0.4
{E,M}	0.4
{E,O}	0.6
{Y,M}	0.4
{Y,O}	0.2
{M,O}	0.2

Item	Support
{K,E,Y}	0.4
{K,E,M}	0.4
{K,E,O}	0.6
{K,Y,M}	0.4
{K,Y,O}	0.4
{K,M,O}	0.2
{K,M,E,O}	0.2

$frequent itemsets : {{E}, {K}, {M}, {O}, {Y}, {E K}, {EO}, {K M}, {K O}, {K Y}, {E K O}}$

FP-growth algorithm

itemsets	condition	support $\geq$ 0.6 itemsets	frequent itemsets
e	{k:4}	{k:4}	${E, K}$
m	{e,k:2},{k:1}	{k:3}	${M, K}$
o	{k,e,m:1},{k,e:2}	{k,3}{e:3}	${O, K}, {O, E}, {O, E, K}$
y	{k,e,m:1},{k,e,o:1},{k,m:1}	{k:3}	${Y, K}$

$frequent itemsets : {{E : 5}, {K : 4}, {M : 3}, {O : 3}, {Y : 3}, {E, K : 4}, {E, O : 3}, {K, M : 3}, {K, O : 3}, {K, Y : 3}, {E, K, O : 3}}$

conclusion

By query 2 time to build FP-tree reduce the time to query database .So FP-growth is more efficient compared to a priori.

b

frequent itemsets	support	Confidence
${K, O} \to {E}$	0.6	1.0
${E, O} \to {K}$	0.6	1.0

6.14

hot	dogs	!(hot dogs)	total
hamburgers	2000	500	2500
!(hamburgers)	1000	1500	2500
Total	3000	2000	5000

$support(hot dogs , hamburgers) = P (hot dogs \cap hamburgers) = 0.4$ $confidence(hot dogs \to hamburgers) = \frac{P ( hot dogs \cap hamburgers )}{P ( hamburgers )} = 0.8$ $0.4 > 0.25 and 0.8 > 0.5 so it is a strong rule$

b

$l i f t (hot dogs \to hamburgers) = \frac{\frac{2000}{5000}}{\frac{3000}{5000} \frac{2500}{5000}} = \frac{4}{3} \frac{4}{3} > 1 so positively correlated$

c

$A llC o n f (a, b) M a x C o n f (a, b) K u l c (a, b) C os in e (a, b) L i f t (a, b) = \frac{P ( a & b )}{max ( P ( a ) , P ( b ))} = max (P (a ∣ b), P (b ∣ a)) = \frac{1}{2} (P (b ∣ a) + P (a ∣ b)) = \frac{a \cdot b}{∣ a ∣ \times ∣ b ∣} = \frac{2000}{2500 \times 3000} = \frac{P ( a & b )}{P ( a ) P ( b )} = 0.666 = 0.8 = 0.7333 = 0.730 = 1.333$

something

hw2 110590049

6.3

a

b

c

d

6.6

a

Apriori algorithm

FP-growth algorithm

conclusion

b

6.14

a

b

c