Constructing a Large Node Chow-Liu Tree Based on Frequent Itemsets Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory The Chinese University of Hong Kong Shatin, NT. Hong Kong {kzhuang, king, lyu}@cse.cuhk.edu.hk ICONIP2002, November 19, 2002 Orchid Country Club, Singapore Outline Background Probabilistic Classifiers Chow-Liu Tree Motivation Large Node Chow-Liu tree Experimental Results

Conclusion A Typical Classification Problem Given a set of symptoms, one wants to find out whether these symptoms give rise to a particular disease. Background a constant for a given instance of A1,A2,An Probabilistic Classifiers The classification function is defined as: The joint probability is not easily estimated from the dataset; thus the assumption about the distribution has to be made, dependence or independence relationship among variables. Background

Chow-Liu Tree (CLT) Assumption: a dependence tree exists among the variables, given the class variable C. A tree dependence structure Background Chow-Liu Tree Class Variable Advantages Comparable with some of the state-of-the-art classifiers. The tree structure enables it a resistance to the over-fitting problem and a decomposition characteristic.

Disadvantages It cannot model non-tree dependence relationship among attributes or variables. Class Variable 1. 2. 3. Motivation Class Variable Fig. (b) can represent the same independence relationship as Fig. (a): Given B and E, there is an independence relationship among A, C, and D. Fig. (b) is still a tree structure, which inherits the advantages of a tree. By combining several nodes, a large node tree structure can represent a non-tree structure.

This motivates our Large Node Chow-Liu tree approach. Overview of Large Node Chow-Liu Tree (LNCLT) Underlying structure Step 1:draft the ChowLiu tree Step 2:draft the ChowLiu tree The same independence relationship Step 1. Draft the Chow-Liu tree Draft the CL-tree of the dataset according to the CLT algorithm. Step 2. Refine the Chow-Liu tree based on some combination rules Refine the Chow-Liu tree into a large node Chow-Liu tree based on some combination rules Combination Rules Bounded cardinality

The cardinality of each large node should not greater than a bound k. Frequent Itemsets Each large node should be Frequent itemset. Father-son or sibling relationship The nodes in a large node should be a father-son or sibling relationship. Combination Rules (1) Bounded Cardinality The cardinality of each large node ( the number of nodes in a large node) should not greater than a bound k. An example is that: if we set k as the number of the attributes or variables, the LNCLT will be a one large node tree, which will lose all the

merits as a tree. One node tree will lose all the merits of the tree. Combination Rules (2) Frequent Itemsets Food store example: In a food store, if you buy {bread}, it will be highly possible for you to buy {butter}. Thus {bread, butter} is called a frequent itemset. Frequent Itemset is the set of attributes that occur with each other frequently. Frequent Itemsets are possible large nodes, since the attributes in a Frequent Itemset act just like one attribute they occur with each other frequently at the same time. Combination Rules (3) Father-son or sibling relationship Combining Father-son and sibling nodes will increase the data fitness of the tree structure on the datasets (Proved in the paper). Combining Father-son and sibling nodes will

maintain the graphical structure as a tree structure. Combining non-father or non-sibling nodes may result in a non-tree structure Father-son Combination Sibling Combination Constructing Large Node Chow-Liu Tree 1. Generate the frequent itemsets Call Apriori[AS94] to generate the frequent itemsets, which have the size less than k. Record all the frequent itemsets together with their frequnecy into list L. 2. Draft the Chow-Liu tree Draft the CL-tree of the dataset according to the CLT algorithm.

3. Combine nodes based on Combining rules Iteratively combine the frequent itemset with maximum frequency, which satisfy the combination conditions: father-son or sibling relationship until L is NULL. Example:Constructing LNCLT 11.. {A,C} {A,C} does does not not satisfy satisfy the the 2. f({B,C}) is the biggest

and 2. combination f({B,C}) iscondition, the biggestfilter and combination condition, filter 3.. Filter the satisfies combination 3.. Filter the frequent frequent itemsets itemsets satisfies combination out {A,C}

out {A,C} 4. {D, E } is the frequent 4.which {D,combine E } iscoverage the frequent have with condition, them into which have coverage

with condition, combine them into itemset and satisfies the itemset and satisfies the ,, the {D,E} isis left. (c) {B,C} the {D,E} left. (c) {B,C}

combination combination condition, condition, Example: combine Example: combine them them into into (d) (d) We We assume assume the the kk isis 2, 2, after after step step 1, 1, we we get get the the frequent

frequent itemsets itemsets {A, {A, B} B} {A, {A, C},{B, C},{B, C}, C}, {B, {B, E}, E}, {B, {B, D}, D}, {D, {D, E}. E}. And And f({B, f({B, C})>f({A, C})>f({A, B})> B})> f({B, f({B, E}) E}) >f({B, >f({B, D})>f({D,

D})>f({D, E}) E}) (f(*) (f(*) represents represents the the frequency frequency of of frequent frequent itemsets). itemsets). (b) (b) isis the the CLT CLT in in step2. step2. Experimental Setup Dataset: MNIST-handwritten digit (28*28 gray-level bitmap) database training dataset size: 60000

testing dataset size: 10000 Experimental Environments Platform: win2000 Developing tool: Visual C++ 6.0 Experiments Data fitness Comparison Experiments Data fitness Comparison Minus Log Likelihood in testing dataset Minus Log Likelihood in training dataset 36 31 27

LNCLT 22 CLT (bist/digit) (bits/digit) 32 16 12 11 1 2

3 4 5 digit 6 7 8 9 CLT 21 17

0 LNCLT 26 0 1 2 3 4 5 digit 6

7 8 9 Experiments Recognition Rate Future Work Evaluate our algorithm extensively in other benchmark datasets. Examine other combining rules. Conclusion A novel Large Node Chow-Liu tree is constructed based on Frequent Itemsets. LNCLT can partially overcome the disadvantages of CLT, i.e., inability to represent non-tree structures. We demonstrate that our LNCLT model has a better data fitness and a better prediction

accuracy theoretically and experimentally . Main References [AS1994] R. Agrawal, R. Srikant, 1994,Fast algorithms for mining association rules, Proc. VLDB-94 1994. [Chow, Liu1968] Chow, C.K. and Liu, C.N. (1968). Approximating discrete probability distributions with dependence trees. IEEE Trans. on Information Theory, 14,(pp462-467) [Friedman1997] Friedman, N., Geiger, D. and Goldszmidt, M. (1997). Bayesian Network Classifiers. Machine Learning, 29,(pp.131-161). [Cheng1997] Cheng, J. Bell, D.A. Liu, W. 1997, Learning Belief Networks from Data: An Information Theory Based Approach. In Proceedings of ACM CIKM97 [Cheng2001] Cheng, J. and Greiner, R. 2001, Learning Bayesian Belief Network Classifiers: Algorithms and System, E.Stroulia and S. Matwin(Eds.): AI 2001, LNAI 2056, (pp.141-151),

Learning Bayesian Belief Network Classifiers: Algorithms and System, E.Stroulia and S. Matwin(Eds.): AI 2001, LNAI 2056, (pp.141-151). Q&A Thanks.