Estimate the Number of Relevant Images Using TwoOrder Markov Chain Presented by: WANG Xiaoling Supervisor: Clement LEUNG Outline Introduction Objective Methodology Experiment Results Conclusion and Future Work Introduction Large collections of images have been made available on web. Retrieval effectiveness becomes one of the most important parameters to measure the performance of image retrieval systems. Measures: Precision Recall P R number of relevant images retrieved total number of images returned number of relevant images retrieved total number of relevant images in the database Significant Challenge: the total number of relevant images is not directly observable Basic Models Regression Model Markov Chain Two-Order Markov Chain

Objective To investigate the probabilistic behavior of the distribution of relevant images among the returned results for the image search engines using two-order markov chain Methodology Test Image Search Engine: Query Design 70% provided by authors One word query Two word query Three word query 30% suggestive term Suggestive term with largest returned results Suggestive term with least returned results Methodology Database Setup: Stochastic process {X1, X2,, XJ } where XJ denotes the aggregate rele vance of all the images in page J Equation:X 20 J YJi i 1 where YJi=1 if the i th image on page J is relevant, and YJi =0 if the i th image on page J is not relevant. 20 X J YJi 18 i 1 Page J

XJ 1 18 2 19 3 20 4 19 5 20 6 19 7 20 8 18 9 19 10 18 Forecast Using Two-Order Markov Chain Markov Chain: Stochastic process {XJ, J1} with state space S={0,1,2,20} Two-Order Markov Chain: State space change to S 2 Forecast the state probability distribution of next page (J) J) based on the original state probability distribution (J) 1) and transition probability matrix P . An Example Model Test Mean Absolute Error Experiment Results Forecast Results Using Two-Order Markov Chain Page Google

Yahoo Bing 1 20 20 20 2 20 20 20 3 20 20 20 4 20 20 20 5 20 20 20 6 20 20 17 7 20 20 17 8

20 20 17 9 20 20 17 10 20 20 17 Test Results--Google 20 20 15 15 10 1 2 3 4 5 6 Page Number (a) 7 8 9 10 10 20 20 15 15

1 2 3 4 5 6 Page Number (c) 7 8 9 20 15 10 1 2 3 4 5 6 Page Number (e) 7 8 9 10 10 10 Number of Relevant Images 10 Number of Relevant Images 10 15 15 2 3

4 5 6 Page Number (g) 7 8 9 10 10 20 20 15 15 10 1 2 3 4 5 6 Page Number (i) 7 8 9 10 3 4 5 6 Page Number (b) 7 8 9

10 1 2 3 4 5 6 Page Number (d) 7 8 9 10 1 2 3 4 5 6 Page Number (f) 7 8 9 10 1 2 3 4 5 6 Page Number (h) 7 8 9

10 1 2 3 4 5 6 Page Number 7 8 9 10 15 20 1 2 20 20 10 1 10 (j) Test Results--Yahoo 20 20 15 15 10 1 2 3 4 5

6 Page Number (a) 7 8 9 10 10 20 20 15 15 1 2 3 4 5 6 Page Number (c) 7 8 9 20 15 10 1 2 3 4 5 6 Page Number (e) 7 8 9

10 10 10 Number of Relevant Images 10 Number of Relevant Images 10 15 15 2 3 4 5 6 Page Number (g) 7 8 9 10 10 20 20 15 15 10 1 2 3 4 5 6 Page Number (i) 7

8 9 10 3 4 5 6 Page Number (b) 7 8 9 10 1 2 3 4 5 6 Page Number (d) 7 8 9 10 1 2 3 4 5 6 Page Number (f) 7 8 9

10 1 2 3 4 5 6 Page Number (h) 7 8 9 10 1 2 3 4 5 6 Page Number 7 8 9 10 15 20 1 2 20 20 10 1 10 (j)

2 3 4 1 2 3 4 1 2 3 1 2 1 2 5 6 7 8 9 10 5 6 Page Number (b) 7 8 9 10 4 5 6 Page Number (d) 7

8 9 10 3 4 5 6 Page Number (f) 7 8 9 10 3 4 5 6 Page Number 7 8 9 10 7 8 9 10 Test Results--Bing 20 20 10 10 0 1 2 3

4 5 6 Page Number 7 8 9 0 10 (a) 20 20 10 10 1 2 3 4 5 6 Page Number 7 8 9 (c) 20 10 0 1 2 3 4 5 6 Page Number 7

8 9 10 (e) 10 0 10 10 1 2 3 4 5 6 Page Number 7 8 9 10 0 (h) (g) 20 20 10 10 0 20 20 20 0 0 10 Number of Relevant Images 0

Number of Relevant Images 1 1 2 3 4 5 6 Page Number (i) 7 8 9 10 0 1 2 3 4 5 6 Page Number (j) Measure for Forecast Accuracy Mean Absolute Deviation (MAD): forecast error MAD n One-word Googl 2. 2. 1.1 e 7 3 Yahoo 2. 0. 1.1 0 1 Bing 1. 1. 1.2 3 9

Two-word Three-word 0.8 0.1 0.8 1. 1.9 0.6 7 0.5 4.8 1.1 2. 2.2 0.4 2 4.5 2.0 1.2 1. 10. 1.1 2 5 Mean Absol ut e Er ror Comparative Results 5 4 Regreesi on M odel M arkov Chai n 3 2 Two- Order M arkov Chai n 1 0 Googel Yahoo Bi ng Im age Search Engi ne Best Model: TwoOrder Markov Chain Worst Model: Regression Model Conclusion Two-Order Markov Chain could well represent the distribution of relevant images among the results pages for the major web image search engine. Two-Order Markov Chain is the best model among three models we have worked. Future Work

Our future work will try to apply Hidd en Markov Chain to this topic Thank you! Q&A Two-Order Markov Chain An example (cont) Suppose the stochastic process {Xt, t>=0} wit h a state space S={A, B, C} As to two-order Markov chain, the state space: S2={AA, AB, AC, BA, BB, BC, CA, CB, CC} The state probabilities distribution of period z ero: (0)= (AA, AB, AC, BA, BB, BC, CA, CB, CC) An example (cont) The transition probability matrix: PAA,BA=0 p AA, AA 0 0 pBA, AA p 0 0 pCA, BC 0 0 p AA, AB p AA, AC 0 0 0 0 0 0 0 p AB , BA

p AB , BB p AB , BC 0 0 0 0 0 0 0 p AC ,CA p AC ,CB pBA, AB pBA, AC 0 0 0 0 0 0 0 pBB , BA pBB , BB pBB , BC 0 0 0 0 0 0 0 pBC ,CA pBC ,CB

pCA, AB 0 pCA, AC 0 0 0 0 pCB , BA pCB , BB pCB , BC 0 0 0 0 0 0 0 0 0 pCC ,CA pCC ,CB 0 0 p AC ,CC 0 0 pBC ,CC 0 0 pCC ,CC An example Therefore, the probability distribution o f states for page J will be compute as: (J) J)=(J) J-1)*PP [Return]