Distributed Spatio-Temporal Similarity Search Demetrios Zeinalipour-Yazti [email protected] University

Distributed Spatio-Temporal Similarity Search Demetrios Zeinalipour-Yazti dzeina@cs.ucy.ac.cy University

Distributed Spatio-Temporal Similarity Search Demetrios Zeinalipour-Yazti [email protected] University of Cyprus Song Lin [email protected] University of California - Riverside Dimitrios Gunopulos [email protected] University of California - Riverside http://www.cs.ucr.edu/~slin Song Lin

ICDE 2006 University of California, Trajectories are everywhere Song Lin University of California, Trajectory Similarity Search Habitat monitoring Animal migration patterns Sign language detection Movement of fingers Store surveillance video Customer movement patterns Camera sensor network

Each sensor can monitor the movement of objects within a small area Song Lin University of California, Distributed Similarity Search The setting Monitoring area G with m objects moving inside G is segmented into n non-overlapping cells each having a camera sensor Each record of the trajectory is stored locally at the closest sensor Problem Given a query trajectory Q, retrieve the top K trajectories which are most similar to Q. Song Lin University of California,

An example Distributed top-K problem The trajectories of objects are distributed at different cells It is expensive to collect all the trajectories centrally. a) Map View b) Cell View A6 A1 Q Q A2 C1 C2

C1 C2 C3 C4 G C3 C4 A3 A4 Song Lin A5 University of California,

Finding K most similar trajectories We have to define what is similar We use well known similarity measures for trajectories Euclidean Dynamic Time Wrapping (DTW) Berndt D., Clifford J., Using Dynamic Time Warping to Find Patterns in Time Series, In KDD94, Menlo Park, CA, pp. 229-248, 1994. Longest Common SubSequence (LCSS) Das G., Gunopulos D., Mannila H., Finding Similar Time Series, In PKDD97, Trondheim, Norway, pp. 88-100, LNCS 1263, 1997. We have to find the most similar trajectories We focus on LCSS, but the techniques work for DTW as well. Song Lin

University of California, Similarity Measures Euclidean Matching A) Dynamic Time Warping Matching B) Longest Common SubSequence Matching C) Courtesy of Dr. Eamonn Keogh Song Lin University of California, Longest Common Sub_Sequence (LCSS) n

1 Out-of-phase Match Used in string matching problems Captures out-of-phase matches, Captures outliers (ignore matching with outliers) LCSS Figure: courtesy of Dr. Eamonn Keogh Song Lin University of California, Longest Common Sub_Sequence (LCSS) if A or B is empty 0, 1 LCSS (Tail ( A), Tail ( B )), LCSS ( A, B ) if ai1 - bi 2 and i1 i 2

max ( LCSS (Tail ( A), B ), LCSS ( A, Tail ( B )), otherwise LCSS can be computed in O((ll1+l2) ) by dynamic programming algorithm. In general, it is expensive to compute this similarity exactly, so we can also compute the bounds of it. Song Lin University of California, Centralized LCSS UpperBound EnvHigh[i ] max{Q[ j ] }, i j EnvLow MBE , (Q ) EnvHigh, where EnvLow[i ] min{Q[ j ] }, i j 1, if A[i] within envelop LCSS , ( MBE (Q ), A) 0, therwise

Theorem: LCSS , (Q, A) LCSS , ( MBE (Q ), A) Song Lin University of California, Problem with distributed computation of LCSS In distributed setting, computing lCSS is difficult, because Sequential matching problem Matching may occur across cells Cell 1 Song Lin Cell 2 Cell 3 Cell 4

University of California, Our Solution We compute lower bound and upper bound of the LCSS similarity distributively. We develop new distributed top-K algorithms (UB-K, UBLB-K) that use these bounds to find the most similar trajectories. Song Lin University of California, Distributed LCSS UpperBound Each cell uses LCSS,(MBE(Q), AMBE(MBE(Q), AQ), Aij) to calculate the similarity of each local sub_trajectory Aij to MBE(MBE(Q), AQ) Upper bound DUB_LCSS(MBE(Q), AQ,Ai) is computed by adding the n local results

Theorem 1 Song Lin n j 1 LCSS , ( MBE (Q), Aij ) LCSS , (Q, Ai ) University of California, Distributed LCSS LowerBound For each trajectory Ai, cell cj finds the time region Tij = {ts(p)|p in Aij} when Ai stays in cell cj. Filter Q into Qij such that Qij is in the same time intervals as Aij , Qij = {p| p in Q and ts(p) in Tij}. Each cell performs a local computation of LCSS,(MBE(Q), AQij, Aij) The lower bound DLB_LCSS(MBE(Q), AQ,Ai) is computed by adding the n local results Theorem 2

Song Lin n j 1 LCSS , (Q ' ij , Aij ) LCSS , (Q, Ai ) University of California, Distribute top K algorithms Threshold Algorithm (TA) Fagin R., Lotem A. and Naor M., Optimal Aggregation Algorithms For Middleware, In PODS01, Santa Barbara, CA, pp. 102-113, 2001. Three-Phase Uniform Threshold (TPUT) P. Cao and Z. Wang. Efficient Top-K Query Calculation in Distributed Networks. In PODC, Newfoundland, Canada, 2004.

Threshold Join Algorithm (TJA) D. Zeinalipour-Yazti, Z. Vagena, D. Gunopulos, V. Kalogeraki, V. Tsotras, M. Vlachos, N. Koudas, D. Srivastava. The Threshold Join Algorithm for Top-k Queries in Distributed Sensor Networks. In DMSN,Trondheim, Norway, 2005. Song Lin University of California, Problem with existing approaches Assume the exact partial scores are available The exact scores at each cell can not be computed efficiently (recall that the matching may occur at the crossing cells) We use upper (lower) bounds to perform distributed top-k computation (based on Theorem 1 and Theorem 2) Song Lin University of California,

Distributed top-K computation with bounds m Now we have the Lower and Upper Bounds rather than Exact scores. e.g. instead of sim(A0,Q)=20 it gives us [A0,15,25] v1 id,lb,ub v2 id,lb,ub v3

id,lb,ub METADATA A 2,3,6 A 0,4,8 A4,5,10 A 7,7,9 A3,8,11 A 9,8,9 .... A4,4,5 A2,5,6 A0,5,7 A3,5,6 A 9,8,10 A7,12,13 .... A4,1,3 A 0,6,10

A2,5,7 A9,6,7 A 3,7,10 A 7,11,13 .... A 4,10,18 A 2,13,19 A 0,15,25 A 3,20,27 A 9,22,26 A 7,30,35 .... n id,lb,ub We propose UB-K and UBLB-K algorithms to compute the top-K results. Song Lin

University of California, UB-K Algorithm Query: Find the K=2 highest ranked answers TJA +1 TJA 2+1 METADATA id,lb id,ub DATA A 4,30 A 2,27 A 0,25 A 3,20

A 9,18 A 7,12 .... A 4,23 A 2,22 A 0,16 A 3,18 A 9,15 A 7,10 .... LB 2 ? EXA CT

Why not stop at 25? Because we might have another object X [UB:24, Real:23] Song Lin University of California, UBLB-K Algorithm METADATA id,lb id,lb,ub TJA +1 TJA 2+1 A 4,22,30 A 2,21,27 A 0,15,25 A 3,13,20 A 9,14,18

A 7,10,12 .... ? LB,UB DATA Exact Score A 4,23 A 2,22 A 0,16 A 3,18 A 9,15 A 7,10 .... EXA CT Note: Kth highest LB is: 21 Therefore A3 (UB:20) and below are not necessary

Song Lin University of California, UB-K vs. UBLB-K Both fetch METADATA objects incrementally (+1). UB-K uses upper bounds, while UBLB-K uses both upper bounds and lower bounds UB-K always fetches +1 (: step increment) DATA objects, while UBLB-K may fetch less DATA objects. UB-K fetches DATA incrementally, while UBLB-K uses a final bulk DATA transfer. Song Lin

University of California, Experimental Evaluation Comparison system Centralized UB-K UBLB-K Dataset 25,000 trajectories generated over the Oldenburg street map, using the Network Based Generator of Moving Objects*. * Brinkhoff T., A Framework for Generating Network-Based Moving Objects. In GeoInformatica,6(2), 2002. Song Lin University of California, Performance Evaluation Song Lin

University of California, Scalability Evaluation Song Lin University of California, Varying K and Song Lin University of California, Summary We described and analyzed well known similarity measures for trajectories DUB_LCSS and DLB_LCSS for bounding similarity of two trajectories distributively UB-K and UBLB-K to find K most similar trajectories

Easily extended for DTW and other similarity measures Song Lin University of California, Distributed Spatio-Temporal Similarity Search Demetrios Zeinalipour-Yazti [email protected] University of Cyprus Song Lin [email protected] University of California - Riverside Dimitrios Gunopulos

[email protected] University of California - Riverside http://www.cs.ucr.edu/~slin Song Lin ICDE 2006 University of California,

Recently Viewed Presentations

  • Joint, Marginal, and Conditional Distributions

    Joint, Marginal, and Conditional Distributions

    The joint density function of the lifetimes of the two components, both measured in hours, is . f(x, y) = (x+y)/8, for 0<x<2 and 0<y<2. Calculate the probability that the device fails during its first hour of operation. Sample Exam...
  • Multiple Regression - University of Minnesota

    Multiple Regression - University of Minnesota

    Multiple Regression 18.1 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent variables. We expect to build a model that fits the data better than the simple linear regression model.
  • Ophthalmic Pathology - midatlanticpas.org

    Ophthalmic Pathology - midatlanticpas.org

    * * * * * * * * * * * * Contents Grossing the globe Ocular histology 10 Representative cases Grossing the ocular specimen Four basic steps Orient the specimen and determine laterality Measurements Transillumination (TI) Sectioning the globe...
  • Harvesting Energy - Western Oregon University

    Harvesting Energy - Western Oregon University

    Harvesting Energy Glycolysis and Cellular Respiration Energy from Glucose _____and _____ break glucose and other carbon compounds apart. Energy released from broken bonds is harnessed to make ____ to run cell processes.
  • Presentazione standard di PowerPoint

    Presentazione standard di PowerPoint

    In the Soviet Union, resonance theory — especially as developed by Linus Pauling — was attacked in the early 1950s as being contrary to the Marxist principles of dialectical materialism, and in June 1951 the Soviet Academy of Sciences under...
  • International Registration of Geographical Indications and Appellations of

    International Registration of Geographical Indications and Appellations of

    Informs consumers of the uniqueness of the products derived from this connection (typicality) Represents the collective goodwill derived from this uniqueness (reputation) ... invalidated the effect of the international registration in its territory.
  • Global Trade Liquidity Program - World Bank

    Global Trade Liquidity Program - World Bank

    GTLP mobilizes liquidity and guarantee public sector and the private sector. GTLP partners with global, regional or local banks to channel liquidity to banks in the developing world in two ways: i) a 40% risk funded participation, and ii) short...
  • ECE 352 Electronics II - Course Overview

    ECE 352 Electronics II - Course Overview

    No channel of electrons for vGS < VTh No drain current for vGS < VTh N-Channel Depletion MOSFET N-type channel VTh Saturation mode operation iDS vGS Cutoff region (vGS < VTh) Triode region (vDS < vDSsat) Triode-saturation boundary at vDS...