This work was supported by grants HKUST 6081/01E and 6070/00E from Hong Kong RGC. Indexing Spatio-Temporal Data Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Hong Kong 26, Feb, 2002 Outline Preliminary Spatial data warehouses and aggregate trees Applications and motivation Solution for static objects Solution for dynamic objects Performance study Conclusion 2 Preliminary Spatial Data Warehouses Each spatial object carries some sort of aggregate information (i.e., each landscape may involve the population). A common query is the window aggregate query, which specifies a window query and retrieves the aggregate sum of all objects intersecting it. Analogy of the group-by in conventional data warehouses. Materialization techniques common in traditional data warehouses are of limited use since possible positions of queries are infinite. Ad-hoc group-by 3 R2 R1 150

qs 75 R3 132 R4 12 Preliminaries Spatial Data Warehouse A better approach is to deploy aggregate trees to introduce the spatial hierarchy [Kline and Snodgrass, 1995, Papadias, et al, 2001, Lazaridis and Mehrotra, 2001]. Aggregation R-tree R5 R1 150 qs R2 75 R3 132 R5 225 R4 12 R1 150 R6 Retrieve the sum of aggregate of objects intersecting qs 4 R2 75 R6 144 R3 132 R4 12

Spatio-Temporal DW: Applications and Motivation Spatio-temporal databases deal with objects whose properties may change with time. Traditional studies in spatio-temporal databases focus on retrieving the actual objects that satisfy the query predicates. Retrieve all vehicles that appear in the north district during 3pm to 5pm yesterday. A more useful type of queries may be to retrieve, instead of the actual object IDs, the number of objects that satisfy the query conditions. Retrieve the (approximate) number of vehicles in the north district during 3pm-5pm yesterday. In the above example, the spatial objects (i.e., streets in the north district) that carry aggregate information (i.e., number of cars) are static. Other queries may involve dynamic objects. The mobile phone antenna (i.e., the aggregate information = # of users served by the antenna) whose spatial extents (i.e., covering areas) may change over time. 5 Example (Static Objects) R2 R1 q R3 R4 aggregate results over timestamps s Query qs retrieve the aggregate sum (during time T1-T4) of all rectangles that intersect it. 369 369 367 364 359

1828 12 12 12 12 12 60 R3 132 127 125 127 127 638 75 80 85 90 90 420 R1 150 T1 150 145 135 130 710 T2 T3

R4 regions R2 T4 T5 now FACT TABLE 6 total sum aggregate results over regions Traditional Methods Pre-materialization Even more difficult than spatial DW due to the inclusion of the temporal dimension. Use an aggregation tree. When the aggregate of a region changes, create a 3D box. An aggregate 3D R-tree is used to index all these boxes. Problem: The spatial extent of a region must be duplicated many times although it does not change. 7 3D boxes for region R1 130 T5 T4 135 145 T3 150 T1 Aggregate RB-tree Spatial extents are stored only once. R5 R2 R1 q R3 R4

B-tree for the whole space R6 s 1 1105 3 723 B-tree for R5 1 685 4 445 1 369 3 367 4 364 5 359 B-tree for R6 1 283 3 405 aggregate results over timestamps 369 R4 12 R3 132 R2 75 R1 150 T1 369 367 364 359 12 60 127 125 127 127 638 90 420 150 145 135 130 710 T2 85 12

T3 90 T4 T5 now FACT TABLE 1 225 2 230 1828 12 80 12 total sum 4 225 5 220 R-tree for spatial dimension R5 1130 R6 698 B-tree for R1 1 445 4 265 1 150 3 145 aggregate results over regions R1 710 R2 420 1 155 3 265 8 3 85 4 90 3 137 4 139 B-tree for R4 R3 638 R4 60 4 135 5 130 B-tree for R2 1 75 2 80 1 144 2 139 1 12 B-tree for R3

1 259 3 379 1 132 2 127 3 125 4 127 Example (Dynamic Objects) Situation during timestamps 1-4 qs R2 R1 R3 R4 aggregate results over timestamps Query qs retrieve the aggregate sum (during time T1-T4) of all rectangles that intersect it. regions 369 369 367 364 359 1828 R 4 12 12 12 12 12 60 R3 132

127 125 127 127 638 R2 75 80 85 90 90 420 R 1 150 150 145 135 130 710 T1 T2 T3 T4 T5 now aggregate results FACT TABLE 9

total sum over regions Example (cont.) change position at timestamp 5 R2 R1 qs R3 R4 Query qs retrieve the aggregate sum (during time T1-T4) of all rectangles that intersect it. aggregate results over timestamps regions 369 369 367 364 359 1828 R 4 12 12 12 12 12 60 R3 132

127 125 127 127 638 R2 75 80 85 90 90 420 R 1 150 150 145 135 130 710 T1 T2 T3 T4 T5 now aggregate results FACT TABLE 10 total sum

over regions Aggregate HRB-tree Integrates the previous idea with the spatio-temporal access method HR-trees. timestamp 1-4 R5 timestamp 5 R2 R1 R' 5 R2 R4 R3 R6 qs R3 R' 1 B-tree forR5 B-tree forR1 R-tree for spatial dimension timestamps 1-4 A B R1 B-tree forR2 R6 qs R-tree for spatial dimension R2

R5 timestamp 5 D R6 C R3 R4 E R4 B-tree forR3 R'1 R6 R2 B-tree forR4 11 R'5 B-tree forR6 B-tree forR'1 B-tree forR'5 Aggregate 3D RB-tree Creates a 3D box only when the spatial extent of an object changes. R5 B-tree forR 5 R2 R4 R'5 R3 R1

B-tree forR' 1 B-tree forR6 R6 R'1 B-tree forR'5 5 time B-tree forR1 B-tree forR2 B-tree forR3 12 B-tree forR4 Managing Numerous B-trees If each B-tree is too small (i.e., the rates of spatial extent and aggregate changes are similar) A block contains too few entries and much space is wasted. Not suitable for caching. Our solution is to use a B-File, which packs numerous B-trees into a single file Avoiding empty spaces in a disk page. Maintaining the same query performance. 13 Performance Dataset settings Number of spatial objects = 10,000 History length = 1,000 timestamps Aggregate agility describes how fast the aggregate information changes (4%, 8%, 16%, 32%, 64%) Region agility describes how fast the spatial extents change 0% for static objects 0.01% for dynamic objects (capturing the fact that spatial dimension changes much slower than the aggregate data) Datasets include 500,000 to 6,500,5000 records. Each query contains 2 parameters: (spatial extents and interval length). 14 Results (Static Objects)

400 Mega bytes 350 aRB 180 a3DR node accesses aRB 150 a3DR 300 120 250 90 200 150 60 100 30 50 0 0 4 8 16 aggregate agility (%) 32 64 15 4 8

16 32 aggregate agility (%) 64 Results (Static Objects) 180 node accesses 180 150 aRB a3DR 150 node accesses aRB a3DR 120 120 90 90 60 60 30 30 0 0 1 25 50 query length 75 100 16

1 3 5 query extent (%) 7 9 Results (Dynamic Objects) 400 Mega bytes 350 a3DR 300 a3DRB 180 node accesses 150 a3DR aHRB aHRB 120 250 200 a3DRB 90 150 60 100 30 50 0

4 8 16 32 aggregate agility (%) 0 64 17 4 8 16 aggregate agility (%) 32 64 Results (Dynamic Objects) 200 200 node accesses a3DRB a3DR a3DR aHRB 150 node accesses aHRB 150 100 100 50 50

0 1 25 50 query length 75 0 100 18 a3DRB 1 3 5 query extent (%) 7 9 Conclusion We propose indexing techniques that replace the data cube in spatiotemporal data warehouses and answer ad-hoc group-by queries very efficiently. Both static and dynamic spatial dimensions are discussed. Extensions Cost models that predict the performance of alternative structures. Query optimization based on the cost models. Complex query evaluation 19