Paper2_CSE6331_Vivek_1001053883

Extracting Dense Regions From
HurricaneTrajectory Data[1]
Praveen KumarTripathi, Madhuri Debnath, Ramez Elmasri
Dept. of Computer Science, University ofTexas at Arlington
Presented By:
Vivek Kumar Sharma
[1]”GeoRich’14 Proceedings ofWorkshop on
Managing and Mining Enriched Geo-Spatial Data”

Outline
• Introduction
• Trajectory Clustering on Point Data
• DBSCAN
• Dense Regions Extraction
• Experiments and Analysis
• Related Work
• Conclusion
• References

Introduction
• Weather Data:A good example of Spatial-temporal data(Time & Space)
• Clustering: Key technique to analyze storm trajectories
• Trajectories, Sub-Trajectories and Point Data
• (long, lat) as spatial attributes and (wind-speed, time) as similarity measure
• Monitoring & Predicting Weather conditions
• Identifying origin, destination of Hurricanes and most affected regions
• Hurricane dataset in Atlantic region from 1950-1999 with attributes longitude,
latitude, time, wind-speed, pressure and status

Trajectory Clustering on Point Data
• Consider a trajectory as an individual Point
• Points are not constrained to belong to its respective parent trajectory or
sub-trajectory
[1] http://weather.unisys.com/hurricane/atlantic/2012H/index.php

DBSCAN
• Two global parameters:
• Eps: Maximum radius of the neighborhood
• MinPts: Minimum number of points in an Eps-neighborhood of that point
• Density = number of points within a specified radius r (Eps)
• Core Object: object with at least MinPts objects within a radius ‘Eps-neighborhood’
• Border Object: object that on the border of a cluster
• A noise point is any point that is not a core point or a border point.

Background (I)
• NEps(p): {q belongs to D | dist(p, q) <= Eps}
• Directly density-reachable:
A point p is directly density-reachable from a point q wrt. Eps, MinPts if
• 1) p belongs to NEps(q)
• 2) |NEps (q)| >= MinPts
(core point condition) p
q
MinPts = 5
Eps = 1 cm

Background (II)
• Density-reachable:
• A point p is density-reachable
from a point q wrt. Eps, MinPts if
there is a chain of points p1, …,
pn, p1 = q, pn = p such that pi+1 is
directly density-reachable from
pi
• Density-connected:
• A point p is density-connected
to a point q wrt. Eps, MinPts if
there is a point o such that both,
p and q are density-reachable
from o wrt. Eps and MinPts.
p
q
p1
p q
o

DBSCAN Algorithm
• Arbitrary select a point p
• Retrieve all points density-reachable from p wrt Eps and MinPts.
• If p is a core point, a cluster is formed.
• If p is a border point, no points are density-reachable from p and DBSCAN visits the next
point of the database.
• Continue the process until all of the points have been processed.
Core
Border
Outlier
Eps = 1cm
MinPts = 5

Dense Region Extraction
• How to find regions of high storm activity
• DBSCAN parameter Eps which gives the neighborhood
• First neighborhood considering spatial attributes
• Then adding non-spatial measure wind-speed as well
• SpatialClustering
• Spatial and non spatial clustering
• Spatial temporal clustering

(Spatial Clustering)
• Latitude and Longitude as data points
• Haversine formula as similarity measure between two points
• a = sin(/2) + cos(φ1) × cos(φ2) × sin2(λ/2)
• c = 2 × arctan 2(pa, p(1 − a))
• d = R × c
• where (φ/latitude) = φ2 − φ1 and (λ/longitude) = λ2 − λ1, R = 6371Km is the radius of the
Earth, and d is the Haversine distance.
• This formula is used to calculate spherical distance between two points because of
spatial coordinates which are lying on spherical object(Earth)

(Spatial and non Spatial Clustering)
• Extending DBSCAN to handle non-spatial attribute
• If we have a data set D = {d1, d2, . . . , dn}, where di = (xi, yi, ai, bi) and (xi, yi) be the
spatial attribute & (ai, bi) as non-spatial attribute
• Similarity measure can be calculated as:
• dist_spatial (d1,d2) = √(x1 − x2)^2 + (y1 − y2)^2
• dist_nonspatial (d1,d2) = √(a1 − a2)^2 + (b1 − b2)^2
• Remaining steps for DBSCAN will remain same to calculate dense regions for
spatial and non-spatial attributes
• Finally composite neighborhood will include data points common in both

(SpatialTemporal Clustering)
• Consider Time as non-spatial attribute for analysis
• Hurricane data is sampled at 6 hour interval
• Haversine formula is being used to calculate similarity measure between spatial
attributes of two data points
• Manhattan distance for non-spatial attribute time is used
• Relative time framework instead of Absolute time considered which means relative
time from start of a particular trajectory
• Important to analyze movement patterns of hurricane
• Normalized time component(as per length of trajectory) varies between 0-1.0
• 0.00 signifies initial stage
• 0.5 signifies middle stage
• Higher values close to 1.0 signifies end of hurricane

Experiments and Analysis
• Implementation and Analysis being done on MATLAB
• Different combinations of Eps and MinPts has been tried to get well
separated and compact clusters
• Four different types of analysis being done:
• Analysing Spatial attributes
• QualitativeAnalysis of clusters
• Analysing combination of spatial and non-spatial attributes
• Storm starting and landing information

Experiment and Analysis
(Analysing Spatial Attributes)
• Eps= 35, MinPts= 10, k= 15

(Qualitative Analysis of Clusters)
• Clustering based on spatial and non-spatial attributes
• Wind-speed and Time
• Wind-speed threshold has been varied from 20-100 in the interval of 10
• Time/Temporal threshold from 0.2-1.0 in the interval of 0.1
• Quality measure needed to check the clustering results
• SSE= (Sum of the Square Error)
• numclus is the number of clusters, N is the set of noise points and Ci is the ith cluster
• Smaller the value of SSE, better will be the clustering results

(Combination of Spatial and non-spatial attributes)
• Studying the nature of storms
• Now neighboring points will be close to core data point spatially and will
have similar wind-speed values
• Reduced cluster size as they represents regions with same nature of storms
• As wind-speed being reduced cluster size also reduces

(Combination of Spatial and non-spatial attributes)
• SSE Results :
• Reduction in wind-speed results
compact and homogeneous
clusters.
• Spatial-Temporal Clustering:
• Temporal threshold reduced
from 1.0 to 0.2.

(Storm Starting and Landing Information)
• Storm Starting Clusters
• Considered starting three data points wrt first
three timestamps for each trajectory
• Storm Landing Clusters
• Considered last three data points

RelatedWork
• DENTRAC (DENsity basedTRAjectory Clustering)
• Post processing of clusters to get more domain specific knowledge
• Partition and Group based Clustering
• Using concept of MDL (Minimum Descriptive Length) identifies characteristic points
• Representing trajectory by connecting consecutive characteristics points
• Clustering algorithm using computational geometry and string processing
• Remove noise by preprocessing trajectory data and divide into sub-trajectories
• Slicing-STS-Miner using sequential patterns
• Spatial temporal pattern convoy being proposed
• Region based &Trajectory based Clustering
• Classifying trajectories using duration of trajectory as an important feature

Conclusion
• Analyzed hurricane trajectory data as point data
• Initial clustering based on spatial attributes
• Influence of non-spatial attribute i.e. wind-speed obtained
• Spatial temporal DBSCAN algorithm has been proposed in which
normalized time is also considered as non-spatial attribute
• Post-processing of clusters to get origin, landing and tracking information

References
• http://weather.unisys.com/hurrricane/atlantic/index.html.
• Data Mining: Concepts andTechniques, 2nd ed.Morgan Kaufmann, 2006.
• Derya Birant and Alp Kut. ST-DBSCAN:An algorithm for clustering spatial-temporal data. DataKnowl.
Eng., 60(1):208–221, January 2007.
• Chun-Sheng Chen, Christoph F. Eick, and NouhadJ. Rizk. Mining spatial trajectories using non-
parametric density functions. In Petra Perner, editor, Machine Learning and Data Mining in Pattern
Recognition,volume 6871 of Lecture Notes in Computer Science, pages 496–510. Springer Berlin
Heidelberg, 2011.
• Martin Ester, Hans peter Kriegel, Jorg S, and Xiaowei Xu. A density-based algorithm for discovering
clusters in large spatial databases with noise. pages 226–231.AAAI Press, 1996.
• Jae gil Lee and Jiawei Han.Trajectory clustering:A partition-and-group framework. In In SIGMOD,
pages 593–604, 2007.

References
• Joachim Gudmundsson,AndreasThom, and JanVahrenhold.Of motifs and goals: mining trajectory data. In
Proceedings of the 20th InternationalConference on Advances in Geographic InformationSystems,
SIGSPATIAL ’12, pages 129–138, NewYork, NY, USA, 2012.ACM.
• Yan Huang, Liqin Zhang, and Pusheng Zhang.A framework for mining sequential patterns from spatio-
temporal event data sets. IEEETransactions on Knowledge and Data Engineering, 20(4):433–448, 2008.
• Hoyoung Jeung, Man LungYiu, Xiaofang Zhou,Christian S. Jensen, and HengTao Shen. Discovery of
convoys in trajectory databases. Proc.VLDB Endow.,1(1):1068–1080,August 2008.
• Jae-Gil Lee, Jiawei Han, Xiaolei Li, and Hector Gonzalez.Traclass:Trajectory classification using hierarchical
region-based and trajectory-based clustering. Proc.VLDB Endow., 1(1):1081–1094,August 2008.
• Dhaval Patel, Chang Sheng,Wynne Hsu, and Mong Li Lee. Incorporating duration information for trajectory
classification. In Proceedings of the 2012 IEEE 28th InternationalConference on Data Engineering, ICDE ’12,
pages 1132–1143, 2012.
• MarcosVieira, Petko Bakalov, andVassilisTsotras.On-line discovery of flock patterns in spatio-temporal
data, 2009.

Paper2_CSE6331_Vivek_1001053883

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Paper2_CSE6331_Vivek_1001053883

Similar to Paper2_CSE6331_Vivek_1001053883 (20)

Paper2_CSE6331_Vivek_1001053883