SlideShare a Scribd company logo
1 of 18
基于用户轨迹数据的
热点区域分析
2016.6.26
第3小组
问题描述
随着智能机的发展,人员的地理位置数据更易收集,发生爆炸式增
长。
针对这些庞大的时空数据的挖掘工作有利于人们出行,比如:热点
景区,餐厅、商超推荐,出租车载客地点推荐等。
本文主要工作为挖掘热点区域。
数据描述
微软亚洲研究院GeoLife GPS轨迹数据
用户:178位
时间:2007年4月至2011年10月
距离:1292951公里
轨迹:17621段
数据密度:每1~5秒或每5~10米上报
字段: 39.980137,116.345113, 347,2008-10-26,16:09:35
(纬度,经度,海拔,日期,时间)
工具
• Python
• Sklearn
• RapidMiner
建模过程
• 停留点(Stay Point)
移动对象具有随机性,需要提取轨迹中的停留点作为轨迹的特征。
P={pm, pm+1, ... , pn} , ∀𝑚<𝑖≤𝑛 , 𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑝𝑚,𝑝𝑖) ≤ 𝐷threh, 𝑝𝑛.𝑇−𝑝𝑚.𝑇 ≥ 𝑇threh
S=(Lat, Lngt, arvT, levT), 𝑠.𝐿𝑎𝑡= 𝑝𝑖.𝐿𝑎𝑡/|𝑃|, 𝑠.𝐿𝑛𝑔𝑡= 𝑝𝑖.𝐿𝑛𝑔𝑡/|𝑃|,
𝑠.𝑎𝑟𝑣𝑇 = 𝑝𝑚.𝑇, 𝑠.𝑙𝑒𝑣𝑇 = 𝑝𝑛.𝑇
p4
p3
p5
p6
p7
A Stay Point S
p1
p2
Latitude, Longitude, Time
p1: Lat1, Lngt1, T1
p2: Lat2, Lngt2, T2
………...
pn: Latn, Lngtn, Tn
演示
停留点的分布情况
DBSCAN
• Density-based spatial clustering of applications with noise
• DBSCAN是一种最常用的基于密度的聚类算法,目的在于过滤低密
度区域,发现稠密度样本点。相比K-means,不需要用户预先指
定聚类的个数,可以发现任意形状的聚类簇。 (K-means 不能发现
非凸形状的簇)
输入:
Eps:半径
MinPts:给定点在Eps领域内成为核心对象的最小领域点数
D:数据集合输出:分簇集合
方法:
repeat
判断输入点是否为核心对象
找出核心对象的E领域中的所有直接密度可达点
until 所有输入点都判断完毕
repeat
针对所有核心对象的E领域所有直接密度可达点找到最大密度相连对象合,中间涉及
到一些密度可达对象的合并。
until 所有核心对象的E领域都遍历完毕
距离度量
• C = sin(MLatA)*sin(MLatB)*cos(MLonA-MLonB) +
cos(MLatA)*cos(MLatB)
• Distance = R*Arccos(C)*Pi/180
模型评估结果
• 没有label数据,无法做precision,recall, F-measure。
• 通过散点图观察
• 轮廓系数(Silhouette Coefficient),结合了凝聚度(Cohesion)和分离
度(Separation),[-1,+1],越大越好。
总结
• 由于使用全局参数Eps,DBSCAN对密度不均匀的数据效果不好
• DBSCAN可以发现任意形状的簇
• 没有label的聚类算法评估困难
参考
• Yu Zheng, Lizhu Zhang, Xing Xie, Wei-Ying Ma. Mining Interesting
Locations and Travel Sequences from GPS Trajectories.
• 张明月. 基于出租车轨迹的载客点与热点区域推荐.
• 唐志博,姜小荣,陈伟. 基于dbscan算法的geolife人员位置分析.
Q&A
THX

More Related Content

What's hot

Microsoft dynamics warehouse management system implementation guide
Microsoft dynamics warehouse management system implementation guideMicrosoft dynamics warehouse management system implementation guide
Microsoft dynamics warehouse management system implementation guide
Prema Arjinajarn
 

What's hot (20)

Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
Digging into the Dirichlet Distribution by Max Sklar
Digging into the Dirichlet Distribution by Max SklarDigging into the Dirichlet Distribution by Max Sklar
Digging into the Dirichlet Distribution by Max Sklar
 
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitations
 
A Brief Introduction of TiDB (Percona Live)
A Brief Introduction of TiDB (Percona Live)A Brief Introduction of TiDB (Percona Live)
A Brief Introduction of TiDB (Percona Live)
 
Autocorrelation- Detection- part 1- Durbin-Watson d test
Autocorrelation- Detection- part 1- Durbin-Watson d testAutocorrelation- Detection- part 1- Durbin-Watson d test
Autocorrelation- Detection- part 1- Durbin-Watson d test
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
 
Granger Causality
Granger CausalityGranger Causality
Granger Causality
 
PostgreSQL Internals (1) for PostgreSQL 9.6 (English)
PostgreSQL Internals (1) for PostgreSQL 9.6 (English)PostgreSQL Internals (1) for PostgreSQL 9.6 (English)
PostgreSQL Internals (1) for PostgreSQL 9.6 (English)
 
MariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsMariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAs
 
CQRS
CQRSCQRS
CQRS
 
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
 
Elasticsearch Operations on K8s - Key Specificities
Elasticsearch Operations on K8s - Key SpecificitiesElasticsearch Operations on K8s - Key Specificities
Elasticsearch Operations on K8s - Key Specificities
 
Reading The Source Code of Presto
Reading The Source Code of PrestoReading The Source Code of Presto
Reading The Source Code of Presto
 
Monte Carlo and Schedule Risk Analysis
Monte Carlo and Schedule Risk AnalysisMonte Carlo and Schedule Risk Analysis
Monte Carlo and Schedule Risk Analysis
 
Migrating Oracle to PostgreSQL
Migrating Oracle to PostgreSQLMigrating Oracle to PostgreSQL
Migrating Oracle to PostgreSQL
 
Time series analysis
Time series analysis Time series analysis
Time series analysis
 
Microsoft dynamics warehouse management system implementation guide
Microsoft dynamics warehouse management system implementation guideMicrosoft dynamics warehouse management system implementation guide
Microsoft dynamics warehouse management system implementation guide
 
Impact of censored data on reliability analysis
Impact of censored data on reliability analysisImpact of censored data on reliability analysis
Impact of censored data on reliability analysis
 
Keynote: Elastic Observability evolution and vision
Keynote: Elastic Observability evolution and visionKeynote: Elastic Observability evolution and vision
Keynote: Elastic Observability evolution and vision
 
Cross validation.pptx
Cross validation.pptxCross validation.pptx
Cross validation.pptx
 

基于用户轨迹数据的热点区域分析