SlideShare a Scribd company logo
Hung Tien Tran, Hiep Tuan Nguyen, Viet-Trung Tran
Hanoi University of Science and Technology
Introduction
 What is Geographically Weighted Regression?
 What is our work?
Source: http://desktop.arcgis.com
GWR + =
- Large-scale spatial data
- Improve performance
- Distributed
Outline
 Background
 Problem
 Scalable GWR on Spark
 Experiments
 Discussion
 Conclusion
Background
 First Law of Geography - Waldo Tobler:
“Everything is related with everything else, but closer
things are more related”.
 Model GWR
 The OLS estimator takes the form
yi (u) = β0i (u) + β1i (u)x1i +β2i (u)x2i + ... + βmi (u)xmi
βˆ(u) = (X TW (u)X )−1 X TW (u)Y
Background
 Kernel function
 Gaussian function
 Bandwidth
5
fixed bandwidth adaptive bandwidth
Problem
 Estimating a local model
 Bandwidth selection
 Evaluation model
Choose kernel function
βˆ(u) = (X TW (u)X )−1 X TW (u)Y
Source: http://rose.bris.ac.uk
O(n3)
Which bandwidth is good
Problem
 How to apply the model for large-scale data?
 Data points
 Features
 Regression points
Large-Scale GWR on Spark
 Why is Spark?
 In-memory cluster-computing platform
 Support parallel programming
 Develop applications by high-level APIs
 Provides resilient distributed datasets and parallel
operations
 Integration with other components on Spark
Large-Scale GWR on Spark
 We propose three approach to scaling GWR
 Scaling Weighted Linear Regression
 Parallel Multiple WLR models
 Parallel Geographically Weighted Regression (combine
the first two approach)
Scalable GWR on Spark
 Naïve approach – Scaling Weighted Linear Regression
Foreach regPoint
Compute weight
Fit Weighted
Linear
Regression
Summary model
Compute weight
parallel
Compute WLR
model parallel
Scalable GWR on Spark
 Naïve approach
Scalable GWR on Spark
 Parallel Multiple WLR models
Regression dataset
Training dataset
WL
R
Compute weight
WL
R
Compute parallel
multiple WLR
models
Summary
Scalable GWR on Spark
 Parallel Multiple WLR models
Scalable GWR on Spark
 Parallel Geographically Weighted Regression
R
R
R
T
T
T
R
T
R
T
R
T
Regressio
n dataset
Training
dataset
Combin
e dataset
Distributed GWR Computation
Scalable GWR on Spark
 Parallel Geographically Weighted Regression
Scalable GWR on Spark
 Parallel Geographically Weighted Regression
Experiments
 Environment
 Cluster: 8 nodes on Amazon Web Service
 4 cores Inte Xeon E5-2670 v2 2.5 GHz
 16 GB RAM, 2x40 GB SSD
 Hadoop 2.7.2 and Spark 1.6.1
 Dataset
| − −x : double(nullable = false)
| − −y : double(nullable = false)
| − −label : double(nullable = false)
| − −f eatures : vector(nullable = false)
Experiments
 Testing large training dataset
0
200
400
600
800
1000
1200
10000 100000 1000000 2000000 5000000
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
time (sec).
Number of training points
Experiments
 Testing large regression dataset
0
200
400
600
800
1000
1200
1000 5000 10000 20000 50000
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
time
(sec).
Number of regression
points
Experiments
 Testing large dataset with increasing number of
features
0
200
400
600
800
1000
1200
1400
1600
1800
10 20 50 100 200
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
time
(sec).
Number of regression
points
Experiments
 Cluster
0
200
400
600
800
1000
1200
1400
1600
1800
2000
2-node 4-node 8-node
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
time (sec).
Number of nodes
Discussion
 Related work
 Many library GWR on local
 Spgwr (multiR on GRID)
 Using GPU
 Our work
 First study distributed GWR on Spark
 Easy deployment and the advantages of Spark
 Scalable and work well on cluster
Conclusion
 We have
 Propose three approach
 Implement four algorithms base on Spark
 Evaluate our implementation
 Future work
 Improve performance by using Pipeline and Partitions
 Release as open-source library
Large-Scale Geographically Weighted Regression on Spark

More Related Content

What's hot

TYBSC IT PGIS Unit III Chapter I Spatial Referencing and Positioning
TYBSC IT PGIS Unit III Chapter I Spatial Referencing and PositioningTYBSC IT PGIS Unit III Chapter I Spatial Referencing and Positioning
TYBSC IT PGIS Unit III Chapter I Spatial Referencing and Positioning
Arti Parab Academics
 
GIS data structure
GIS data structureGIS data structure
GIS data structure
Thana Chirapiwat
 
GIS data analysis
GIS data analysisGIS data analysis
GIS data analysis
Arindam Sarkar
 
Gis applications
Gis applicationsGis applications
Gis applications
Kisesa Hamis
 
A Journey to the World of GIS
A Journey to the World of GISA Journey to the World of GIS
A Journey to the World of GIS
Nishant Sinha
 
Introduction of gps global navigation satellite systems
Introduction of gps   global navigation satellite systems Introduction of gps   global navigation satellite systems
Introduction of gps global navigation satellite systems DocumentStory
 
Spatial analysis and modeling
Spatial analysis and modelingSpatial analysis and modeling
Spatial analysis and modelingTolasa_F
 
Geographical information system
Geographical information systemGeographical information system
Geographical information system
Bipin Karki
 
Geospatial machine learning for urban development
Geospatial machine learning for urban developmentGeospatial machine learning for urban development
Geospatial machine learning for urban development
MLconf
 
GEOSTATISTICAL_ANALYST
GEOSTATISTICAL_ANALYSTGEOSTATISTICAL_ANALYST
GEOSTATISTICAL_ANALYST
Putu Santikayasa
 
WEB GIS AND WEB MAP.pptx
WEB GIS AND WEB MAP.pptxWEB GIS AND WEB MAP.pptx
WEB GIS AND WEB MAP.pptx
Asim Pt
 
GIS Data Quality
GIS Data QualityGIS Data Quality
GIS Data Quality
Dr. Zahir Ali
 
spatial databases ADBMS ppt
spatial databases ADBMS pptspatial databases ADBMS ppt
spatial databases ADBMS ppt
RitaThakkar1
 
Remote Sensing ppt
Remote Sensing pptRemote Sensing ppt
Remote Sensing ppt
Ravina Dadhich
 
History of GIS
History of GISHistory of GIS
History of GIS
Walter Simonazzi
 
geo spatial data and its types.pptx
geo spatial data and its types.pptxgeo spatial data and its types.pptx
geo spatial data and its types.pptx
lovezalodhi
 

What's hot (20)

Introduction to GIS
Introduction to GISIntroduction to GIS
Introduction to GIS
 
TYBSC IT PGIS Unit III Chapter I Spatial Referencing and Positioning
TYBSC IT PGIS Unit III Chapter I Spatial Referencing and PositioningTYBSC IT PGIS Unit III Chapter I Spatial Referencing and Positioning
TYBSC IT PGIS Unit III Chapter I Spatial Referencing and Positioning
 
70.mobile gis
70.mobile gis70.mobile gis
70.mobile gis
 
GIS data structure
GIS data structureGIS data structure
GIS data structure
 
GIS data analysis
GIS data analysisGIS data analysis
GIS data analysis
 
Web GIS
Web GISWeb GIS
Web GIS
 
Introduction to gis
Introduction to gisIntroduction to gis
Introduction to gis
 
Gis applications
Gis applicationsGis applications
Gis applications
 
A Journey to the World of GIS
A Journey to the World of GISA Journey to the World of GIS
A Journey to the World of GIS
 
Introduction of gps global navigation satellite systems
Introduction of gps   global navigation satellite systems Introduction of gps   global navigation satellite systems
Introduction of gps global navigation satellite systems
 
Spatial analysis and modeling
Spatial analysis and modelingSpatial analysis and modeling
Spatial analysis and modeling
 
Geographical information system
Geographical information systemGeographical information system
Geographical information system
 
Geospatial machine learning for urban development
Geospatial machine learning for urban developmentGeospatial machine learning for urban development
Geospatial machine learning for urban development
 
GEOSTATISTICAL_ANALYST
GEOSTATISTICAL_ANALYSTGEOSTATISTICAL_ANALYST
GEOSTATISTICAL_ANALYST
 
WEB GIS AND WEB MAP.pptx
WEB GIS AND WEB MAP.pptxWEB GIS AND WEB MAP.pptx
WEB GIS AND WEB MAP.pptx
 
GIS Data Quality
GIS Data QualityGIS Data Quality
GIS Data Quality
 
spatial databases ADBMS ppt
spatial databases ADBMS pptspatial databases ADBMS ppt
spatial databases ADBMS ppt
 
Remote Sensing ppt
Remote Sensing pptRemote Sensing ppt
Remote Sensing ppt
 
History of GIS
History of GISHistory of GIS
History of GIS
 
geo spatial data and its types.pptx
geo spatial data and its types.pptxgeo spatial data and its types.pptx
geo spatial data and its types.pptx
 

Viewers also liked

giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case study
Viet-Trung TRAN
 
Exploring housing patterns and dynamics in low demand neighbourhoods using Ge...
Exploring housing patterns and dynamics in low demand neighbourhoods using Ge...Exploring housing patterns and dynamics in low demand neighbourhoods using Ge...
Exploring housing patterns and dynamics in low demand neighbourhoods using Ge...
Graham Squires
 
Time Series
Time SeriesTime Series
Time Series
STATISTIKA ITS
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
Viet-Trung TRAN
 
Neural Networks for OCR
Neural Networks for OCRNeural Networks for OCR
Neural Networks for OCR
David Stark
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
Viet-Trung TRAN
 
Giasan.vn @rstars
Giasan.vn @rstarsGiasan.vn @rstars
Giasan.vn @rstars
Viet-Trung TRAN
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
Viet-Trung TRAN
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposals
Viet-Trung TRAN
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learning
Viet-Trung TRAN
 
Deep Learning Class #3 - Take Two LSTMs
Deep Learning Class #3 - Take Two LSTMsDeep Learning Class #3 - Take Two LSTMs
Deep Learning Class #3 - Take Two LSTMs
Holberton School
 
3 - Finding similar items
3 - Finding similar items3 - Finding similar items
3 - Finding similar items
Viet-Trung TRAN
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
Viet-Trung TRAN
 
Tamil OCR using Tesseract OCR Engine
Tamil OCR using Tesseract OCR EngineTamil OCR using Tesseract OCR Engine
Tamil OCR using Tesseract OCR Engine
balamurugan.k Kalibalamurugan
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Viet-Trung TRAN
 
ABC ELP Program - Innovation in government
ABC ELP Program - Innovation in governmentABC ELP Program - Innovation in government
ABC ELP Program - Innovation in government
Anne-Marie Elias
 
Video Encoding and HTML5 Playback With Native DRM
Video Encoding and HTML5 Playback With Native DRMVideo Encoding and HTML5 Playback With Native DRM
Video Encoding and HTML5 Playback With Native DRM
Stefan Lederer
 
"Year of the Selfie" [INFOGRAPHIC]
"Year of the Selfie" [INFOGRAPHIC]"Year of the Selfie" [INFOGRAPHIC]
"Year of the Selfie" [INFOGRAPHIC]
Unmetric
 
Living Wall - Arabic
Living Wall - ArabicLiving Wall - Arabic
Living Wall - Arabic
Yousef Taibeh
 

Viewers also liked (20)

giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case study
 
Exploring housing patterns and dynamics in low demand neighbourhoods using Ge...
Exploring housing patterns and dynamics in low demand neighbourhoods using Ge...Exploring housing patterns and dynamics in low demand neighbourhoods using Ge...
Exploring housing patterns and dynamics in low demand neighbourhoods using Ge...
 
Time Series
Time SeriesTime Series
Time Series
 
Riset Sosial
Riset SosialRiset Sosial
Riset Sosial
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
Neural Networks for OCR
Neural Networks for OCRNeural Networks for OCR
Neural Networks for OCR
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
 
Giasan.vn @rstars
Giasan.vn @rstarsGiasan.vn @rstars
Giasan.vn @rstars
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposals
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learning
 
Deep Learning Class #3 - Take Two LSTMs
Deep Learning Class #3 - Take Two LSTMsDeep Learning Class #3 - Take Two LSTMs
Deep Learning Class #3 - Take Two LSTMs
 
3 - Finding similar items
3 - Finding similar items3 - Finding similar items
3 - Finding similar items
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Tamil OCR using Tesseract OCR Engine
Tamil OCR using Tesseract OCR EngineTamil OCR using Tesseract OCR Engine
Tamil OCR using Tesseract OCR Engine
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
 
ABC ELP Program - Innovation in government
ABC ELP Program - Innovation in governmentABC ELP Program - Innovation in government
ABC ELP Program - Innovation in government
 
Video Encoding and HTML5 Playback With Native DRM
Video Encoding and HTML5 Playback With Native DRMVideo Encoding and HTML5 Playback With Native DRM
Video Encoding and HTML5 Playback With Native DRM
 
"Year of the Selfie" [INFOGRAPHIC]
"Year of the Selfie" [INFOGRAPHIC]"Year of the Selfie" [INFOGRAPHIC]
"Year of the Selfie" [INFOGRAPHIC]
 
Living Wall - Arabic
Living Wall - ArabicLiving Wall - Arabic
Living Wall - Arabic
 

Similar to Large-Scale Geographically Weighted Regression on Spark

Processing Reachability Queries with Realistic Constraints on Massive Network...
Processing Reachability Queries with Realistic Constraints on Massive Network...Processing Reachability Queries with Realistic Constraints on Massive Network...
Processing Reachability Queries with Realistic Constraints on Massive Network...
BigMine
 
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
Ganesan Narayanasamy
 
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Sangmin Park
 
Introduction to Chainer Chemistry
Introduction to Chainer ChemistryIntroduction to Chainer Chemistry
Introduction to Chainer Chemistry
Preferred Networks
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
Sara Asher
 
How to Layer a Directed Acyclic Graph (GD 2001)
How to Layer a Directed Acyclic Graph (GD 2001)How to Layer a Directed Acyclic Graph (GD 2001)
How to Layer a Directed Acyclic Graph (GD 2001)
Nikola S. Nikolov
 
"An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ..."An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ...butest
 
EAGE_prsentation_Anderson.pptx
EAGE_prsentation_Anderson.pptxEAGE_prsentation_Anderson.pptx
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer PerceptronsESCOM
 
PAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningPAC Bayesian for Deep Learning
PAC Bayesian for Deep Learning
Mark Chang
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
Fabian Pedregosa
 
Paper.pdf
Paper.pdfPaper.pdf
Paper.pdf
DavCla1
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
Mark Chang
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
Mark Chang
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Soma Boubou
 
Implementation of the fully adaptive radar framework: Practical limitations
Implementation of the fully adaptive radar framework: Practical limitationsImplementation of the fully adaptive radar framework: Practical limitations
Implementation of the fully adaptive radar framework: Practical limitations
Luis Úbeda Medina
 
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...Cemal Ardil
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOps
Pooyan Jamshidi
 
Imecs2012 pp440 445
Imecs2012 pp440 445Imecs2012 pp440 445
Imecs2012 pp440 445Rasha Orban
 
User biglm
User biglmUser biglm
User biglm
johnatan pladott
 

Similar to Large-Scale Geographically Weighted Regression on Spark (20)

Processing Reachability Queries with Realistic Constraints on Massive Network...
Processing Reachability Queries with Realistic Constraints on Massive Network...Processing Reachability Queries with Realistic Constraints on Massive Network...
Processing Reachability Queries with Realistic Constraints on Massive Network...
 
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
 
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
 
Introduction to Chainer Chemistry
Introduction to Chainer ChemistryIntroduction to Chainer Chemistry
Introduction to Chainer Chemistry
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
 
How to Layer a Directed Acyclic Graph (GD 2001)
How to Layer a Directed Acyclic Graph (GD 2001)How to Layer a Directed Acyclic Graph (GD 2001)
How to Layer a Directed Acyclic Graph (GD 2001)
 
"An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ..."An adaptive modular approach to the mining of sensor network ...
"An adaptive modular approach to the mining of sensor network ...
 
EAGE_prsentation_Anderson.pptx
EAGE_prsentation_Anderson.pptxEAGE_prsentation_Anderson.pptx
EAGE_prsentation_Anderson.pptx
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
PAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningPAC Bayesian for Deep Learning
PAC Bayesian for Deep Learning
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
 
Paper.pdf
Paper.pdfPaper.pdf
Paper.pdf
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
 
Implementation of the fully adaptive radar framework: Practical limitations
Implementation of the fully adaptive radar framework: Practical limitationsImplementation of the fully adaptive radar framework: Practical limitations
Implementation of the fully adaptive radar framework: Practical limitations
 
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
 
Machine Learning meets DevOps
Machine Learning meets DevOpsMachine Learning meets DevOps
Machine Learning meets DevOps
 
Imecs2012 pp440 445
Imecs2012 pp440 445Imecs2012 pp440 445
Imecs2012 pp440 445
 
User biglm
User biglmUser biglm
User biglm
 

More from Viet-Trung TRAN

Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Viet-Trung TRAN
 
Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store
Viet-Trung TRAN
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớn
Viet-Trung TRAN
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processing
Viet-Trung TRAN
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Viet-Trung TRAN
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
Viet-Trung TRAN
 
GPSinsights poster
GPSinsights posterGPSinsights poster
GPSinsights poster
Viet-Trung TRAN
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
Viet-Trung TRAN
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015
Viet-Trung TRAN
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
Viet-Trung TRAN
 
Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applications
Viet-Trung TRAN
 
Introduction to mining massive datasets
Introduction to mining massive datasetsIntroduction to mining massive datasets
Introduction to mining massive datasets
Viet-Trung TRAN
 
6 clustering
6 clustering6 clustering
6 clustering
Viet-Trung TRAN
 
2 association rules
2 association rules2 association rules
2 association rules
Viet-Trung TRAN
 
Tachyon memory centric, fault tolerance storage for cluster framworks
Tachyon  memory centric, fault tolerance storage for cluster framworksTachyon  memory centric, fault tolerance storage for cluster framworks
Tachyon memory centric, fault tolerance storage for cluster framworks
Viet-Trung TRAN
 
Interactive big data analytics
Interactive big data analyticsInteractive big data analytics
Interactive big data analytics
Viet-Trung TRAN
 
Hệ thống phân tích tình trạng giao thông: Ứng dụng công cụ xử lý dữ liệu lớn...
Hệ thống phân tích tình trạng giao thông:  Ứng dụng công cụ xử lý dữ liệu lớn...Hệ thống phân tích tình trạng giao thông:  Ứng dụng công cụ xử lý dữ liệu lớn...
Hệ thống phân tích tình trạng giao thông: Ứng dụng công cụ xử lý dữ liệu lớn...
Viet-Trung TRAN
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
Viet-Trung TRAN
 
Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computing
Viet-Trung TRAN
 
Vanilla Hadoop vs. the rest
Vanilla Hadoop vs. the rest Vanilla Hadoop vs. the rest
Vanilla Hadoop vs. the rest
Viet-Trung TRAN
 

More from Viet-Trung TRAN (20)

Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
 
Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớn
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processing
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
GPSinsights poster
GPSinsights posterGPSinsights poster
GPSinsights poster
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
 
Dimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applicationsDimensionality reduction: SVD and its applications
Dimensionality reduction: SVD and its applications
 
Introduction to mining massive datasets
Introduction to mining massive datasetsIntroduction to mining massive datasets
Introduction to mining massive datasets
 
6 clustering
6 clustering6 clustering
6 clustering
 
2 association rules
2 association rules2 association rules
2 association rules
 
Tachyon memory centric, fault tolerance storage for cluster framworks
Tachyon  memory centric, fault tolerance storage for cluster framworksTachyon  memory centric, fault tolerance storage for cluster framworks
Tachyon memory centric, fault tolerance storage for cluster framworks
 
Interactive big data analytics
Interactive big data analyticsInteractive big data analytics
Interactive big data analytics
 
Hệ thống phân tích tình trạng giao thông: Ứng dụng công cụ xử lý dữ liệu lớn...
Hệ thống phân tích tình trạng giao thông:  Ứng dụng công cụ xử lý dữ liệu lớn...Hệ thống phân tích tình trạng giao thông:  Ứng dụng công cụ xử lý dữ liệu lớn...
Hệ thống phân tích tình trạng giao thông: Ứng dụng công cụ xử lý dữ liệu lớn...
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
Overview of big data in cloud computing
Overview of big data in cloud computingOverview of big data in cloud computing
Overview of big data in cloud computing
 
Vanilla Hadoop vs. the rest
Vanilla Hadoop vs. the rest Vanilla Hadoop vs. the rest
Vanilla Hadoop vs. the rest
 

Recently uploaded

一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 

Recently uploaded (20)

一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 

Large-Scale Geographically Weighted Regression on Spark

  • 1. Hung Tien Tran, Hiep Tuan Nguyen, Viet-Trung Tran Hanoi University of Science and Technology
  • 2. Introduction  What is Geographically Weighted Regression?  What is our work? Source: http://desktop.arcgis.com GWR + = - Large-scale spatial data - Improve performance - Distributed
  • 3. Outline  Background  Problem  Scalable GWR on Spark  Experiments  Discussion  Conclusion
  • 4. Background  First Law of Geography - Waldo Tobler: “Everything is related with everything else, but closer things are more related”.  Model GWR  The OLS estimator takes the form yi (u) = β0i (u) + β1i (u)x1i +β2i (u)x2i + ... + βmi (u)xmi βˆ(u) = (X TW (u)X )−1 X TW (u)Y
  • 5. Background  Kernel function  Gaussian function  Bandwidth 5 fixed bandwidth adaptive bandwidth
  • 6. Problem  Estimating a local model  Bandwidth selection  Evaluation model Choose kernel function βˆ(u) = (X TW (u)X )−1 X TW (u)Y Source: http://rose.bris.ac.uk O(n3) Which bandwidth is good
  • 7. Problem  How to apply the model for large-scale data?  Data points  Features  Regression points
  • 8. Large-Scale GWR on Spark  Why is Spark?  In-memory cluster-computing platform  Support parallel programming  Develop applications by high-level APIs  Provides resilient distributed datasets and parallel operations  Integration with other components on Spark
  • 9. Large-Scale GWR on Spark  We propose three approach to scaling GWR  Scaling Weighted Linear Regression  Parallel Multiple WLR models  Parallel Geographically Weighted Regression (combine the first two approach)
  • 10. Scalable GWR on Spark  Naïve approach – Scaling Weighted Linear Regression Foreach regPoint Compute weight Fit Weighted Linear Regression Summary model Compute weight parallel Compute WLR model parallel
  • 11. Scalable GWR on Spark  Naïve approach
  • 12. Scalable GWR on Spark  Parallel Multiple WLR models Regression dataset Training dataset WL R Compute weight WL R Compute parallel multiple WLR models Summary
  • 13. Scalable GWR on Spark  Parallel Multiple WLR models
  • 14. Scalable GWR on Spark  Parallel Geographically Weighted Regression R R R T T T R T R T R T Regressio n dataset Training dataset Combin e dataset Distributed GWR Computation
  • 15. Scalable GWR on Spark  Parallel Geographically Weighted Regression
  • 16. Scalable GWR on Spark  Parallel Geographically Weighted Regression
  • 17. Experiments  Environment  Cluster: 8 nodes on Amazon Web Service  4 cores Inte Xeon E5-2670 v2 2.5 GHz  16 GB RAM, 2x40 GB SSD  Hadoop 2.7.2 and Spark 1.6.1  Dataset | − −x : double(nullable = false) | − −y : double(nullable = false) | − −label : double(nullable = false) | − −f eatures : vector(nullable = false)
  • 18. Experiments  Testing large training dataset 0 200 400 600 800 1000 1200 10000 100000 1000000 2000000 5000000 Algorithm 1 Algorithm 2 Algorithm 3 Algorithm 4 time (sec). Number of training points
  • 19. Experiments  Testing large regression dataset 0 200 400 600 800 1000 1200 1000 5000 10000 20000 50000 Algorithm 1 Algorithm 2 Algorithm 3 Algorithm 4 time (sec). Number of regression points
  • 20. Experiments  Testing large dataset with increasing number of features 0 200 400 600 800 1000 1200 1400 1600 1800 10 20 50 100 200 Algorithm 1 Algorithm 2 Algorithm 3 Algorithm 4 time (sec). Number of regression points
  • 21. Experiments  Cluster 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2-node 4-node 8-node Algorithm 1 Algorithm 2 Algorithm 3 Algorithm 4 time (sec). Number of nodes
  • 22. Discussion  Related work  Many library GWR on local  Spgwr (multiR on GRID)  Using GPU  Our work  First study distributed GWR on Spark  Easy deployment and the advantages of Spark  Scalable and work well on cluster
  • 23. Conclusion  We have  Propose three approach  Implement four algorithms base on Spark  Evaluate our implementation  Future work  Improve performance by using Pipeline and Partitions  Release as open-source library

Editor's Notes

  1. Scalability , Performance User-friendly APIs