Detect Crowd Levels Using Location Data

•Download as ODP, PDF•

0 likes•204 views

reza-asad

Motivation
● Avoid waiting time in crowded areas.

Data
● Lets imagine we had data about people's location.
● This could be collected form people's cell phones.
● How can we use such data?

Data
● But such data is not available to me ...
● Solution : Engineer the data!
● Take data from yelp
● Perform a random walk

Engineering Challenges
● The area of SF: 46.87 mi ²
● For the purpose of this project each cluster is 0.09 mi ²
● This means k is roughly 500

Engineering Challenges
● Parameters to tune:
– Time it takes to produce the messages
– Processing time for k-means in Spark Streaming
– The update interval for a fixed data point in the
database

Goal
● Tune the parameters in order to have a stable system
● The total delay after processing each batch must be
constant and comparable to the batch interval.
● You can check this in the Spark API

Tackling Challenges
●
Having multiple producers and consumers ✔
● Kafka is fast with sending messages and is not the bottleneck
● Establishing some safe limits:
– Using spark.streaming.receiver.maxRate to control
the input rate ✔
– Understanding the complexity of the process in Spark
Streaming ✔
– Choosing the right batch interval ✔

Data Process
● Data filteration in spark streaming

About Me
● Long time ago - B.S in pure math, University of Toronto
● More recent - M.S in applied math, University of British Columbia
● The exciting now - A data engieer who wants to go camping with other
data engineers

What's hot

Keynote: Scaling Sensu GoSensu Inc.

Kiwi.com Reaches Cruising Altitude with ScyllaScyllaDB

How Sensor Data Can Help Manufacturers Gain Insight to Reduce Waste, Energy C...InfluxData

Why Architecting for Disaster Recovery is Important for Your Time Series Data...InfluxData

Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...Flink Forward

PEARC17: Evaluation of Intel Omni-Path on the Intel Knights Landing ProcessorAntonio Gomez

Session 03 data_migration_at_scale_by_sameerAshish Pandey

Streaming Sensor Data with Grafana and InfluxDB | Ryan Mckinley | GrafanaInfluxData

NodeTime Tool Reviewgs289509

Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overviewFlink Forward

Slack in the Age of PrometheusGeorge Luong

Golang testingGoWitek Consulting Pvt.Ltd

Cassandra Meetup Nov 2019 - Cassandra ResiliencySumanth Pasupuleti

Html5 devconf nodejs_devops_shubhraShubhra Kar

Lambda - Building On-prem GPU Training InfrastructureStephen Balaban

Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...Flink Forward

Flink Forward Berlin 2018: Shriya Arora - "Taming large-state to join dataset...Flink Forward

Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...Flink Forward

What's hot (18)

Keynote: Scaling Sensu Go

Kiwi.com Reaches Cruising Altitude with Scylla

How Sensor Data Can Help Manufacturers Gain Insight to Reduce Waste, Energy C...

Why Architecting for Disaster Recovery is Important for Your Time Series Data...

Flink Forward Berlin 2018: Brian Wolfe - "Upshot: distributed tracing using F...

PEARC17: Evaluation of Intel Omni-Path on the Intel Knights Landing Processor

Session 03 data_migration_at_scale_by_sameer

Streaming Sensor Data with Grafana and InfluxDB | Ryan Mckinley | Grafana

NodeTime Tool Review

Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overview

Slack in the Age of Prometheus

Golang testing

Cassandra Meetup Nov 2019 - Cassandra Resiliency

Html5 devconf nodejs_devops_shubhra

Lambda - Building On-prem GPU Training Infrastructure

Flink Forward Berlin 2017: Francesco Versaci - Integrating Flink and Kafka in...

Flink Forward Berlin 2018: Shriya Arora - "Taming large-state to join dataset...

Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...

Viewers also liked

RideOnMsSophieHowl

Bird FeedEamon Kavanagh

MapMyCabPreetika Kulshrestha

VenmoPlusQingpeng "Q.P." Zhang

Eric Fan Insight Project DemoEric Fan

Traffichelper demoSathya Bettadapura

Sidi chang week_4.3Sidi Chang

Insight Data Engineering projectHoa Nguyen

Insight Data Engineering ProjectAravind Ramesh

Detecting Anomalies in Streaming DataNumenta

Machine learning and Internet of Things, the future of medical preventionPierre Gutierrez

Statistical Learning Based Anomaly Detection @ TwitterArun Kejariwal

Insight Data Engineering: Open source data ingestionTreasure Data, Inc.

Detecting Hacks: Anomaly Detection on Networking DataDataWorks Summit

Tuning and Debugging in Apache SparkDatabricks

Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...DataWorks Summit/Hadoop Summit

Anomaly Detection with Apache SparkCloudera, Inc.

Parquet Strata/Hadoop World, New York 2013Julien Le Dem

Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaSpark Summit

Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.

Viewers also liked (20)

RideOn

Bird Feed

MapMyCab

VenmoPlus

Eric Fan Insight Project Demo

Traffichelper demo

Sidi chang week_4.3

Insight Data Engineering project

Insight Data Engineering Project

Detecting Anomalies in Streaming Data

Machine learning and Internet of Things, the future of medical prevention

Statistical Learning Based Anomaly Detection @ Twitter

Insight Data Engineering: Open source data ingestion

Detecting Hacks: Anomaly Detection on Networking Data

Tuning and Debugging in Apache Spark

Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...

Anomaly Detection with Apache Spark

Parquet Strata/Hadoop World, New York 2013

Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa

Efficient Data Storage for Analytics with Apache Parquet 2.0

Similar to Detect Crowd Levels Using Location Data

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...Codemotion

Scala like distributed collections - dumping time-series data with apache sparkDemi Ben-Ari

Building real time Data Pipeline using Spark Streamingdatamantra

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...Codemotion Tel Aviv

Performance Characterization and Optimization of In-Memory Data Analytics on ...Ahsan Javed Awan

Near Data Computing Architectures: Opportunities and Challenges for Apache SparkAhsan Javed Awan

Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Spark Summit

NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg

Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...Databricks

Auto-Pilot for Apache Spark Using Machine LearningDatabricks

Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Spark Summit

Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA

Seattle Spark Meetup Mobius CSharp APIshareddatamsft

Debugging data pipelines @OLA by Karan KumarShubham Tagra

Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uberconfluent

Profiling & Testing with SparkRoger Rafanell Mas

LCU14 310- Cisco ODP v2Linaro

Developing high frequency indicators using real time tick data on apache supe...Zekeriya Besiroglu

SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKzmhassan

Get Lower Latency and Higher Throughput for Java ApplicationsScyllaDB

Similar to Detect Crowd Levels Using Location Data (20)

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...

Scala like distributed collections - dumping time-series data with apache spark

Building real time Data Pipeline using Spark Streaming

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...

Performance Characterization and Optimization of In-Memory Data Analytics on ...

Near Data Computing Architectures: Opportunities and Challenges for Apache Spark

Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...

NetflixOSS Meetup season 3 episode 1

Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...

Auto-Pilot for Apache Spark Using Machine Learning

Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...

Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...

Seattle Spark Meetup Mobius CSharp API

Debugging data pipelines @OLA by Karan Kumar

Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber

Profiling & Testing with Spark

LCU14 310- Cisco ODP v2

Developing high frequency indicators using real time tick data on apache supe...

SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK

Get Lower Latency and Higher Throughput for Java Applications

Detect Crowd Levels Using Location Data

1. Crowd DetectorCrowd Detector Reza Asad Insight Data Engineering June 2015

2. Motivation ● Avoid waiting time in crowded areas.

3. Data ● Lets imagine we had data about people's location. ● This could be collected form people's cell phones. ● How can we use such data?

4. Naive Approach

5. Demo

6. Data ● But such data is not available to me ... ● Solution : Engineer the data! ● Take data from yelp ● Perform a random walk

7. Pipeline Data

8. Engineering Challenges ● Choosing K?

9. Engineering Challenges ● The area of SF: 46.87 mi ² ● For the purpose of this project each cluster is 0.09 mi ² ● This means k is roughly 500

10. Engineering Challenges ● Parameters to tune: – Time it takes to produce the messages – Processing time for k-means in Spark Streaming – The update interval for a fixed data point in the database

11. Goal ● Tune the parameters in order to have a stable system ● The total delay after processing each batch must be constant and comparable to the batch interval. ● You can check this in the Spark API

12. Tackling Challenges ● Having multiple producers and consumers ✔ ● Kafka is fast with sending messages and is not the bottleneck ● Establishing some safe limits: – Using spark.streaming.receiver.maxRate to control the input rate ✔ – Understanding the complexity of the process in Spark Streaming ✔ – Choosing the right batch interval ✔

13. Raw Data

14. Data Process ● Data filteration in spark streaming

15. Data Process

16. About Me ● Long time ago - B.S in pure math, University of Toronto ● More recent - M.S in applied math, University of British Columbia ● The exciting now - A data engieer who wants to go camping with other data engineers

Detect Crowd Levels Using Location Data

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (20)

Similar to Detect Crowd Levels Using Location Data

Similar to Detect Crowd Levels Using Location Data (20)

Detect Crowd Levels Using Location Data