SlideShare a Scribd company logo
1 of 14
Download to read offline
RankMySteps
Rank Number of Steps Taken Daily
Anurag Tiwari
github.com/bigdata2/rankMySteps
Motivation
Real-time daily ranking based on number of steps
taken
Challenge users to beat the top-ranking walkers
rankmysteps.xyz
Motivation
Data
• Synthesized Real-Time Data
• Continuous stream of JSON messages
• Six producer scripts
{"source": "6", "steps": 10, "uuid": 10133, "timestamp": "2016-02-02 18:41:39"}
{"source": "5", "steps": 3, "uuid": 11116, "timestamp": "2016-02-02 18:41:39"}
{"source": "4", "steps": 1, "uuid": 11249, "timestamp": "2016-02-02 18:41:39"}
{"source": "3", "steps": 1, "uuid": 10111, "timestamp": "2016-02-02 18:41:39”}
{"source": "2", "steps": 10, "uuid": 10133, "timestamp": "2016-02-02 18:41:39"}
{"source": "1", "steps": 3, "uuid": 11116, "timestamp": "2016-02-02 18:41:39"}
Data Pipeline
Script 1
Script 2
Script 6
8 m4.xlarge
$1.9 per Hour
Key,Value
Materialized
View
500K events
in 20 Sec
Base
Table
Materialized Views (MV) in
Cassandra 3.0
Clustered
Table
App
Base
Table
Materialized Views (MV) in
Cassandra 3.0
Clustered
Table
App
Base
Table
Clustered
MV
App
SELECT * from MV
Cassandra Schema
User Id
(Partitioning
Key)
Date Total Steps
Base Table
Date
(Partitioning
Key)
User Id
(Clustering
Key)
Total Steps
(Clustering
Key)
Materialized View
Primary Key
Primary Key
Order by Total Steps
Challenges and Learnings
Spark
To avoid a read from Cassandra, I used Spark in-memory
computation on DStream — updateStateByKey(updateFunc)
— Spark workers ran out of memory when scaled up.
Cassandra
Inserted data into two different tables — a base table and a
sorted data table — faced consistency issues.
Anurag Tiwari
• Staff Design Engineer
• Silicon Program Manager
• CM Program Manager
• Member of Technical Staff
• Ph.D. Computer Science and Engineering
BACKUP
Challenges and Learnings
To avoid a read from Cassandra I used Spark in-memory
computation on DStream — updateStateByKey(updateFunc)
DSTREAM
R
D
D
R
D
D
R
D
D
Previous State
R
D
D
R
D
D
R
D
D
R
D
D
R
D
D
5000 records 5M records
updateFunc called on 5M records
Cassandra Schema
CREATE TABLE rank_steps.walkers_steps2 (
user int,
arrival_time text,
num_steps int,
PRIMARY KEY (user, arrival_time)
) WITH CLUSTERING ORDER BY (arrival_time ASC)
CREATE MATERIALIZED VIEW rank_steps.top_walkers8 AS
SELECT arrival_time, num_steps, user
FROM rank_steps.walkers_steps2
WHERE user IS NOT NULL AND num_steps IS NOT NULL
AND arrival_time IS NOT NULL
PRIMARY KEY (arrival_time, num_steps, user)
WITH CLUSTERING ORDER BY (num_steps DESC, user ASC)
Materialized Views (MV) in
Cassandra 3.0
Eliminate the need of data denormalization by developers
— No need to create multiple tables for different queries.
Can be queried as any Cassandra table.
Persistent view — NOT an SQL view.
Automatic propagation of updates from the
base table to MV ensuring eventual consistency.

More Related Content

What's hot

Query Rewriting in RDF Stream Processing
Query Rewriting in RDF Stream ProcessingQuery Rewriting in RDF Stream Processing
Query Rewriting in RDF Stream ProcessingJean-Paul Calbimonte
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsJean-Paul Calbimonte
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...Srinath Perera
 
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016DataStax
 
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...InfluxData
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion
 
Final Presentation IRT - Jingxuan Wei V1.2
Final Presentation  IRT - Jingxuan Wei V1.2Final Presentation  IRT - Jingxuan Wei V1.2
Final Presentation IRT - Jingxuan Wei V1.2JINGXUAN WEI
 
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming AnalyticsDEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming AnalyticsSriskandarajah Suhothayan
 
Time seriesdb influx
Time seriesdb influxTime seriesdb influx
Time seriesdb influxMauro Rainis
 
Storing metrics at scale with Gnocchi
Storing metrics at scale with GnocchiStoring metrics at scale with Gnocchi
Storing metrics at scale with GnocchiGordon Chung
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsSrinath Perera
 

What's hot (11)

Query Rewriting in RDF Stream Processing
Query Rewriting in RDF Stream ProcessingQuery Rewriting in RDF Stream Processing
Query Rewriting in RDF Stream Processing
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of Semantics
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
 
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
 
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
 
Final Presentation IRT - Jingxuan Wei V1.2
Final Presentation  IRT - Jingxuan Wei V1.2Final Presentation  IRT - Jingxuan Wei V1.2
Final Presentation IRT - Jingxuan Wei V1.2
 
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming AnalyticsDEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
 
Time seriesdb influx
Time seriesdb influxTime seriesdb influx
Time seriesdb influx
 
Storing metrics at scale with Gnocchi
Storing metrics at scale with GnocchiStoring metrics at scale with Gnocchi
Storing metrics at scale with Gnocchi
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics Patterns
 

Viewers also liked

Funciones del lenguaje y prototipos text (repaso)
Funciones del lenguaje y prototipos text (repaso)Funciones del lenguaje y prototipos text (repaso)
Funciones del lenguaje y prototipos text (repaso)Mtra. Zoraida Gpe. Mtz
 
Convegno SEEP Viterbo 2010
Convegno SEEP Viterbo 2010Convegno SEEP Viterbo 2010
Convegno SEEP Viterbo 2010Simone Barni
 
Durham Region Real Estate Statistics August 2016
Durham Region Real Estate Statistics August 2016Durham Region Real Estate Statistics August 2016
Durham Region Real Estate Statistics August 2016Paul St. Aubin
 
BJP Election Campain
BJP Election CampainBJP Election Campain
BJP Election Campainpankaj kumar
 
Licencias creative commons
Licencias creative commonsLicencias creative commons
Licencias creative commonsMary Macas
 
Beni Culturali 2.1 Introduzione Os
Beni Culturali 2.1 Introduzione OsBeni Culturali 2.1 Introduzione Os
Beni Culturali 2.1 Introduzione OsCaterina Policaro
 
Il catalogo come learning place
Il catalogo come learning placeIl catalogo come learning place
Il catalogo come learning placeAgnese Galeffi
 
Storytelling: l'Arte del Narrare da Omero al Digitale
Storytelling: l'Arte del Narrare da Omero al DigitaleStorytelling: l'Arte del Narrare da Omero al Digitale
Storytelling: l'Arte del Narrare da Omero al DigitaleMariagrazia Licandro
 
Stella e Simão Mil Folhas
Stella e Simão Mil FolhasStella e Simão Mil Folhas
Stella e Simão Mil Folhasmrvpimenta
 
La catalogazione di videoregistrazioni e filmati
La catalogazione di videoregistrazioni e filmatiLa catalogazione di videoregistrazioni e filmati
La catalogazione di videoregistrazioni e filmatiRomina D'Antoni
 

Viewers also liked (20)

Bali II
Bali IIBali II
Bali II
 
Nerddit Demo Presentation
Nerddit Demo PresentationNerddit Demo Presentation
Nerddit Demo Presentation
 
Funciones del lenguaje y prototipos text (repaso)
Funciones del lenguaje y prototipos text (repaso)Funciones del lenguaje y prototipos text (repaso)
Funciones del lenguaje y prototipos text (repaso)
 
5th Nov '08
5th Nov '085th Nov '08
5th Nov '08
 
Convegno SEEP Viterbo 2010
Convegno SEEP Viterbo 2010Convegno SEEP Viterbo 2010
Convegno SEEP Viterbo 2010
 
Trabajo informatica
Trabajo informatica Trabajo informatica
Trabajo informatica
 
Durham Region Real Estate Statistics August 2016
Durham Region Real Estate Statistics August 2016Durham Region Real Estate Statistics August 2016
Durham Region Real Estate Statistics August 2016
 
Ahmad Syahidi B Che Zainal CV
Ahmad Syahidi B Che Zainal CVAhmad Syahidi B Che Zainal CV
Ahmad Syahidi B Che Zainal CV
 
BJP Election Campain
BJP Election CampainBJP Election Campain
BJP Election Campain
 
SMKASAS & MUHI
SMKASAS & MUHISMKASAS & MUHI
SMKASAS & MUHI
 
Guiaymanua7
Guiaymanua7Guiaymanua7
Guiaymanua7
 
Licencias creative commons
Licencias creative commonsLicencias creative commons
Licencias creative commons
 
Beni Culturali 2.1 Introduzione Os
Beni Culturali 2.1 Introduzione OsBeni Culturali 2.1 Introduzione Os
Beni Culturali 2.1 Introduzione Os
 
rss
rssrss
rss
 
Il catalogo come learning place
Il catalogo come learning placeIl catalogo come learning place
Il catalogo come learning place
 
Storytelling: l'Arte del Narrare da Omero al Digitale
Storytelling: l'Arte del Narrare da Omero al DigitaleStorytelling: l'Arte del Narrare da Omero al Digitale
Storytelling: l'Arte del Narrare da Omero al Digitale
 
Stella e Simão Mil Folhas
Stella e Simão Mil FolhasStella e Simão Mil Folhas
Stella e Simão Mil Folhas
 
La catalogazione di videoregistrazioni e filmati
La catalogazione di videoregistrazioni e filmatiLa catalogazione di videoregistrazioni e filmati
La catalogazione di videoregistrazioni e filmati
 
Metal Semi-Conductor Junctions
Metal Semi-Conductor JunctionsMetal Semi-Conductor Junctions
Metal Semi-Conductor Junctions
 
18 el vidrio
18 el vidrio18 el vidrio
18 el vidrio
 

Similar to Rank mysteps demo

MongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB
 
MongoDB for Time Series Data
MongoDB for Time Series DataMongoDB for Time Series Data
MongoDB for Time Series DataMongoDB
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jugGerald Muecke
 
Timeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaTimeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaOCoderFest
 
Making sense of your data
Making sense of your dataMaking sense of your data
Making sense of your dataGerald Muecke
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAdrian Hornsby
 
Physical Design for Non-Relational Data Systems
Physical Design for Non-Relational Data SystemsPhysical Design for Non-Relational Data Systems
Physical Design for Non-Relational Data SystemsMichael Mior
 
Clickstream data with spark
Clickstream data with sparkClickstream data with spark
Clickstream data with sparkMarissa Saunders
 
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisTeradata Aster
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analyticsAnirudh
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAmazon Web Services
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged ApplicationsMapR Technologies
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleSriram Krishnan
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxData Con LA
 
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion StoicaRISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion StoicaSpark Summit
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsJen Aman
 
Using Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech IndustryUsing Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech IndustryStanka Dalekova
 
Using Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech IndustryUsing Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech IndustryStanka Dalekova
 
[Webinar] Introduction to Cypher
[Webinar] Introduction to Cypher[Webinar] Introduction to Cypher
[Webinar] Introduction to CypherNeo4j
 

Similar to Rank mysteps demo (20)

MongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor Management
 
MongoDB for Time Series Data
MongoDB for Time Series DataMongoDB for Time Series Data
MongoDB for Time Series Data
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jug
 
Timeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaTimeseries - data visualization in Grafana
Timeseries - data visualization in Grafana
 
Making sense of your data
Making sense of your dataMaking sense of your data
Making sense of your data
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
 
Physical Design for Non-Relational Data Systems
Physical Design for Non-Relational Data SystemsPhysical Design for Non-Relational Data Systems
Physical Design for Non-Relational Data Systems
 
Presentation
PresentationPresentation
Presentation
 
Clickstream data with spark
Clickstream data with sparkClickstream data with spark
Clickstream data with spark
 
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and Analysis
 
Real time streaming analytics
Real time streaming analyticsReal time streaming analytics
Real time streaming analytics
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged Applications
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
 
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion StoicaRISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
 
Using Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech IndustryUsing Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech Industry
 
Using Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech IndustryUsing Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech Industry
 
[Webinar] Introduction to Cypher
[Webinar] Introduction to Cypher[Webinar] Introduction to Cypher
[Webinar] Introduction to Cypher
 

Recently uploaded

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 

Recently uploaded (20)

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 

Rank mysteps demo

  • 1. RankMySteps Rank Number of Steps Taken Daily Anurag Tiwari github.com/bigdata2/rankMySteps
  • 2. Motivation Real-time daily ranking based on number of steps taken Challenge users to beat the top-ranking walkers rankmysteps.xyz
  • 4. Data • Synthesized Real-Time Data • Continuous stream of JSON messages • Six producer scripts {"source": "6", "steps": 10, "uuid": 10133, "timestamp": "2016-02-02 18:41:39"} {"source": "5", "steps": 3, "uuid": 11116, "timestamp": "2016-02-02 18:41:39"} {"source": "4", "steps": 1, "uuid": 11249, "timestamp": "2016-02-02 18:41:39"} {"source": "3", "steps": 1, "uuid": 10111, "timestamp": "2016-02-02 18:41:39”} {"source": "2", "steps": 10, "uuid": 10133, "timestamp": "2016-02-02 18:41:39"} {"source": "1", "steps": 3, "uuid": 11116, "timestamp": "2016-02-02 18:41:39"}
  • 5. Data Pipeline Script 1 Script 2 Script 6 8 m4.xlarge $1.9 per Hour Key,Value Materialized View 500K events in 20 Sec
  • 6. Base Table Materialized Views (MV) in Cassandra 3.0 Clustered Table App
  • 7. Base Table Materialized Views (MV) in Cassandra 3.0 Clustered Table App Base Table Clustered MV App SELECT * from MV
  • 8. Cassandra Schema User Id (Partitioning Key) Date Total Steps Base Table Date (Partitioning Key) User Id (Clustering Key) Total Steps (Clustering Key) Materialized View Primary Key Primary Key Order by Total Steps
  • 9. Challenges and Learnings Spark To avoid a read from Cassandra, I used Spark in-memory computation on DStream — updateStateByKey(updateFunc) — Spark workers ran out of memory when scaled up. Cassandra Inserted data into two different tables — a base table and a sorted data table — faced consistency issues.
  • 10. Anurag Tiwari • Staff Design Engineer • Silicon Program Manager • CM Program Manager • Member of Technical Staff • Ph.D. Computer Science and Engineering
  • 12. Challenges and Learnings To avoid a read from Cassandra I used Spark in-memory computation on DStream — updateStateByKey(updateFunc) DSTREAM R D D R D D R D D Previous State R D D R D D R D D R D D R D D 5000 records 5M records updateFunc called on 5M records
  • 13. Cassandra Schema CREATE TABLE rank_steps.walkers_steps2 ( user int, arrival_time text, num_steps int, PRIMARY KEY (user, arrival_time) ) WITH CLUSTERING ORDER BY (arrival_time ASC) CREATE MATERIALIZED VIEW rank_steps.top_walkers8 AS SELECT arrival_time, num_steps, user FROM rank_steps.walkers_steps2 WHERE user IS NOT NULL AND num_steps IS NOT NULL AND arrival_time IS NOT NULL PRIMARY KEY (arrival_time, num_steps, user) WITH CLUSTERING ORDER BY (num_steps DESC, user ASC)
  • 14. Materialized Views (MV) in Cassandra 3.0 Eliminate the need of data denormalization by developers — No need to create multiple tables for different queries. Can be queried as any Cassandra table. Persistent view — NOT an SQL view. Automatic propagation of updates from the base table to MV ensuring eventual consistency.