Rank mysteps demo

•

0 likes•294 views

Anurag Tiwari

Rank mystics demo

Engineering

RankMySteps
Rank Number of Steps Taken Daily
Anurag Tiwari
github.com/bigdata2/rankMySteps

Motivation
Real-time daily ranking based on number of steps
taken
Challenge users to beat the top-ranking walkers
rankmysteps.xyz

Data
• Synthesized Real-Time Data
• Continuous stream of JSON messages
• Six producer scripts
{"source": "6", "steps": 10, "uuid": 10133, "timestamp": "2016-02-02 18:41:39"}
{"source": "5", "steps": 3, "uuid": 11116, "timestamp": "2016-02-02 18:41:39"}
{"source": "4", "steps": 1, "uuid": 11249, "timestamp": "2016-02-02 18:41:39"}
{"source": "3", "steps": 1, "uuid": 10111, "timestamp": "2016-02-02 18:41:39”}
{"source": "2", "steps": 10, "uuid": 10133, "timestamp": "2016-02-02 18:41:39"}
{"source": "1", "steps": 3, "uuid": 11116, "timestamp": "2016-02-02 18:41:39"}

Data Pipeline
Script 1
Script 2
Script 6
8 m4.xlarge
$1.9 per Hour
Key,Value
Materialized
View
500K events
in 20 Sec

Base
Table
Materialized Views (MV) in
Cassandra 3.0
Clustered
Table
App

Base
Table
Materialized Views (MV) in
Cassandra 3.0
Clustered
Table
App
Base
Table
Clustered
MV
App
SELECT * from MV

Cassandra Schema
User Id
(Partitioning
Key)
Date Total Steps
Base Table
Date
(Partitioning
Key)
User Id
(Clustering
Key)
Total Steps
(Clustering
Key)
Materialized View
Primary Key
Primary Key
Order by Total Steps

Challenges and Learnings
Spark
To avoid a read from Cassandra, I used Spark in-memory
computation on DStream — updateStateByKey(updateFunc)
— Spark workers ran out of memory when scaled up.
Cassandra
Inserted data into two different tables — a base table and a
sorted data table — faced consistency issues.

Anurag Tiwari
• Staff Design Engineer
• Silicon Program Manager
• CM Program Manager
• Member of Technical Staff
• Ph.D. Computer Science and Engineering

Challenges and Learnings
To avoid a read from Cassandra I used Spark in-memory
computation on DStream — updateStateByKey(updateFunc)
DSTREAM
R
D
D
R
D
D
R
D
D
Previous State
R
D
D
R
D
D
R
D
D
R
D
D
R
D
D
5000 records 5M records
updateFunc called on 5M records

Cassandra Schema
CREATE TABLE rank_steps.walkers_steps2 (
user int,
arrival_time text,
num_steps int,
PRIMARY KEY (user, arrival_time)
) WITH CLUSTERING ORDER BY (arrival_time ASC)
CREATE MATERIALIZED VIEW rank_steps.top_walkers8 AS
SELECT arrival_time, num_steps, user
FROM rank_steps.walkers_steps2
WHERE user IS NOT NULL AND num_steps IS NOT NULL
AND arrival_time IS NOT NULL
PRIMARY KEY (arrival_time, num_steps, user)
WITH CLUSTERING ORDER BY (num_steps DESC, user ASC)

Materialized Views (MV) in
Cassandra 3.0
Eliminate the need of data denormalization by developers
— No need to create multiple tables for different queries.
Can be queried as any Cassandra table.
Persistent view — NOT an SQL view.
Automatic propagation of updates from the
base table to MV ensuring eventual consistency.

What's hot

Query Rewriting in RDF Stream ProcessingJean-Paul Calbimonte

RDF Stream Processing and the role of SemanticsJean-Paul Calbimonte

ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...Srinath Perera

Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016DataStax

RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...InfluxData

Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion

Final Presentation IRT - Jingxuan Wei V1.2JINGXUAN WEI

DEBS 2015 Tutorial : Patterns for Realtime Streaming AnalyticsSriskandarajah Suhothayan

Time seriesdb influxMauro Rainis

Storing metrics at scale with GnocchiGordon Chung

ACM DEBS 2015: Realtime Streaming Analytics PatternsSrinath Perera

What's hot (11)

Query Rewriting in RDF Stream Processing

RDF Stream Processing and the role of Semantics

ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...

Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016

RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...

Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018

Final Presentation IRT - Jingxuan Wei V1.2

DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics

Time seriesdb influx

Storing metrics at scale with Gnocchi

ACM DEBS 2015: Realtime Streaming Analytics Patterns

Viewers also liked

Bali IIF. Ovies

Nerddit Demo PresentationJerry Prawiharjo

Funciones del lenguaje y prototipos text (repaso)Mtra. Zoraida Gpe. Mtz

5th Nov '08SEXYSLIDES

Convegno SEEP Viterbo 2010Simone Barni

Trabajo informatica Angelica08Velasquez

Durham Region Real Estate Statistics August 2016Paul St. Aubin

Ahmad Syahidi B Che Zainal CVAHMAD SYAHIDI CHE ZAINAL

BJP Election Campainpankaj kumar

SMKASAS & MUHICik Tom Awang

Guiaymanua7Elis26rias

Licencias creative commonsMary Macas

Beni Culturali 2.1 Introduzione OsCaterina Policaro

rssCarlos Correa

Il catalogo come learning placeAgnese Galeffi

Storytelling: l'Arte del Narrare da Omero al DigitaleMariagrazia Licandro

Stella e Simão Mil Folhasmrvpimenta

La catalogazione di videoregistrazioni e filmatiRomina D'Antoni

Metal Semi-Conductor JunctionsAgha Muqaddas Ali Khan

18 el vidrioRichard Jimenez

Viewers also liked (20)

Bali II

Nerddit Demo Presentation

Funciones del lenguaje y prototipos text (repaso)

5th Nov '08

Convegno SEEP Viterbo 2010

Trabajo informatica

Durham Region Real Estate Statistics August 2016

Ahmad Syahidi B Che Zainal CV

BJP Election Campain

SMKASAS & MUHI

Guiaymanua7

Licencias creative commons

Beni Culturali 2.1 Introduzione Os

rss

Il catalogo come learning place

Storytelling: l'Arte del Narrare da Omero al Digitale

Stella e Simão Mil Folhas

La catalogazione di videoregistrazioni e filmati

Metal Semi-Conductor Junctions

18 el vidrio

Similar to Rank mysteps demo

MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB

MongoDB for Time Series DataMongoDB

Making sense of your data jugGerald Muecke

Timeseries - data visualization in GrafanaOCoderFest

Making sense of your dataGerald Muecke

Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAdrian Hornsby

Physical Design for Non-Relational Data SystemsMichael Mior

PresentationDimitris Stripelis

Clickstream data with sparkMarissa Saunders

Mastering MapReduce: MapReduce for Big Data Management and AnalysisTeradata Aster

Real time streaming analyticsAnirudh

Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAmazon Web Services

How Spark is Enabling the New Wave of Converged ApplicationsMapR Technologies

Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleSriram Krishnan

Getting started with Spark & Cassandra by Jon Haddad of DatastaxData Con LA

RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion StoicaSpark Summit

RISELab:Enabling Intelligent Real-Time DecisionsJen Aman

Using Graph Analysis and Fraud Detection in the Fintech IndustryStanka Dalekova

[Webinar] Introduction to CypherNeo4j

Similar to Rank mysteps demo (20)

MongoDB for Time Series Data: Setting the Stage for Sensor Management

MongoDB for Time Series Data

Making sense of your data jug

Timeseries - data visualization in Grafana

Making sense of your data

Serverless Streaming Data Processing using Amazon Kinesis Analytics

Physical Design for Non-Relational Data Systems

Presentation

Clickstream data with spark

Mastering MapReduce: MapReduce for Big Data Management and Analysis

Real time streaming analytics

Serverless Streaming Data Processing using Amazon Kinesis Analytics

How Spark is Enabling the New Wave of Converged Applications

Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale

Getting started with Spark & Cassandra by Jon Haddad of Datastax

RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica

RISELab:Enabling Intelligent Real-Time Decisions

Using Graph Analysis and Fraud Detection in the Fintech Industry

[Webinar] Introduction to Cypher

Recently uploaded

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Roadmap to Membership of RICS - Pathways and RoutesM Maged Hegazy, LLM, MBA, CCP, P3O

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Low Rate Call Girls In Saket, Delhi NCR

College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Introduction to Multiple Access Protocol.pptxupamatechverse

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N

Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEslot gacor bisa pakai pulsa

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N

Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan

chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam

Analog to Digital and Digital to Analog ConverterAbhinavSharma374939

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3

HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95

Recently uploaded (20)

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts

Roadmap to Membership of RICS - Pathways and Routes

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts

Introduction to Multiple Access Protocol.pptx

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS

Microscopic Analysis of Ceramic Materials.pptx

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE

Coefficient of Thermal Expansion and their Importance.pptx

chaitra-1.pptx fake news detection using machine learning

Analog to Digital and Digital to Analog Converter

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV

Rank mysteps demo

1. RankMySteps Rank Number of Steps Taken Daily Anurag Tiwari github.com/bigdata2/rankMySteps

2. Motivation Real-time daily ranking based on number of steps taken Challenge users to beat the top-ranking walkers rankmysteps.xyz

3. Motivation

4. Data • Synthesized Real-Time Data • Continuous stream of JSON messages • Six producer scripts {"source": "6", "steps": 10, "uuid": 10133, "timestamp": "2016-02-02 18:41:39"} {"source": "5", "steps": 3, "uuid": 11116, "timestamp": "2016-02-02 18:41:39"} {"source": "4", "steps": 1, "uuid": 11249, "timestamp": "2016-02-02 18:41:39"} {"source": "3", "steps": 1, "uuid": 10111, "timestamp": "2016-02-02 18:41:39”} {"source": "2", "steps": 10, "uuid": 10133, "timestamp": "2016-02-02 18:41:39"} {"source": "1", "steps": 3, "uuid": 11116, "timestamp": "2016-02-02 18:41:39"}

5. Data Pipeline Script 1 Script 2 Script 6 8 m4.xlarge $1.9 per Hour Key,Value Materialized View 500K events in 20 Sec

6. Base Table Materialized Views (MV) in Cassandra 3.0 Clustered Table App

7. Base Table Materialized Views (MV) in Cassandra 3.0 Clustered Table App Base Table Clustered MV App SELECT * from MV

8. Cassandra Schema User Id (Partitioning Key) Date Total Steps Base Table Date (Partitioning Key) User Id (Clustering Key) Total Steps (Clustering Key) Materialized View Primary Key Primary Key Order by Total Steps

9. Challenges and Learnings Spark To avoid a read from Cassandra, I used Spark in-memory computation on DStream — updateStateByKey(updateFunc) — Spark workers ran out of memory when scaled up. Cassandra Inserted data into two different tables — a base table and a sorted data table — faced consistency issues.

10. Anurag Tiwari • Staff Design Engineer • Silicon Program Manager • CM Program Manager • Member of Technical Staff • Ph.D. Computer Science and Engineering

11. BACKUP

12. Challenges and Learnings To avoid a read from Cassandra I used Spark in-memory computation on DStream — updateStateByKey(updateFunc) DSTREAM R D D R D D R D D Previous State R D D R D D R D D R D D R D D 5000 records 5M records updateFunc called on 5M records

13. Cassandra Schema CREATE TABLE rank_steps.walkers_steps2 ( user int, arrival_time text, num_steps int, PRIMARY KEY (user, arrival_time) ) WITH CLUSTERING ORDER BY (arrival_time ASC) CREATE MATERIALIZED VIEW rank_steps.top_walkers8 AS SELECT arrival_time, num_steps, user FROM rank_steps.walkers_steps2 WHERE user IS NOT NULL AND num_steps IS NOT NULL AND arrival_time IS NOT NULL PRIMARY KEY (arrival_time, num_steps, user) WITH CLUSTERING ORDER BY (num_steps DESC, user ASC)

14. Materialized Views (MV) in Cassandra 3.0 Eliminate the need of data denormalization by developers — No need to create multiple tables for different queries. Can be queried as any Cassandra table. Persistent view — NOT an SQL view. Automatic propagation of updates from the base table to MV ensuring eventual consistency.

Rank mysteps demo

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Viewers also liked

Viewers also liked (20)

Similar to Rank mysteps demo

Similar to Rank mysteps demo (20)

Recently uploaded

Recently uploaded (20)

Rank mysteps demo