SlideShare a Scribd company logo
1 of 52
Download to read offline
S3, Cassandra or Outer Space? Dumping
Time Series Data using Spark
Demi Ben-Ari - VP R&D @
ROME 24-25 MARCH 2017
About Me
Demi Ben-Ari, Co-Founder & VP R&D @ Panorays
● BS’c Computer Science – Academic College Tel-Aviv Yaffo
● Co-Founder
○ “Big Things” Big Data Community
○ Google Developer Group Cloud
In the Past:
● Sr. Data Engineer - Windward
● Team Leader & Sr. Java Software Engineer
Missile defense and Alert System - “Ofek” – IAF
Interested in almost every kind of technology – A True Geek
Agenda
● Spark brief overview and Catch Up
● Data flow and Environment
● What’s our time series data like?
● Where we started from - where we got to
○ Problems and our decisions
○ Evolution of the solution
● Conclusions
Spark Brief Overview & Catchup
Scala & Spark (Architecture)
Scala REPL Scala Compiler
Spark Runtime
Scala Runtime
JVM
File System
(eg. HDFS,
Cassandra, S3..)
Cluster Manager
(eg. Yarn, Mesos)
What kind of DSL is Apache Spark
● Centered around Collections
● Immutable data sets equipped with functional transformations
● These are exactly the Scala collection operations
map
flatMap
filter
...
reduce
fold
aggregate
...
union
intersection
...
Spark is A Multi-Language Platform
● Why to use Scala instead of
Python?
○ Native to Spark, Can use
everything without
translation
○ Types help
So Bottom Line…
What’s Spark???
United Tools Platform
United Tools Platform - Single Framework
Batch
InteractiveStreaming
Single Framework
Data flow and Environment
(Our Use Case)
Structure of the Data
● Maritime Analytics Platform
● Geo Locations + Metadata
● Arriving over time
● Different types of messages being reported by satellites
● Encoded
● Might arrive later than actually transmitted
Data Flow Diagram
External
Data
Source
Analytics
Layers
Data Pipeline
Parsed
Raw
Entity Resolution
Process
Building insights
on top of the entities
Data Output
Layer
Anomaly
Detection
Trends
Environment Description
Cluster
Dev Testing
Live
Staging
ProductionEnv
OB1K
RESTful Java Services
Basic Terms
● Missing Parts in Time Series Data
◦ Data arriving from the satellites
● Might be causing delays because of bad transmission
◦ Data vendors delaying the data stream
◦ Calculation in Layers may cause Holes in the Data
● Calculating the Data layers by time slices
Basic Terms
● Idempotence
is the property of certain operations in mathematics and computer
science, that can be applied multiple times without changing the
result beyond the initial application.
● Function: Same input => Same output
Basic Terms
● Partitions == Parallelism
◦ Physical / Logical partitioning
● Resilient Distributed Datasets (RDDs) == Collections
◦ fault-tolerant collection of elements that can be operated on in
parallel.
◦ Applying immutable transformations and actions over RDDs
What RDD’s really are?
So…..
The Problem - Receiving DATA
Beginning state, no data, and the timeline
begins
T = 0
Level 3 Entity
Level 2 Entity
Level 1 Entity
The Problem - Receiving DATA
T = 10
Level 3 Entity
Level 2 Entity
Level 1 Entity
Computation sliding window size
Level 1 entities data arrives
and gets stored
The Problem - Receiving DATA
T = 10
Level 3 Entity
Level 2 Entity
Level 1 Entity
Computation sliding window size
Level 3 entities are created on
top of Level 2’s Data
(Decreased amount of data)
Level 2 entities are created
on top of Level 1’s Data
(Decreased amount of
data)
The Problem - Receiving DATA
T = 20
Level 3 Entity
Level 2 Entity
Level 1 Entity
Computation sliding window size
Because of the sliding window’s
back size, level 2 and 3 entities
would not be created properly and
there would be “Holes” in the Data
Level 1 entity's
data arriving late
Solution to the Problem
● Creating Dependent Micro services forming a data pipeline
◦ Mainly Apache Spark applications
◦ Services are only dependent on the Data - not the previous
service’s run
● Forming a structure and scheduling of “Back Sliding Window”
◦ Know your data and its relevance through time
◦ Don’t try to foresee the future – it might Bias the results
Starting point & Infrastructure
How we started?
● Spark Standalone – via ec2 scripts
◦ Around 5 nodes (r3.xlarge instances)
◦ Didn’t want to keep a persistent HDFS – Costs a lot
◦ 100 GB (per day) => ~150 TB for 4 years
◦ Cost for server per year (r3.xlarge):
- On demand: ~2900$
- Reserved: ~1750$
● Know your costs: http://www.ec2instances.info/
Know Your Costs
Decision
● Working with S3 as the persistence layer
◦ Pay extra for
- Put (0.005 per 1000 requests)
- Get (0.004 per 10,000 requests)
◦ 150TB => ~210$ for 4 years of Data
● Same format as HDFS (CSV files)
◦ s3n://some-bucket/entity1/201412010000/part-00000
◦ s3n://some-bucket/entity1/201412010000/part-00001
◦ ……
What about the serving?
MongoDB for Serving
Worker 1
Worker 2
….
….
…
…
Worker N
MongoDB
Replica Set
Spark
Cluster
Master
Write
Read
Spark Slave - Server Specs
● Instance Type: r3.xlarge
● CPU’s: 4
● RAM: 30.5GB
● Storage: ephemeral
● Amount: 10+
MongoDB - Server Specs
● MongoDB version: 2.6.1
● Instance Type: m3.xlarge (AWS)
● CPU’s: 4
● RAM: 15GB
● Storage: EBS
● DB Size: ~500GB
● Collection Indexes: 5 (4 compound)
The Problem
● Batch jobs
◦ Should run for 5-10 minutes in total
◦ Actual - runs for ~40 minutes
● Why?
◦ ~20 minutes to write with the Java mongo driver – Async
(Unacknowledged)
◦ ~20 minutes to sync the journal
◦ Total: ~ 40 Minutes of the DB being unavailable
◦ No batch process response and no UI serving
Alternative Solutions
● Sharded MongoDB (With replica sets)
◦ Pros:
- Increases Throughput by the amount of shards
- Increases the availability of the DB
◦ Cons:
- Very hard to manage DevOps wise (for a small team of
developers)
- High cost of servers – because each shared need 3 replicas
Workflow with MongoDB
Worker 1
Worker 2
….
….
…
…
Worker N
Spark
Cluster
Master
Write
Read
Master
Our DevOps – After that solution
We had no
DevOps guy at
that time at all
☹
Alternative Solutions
● Apache Cassandra
◦ Pros:
- Very large developer community
- Linearly scalable Database
- No single master architecture
- Proven working with distributed engines like Apache Spark
◦ Cons:
- We had no experience at all with the Database
- No Geo Spatial Index – Needed to implement by ourselves
The Solution
● Migration to Apache Cassandra
● Create easily a Cassandra cluster using DataStax Community
AMI on AWS
◦ First easy step – Using the spark-cassandra-connector
(Easy bootstrap move to Spark ⬄ Cassandra)
◦ Creating a monitoring dashboard to Cassandra
Workflow with Cassandra
Worker 1
Worker 2
….
….
…
…
Worker N
Cassandra
Cluster
Spark
Cluster
Write
Read
Result
● Performance improvement
◦ Batch write parts of the job run in 3 minutes instead of ~ 40
minutes in MongoDB
● Took 2 weeks to go from “Zero to Hero”, and to ramp up a
running solution that work without glitches
So (Again)?
Transferring the Heaviest Process
● Micro service that runs every 10 minutes
● Writes to Cassandra 30GB per iteration
◦ (Replication factor 3 => 90GB)
● At first took us 18 minutes to do all of the writes
◦ Not Acceptable in a 10 minute process
Cluster On OpsCenter - Before
Transferring the Heaviest Process
● Solutions
◦ We chose the i2.xlarge
◦ Optimization of the Cluster
◦ Changing the JDK to Java-8
- Changing the GC algorithm to G1
◦ Tuning the Operation system
- Ulimit, removing the swap
◦ Write time went down to ~5 minutes (For 30GB RF=3)
Sounds good right? I don’t think so
Cloud Watch After Tuning
Cloud Watch After Tuning
The Solution
● Taking the same Data Model that we held in Cassandra (All of the
Raw data per 10 minutes) and put it on S3
◦ Write time went down from ~5 minutes to 1.5 minutes
● Added another process, not dependent on the main one,
happens every 15 minutes
◦ Reads from S3, downscales the amount and Writes them to
Cassandra for serving
Parsed
Raw
Static /
Aggregated
Data
Spark Analytics Layers
UI Serving
Downscaled
Data
Heavy
Fusion
Process
How it looks after all?
Conclusion
● Always give an estimate to your data
◦ Frequency
◦ Volume
◦ Arrangement of the previous phase
● There is no “Best” persistence layer
◦ There is the right one for the job
◦ Don’t overload an existing solution
Conclusion
● Spark is a great framework for distributed collections
◦ Fully functional API
◦ Can perform imperative actions
● “With great power,
comes lots of partitioning”
◦ Control your work and
data distribution via partitions
● https://www.pinterest.com/pin/155514993354583499/ (Thanks)
Questions?
● LinkedIn
● Twitter: @demibenari
● Blog:
http://progexc.blogspot.com/
● demi.benari@gmail.com
● “Big Things” Community
Meetup, YouTube, Facebook,
Twitter
● GDG Cloud
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Ben-Ari - Codemotion Rome 2017

More Related Content

What's hot

Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Monal Daxini
 
Lifting the Blinds: Monitoring Windows Server 2012
Lifting the Blinds: Monitoring Windows Server 2012Lifting the Blinds: Monitoring Windows Server 2012
Lifting the Blinds: Monitoring Windows Server 2012Datadog
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudDatadog
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Docker, Inc.
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Monal Daxini
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniMonal Daxini
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016aspyker
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016Monal Daxini
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDogRedis Labs
 
NetflixOSS and ZeroToDocker Talk
NetflixOSS and ZeroToDocker TalkNetflixOSS and ZeroToDocker Talk
NetflixOSS and ZeroToDocker Talkaspyker
 
Netflix Open Source: Building a Distributed and Automated Open Source Program
Netflix Open Source:  Building a Distributed and Automated Open Source ProgramNetflix Open Source:  Building a Distributed and Automated Open Source Program
Netflix Open Source: Building a Distributed and Automated Open Source Programaspyker
 
Containerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with KubernetesContainerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with KubernetesCodemotion Tel Aviv
 
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Beaming flink to the cloud @ netflix   ff 2016-monal-daxiniBeaming flink to the cloud @ netflix   ff 2016-monal-daxini
Beaming flink to the cloud @ netflix ff 2016-monal-daxiniMonal Daxini
 
Fact-Based Monitoring - PuppetConf 2014
Fact-Based Monitoring - PuppetConf 2014Fact-Based Monitoring - PuppetConf 2014
Fact-Based Monitoring - PuppetConf 2014Puppet
 
Reactive database access with Slick3
Reactive database access with Slick3Reactive database access with Slick3
Reactive database access with Slick3takezoe
 

What's hot (20)

Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
 
Lifting the Blinds: Monitoring Windows Server 2012
Lifting the Blinds: Monitoring Windows Server 2012Lifting the Blinds: Monitoring Windows Server 2012
Lifting the Blinds: Monitoring Windows Server 2012
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Netty training
Netty trainingNetty training
Netty training
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
NetflixOSS and ZeroToDocker Talk
NetflixOSS and ZeroToDocker TalkNetflixOSS and ZeroToDocker Talk
NetflixOSS and ZeroToDocker Talk
 
Netflix Open Source: Building a Distributed and Automated Open Source Program
Netflix Open Source:  Building a Distributed and Automated Open Source ProgramNetflix Open Source:  Building a Distributed and Automated Open Source Program
Netflix Open Source: Building a Distributed and Automated Open Source Program
 
Containerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with KubernetesContainerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with Kubernetes
 
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Beaming flink to the cloud @ netflix   ff 2016-monal-daxiniBeaming flink to the cloud @ netflix   ff 2016-monal-daxini
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
 
Fact-Based Monitoring - PuppetConf 2014
Fact-Based Monitoring - PuppetConf 2014Fact-Based Monitoring - PuppetConf 2014
Fact-Based Monitoring - PuppetConf 2014
 
Reactive database access with Slick3
Reactive database access with Slick3Reactive database access with Slick3
Reactive database access with Slick3
 

Viewers also liked

Docker Inside/Out: the ‘real’ real-world of stacking containers in production...
Docker Inside/Out: the ‘real’ real-world of stacking containers in production...Docker Inside/Out: the ‘real’ real-world of stacking containers in production...
Docker Inside/Out: the ‘real’ real-world of stacking containers in production...Codemotion
 
Component-Based UI Architectures for the Web - Andrew Rota - Codemotion Rome...
Component-Based UI Architectures for the Web  - Andrew Rota - Codemotion Rome...Component-Based UI Architectures for the Web  - Andrew Rota - Codemotion Rome...
Component-Based UI Architectures for the Web - Andrew Rota - Codemotion Rome...Codemotion
 
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017Codemotion
 
From a Developer's POV: is Machine Learning Reshaping the World? - Simone Sca...
From a Developer's POV: is Machine Learning Reshaping the World? - Simone Sca...From a Developer's POV: is Machine Learning Reshaping the World? - Simone Sca...
From a Developer's POV: is Machine Learning Reshaping the World? - Simone Sca...Codemotion
 
Microservices in GO - Massimiliano Dessì - Codemotion Rome 2017
Microservices in GO - Massimiliano Dessì - Codemotion Rome 2017Microservices in GO - Massimiliano Dessì - Codemotion Rome 2017
Microservices in GO - Massimiliano Dessì - Codemotion Rome 2017Codemotion
 
Il game audio come processo ingegneristico - Davide Pensato - Codemotion Rome...
Il game audio come processo ingegneristico - Davide Pensato - Codemotion Rome...Il game audio come processo ingegneristico - Davide Pensato - Codemotion Rome...
Il game audio come processo ingegneristico - Davide Pensato - Codemotion Rome...Codemotion
 
Comics and immersive storytelling in Virtual Reality - Fabio Corrirossi - Cod...
Comics and immersive storytelling in Virtual Reality - Fabio Corrirossi - Cod...Comics and immersive storytelling in Virtual Reality - Fabio Corrirossi - Cod...
Comics and immersive storytelling in Virtual Reality - Fabio Corrirossi - Cod...Codemotion
 
Galateo semi-serio dell'Open Source - Luigi Dell' Aquila - Codemotion Rome 2017
Galateo semi-serio dell'Open Source -  Luigi Dell' Aquila - Codemotion Rome 2017Galateo semi-serio dell'Open Source -  Luigi Dell' Aquila - Codemotion Rome 2017
Galateo semi-serio dell'Open Source - Luigi Dell' Aquila - Codemotion Rome 2017Codemotion
 
The busy developer guide to Docker - Maurice de Beijer - Codemotion Rome 2017
The busy developer guide to Docker - Maurice de Beijer - Codemotion Rome 2017The busy developer guide to Docker - Maurice de Beijer - Codemotion Rome 2017
The busy developer guide to Docker - Maurice de Beijer - Codemotion Rome 2017Codemotion
 
Kunos Simulazioni and Assetto Corsa, behind the scenes- Alessandro Piva, Fabr...
Kunos Simulazioni and Assetto Corsa, behind the scenes- Alessandro Piva, Fabr...Kunos Simulazioni and Assetto Corsa, behind the scenes- Alessandro Piva, Fabr...
Kunos Simulazioni and Assetto Corsa, behind the scenes- Alessandro Piva, Fabr...Codemotion
 
Barbarians at the Gate(way) - Dave Lewis - Codemotion Rome 2017
Barbarians at the Gate(way) - Dave Lewis - Codemotion Rome 2017Barbarians at the Gate(way) - Dave Lewis - Codemotion Rome 2017
Barbarians at the Gate(way) - Dave Lewis - Codemotion Rome 2017Codemotion
 
Xamarin.Forms Performance Tips & Tricks - Francesco Bonacci - Codemotion Rome...
Xamarin.Forms Performance Tips & Tricks - Francesco Bonacci - Codemotion Rome...Xamarin.Forms Performance Tips & Tricks - Francesco Bonacci - Codemotion Rome...
Xamarin.Forms Performance Tips & Tricks - Francesco Bonacci - Codemotion Rome...Codemotion
 
Web Based Virtual Reality - Tanay Pant - Codemotion Rome 2017
Web Based Virtual Reality - Tanay Pant - Codemotion Rome 2017Web Based Virtual Reality - Tanay Pant - Codemotion Rome 2017
Web Based Virtual Reality - Tanay Pant - Codemotion Rome 2017Codemotion
 
Commodore 64 Mon Amour(2): sprite multiplexing. Il caso Catalypse e altre sto...
Commodore 64 Mon Amour(2): sprite multiplexing. Il caso Catalypse e altre sto...Commodore 64 Mon Amour(2): sprite multiplexing. Il caso Catalypse e altre sto...
Commodore 64 Mon Amour(2): sprite multiplexing. Il caso Catalypse e altre sto...Codemotion
 
Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...
Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...
Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...Codemotion
 
Pronti per la legge sulla data protection GDPR? No Panic! - Domenico Maracci,...
Pronti per la legge sulla data protection GDPR? No Panic! - Domenico Maracci,...Pronti per la legge sulla data protection GDPR? No Panic! - Domenico Maracci,...
Pronti per la legge sulla data protection GDPR? No Panic! - Domenico Maracci,...Codemotion
 
Cyber Security in Multi Cloud Architecture - Luca Di Bari - Codemotion Rome 2017
Cyber Security in Multi Cloud Architecture - Luca Di Bari - Codemotion Rome 2017Cyber Security in Multi Cloud Architecture - Luca Di Bari - Codemotion Rome 2017
Cyber Security in Multi Cloud Architecture - Luca Di Bari - Codemotion Rome 2017Codemotion
 
Meetup Code Garden Roma e Java User Group Roma: metodi asincroni con Spring -...
Meetup Code Garden Roma e Java User Group Roma: metodi asincroni con Spring -...Meetup Code Garden Roma e Java User Group Roma: metodi asincroni con Spring -...
Meetup Code Garden Roma e Java User Group Roma: metodi asincroni con Spring -...Codemotion
 
Resilient microservices with Kubernetes - Mete Atamel - Codemotion Rome 2017
Resilient microservices with Kubernetes - Mete Atamel - Codemotion Rome 2017Resilient microservices with Kubernetes - Mete Atamel - Codemotion Rome 2017
Resilient microservices with Kubernetes - Mete Atamel - Codemotion Rome 2017Codemotion
 
Microservice Plumbing - Glynn Bird - Codemotion Rome 2017
Microservice Plumbing  - Glynn Bird - Codemotion Rome 2017Microservice Plumbing  - Glynn Bird - Codemotion Rome 2017
Microservice Plumbing - Glynn Bird - Codemotion Rome 2017Codemotion
 

Viewers also liked (20)

Docker Inside/Out: the ‘real’ real-world of stacking containers in production...
Docker Inside/Out: the ‘real’ real-world of stacking containers in production...Docker Inside/Out: the ‘real’ real-world of stacking containers in production...
Docker Inside/Out: the ‘real’ real-world of stacking containers in production...
 
Component-Based UI Architectures for the Web - Andrew Rota - Codemotion Rome...
Component-Based UI Architectures for the Web  - Andrew Rota - Codemotion Rome...Component-Based UI Architectures for the Web  - Andrew Rota - Codemotion Rome...
Component-Based UI Architectures for the Web - Andrew Rota - Codemotion Rome...
 
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
 
From a Developer's POV: is Machine Learning Reshaping the World? - Simone Sca...
From a Developer's POV: is Machine Learning Reshaping the World? - Simone Sca...From a Developer's POV: is Machine Learning Reshaping the World? - Simone Sca...
From a Developer's POV: is Machine Learning Reshaping the World? - Simone Sca...
 
Microservices in GO - Massimiliano Dessì - Codemotion Rome 2017
Microservices in GO - Massimiliano Dessì - Codemotion Rome 2017Microservices in GO - Massimiliano Dessì - Codemotion Rome 2017
Microservices in GO - Massimiliano Dessì - Codemotion Rome 2017
 
Il game audio come processo ingegneristico - Davide Pensato - Codemotion Rome...
Il game audio come processo ingegneristico - Davide Pensato - Codemotion Rome...Il game audio come processo ingegneristico - Davide Pensato - Codemotion Rome...
Il game audio come processo ingegneristico - Davide Pensato - Codemotion Rome...
 
Comics and immersive storytelling in Virtual Reality - Fabio Corrirossi - Cod...
Comics and immersive storytelling in Virtual Reality - Fabio Corrirossi - Cod...Comics and immersive storytelling in Virtual Reality - Fabio Corrirossi - Cod...
Comics and immersive storytelling in Virtual Reality - Fabio Corrirossi - Cod...
 
Galateo semi-serio dell'Open Source - Luigi Dell' Aquila - Codemotion Rome 2017
Galateo semi-serio dell'Open Source -  Luigi Dell' Aquila - Codemotion Rome 2017Galateo semi-serio dell'Open Source -  Luigi Dell' Aquila - Codemotion Rome 2017
Galateo semi-serio dell'Open Source - Luigi Dell' Aquila - Codemotion Rome 2017
 
The busy developer guide to Docker - Maurice de Beijer - Codemotion Rome 2017
The busy developer guide to Docker - Maurice de Beijer - Codemotion Rome 2017The busy developer guide to Docker - Maurice de Beijer - Codemotion Rome 2017
The busy developer guide to Docker - Maurice de Beijer - Codemotion Rome 2017
 
Kunos Simulazioni and Assetto Corsa, behind the scenes- Alessandro Piva, Fabr...
Kunos Simulazioni and Assetto Corsa, behind the scenes- Alessandro Piva, Fabr...Kunos Simulazioni and Assetto Corsa, behind the scenes- Alessandro Piva, Fabr...
Kunos Simulazioni and Assetto Corsa, behind the scenes- Alessandro Piva, Fabr...
 
Barbarians at the Gate(way) - Dave Lewis - Codemotion Rome 2017
Barbarians at the Gate(way) - Dave Lewis - Codemotion Rome 2017Barbarians at the Gate(way) - Dave Lewis - Codemotion Rome 2017
Barbarians at the Gate(way) - Dave Lewis - Codemotion Rome 2017
 
Xamarin.Forms Performance Tips & Tricks - Francesco Bonacci - Codemotion Rome...
Xamarin.Forms Performance Tips & Tricks - Francesco Bonacci - Codemotion Rome...Xamarin.Forms Performance Tips & Tricks - Francesco Bonacci - Codemotion Rome...
Xamarin.Forms Performance Tips & Tricks - Francesco Bonacci - Codemotion Rome...
 
Web Based Virtual Reality - Tanay Pant - Codemotion Rome 2017
Web Based Virtual Reality - Tanay Pant - Codemotion Rome 2017Web Based Virtual Reality - Tanay Pant - Codemotion Rome 2017
Web Based Virtual Reality - Tanay Pant - Codemotion Rome 2017
 
Commodore 64 Mon Amour(2): sprite multiplexing. Il caso Catalypse e altre sto...
Commodore 64 Mon Amour(2): sprite multiplexing. Il caso Catalypse e altre sto...Commodore 64 Mon Amour(2): sprite multiplexing. Il caso Catalypse e altre sto...
Commodore 64 Mon Amour(2): sprite multiplexing. Il caso Catalypse e altre sto...
 
Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...
Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...
Handle insane devices traffic using Google Cloud Platform - Andrea Ulisse - C...
 
Pronti per la legge sulla data protection GDPR? No Panic! - Domenico Maracci,...
Pronti per la legge sulla data protection GDPR? No Panic! - Domenico Maracci,...Pronti per la legge sulla data protection GDPR? No Panic! - Domenico Maracci,...
Pronti per la legge sulla data protection GDPR? No Panic! - Domenico Maracci,...
 
Cyber Security in Multi Cloud Architecture - Luca Di Bari - Codemotion Rome 2017
Cyber Security in Multi Cloud Architecture - Luca Di Bari - Codemotion Rome 2017Cyber Security in Multi Cloud Architecture - Luca Di Bari - Codemotion Rome 2017
Cyber Security in Multi Cloud Architecture - Luca Di Bari - Codemotion Rome 2017
 
Meetup Code Garden Roma e Java User Group Roma: metodi asincroni con Spring -...
Meetup Code Garden Roma e Java User Group Roma: metodi asincroni con Spring -...Meetup Code Garden Roma e Java User Group Roma: metodi asincroni con Spring -...
Meetup Code Garden Roma e Java User Group Roma: metodi asincroni con Spring -...
 
Resilient microservices with Kubernetes - Mete Atamel - Codemotion Rome 2017
Resilient microservices with Kubernetes - Mete Atamel - Codemotion Rome 2017Resilient microservices with Kubernetes - Mete Atamel - Codemotion Rome 2017
Resilient microservices with Kubernetes - Mete Atamel - Codemotion Rome 2017
 
Microservice Plumbing - Glynn Bird - Codemotion Rome 2017
Microservice Plumbing  - Glynn Bird - Codemotion Rome 2017Microservice Plumbing  - Glynn Bird - Codemotion Rome 2017
Microservice Plumbing - Glynn Bird - Codemotion Rome 2017
 

Similar to S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben-Ari - Codemotion Rome 2017

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...Codemotion Tel Aviv
 
S3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using sparkS3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using sparkDemi Ben-Ari
 
Migrating Data Pipeline from MongoDB to Cassandra
Migrating Data Pipeline from MongoDB to CassandraMigrating Data Pipeline from MongoDB to Cassandra
Migrating Data Pipeline from MongoDB to CassandraDemi Ben-Ari
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at UberDatabricks
 
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark ClustersFrom HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark ClustersDatabricks
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaDataWorks Summit
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache SparkLucian Neghina
 
Stream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaStream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaItai Yaffe
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streamingdatamantra
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudHostedbyConfluent
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkDatabricks
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaDatabricks
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with SparkRoger Rafanell Mas
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Djamel Zouaoui
 

Similar to S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben-Ari - Codemotion Rome 2017 (20)

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
 
S3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using sparkS3 cassandra or outer space? dumping time series data using spark
S3 cassandra or outer space? dumping time series data using spark
 
Migrating Data Pipeline from MongoDB to Cassandra
Migrating Data Pipeline from MongoDB to CassandraMigrating Data Pipeline from MongoDB to Cassandra
Migrating Data Pipeline from MongoDB to Cassandra
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at Uber
 
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark ClustersFrom HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache Spark
 
Stream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaStream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and Kafka
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent Cloud
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache Spark
 
Netty training
Netty trainingNetty training
Netty training
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 

More from Codemotion

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyCodemotion
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaCodemotion
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserCodemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 - Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Codemotion
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Codemotion
 

More from Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 

Recently uploaded

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Recently uploaded (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben-Ari - Codemotion Rome 2017

  • 1. S3, Cassandra or Outer Space? Dumping Time Series Data using Spark Demi Ben-Ari - VP R&D @ ROME 24-25 MARCH 2017
  • 2. About Me Demi Ben-Ari, Co-Founder & VP R&D @ Panorays ● BS’c Computer Science – Academic College Tel-Aviv Yaffo ● Co-Founder ○ “Big Things” Big Data Community ○ Google Developer Group Cloud In the Past: ● Sr. Data Engineer - Windward ● Team Leader & Sr. Java Software Engineer Missile defense and Alert System - “Ofek” – IAF Interested in almost every kind of technology – A True Geek
  • 3. Agenda ● Spark brief overview and Catch Up ● Data flow and Environment ● What’s our time series data like? ● Where we started from - where we got to ○ Problems and our decisions ○ Evolution of the solution ● Conclusions
  • 5. Scala & Spark (Architecture) Scala REPL Scala Compiler Spark Runtime Scala Runtime JVM File System (eg. HDFS, Cassandra, S3..) Cluster Manager (eg. Yarn, Mesos)
  • 6. What kind of DSL is Apache Spark ● Centered around Collections ● Immutable data sets equipped with functional transformations ● These are exactly the Scala collection operations map flatMap filter ... reduce fold aggregate ... union intersection ...
  • 7. Spark is A Multi-Language Platform ● Why to use Scala instead of Python? ○ Native to Spark, Can use everything without translation ○ Types help
  • 10. United Tools Platform - Single Framework Batch InteractiveStreaming Single Framework
  • 11. Data flow and Environment (Our Use Case)
  • 12. Structure of the Data ● Maritime Analytics Platform ● Geo Locations + Metadata ● Arriving over time ● Different types of messages being reported by satellites ● Encoded ● Might arrive later than actually transmitted
  • 13. Data Flow Diagram External Data Source Analytics Layers Data Pipeline Parsed Raw Entity Resolution Process Building insights on top of the entities Data Output Layer Anomaly Detection Trends
  • 15. Basic Terms ● Missing Parts in Time Series Data ◦ Data arriving from the satellites ● Might be causing delays because of bad transmission ◦ Data vendors delaying the data stream ◦ Calculation in Layers may cause Holes in the Data ● Calculating the Data layers by time slices
  • 16. Basic Terms ● Idempotence is the property of certain operations in mathematics and computer science, that can be applied multiple times without changing the result beyond the initial application. ● Function: Same input => Same output
  • 17. Basic Terms ● Partitions == Parallelism ◦ Physical / Logical partitioning ● Resilient Distributed Datasets (RDDs) == Collections ◦ fault-tolerant collection of elements that can be operated on in parallel. ◦ Applying immutable transformations and actions over RDDs
  • 20. The Problem - Receiving DATA Beginning state, no data, and the timeline begins T = 0 Level 3 Entity Level 2 Entity Level 1 Entity
  • 21. The Problem - Receiving DATA T = 10 Level 3 Entity Level 2 Entity Level 1 Entity Computation sliding window size Level 1 entities data arrives and gets stored
  • 22. The Problem - Receiving DATA T = 10 Level 3 Entity Level 2 Entity Level 1 Entity Computation sliding window size Level 3 entities are created on top of Level 2’s Data (Decreased amount of data) Level 2 entities are created on top of Level 1’s Data (Decreased amount of data)
  • 23. The Problem - Receiving DATA T = 20 Level 3 Entity Level 2 Entity Level 1 Entity Computation sliding window size Because of the sliding window’s back size, level 2 and 3 entities would not be created properly and there would be “Holes” in the Data Level 1 entity's data arriving late
  • 24. Solution to the Problem ● Creating Dependent Micro services forming a data pipeline ◦ Mainly Apache Spark applications ◦ Services are only dependent on the Data - not the previous service’s run ● Forming a structure and scheduling of “Back Sliding Window” ◦ Know your data and its relevance through time ◦ Don’t try to foresee the future – it might Bias the results
  • 25. Starting point & Infrastructure
  • 26. How we started? ● Spark Standalone – via ec2 scripts ◦ Around 5 nodes (r3.xlarge instances) ◦ Didn’t want to keep a persistent HDFS – Costs a lot ◦ 100 GB (per day) => ~150 TB for 4 years ◦ Cost for server per year (r3.xlarge): - On demand: ~2900$ - Reserved: ~1750$ ● Know your costs: http://www.ec2instances.info/
  • 28. Decision ● Working with S3 as the persistence layer ◦ Pay extra for - Put (0.005 per 1000 requests) - Get (0.004 per 10,000 requests) ◦ 150TB => ~210$ for 4 years of Data ● Same format as HDFS (CSV files) ◦ s3n://some-bucket/entity1/201412010000/part-00000 ◦ s3n://some-bucket/entity1/201412010000/part-00001 ◦ ……
  • 29. What about the serving?
  • 30. MongoDB for Serving Worker 1 Worker 2 …. …. … … Worker N MongoDB Replica Set Spark Cluster Master Write Read
  • 31. Spark Slave - Server Specs ● Instance Type: r3.xlarge ● CPU’s: 4 ● RAM: 30.5GB ● Storage: ephemeral ● Amount: 10+
  • 32. MongoDB - Server Specs ● MongoDB version: 2.6.1 ● Instance Type: m3.xlarge (AWS) ● CPU’s: 4 ● RAM: 15GB ● Storage: EBS ● DB Size: ~500GB ● Collection Indexes: 5 (4 compound)
  • 33. The Problem ● Batch jobs ◦ Should run for 5-10 minutes in total ◦ Actual - runs for ~40 minutes ● Why? ◦ ~20 minutes to write with the Java mongo driver – Async (Unacknowledged) ◦ ~20 minutes to sync the journal ◦ Total: ~ 40 Minutes of the DB being unavailable ◦ No batch process response and no UI serving
  • 34. Alternative Solutions ● Sharded MongoDB (With replica sets) ◦ Pros: - Increases Throughput by the amount of shards - Increases the availability of the DB ◦ Cons: - Very hard to manage DevOps wise (for a small team of developers) - High cost of servers – because each shared need 3 replicas
  • 35. Workflow with MongoDB Worker 1 Worker 2 …. …. … … Worker N Spark Cluster Master Write Read Master
  • 36. Our DevOps – After that solution We had no DevOps guy at that time at all ☹
  • 37. Alternative Solutions ● Apache Cassandra ◦ Pros: - Very large developer community - Linearly scalable Database - No single master architecture - Proven working with distributed engines like Apache Spark ◦ Cons: - We had no experience at all with the Database - No Geo Spatial Index – Needed to implement by ourselves
  • 38. The Solution ● Migration to Apache Cassandra ● Create easily a Cassandra cluster using DataStax Community AMI on AWS ◦ First easy step – Using the spark-cassandra-connector (Easy bootstrap move to Spark ⬄ Cassandra) ◦ Creating a monitoring dashboard to Cassandra
  • 39. Workflow with Cassandra Worker 1 Worker 2 …. …. … … Worker N Cassandra Cluster Spark Cluster Write Read
  • 40. Result ● Performance improvement ◦ Batch write parts of the job run in 3 minutes instead of ~ 40 minutes in MongoDB ● Took 2 weeks to go from “Zero to Hero”, and to ramp up a running solution that work without glitches
  • 42. Transferring the Heaviest Process ● Micro service that runs every 10 minutes ● Writes to Cassandra 30GB per iteration ◦ (Replication factor 3 => 90GB) ● At first took us 18 minutes to do all of the writes ◦ Not Acceptable in a 10 minute process
  • 44. Transferring the Heaviest Process ● Solutions ◦ We chose the i2.xlarge ◦ Optimization of the Cluster ◦ Changing the JDK to Java-8 - Changing the GC algorithm to G1 ◦ Tuning the Operation system - Ulimit, removing the swap ◦ Write time went down to ~5 minutes (For 30GB RF=3) Sounds good right? I don’t think so
  • 45. Cloud Watch After Tuning Cloud Watch After Tuning
  • 46. The Solution ● Taking the same Data Model that we held in Cassandra (All of the Raw data per 10 minutes) and put it on S3 ◦ Write time went down from ~5 minutes to 1.5 minutes ● Added another process, not dependent on the main one, happens every 15 minutes ◦ Reads from S3, downscales the amount and Writes them to Cassandra for serving
  • 47. Parsed Raw Static / Aggregated Data Spark Analytics Layers UI Serving Downscaled Data Heavy Fusion Process How it looks after all?
  • 48. Conclusion ● Always give an estimate to your data ◦ Frequency ◦ Volume ◦ Arrangement of the previous phase ● There is no “Best” persistence layer ◦ There is the right one for the job ◦ Don’t overload an existing solution
  • 49. Conclusion ● Spark is a great framework for distributed collections ◦ Fully functional API ◦ Can perform imperative actions ● “With great power, comes lots of partitioning” ◦ Control your work and data distribution via partitions ● https://www.pinterest.com/pin/155514993354583499/ (Thanks)
  • 51. ● LinkedIn ● Twitter: @demibenari ● Blog: http://progexc.blogspot.com/ ● demi.benari@gmail.com ● “Big Things” Community Meetup, YouTube, Facebook, Twitter ● GDG Cloud