SlideShare a Scribd company logo
Ruby to Scala in 9 Weeks 
Evolving antiquated software with modern tools 
Jake Utley (jutley@uw.edu) 
Sept. 9, 2014
WhitePages 2 
Areas of interest over SDLC 
• Understanding the problem 
• Exploring solution space 
• Speed of prototyping 
• Performance 
• Maintainability 
• Integration 
prototyping production
WhitePages 3 
Scala and Ruby Similarities 
• Readable and concise 
• Object oriented 
• Highly composable (traits and mixins) 
• Highly adaptable
WhitePages 4 
Background: Location Services 
• An internal service at Whitepages, written in Ruby 
• Rationalizes Foursquare data against Whitepages people data 
• Modified to use data from Facebook, LinkedIn, and Twitter 
• Contained many issues that motivated rewrite 
– Unclear code 
– Sloppy code organization 
– Poor performance 
– Minimal documentation
WhitePages 5 
Migration to Scala 
• Concurrency 
– Ruby EventMachine vs. Scala Futures 
• Type system 
• Performance 
– Throughput 
– Latency 
– Hardware Utilization
CONCURRENCY
• Ruby: No built-in concurrency 
– Many implementations are single-threaded 
– Cooperative multitasking comes from 
external libraries 
• Scala: Built-in concurrency 
– Built on top of the JVM 
– Supports standard concurrency models 
o Futures, Actors 
WhitePages 7 
Concurrency
• Ruby library for cooperative 
multitasking 
• Uses the reactor pattern 
• Dangerous: blocking 
incorrectly can kill entire app 
• Long chains of callbacks 
WhitePages 8 
Ruby Concurrency: EventMachine 
Location Services author created 
DeferrableSequence to manage callbacks 
Wikipedia: http://bit.ly/1qHMez5
WhitePages 9 
Scala Concurrency: Futures 
val x = Future { 
1 + 1 
} 
val y = x map { case two => 
two * 2 
} 
y map println 
• Well known concurrency model 
• Built into Scala 
• Functional and imperative: 
– Composition (via map, zip, etc.) 
– Callbacks (onComplete, etc.) 
• Blends nicely into other code 
• Can be treated like a 1-element 
collection
WhitePages 10
val x = Future { 
1 + 1 
} 
val y = x map { case two => 
two * 2 
} 
y map println 
WhitePages Confidential 11 
Scala Concurrency: Futures 
• Drawbacks 
– Have to be used in a non-blocking 
way (without Await) 
– Confusing types 
o Future[Future[Int]] 
o Seq[Future[Int]] 
o flatMap and sequence functions 
help to avoid these types 
– Difficult to debug
WhitePages 12 
Areas of interest over SDLC 
• Understanding the problem 
• Exploring solution space 
• Speed of prototyping 
• Performance 
• Maintainability 
• Integration 
prototyping production
TYPE SYSTEMS
WhitePages 14 
Type Systems 
• Ruby: Dynamically typed 
• Scala: Statically typed 
• Both have advantages depending on the circumstance
WhitePages 15 
Type Systems: Ruby 
• Dynamic typing 
– Types are checked at runtime 
– More flexibility 
– Easier to express ideas -> faster prototyping 
– More room for errors 
– Less self-documentation, potentially harder 
for others to understand code 
– Defensive coding
WhitePages 16 
Type Systems: Scala 
• Static typing 
– Compiler type checks 
– Many errors are caught at compile time 
– Argument types are always documented 
– Easier for others to maintain code 
– Strict contracts 
– IDEs can tell us the types of any variable 
– Drawback: An IDE becomes a crutch
WhitePages 17 
Areas of interest over SDLC 
• Understanding the problem 
• Exploring solution space 
• Speed of prototyping 
• Performance 
• Maintainability 
• Integration 
prototyping production
PERFORMANCE
Performance: Methodology 
• Requests from production sent to services for 5 
WhitePages 19 
minute interval 
• Request rate increased until majority of 
requests time out (10 seconds) 
• Each service ran on identical hardware 
• 30 seconds warmups 
• 3 trials at each request rate
WhitePages 20 
Performance tools: Onslaught 
• Performance testing tool built at Whitepages 
• Reports throughput, p50, p75, p95, p99, p999, 
mean, and max latencies 
• Plan to make Onslaught open-source
WhitePages 21 
Performance: Expectations 
• Non-blocking I/O 
– Higher throughput 
– Better CPU usage (no waits) 
– Higher memory usage 
• JVM optimizations 
– Lower latency
Throughput (Scala) 
WhitePages 22 
Performance: Throughput 
450 
400 
350 
300 
250 
200 
150 
100 
50 
0 
Throughput (Ruby) 
25 50 75 100 125 150 
Successful responses/s 
Requests/s 
450 
400 
350 
300 
250 
200 
150 
100 
50 
0 
100 200 300 400 500 600 
Successful responses/s 
Requests/s
WhitePages 23 
Performance: Latency 
10000000 
9000000 
8000000 
7000000 
6000000 
5000000 
4000000 
3000000 
2000000 
1000000 
0 
Latency (Ruby) 
25 50 75 100 125 150 
Latency (μs) 
Request rate (Req/s) 
p50 
p95 
p99 
10000000 
9000000 
8000000 
7000000 
6000000 
5000000 
4000000 
3000000 
2000000 
1000000 
0 
Latency (Scala) 
100 200 300 400 500 600 
Latency (μs) 
Request rate (Req/s) 
p50 
p95 
p99
WhitePages 24 
Performance: CPU
WhitePages 25 
Performance: CPU
WhitePages 26 
Performance: Memory
WhitePages 27 
Areas of interest over SDLC 
• Understanding the problem 
• Exploring solution space 
• Speed of prototyping 
• Performance 
• Maintainability 
• Integration 
prototyping production
WhitePages 28 
Summary 
Scala or Ruby? 
• Prototyping: 
– Ruby’s dynamic typing allows for fast prototyping 
• Production: 
– Scala’s static typing catches more errors and can make code clearer 
– Scala supports concurrency with standard library, Ruby does not 
– Scala performs better than Ruby in throughput and hardware utilization
Thank you. 
Questions?

More Related Content

What's hot

Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Multi cluster, multitenant and hierarchical kafka messaging service   slideshareMulti cluster, multitenant and hierarchical kafka messaging service   slideshare
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Allen (Xiaozhong) Wang
 
Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the Cloud
Ines Sombra
 
Lessons Learned from Building and Operating Scuba
Lessons Learned from Building and Operating ScubaLessons Learned from Building and Operating Scuba
Lessons Learned from Building and Operating Scuba
SingleStore
 
Parallel programming in .NET
Parallel programming in .NETParallel programming in .NET
Parallel programming in .NET
Peter Csala
 
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward
 
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
confluent
 
Topic and schema management-meetupberlin
Topic and schema management-meetupberlinTopic and schema management-meetupberlin
Topic and schema management-meetupberlin
confluent
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
HostedbyConfluent
 
Administrative techniques to reduce Kafka costs | Anna Kepler, Viasat
Administrative techniques to reduce Kafka costs | Anna Kepler, ViasatAdministrative techniques to reduce Kafka costs | Anna Kepler, Viasat
Administrative techniques to reduce Kafka costs | Anna Kepler, Viasat
HostedbyConfluent
 
NRD: Nagios Result Distributor
NRD: Nagios Result DistributorNRD: Nagios Result Distributor
NRD: Nagios Result Distributor
Jose Luis Martínez
 
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean FellowsDeploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
confluent
 
Kafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backboneKafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backbone
Ayyappadas Ravindran (Appu)
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Thomas Weise
 
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
HostedbyConfluent
 
#NetflixEverywhere Global Architecture
#NetflixEverywhere Global Architecture#NetflixEverywhere Global Architecture
#NetflixEverywhere Global Architecture
C4Media
 
Slick 3.0 functional programming and db side effects
Slick 3.0   functional programming and db side effectsSlick 3.0   functional programming and db side effects
Slick 3.0 functional programming and db side effects
Joost de Vries
 
Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?
Flink Forward
 
Looking towards an official cassandra sidecar netflix
Looking towards an official cassandra sidecar   netflixLooking towards an official cassandra sidecar   netflix
Looking towards an official cassandra sidecar netflix
Vinay Kumar Chella
 
From Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka Journey
Allen (Xiaozhong) Wang
 
OSOM Operations in the Cloud
OSOM Operations in the CloudOSOM Operations in the Cloud
OSOM Operations in the Cloud
mstuparu
 

What's hot (20)

Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Multi cluster, multitenant and hierarchical kafka messaging service   slideshareMulti cluster, multitenant and hierarchical kafka messaging service   slideshare
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
 
Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the Cloud
 
Lessons Learned from Building and Operating Scuba
Lessons Learned from Building and Operating ScubaLessons Learned from Building and Operating Scuba
Lessons Learned from Building and Operating Scuba
 
Parallel programming in .NET
Parallel programming in .NETParallel programming in .NET
Parallel programming in .NET
 
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming APIFlink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
 
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
 
Topic and schema management-meetupberlin
Topic and schema management-meetupberlinTopic and schema management-meetupberlin
Topic and schema management-meetupberlin
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 
Administrative techniques to reduce Kafka costs | Anna Kepler, Viasat
Administrative techniques to reduce Kafka costs | Anna Kepler, ViasatAdministrative techniques to reduce Kafka costs | Anna Kepler, Viasat
Administrative techniques to reduce Kafka costs | Anna Kepler, Viasat
 
NRD: Nagios Result Distributor
NRD: Nagios Result DistributorNRD: Nagios Result Distributor
NRD: Nagios Result Distributor
 
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean FellowsDeploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
 
Kafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backboneKafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backbone
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019
 
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
 
#NetflixEverywhere Global Architecture
#NetflixEverywhere Global Architecture#NetflixEverywhere Global Architecture
#NetflixEverywhere Global Architecture
 
Slick 3.0 functional programming and db side effects
Slick 3.0   functional programming and db side effectsSlick 3.0   functional programming and db side effects
Slick 3.0 functional programming and db side effects
 
Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?Alexander Kolb – Flink. Yet another Streaming Framework?
Alexander Kolb – Flink. Yet another Streaming Framework?
 
Looking towards an official cassandra sidecar netflix
Looking towards an official cassandra sidecar   netflixLooking towards an official cassandra sidecar   netflix
Looking towards an official cassandra sidecar netflix
 
From Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka Journey
 
OSOM Operations in the Cloud
OSOM Operations in the CloudOSOM Operations in the Cloud
OSOM Operations in the Cloud
 

Viewers also liked

Erlang web framework: Chicago boss
Erlang web framework: Chicago bossErlang web framework: Chicago boss
Erlang web framework: Chicago boss
Barcamp Saigon
 
NoSQL CGN: Riak (01/2012)
NoSQL CGN: Riak (01/2012)NoSQL CGN: Riak (01/2012)
NoSQL CGN: Riak (01/2012)
Sebastian Cohnen
 
Einführung in nosql // ArangoDB mit Symfony 2
Einführung in nosql // ArangoDB mit Symfony 2Einführung in nosql // ArangoDB mit Symfony 2
Einführung in nosql // ArangoDB mit Symfony 2
ArangoDB Database
 
NoSQL - Neue Ansätze zur Verwaltung unstrukturierter Daten
NoSQL - Neue Ansätze zur Verwaltung unstrukturierter DatenNoSQL - Neue Ansätze zur Verwaltung unstrukturierter Daten
NoSQL - Neue Ansätze zur Verwaltung unstrukturierter Daten
Martin Junghanns
 
NoSQL CGN: CouchDB (11/2011)
NoSQL CGN: CouchDB (11/2011)NoSQL CGN: CouchDB (11/2011)
NoSQL CGN: CouchDB (11/2011)
Sebastian Cohnen
 
Symfony2 Workshop PHP Summit 2013
Symfony2 Workshop PHP Summit 2013Symfony2 Workshop PHP Summit 2013
Symfony2 Workshop PHP Summit 2013Timo Haberkern
 
Experience with C++11 in ArangoDB
Experience with C++11 in ArangoDBExperience with C++11 in ArangoDB
Experience with C++11 in ArangoDB
Max Neunhöffer
 
Artist in Transit @RSE11
Artist in Transit @RSE11Artist in Transit @RSE11
Artist in Transit @RSE11
Matthias Mueller-Prove
 
Der Visuelle Virus auf der Arbeit @RSE13
Der Visuelle Virus auf der Arbeit @RSE13Der Visuelle Virus auf der Arbeit @RSE13
Der Visuelle Virus auf der Arbeit @RSE13Matthias Mueller-Prove
 
OpenData - Was hat das mit mir zu tun? @RSE13
OpenData - Was hat das mit mir zu tun? @RSE13OpenData - Was hat das mit mir zu tun? @RSE13
OpenData - Was hat das mit mir zu tun? @RSE13
Matthias Mueller-Prove
 
Movement, Empathie und die Sehnsucht nach Rhythmus
Movement, Empathie und die Sehnsucht nach RhythmusMovement, Empathie und die Sehnsucht nach Rhythmus
Movement, Empathie und die Sehnsucht nach Rhythmus
Dirk Platzek
 
Usability im web
Usability im webUsability im web
Usability im web
Mayflower GmbH
 

Viewers also liked (12)

Erlang web framework: Chicago boss
Erlang web framework: Chicago bossErlang web framework: Chicago boss
Erlang web framework: Chicago boss
 
NoSQL CGN: Riak (01/2012)
NoSQL CGN: Riak (01/2012)NoSQL CGN: Riak (01/2012)
NoSQL CGN: Riak (01/2012)
 
Einführung in nosql // ArangoDB mit Symfony 2
Einführung in nosql // ArangoDB mit Symfony 2Einführung in nosql // ArangoDB mit Symfony 2
Einführung in nosql // ArangoDB mit Symfony 2
 
NoSQL - Neue Ansätze zur Verwaltung unstrukturierter Daten
NoSQL - Neue Ansätze zur Verwaltung unstrukturierter DatenNoSQL - Neue Ansätze zur Verwaltung unstrukturierter Daten
NoSQL - Neue Ansätze zur Verwaltung unstrukturierter Daten
 
NoSQL CGN: CouchDB (11/2011)
NoSQL CGN: CouchDB (11/2011)NoSQL CGN: CouchDB (11/2011)
NoSQL CGN: CouchDB (11/2011)
 
Symfony2 Workshop PHP Summit 2013
Symfony2 Workshop PHP Summit 2013Symfony2 Workshop PHP Summit 2013
Symfony2 Workshop PHP Summit 2013
 
Experience with C++11 in ArangoDB
Experience with C++11 in ArangoDBExperience with C++11 in ArangoDB
Experience with C++11 in ArangoDB
 
Artist in Transit @RSE11
Artist in Transit @RSE11Artist in Transit @RSE11
Artist in Transit @RSE11
 
Der Visuelle Virus auf der Arbeit @RSE13
Der Visuelle Virus auf der Arbeit @RSE13Der Visuelle Virus auf der Arbeit @RSE13
Der Visuelle Virus auf der Arbeit @RSE13
 
OpenData - Was hat das mit mir zu tun? @RSE13
OpenData - Was hat das mit mir zu tun? @RSE13OpenData - Was hat das mit mir zu tun? @RSE13
OpenData - Was hat das mit mir zu tun? @RSE13
 
Movement, Empathie und die Sehnsucht nach Rhythmus
Movement, Empathie und die Sehnsucht nach RhythmusMovement, Empathie und die Sehnsucht nach Rhythmus
Movement, Empathie und die Sehnsucht nach Rhythmus
 
Usability im web
Usability im webUsability im web
Usability im web
 

Similar to Ruby to Scala in 9 weeks

Writing DSL's in Scala
Writing DSL's in ScalaWriting DSL's in Scala
Writing DSL's in Scala
Abhijit Sharma
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
Don Demcsak
 
Using Scala for building DSLs
Using Scala for building DSLsUsing Scala for building DSLs
Using Scala for building DSLs
IndicThreads
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
confluent
 
Micheal Pershyn "Coljure 4 Big Data"
Micheal Pershyn "Coljure 4 Big Data"Micheal Pershyn "Coljure 4 Big Data"
Micheal Pershyn "Coljure 4 Big Data"
Lviv Startup Club
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Jose Quesada (hiring)
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
FoundationDB
 
Scala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusScala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadius
BoldRadius Solutions
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
John Adams
 
GraphQL API on a Serverless Environment
GraphQL API on a Serverless EnvironmentGraphQL API on a Serverless Environment
GraphQL API on a Serverless Environment
Itai Yaffe
 
DrupalSouth 2015 - Performance: Not an Afterthought
DrupalSouth 2015 - Performance: Not an AfterthoughtDrupalSouth 2015 - Performance: Not an Afterthought
DrupalSouth 2015 - Performance: Not an Afterthought
Nick Santamaria
 
Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)
Alexey Rybak
 
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Flink Forward
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
Apache Apex
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
LinkedIn
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
John Adams
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Xu Jiang
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Helena Edelson
 
Hadoop and Spark
Hadoop and SparkHadoop and Spark
Hadoop and Spark
Shravan (Sean) Pabba
 
Aggregate Programming in Scala
Aggregate Programming in ScalaAggregate Programming in Scala
Aggregate Programming in Scala
Roberto Casadei
 

Similar to Ruby to Scala in 9 weeks (20)

Writing DSL's in Scala
Writing DSL's in ScalaWriting DSL's in Scala
Writing DSL's in Scala
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Using Scala for building DSLs
Using Scala for building DSLsUsing Scala for building DSLs
Using Scala for building DSLs
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Micheal Pershyn "Coljure 4 Big Data"
Micheal Pershyn "Coljure 4 Big Data"Micheal Pershyn "Coljure 4 Big Data"
Micheal Pershyn "Coljure 4 Big Data"
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
 
Scala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusScala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadius
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
GraphQL API on a Serverless Environment
GraphQL API on a Serverless EnvironmentGraphQL API on a Serverless Environment
GraphQL API on a Serverless Environment
 
DrupalSouth 2015 - Performance: Not an Afterthought
DrupalSouth 2015 - Performance: Not an AfterthoughtDrupalSouth 2015 - Performance: Not an Afterthought
DrupalSouth 2015 - Performance: Not an Afterthought
 
Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)Large-scale projects development (scaling LAMP)
Large-scale projects development (scaling LAMP)
 
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
Hadoop and Spark
Hadoop and SparkHadoop and Spark
Hadoop and Spark
 
Aggregate Programming in Scala
Aggregate Programming in ScalaAggregate Programming in Scala
Aggregate Programming in Scala
 

Recently uploaded

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 

Recently uploaded (20)

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 

Ruby to Scala in 9 weeks

  • 1. Ruby to Scala in 9 Weeks Evolving antiquated software with modern tools Jake Utley (jutley@uw.edu) Sept. 9, 2014
  • 2. WhitePages 2 Areas of interest over SDLC • Understanding the problem • Exploring solution space • Speed of prototyping • Performance • Maintainability • Integration prototyping production
  • 3. WhitePages 3 Scala and Ruby Similarities • Readable and concise • Object oriented • Highly composable (traits and mixins) • Highly adaptable
  • 4. WhitePages 4 Background: Location Services • An internal service at Whitepages, written in Ruby • Rationalizes Foursquare data against Whitepages people data • Modified to use data from Facebook, LinkedIn, and Twitter • Contained many issues that motivated rewrite – Unclear code – Sloppy code organization – Poor performance – Minimal documentation
  • 5. WhitePages 5 Migration to Scala • Concurrency – Ruby EventMachine vs. Scala Futures • Type system • Performance – Throughput – Latency – Hardware Utilization
  • 7. • Ruby: No built-in concurrency – Many implementations are single-threaded – Cooperative multitasking comes from external libraries • Scala: Built-in concurrency – Built on top of the JVM – Supports standard concurrency models o Futures, Actors WhitePages 7 Concurrency
  • 8. • Ruby library for cooperative multitasking • Uses the reactor pattern • Dangerous: blocking incorrectly can kill entire app • Long chains of callbacks WhitePages 8 Ruby Concurrency: EventMachine Location Services author created DeferrableSequence to manage callbacks Wikipedia: http://bit.ly/1qHMez5
  • 9. WhitePages 9 Scala Concurrency: Futures val x = Future { 1 + 1 } val y = x map { case two => two * 2 } y map println • Well known concurrency model • Built into Scala • Functional and imperative: – Composition (via map, zip, etc.) – Callbacks (onComplete, etc.) • Blends nicely into other code • Can be treated like a 1-element collection
  • 11. val x = Future { 1 + 1 } val y = x map { case two => two * 2 } y map println WhitePages Confidential 11 Scala Concurrency: Futures • Drawbacks – Have to be used in a non-blocking way (without Await) – Confusing types o Future[Future[Int]] o Seq[Future[Int]] o flatMap and sequence functions help to avoid these types – Difficult to debug
  • 12. WhitePages 12 Areas of interest over SDLC • Understanding the problem • Exploring solution space • Speed of prototyping • Performance • Maintainability • Integration prototyping production
  • 14. WhitePages 14 Type Systems • Ruby: Dynamically typed • Scala: Statically typed • Both have advantages depending on the circumstance
  • 15. WhitePages 15 Type Systems: Ruby • Dynamic typing – Types are checked at runtime – More flexibility – Easier to express ideas -> faster prototyping – More room for errors – Less self-documentation, potentially harder for others to understand code – Defensive coding
  • 16. WhitePages 16 Type Systems: Scala • Static typing – Compiler type checks – Many errors are caught at compile time – Argument types are always documented – Easier for others to maintain code – Strict contracts – IDEs can tell us the types of any variable – Drawback: An IDE becomes a crutch
  • 17. WhitePages 17 Areas of interest over SDLC • Understanding the problem • Exploring solution space • Speed of prototyping • Performance • Maintainability • Integration prototyping production
  • 19. Performance: Methodology • Requests from production sent to services for 5 WhitePages 19 minute interval • Request rate increased until majority of requests time out (10 seconds) • Each service ran on identical hardware • 30 seconds warmups • 3 trials at each request rate
  • 20. WhitePages 20 Performance tools: Onslaught • Performance testing tool built at Whitepages • Reports throughput, p50, p75, p95, p99, p999, mean, and max latencies • Plan to make Onslaught open-source
  • 21. WhitePages 21 Performance: Expectations • Non-blocking I/O – Higher throughput – Better CPU usage (no waits) – Higher memory usage • JVM optimizations – Lower latency
  • 22. Throughput (Scala) WhitePages 22 Performance: Throughput 450 400 350 300 250 200 150 100 50 0 Throughput (Ruby) 25 50 75 100 125 150 Successful responses/s Requests/s 450 400 350 300 250 200 150 100 50 0 100 200 300 400 500 600 Successful responses/s Requests/s
  • 23. WhitePages 23 Performance: Latency 10000000 9000000 8000000 7000000 6000000 5000000 4000000 3000000 2000000 1000000 0 Latency (Ruby) 25 50 75 100 125 150 Latency (μs) Request rate (Req/s) p50 p95 p99 10000000 9000000 8000000 7000000 6000000 5000000 4000000 3000000 2000000 1000000 0 Latency (Scala) 100 200 300 400 500 600 Latency (μs) Request rate (Req/s) p50 p95 p99
  • 27. WhitePages 27 Areas of interest over SDLC • Understanding the problem • Exploring solution space • Speed of prototyping • Performance • Maintainability • Integration prototyping production
  • 28. WhitePages 28 Summary Scala or Ruby? • Prototyping: – Ruby’s dynamic typing allows for fast prototyping • Production: – Scala’s static typing catches more errors and can make code clearer – Scala supports concurrency with standard library, Ruby does not – Scala performs better than Ruby in throughput and hardware utilization

Editor's Notes

  1. Introduction Been moving a service from Ruby to Scala Sharing most interestings aspect of my experience moving a Ruby service that has outgrown Ruby, to Scala Disclaimer: Mostly subjective material, based on experience
  2. Over course of SDLC, we’re interested in different things Different languages allow us to explore different ideas better Ruby better for prototyping, Scala better for production
  3. Not “write-only”
  4. When just Foursquare, resembled prototype Too much added on at a bad state Use Scala for rewrite, better for production What makes Scala better for production? Most issues not due to Ruby
  5. Main implementations, MRI (Matz’s Ruby Interpreter) C-based, single threaded Similar issues with other interpretted languages (like python) Outside library I used: EventMachine In my case: EventMachine and Futures
  6. Ask who is familiar with reactor pattern (handler dispatches requests) because single threaded, blocking the thread kills the app long chains of callbacks: hard to trace code Author created DeferrableSequence to manage callbacks. Extra work, non-standard (unfamiliar, error-prone) Event machine only ever doing one thing at a time, not true concurrency
  7. Built in: reliable, familiar to other developers Callbacks for side effects (don’t return something) Composition for functional code
  8. Built in: reliable, familiar to other developers Callbacks for side effects (don’t return something) Composition for functional code
  9. Still need to avoid blocking, not as severe consequences as EM Overall: Often when prototyping, we’re not as concerned with concurrency, so Ruby works perfectly fine. In production, Scala is more appealing Other major difference: Type System
  10. Other difference: type system
  11. compiler doesn’t get in the way dynamically typed code has high potential to become hard to maintain code
  12. Compiler is our friend Code with a longer shelf life
  13. Type systems and concurrency different, impacts performance
  14. Dynamic vs. static, interpreted vs. compiled, different concurrency models: expect different performance
  15. Point out Devin and Paul
  16. Fairly significant improvement in throughput Seen higher improvements in other services, so we suspect that this can be improved further
  17. clarify x axis
  18. Over period of throughput testing Graph shows Idle CPU can be confusing upside down About the same amount of CPU use
  19. Graph shows Idle CPU can be confusing upside down
  20. Graph shows Idle CPU can be confusing upside down Overall: Scala seems to use more resources more heavily, but is much faster
  21. Does anybody not care about performance?