SlideShare a Scribd company logo
A CHANGE OF SEASONS
A big move to Apache Cassandra
Eiti Kimura, IT Coordinator @Movile Brazil
Eiti Kimura
Spreading the word...
Leader in Latin America
Mobile phones, Smartphones and Tablets
Movile is the company behind the
apps that make your life easier.
We think mobile...
Movile develops apps across all platforms for smartphones
and tablets to not only make life easier, but also more fun.
The company recorded an annual average growth of 80% in the last 7 years
use
cases3
THAT Constitute
THE BIG
move to
Apache Cassandra
- Move I -
The Subscription and Billing
System a.k.a SBS
Subscription and Billing Platform
- it is a service API
- responsible to manage user’s subscriptions
- responsible to charge users in carriers
- an engine to renew subscriptions
“can not” stop under any circumstance
it has to be very performatic
The platform in numbers
88 Million of
Subscriptions
66,1M of unique
users
105M of
transactions a day
Platform Evolution timeline
2008
Pure relational
database times
2009
Apache Cassandra
adoption (v0.6)
2011
The data model was
entirely remodeled
4 nodes
Cluster upgrade from
version 1.0 to 1.2
2013
Cluster upgrade
from version 0.7
to 1.0
Expanded from
4 to 6 nodes
2014
New data index
using time series
2015
THE BIG MOVE
migrating complex
queries from
relational database
Initial architecture revisited
API
DB
API APIAPI API
Engine
Engine Engine
Classical solution using a regular RDBMS
Architecture disadvantages
- single point of failure
- slow response times
- platform gone down often
- hard and expensive to scale
- if you scale your platform and forget to scale
database and other related resources you’ll
fail
A new architecture has come
API
API
Engine
Engine
DB
A hybrid solution using Apache Cassandra Cluster plus a
relational database solution to execute complex queries
Regular
SQL
Queries
API
API
The benefits of new solution
- performance problems: solved
- availability problems: solved
- single point of failure: partially solved
- significantly increased read and write
throughput
The solution weaknesses
Engine
Engine
DB
SQL Queries
- querying relational database consumes time
- has side effects, it locks data being updated
and inserted
- concurrency causes performance
degradation
- it does not scale well
- we still need to use relational database to
execute complex queries
The problems
The complex query..
- query subscription table
- selects expired subscriptions
- the subscriptions must be grouped by user
- must be ordered by priority, criteria, type of
user plan
Sort data
Aggregation
Filter Criterias
Projection
SQLServer’s query
SELECT s.phone, MIN(s.last_renew_attempt) AS min_last_renew_attempt
FROM subscription AS s WITH(nolock)
JOIN configuration AS c WITH(nolock)
ON s.configuration_id = c.configuration_id
WHERE s.enabled = 1
AND s.timeout_date < GETDATE()
AND s.related_id IS NULL
AND c.carrier_id = ?
AND ( c.enabled = 1 AND
( c.renewable = 1 OR c.use_balance_parameter = 1 ) )
GROUP BY s.phone
ORDER BY charge_priority DESC, max(user_plan) DESC,
min_last_renew_attempt
The solution
- Extract data from Apache Cassandra instead
of use relational database
- There is no single point of failure
- Performance improved, but more work
querying and filtering data
Main concern: distributed sort data by multiple
criterias and data aggregation
- Apache Spark!?
- Databricks to use Apache Spark to sort 100 TB of
data on 206 machines in 23 minutes
https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html
Divide-And-Conquer
Preparing for the new solution
Subscription Subscription Index
● configuration_id
○ phone-number
Using a new table as index applying data denormalization!
● each subscription becomes a
column (time series)
Proof of Concept with Apache Spark
Data Extractor
Processor
Preparing Resources
Processor
Java Code Snippet
JavaSparkContext sc = new JavaSparkContext("local[*]", "Simple App",
SPARK_HOME, "spark-simple-1.0.jar");
// Get file from resources folder
ClassLoader classLoader = SparkFileJob.class.getClassLoader();
File file = new File(classLoader.getResource("dataset-10MM.json").getFile());
SQLContext sqlContext = new SQLContext(sc);
DataFrame df = sqlContext.read().json(file.getPath());
df.registerTempTable("subscription");
Preparing and Executing Query
SELECT phone, MAX(charge_priority) as max_priority,
FROM subscription
WHERE enabled = 1
AND timeout_date < System.currentTimeMillis()
AND related_id IS NULL
AND carrier_id in (1, 4, 2, 5)
GROUP BY phone
ORDER BY max_priority DESC, max_plan DESC
sqlContext.sql(query)
.javaRDD()
.foreach(row -> process(row));
Spark SQL
Query
Java code
snippet
- We have Datastax Spark-Cassandra-Connector!
- It allow to expose Cassandra tables as Spark RDDs
- use Apache Spark coupled to Cassandra
https://github.com/datastax/spark-cassandra-connector
https://github.com/eiti-kimura-movile/spark-cassandra
Next Steps
- upgrade cluster version to >= 2.1
- cluster read improvements in 50% from thrift
to CQL, native protocol v3
- implement the final solution Cassandra +
Spark
- Move II -
The Kiwi Migration
The Kiwi Platform
- it is a common backend smartphone
platform
- provides user and device management
- user event and media tracker
- analytics
- push notifications
High Performance Availability Required
Kiwi: The beginning
API
Consumer
Consumer
API
Dynamo
DB
Queue SQS
Queue SQS
PostgreSQL
Push notifications
low reading throughput
The push notification crusade
PostgreSQL
Push Publisher
Push Publisher
Push Publisher
Apple notification
service
Google notification
service
The problems (dejavú?)
- single point of failure with PostgreSQL
- high costs paying for 2 storage services
- DynamoDB does not have good read
throughput for linear readings
- RDS PostreSQL tuning limit reached
- low throughput sending notifications
Slowness means frustration
The solution in numbers
- data storage cost
- Amazon DynamoDB: U$ 4,575.00 / mo
- PostgreSQL (RDS): U$ 6,250.00 / mo
- read throughput measured
- Amazon DynamoDB: 1,4k /s (linear, sequential reads)
- PostgreSQL (RDS): 10k /s
U$ 10,825.00 / mo
Push Publisher
Push Publisher
Push Publisher
Apple notification
service
Google notification
service
Remodeled solution, Cassandra Way
Datamodel changes
- Amazon DynamoDB
- object serialized with Avro
- a few columns
- Apache Cassandra
- exploded object
- more than 80 columns without serialization
Conclusion
AWS DynamoDB + Postgres = U$ 10,825.00/mo
Read Throughput = ~ 12k/s
Apache Cassandra
(8 nodes c3.2xlarge) = U$ 2,580.00/mo
Read Throughput = ~ 200k/s
Before Migration
After Migration
savings of 300%!!!
- Move III -
Distributing Resources
What a kind of resources?
The black listed phone numbers
The ported phone numbers database
Text file resources
Messaging platform
- resources checked before send messages
- identify the user carrier
- resources loaded up in the memory (RAM)
- servers off-cloud (hard to upgrade)
Problem: larger resource files for the
same amount of memory
4GB - 6GB RAM
Loading everything, RAM story
Message Publisher
Black list Portability
- low JVM responses (GC)
- server memory limit
reached
- files continue to grow
- more than 20 instances in
different servers loading
the same resources
How about a distributed solution?
- the resource files are the same in all of the
servers
- RAM memory does not scale well
- It is an expensive solution
So..
- Why not distribute resources around a ring?
The distributed resources solution
DC1
DC2
DC3
Message
Publisher
Message
Publisher
Message
Publisher
Message
Publisher
Message
Publisher
Message
Publisher
Message
Publisher
Message
Publisher
Message
Publisher
Other
Platforms
- common information are shared across a
Cassandra cluster
- the massive hardware upgrade: solved
- the data are available for other platforms
- it is highly scalable
- easy to accommodate more data
Checking the results
Wrapping up the Moves
- always upgrade to newest versions
- high throughput and availability makes a
difference
- costs really, really matter!
- the horizontal scalability is great! if your
volume of data grow, increase the number of
nodes
eitikimura eiti-kimura-movile eiti.kimura@movile.com

More Related Content

What's hot

Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Caserta
 
RubyKaigi 2014: ServerEngine
RubyKaigi 2014: ServerEngineRubyKaigi 2014: ServerEngine
RubyKaigi 2014: ServerEngine
Treasure Data, Inc.
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Databricks
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
Victor Coustenoble
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
Yousun Jeong
 
SQL on Hadoop in Taiwan
SQL on Hadoop in TaiwanSQL on Hadoop in Taiwan
SQL on Hadoop in Taiwan
Treasure Data, Inc.
 
Apache Spark Overview part2 (20161117)
Apache Spark Overview part2 (20161117)Apache Spark Overview part2 (20161117)
Apache Spark Overview part2 (20161117)
Steve Min
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
DataStax
 
Spark and Spark Streaming
Spark and Spark StreamingSpark and Spark Streaming
Spark and Spark Streaming
宇 傅
 
[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)Steve Min
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Legacy Typesafe (now Lightbend)
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)
zznate
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at Scale
Sascha Dittmann
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Аліна Шепшелей
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
Patrick McFadin
 
Strata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark StreamingStrata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark Streaming
Databricks
 
Ingesting data at scale into elasticsearch with apache pulsar
Ingesting data at scale into elasticsearch with apache pulsarIngesting data at scale into elasticsearch with apache pulsar
Ingesting data at scale into elasticsearch with apache pulsar
Timothy Spann
 
Cassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful APICassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful API
Simran Kedia
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
Andrey Lomakin
 
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
Vinoth Chandar
 

What's hot (20)

Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
 
RubyKaigi 2014: ServerEngine
RubyKaigi 2014: ServerEngineRubyKaigi 2014: ServerEngine
RubyKaigi 2014: ServerEngine
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
SQL on Hadoop in Taiwan
SQL on Hadoop in TaiwanSQL on Hadoop in Taiwan
SQL on Hadoop in Taiwan
 
Apache Spark Overview part2 (20161117)
Apache Spark Overview part2 (20161117)Apache Spark Overview part2 (20161117)
Apache Spark Overview part2 (20161117)
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
 
Spark and Spark Streaming
Spark and Spark StreamingSpark and Spark Streaming
Spark and Spark Streaming
 
[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at Scale
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Strata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark StreamingStrata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark Streaming
 
Ingesting data at scale into elasticsearch with apache pulsar
Ingesting data at scale into elasticsearch with apache pulsarIngesting data at scale into elasticsearch with apache pulsar
Ingesting data at scale into elasticsearch with apache pulsar
 
Cassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful APICassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful API
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
 

Viewers also liked

What causes seasons on earth
What causes seasons on earthWhat causes seasons on earth
What causes seasons on earth
nmsouthern
 
Watching the seasons change
Watching the seasons changeWatching the seasons change
Watching the seasons change
lauttasaari
 
the seasons change
the seasons changethe seasons change
the seasons changeliuhanxiang
 
Seasons
SeasonsSeasons
Seasons
RAISSA RO
 
Shadows and Solar-Lunar Eclipses
Shadows and Solar-Lunar EclipsesShadows and Solar-Lunar Eclipses
Shadows and Solar-Lunar Eclipses
Val Bolislis
 
Solar and lunar eclipse
Solar and lunar eclipseSolar and lunar eclipse
Solar and lunar eclipse
Kunal Yadav
 
Seasons
SeasonsSeasons
SeasonsBrandi
 
Cassandra and security
Cassandra and securityCassandra and security
Cassandra and security
Ben Bromhead
 
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable CassandraCassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
aaronmorton
 
Pythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterPythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra Cluster
DataStax Academy
 
Hardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoiaHardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoia
zznate
 
Securing Cassandra The Right Way
Securing Cassandra The Right WaySecuring Cassandra The Right Way
Securing Cassandra The Right Way
DataStax Academy
 
Seasons and weather james
Seasons and weather jamesSeasons and weather james
Seasons and weather jamesroom04
 
Classification power point_with_domain
Classification power point_with_domainClassification power point_with_domain
Classification power point_with_domainSam Crockett
 
What is the reason for the season
What is the reason for the seasonWhat is the reason for the season
What is the reason for the season
Mal Tomlin
 
Science presentation
Science presentationScience presentation
Science presentation
sh2610
 
Cassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For OperatorsCassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For Operators
Jeff Jirsa
 
Reasons for seasons
Reasons for seasonsReasons for seasons
Reasons for seasons
Deborah Devine
 

Viewers also liked (20)

What causes seasons on earth
What causes seasons on earthWhat causes seasons on earth
What causes seasons on earth
 
Watching the seasons change
Watching the seasons changeWatching the seasons change
Watching the seasons change
 
the seasons change
the seasons changethe seasons change
the seasons change
 
Seasons
SeasonsSeasons
Seasons
 
Shadows and Solar-Lunar Eclipses
Shadows and Solar-Lunar EclipsesShadows and Solar-Lunar Eclipses
Shadows and Solar-Lunar Eclipses
 
Solar and lunar eclipse
Solar and lunar eclipseSolar and lunar eclipse
Solar and lunar eclipse
 
Seasons
SeasonsSeasons
Seasons
 
Cassandra and security
Cassandra and securityCassandra and security
Cassandra and security
 
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable CassandraCassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
 
Pythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterPythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra Cluster
 
Hardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoiaHardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoia
 
Securing Cassandra The Right Way
Securing Cassandra The Right WaySecuring Cassandra The Right Way
Securing Cassandra The Right Way
 
Seasons and weather james
Seasons and weather jamesSeasons and weather james
Seasons and weather james
 
Infomercial about Water
Infomercial about WaterInfomercial about Water
Infomercial about Water
 
Classification power point_with_domain
Classification power point_with_domainClassification power point_with_domain
Classification power point_with_domain
 
What is the reason for the season
What is the reason for the seasonWhat is the reason for the season
What is the reason for the season
 
Science presentation
Science presentationScience presentation
Science presentation
 
Cassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For OperatorsCassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For Operators
 
Eclipse
EclipseEclipse
Eclipse
 
Reasons for seasons
Reasons for seasonsReasons for seasons
Reasons for seasons
 

Similar to Cassandra Summit 2015 - A Change of Seasons

Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Alluxio, Inc.
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
C4Media
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
L21 scalability
L21 scalabilityL21 scalability
L21 scalability
Ólafur Andri Ragnarsson
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Nati Shalom
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151xlight
 
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
Lviv Startup Club
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
huguk
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
GeeksLab Odessa
 
Camunda BPM 7.2: Performance and Scalability (English)
Camunda BPM 7.2: Performance and Scalability (English)Camunda BPM 7.2: Performance and Scalability (English)
Camunda BPM 7.2: Performance and Scalability (English)
camunda services GmbH
 
Sql Server
Sql ServerSql Server
Sql Server
SandyShin
 
Application design for the cloud using AWS
Application design for the cloud using AWSApplication design for the cloud using AWS
Application design for the cloud using AWS
Jonathan Holloway
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsStrata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
SingleStore
 
Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...
Tao Cheng
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
Amazon Web Services
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
Ryousei Takano
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven Applications
VMware Tanzu
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Amazon Web Services
 
Roles y Responsabilidades en SQL Azure
Roles y Responsabilidades en SQL AzureRoles y Responsabilidades en SQL Azure
Roles y Responsabilidades en SQL Azure
Eduardo Castro
 

Similar to Cassandra Summit 2015 - A Change of Seasons (20)

Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
L21 scalability
L21 scalabilityL21 scalability
L21 scalability
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
Camunda BPM 7.2: Performance and Scalability (English)
Camunda BPM 7.2: Performance and Scalability (English)Camunda BPM 7.2: Performance and Scalability (English)
Camunda BPM 7.2: Performance and Scalability (English)
 
Sql Server
Sql ServerSql Server
Sql Server
 
Application design for the cloud using AWS
Application design for the cloud using AWSApplication design for the cloud using AWS
Application design for the cloud using AWS
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsStrata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
 
Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven Applications
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
 
Roles y Responsabilidades en SQL Azure
Roles y Responsabilidades en SQL AzureRoles y Responsabilidades en SQL Azure
Roles y Responsabilidades en SQL Azure
 

More from Eiti Kimura

[Datafest 2018] Apache Spark Structured Stream - Moedor de dados em tempo qua...
[Datafest 2018] Apache Spark Structured Stream - Moedor de dados em tempo qua...[Datafest 2018] Apache Spark Structured Stream - Moedor de dados em tempo qua...
[Datafest 2018] Apache Spark Structured Stream - Moedor de dados em tempo qua...
Eiti Kimura
 
[Redis conf18] The Versatility of Redis
[Redis conf18] The Versatility of Redis[Redis conf18] The Versatility of Redis
[Redis conf18] The Versatility of Redis
Eiti Kimura
 
[DEVFEST] Apache Spark Casos de Uso e Escalabilidade
[DEVFEST] Apache Spark  Casos de Uso e Escalabilidade[DEVFEST] Apache Spark  Casos de Uso e Escalabilidade
[DEVFEST] Apache Spark Casos de Uso e Escalabilidade
Eiti Kimura
 
[DataFest-2017] Apache Cassandra Para Sistemas de Alto Desempenho
[DataFest-2017] Apache Cassandra Para Sistemas de Alto Desempenho[DataFest-2017] Apache Cassandra Para Sistemas de Alto Desempenho
[DataFest-2017] Apache Cassandra Para Sistemas de Alto Desempenho
Eiti Kimura
 
[TDC2016] Apache SparkMLlib: Machine Learning na Prática
[TDC2016] Apache SparkMLlib:  Machine Learning na Prática[TDC2016] Apache SparkMLlib:  Machine Learning na Prática
[TDC2016] Apache SparkMLlib: Machine Learning na Prática
Eiti Kimura
 
[TDC2016] Apache Cassandra Estratégias de Modelagem de Dados
[TDC2016]  Apache Cassandra Estratégias de Modelagem de Dados[TDC2016]  Apache Cassandra Estratégias de Modelagem de Dados
[TDC2016] Apache Cassandra Estratégias de Modelagem de Dados
Eiti Kimura
 
QConSP16 - Apache Cassandra Evoluindo Sistemas Distribuídos
QConSP16 - Apache Cassandra Evoluindo Sistemas DistribuídosQConSP16 - Apache Cassandra Evoluindo Sistemas Distribuídos
QConSP16 - Apache Cassandra Evoluindo Sistemas Distribuídos
Eiti Kimura
 
SP Big Data Meetup - Conhecendo Apache Cassandra @Movile
SP Big Data Meetup - Conhecendo Apache Cassandra @MovileSP Big Data Meetup - Conhecendo Apache Cassandra @Movile
SP Big Data Meetup - Conhecendo Apache Cassandra @Movile
Eiti Kimura
 
TDC2015 - Apache Cassandra no Desenvolvimento de Sistemas de Alto Desempenho
TDC2015 - Apache Cassandra no Desenvolvimento de Sistemas de Alto DesempenhoTDC2015 - Apache Cassandra no Desenvolvimento de Sistemas de Alto Desempenho
TDC2015 - Apache Cassandra no Desenvolvimento de Sistemas de Alto Desempenho
Eiti Kimura
 
Conhecendo Apache Cassandra @Movile
Conhecendo Apache Cassandra  @MovileConhecendo Apache Cassandra  @Movile
Conhecendo Apache Cassandra @Movile
Eiti Kimura
 
Cassandra overview: Um Caso Prático
Cassandra overview:  Um Caso PráticoCassandra overview:  Um Caso Prático
Cassandra overview: Um Caso Prático
Eiti Kimura
 
QConSP 2014 - Cassandra no Desenvolvimento de Aplicações para serviços Móveis
QConSP 2014 - Cassandra no Desenvolvimento  de Aplicações para  serviços MóveisQConSP 2014 - Cassandra no Desenvolvimento  de Aplicações para  serviços Móveis
QConSP 2014 - Cassandra no Desenvolvimento de Aplicações para serviços Móveis
Eiti Kimura
 

More from Eiti Kimura (12)

[Datafest 2018] Apache Spark Structured Stream - Moedor de dados em tempo qua...
[Datafest 2018] Apache Spark Structured Stream - Moedor de dados em tempo qua...[Datafest 2018] Apache Spark Structured Stream - Moedor de dados em tempo qua...
[Datafest 2018] Apache Spark Structured Stream - Moedor de dados em tempo qua...
 
[Redis conf18] The Versatility of Redis
[Redis conf18] The Versatility of Redis[Redis conf18] The Versatility of Redis
[Redis conf18] The Versatility of Redis
 
[DEVFEST] Apache Spark Casos de Uso e Escalabilidade
[DEVFEST] Apache Spark  Casos de Uso e Escalabilidade[DEVFEST] Apache Spark  Casos de Uso e Escalabilidade
[DEVFEST] Apache Spark Casos de Uso e Escalabilidade
 
[DataFest-2017] Apache Cassandra Para Sistemas de Alto Desempenho
[DataFest-2017] Apache Cassandra Para Sistemas de Alto Desempenho[DataFest-2017] Apache Cassandra Para Sistemas de Alto Desempenho
[DataFest-2017] Apache Cassandra Para Sistemas de Alto Desempenho
 
[TDC2016] Apache SparkMLlib: Machine Learning na Prática
[TDC2016] Apache SparkMLlib:  Machine Learning na Prática[TDC2016] Apache SparkMLlib:  Machine Learning na Prática
[TDC2016] Apache SparkMLlib: Machine Learning na Prática
 
[TDC2016] Apache Cassandra Estratégias de Modelagem de Dados
[TDC2016]  Apache Cassandra Estratégias de Modelagem de Dados[TDC2016]  Apache Cassandra Estratégias de Modelagem de Dados
[TDC2016] Apache Cassandra Estratégias de Modelagem de Dados
 
QConSP16 - Apache Cassandra Evoluindo Sistemas Distribuídos
QConSP16 - Apache Cassandra Evoluindo Sistemas DistribuídosQConSP16 - Apache Cassandra Evoluindo Sistemas Distribuídos
QConSP16 - Apache Cassandra Evoluindo Sistemas Distribuídos
 
SP Big Data Meetup - Conhecendo Apache Cassandra @Movile
SP Big Data Meetup - Conhecendo Apache Cassandra @MovileSP Big Data Meetup - Conhecendo Apache Cassandra @Movile
SP Big Data Meetup - Conhecendo Apache Cassandra @Movile
 
TDC2015 - Apache Cassandra no Desenvolvimento de Sistemas de Alto Desempenho
TDC2015 - Apache Cassandra no Desenvolvimento de Sistemas de Alto DesempenhoTDC2015 - Apache Cassandra no Desenvolvimento de Sistemas de Alto Desempenho
TDC2015 - Apache Cassandra no Desenvolvimento de Sistemas de Alto Desempenho
 
Conhecendo Apache Cassandra @Movile
Conhecendo Apache Cassandra  @MovileConhecendo Apache Cassandra  @Movile
Conhecendo Apache Cassandra @Movile
 
Cassandra overview: Um Caso Prático
Cassandra overview:  Um Caso PráticoCassandra overview:  Um Caso Prático
Cassandra overview: Um Caso Prático
 
QConSP 2014 - Cassandra no Desenvolvimento de Aplicações para serviços Móveis
QConSP 2014 - Cassandra no Desenvolvimento  de Aplicações para  serviços MóveisQConSP 2014 - Cassandra no Desenvolvimento  de Aplicações para  serviços Móveis
QConSP 2014 - Cassandra no Desenvolvimento de Aplicações para serviços Móveis
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 

Cassandra Summit 2015 - A Change of Seasons

  • 1. A CHANGE OF SEASONS A big move to Apache Cassandra Eiti Kimura, IT Coordinator @Movile Brazil
  • 4. Leader in Latin America Mobile phones, Smartphones and Tablets Movile is the company behind the apps that make your life easier.
  • 5.
  • 6.
  • 7. We think mobile... Movile develops apps across all platforms for smartphones and tablets to not only make life easier, but also more fun. The company recorded an annual average growth of 80% in the last 7 years
  • 9. - Move I - The Subscription and Billing System a.k.a SBS
  • 10. Subscription and Billing Platform - it is a service API - responsible to manage user’s subscriptions - responsible to charge users in carriers - an engine to renew subscriptions “can not” stop under any circumstance it has to be very performatic
  • 11. The platform in numbers 88 Million of Subscriptions 66,1M of unique users 105M of transactions a day
  • 12. Platform Evolution timeline 2008 Pure relational database times 2009 Apache Cassandra adoption (v0.6) 2011 The data model was entirely remodeled 4 nodes Cluster upgrade from version 1.0 to 1.2 2013 Cluster upgrade from version 0.7 to 1.0 Expanded from 4 to 6 nodes 2014 New data index using time series 2015 THE BIG MOVE migrating complex queries from relational database
  • 13. Initial architecture revisited API DB API APIAPI API Engine Engine Engine Classical solution using a regular RDBMS
  • 14. Architecture disadvantages - single point of failure - slow response times - platform gone down often - hard and expensive to scale - if you scale your platform and forget to scale database and other related resources you’ll fail
  • 15. A new architecture has come API API Engine Engine DB A hybrid solution using Apache Cassandra Cluster plus a relational database solution to execute complex queries Regular SQL Queries API API
  • 16. The benefits of new solution - performance problems: solved - availability problems: solved - single point of failure: partially solved - significantly increased read and write throughput
  • 18.
  • 19. - querying relational database consumes time - has side effects, it locks data being updated and inserted - concurrency causes performance degradation - it does not scale well - we still need to use relational database to execute complex queries The problems
  • 20. The complex query.. - query subscription table - selects expired subscriptions - the subscriptions must be grouped by user - must be ordered by priority, criteria, type of user plan
  • 21. Sort data Aggregation Filter Criterias Projection SQLServer’s query SELECT s.phone, MIN(s.last_renew_attempt) AS min_last_renew_attempt FROM subscription AS s WITH(nolock) JOIN configuration AS c WITH(nolock) ON s.configuration_id = c.configuration_id WHERE s.enabled = 1 AND s.timeout_date < GETDATE() AND s.related_id IS NULL AND c.carrier_id = ? AND ( c.enabled = 1 AND ( c.renewable = 1 OR c.use_balance_parameter = 1 ) ) GROUP BY s.phone ORDER BY charge_priority DESC, max(user_plan) DESC, min_last_renew_attempt
  • 22. The solution - Extract data from Apache Cassandra instead of use relational database - There is no single point of failure - Performance improved, but more work querying and filtering data Main concern: distributed sort data by multiple criterias and data aggregation - Apache Spark!?
  • 23. - Databricks to use Apache Spark to sort 100 TB of data on 206 machines in 23 minutes https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html
  • 24. Divide-And-Conquer Preparing for the new solution Subscription Subscription Index ● configuration_id ○ phone-number Using a new table as index applying data denormalization! ● each subscription becomes a column (time series)
  • 25. Proof of Concept with Apache Spark Data Extractor Processor
  • 26. Preparing Resources Processor Java Code Snippet JavaSparkContext sc = new JavaSparkContext("local[*]", "Simple App", SPARK_HOME, "spark-simple-1.0.jar"); // Get file from resources folder ClassLoader classLoader = SparkFileJob.class.getClassLoader(); File file = new File(classLoader.getResource("dataset-10MM.json").getFile()); SQLContext sqlContext = new SQLContext(sc); DataFrame df = sqlContext.read().json(file.getPath()); df.registerTempTable("subscription");
  • 27. Preparing and Executing Query SELECT phone, MAX(charge_priority) as max_priority, FROM subscription WHERE enabled = 1 AND timeout_date < System.currentTimeMillis() AND related_id IS NULL AND carrier_id in (1, 4, 2, 5) GROUP BY phone ORDER BY max_priority DESC, max_plan DESC sqlContext.sql(query) .javaRDD() .foreach(row -> process(row)); Spark SQL Query Java code snippet
  • 28. - We have Datastax Spark-Cassandra-Connector! - It allow to expose Cassandra tables as Spark RDDs - use Apache Spark coupled to Cassandra https://github.com/datastax/spark-cassandra-connector https://github.com/eiti-kimura-movile/spark-cassandra
  • 29. Next Steps - upgrade cluster version to >= 2.1 - cluster read improvements in 50% from thrift to CQL, native protocol v3 - implement the final solution Cassandra + Spark
  • 30. - Move II - The Kiwi Migration
  • 31. The Kiwi Platform - it is a common backend smartphone platform - provides user and device management - user event and media tracker - analytics - push notifications High Performance Availability Required
  • 34. low reading throughput The push notification crusade PostgreSQL Push Publisher Push Publisher Push Publisher Apple notification service Google notification service
  • 35. The problems (dejavú?) - single point of failure with PostgreSQL - high costs paying for 2 storage services - DynamoDB does not have good read throughput for linear readings - RDS PostreSQL tuning limit reached - low throughput sending notifications
  • 37. The solution in numbers - data storage cost - Amazon DynamoDB: U$ 4,575.00 / mo - PostgreSQL (RDS): U$ 6,250.00 / mo - read throughput measured - Amazon DynamoDB: 1,4k /s (linear, sequential reads) - PostgreSQL (RDS): 10k /s U$ 10,825.00 / mo
  • 38. Push Publisher Push Publisher Push Publisher Apple notification service Google notification service Remodeled solution, Cassandra Way
  • 39. Datamodel changes - Amazon DynamoDB - object serialized with Avro - a few columns - Apache Cassandra - exploded object - more than 80 columns without serialization
  • 40. Conclusion AWS DynamoDB + Postgres = U$ 10,825.00/mo Read Throughput = ~ 12k/s Apache Cassandra (8 nodes c3.2xlarge) = U$ 2,580.00/mo Read Throughput = ~ 200k/s Before Migration After Migration savings of 300%!!!
  • 41. - Move III - Distributing Resources
  • 42. What a kind of resources? The black listed phone numbers The ported phone numbers database Text file resources
  • 43. Messaging platform - resources checked before send messages - identify the user carrier - resources loaded up in the memory (RAM) - servers off-cloud (hard to upgrade) Problem: larger resource files for the same amount of memory
  • 44. 4GB - 6GB RAM Loading everything, RAM story Message Publisher Black list Portability - low JVM responses (GC) - server memory limit reached - files continue to grow - more than 20 instances in different servers loading the same resources
  • 45. How about a distributed solution? - the resource files are the same in all of the servers - RAM memory does not scale well - It is an expensive solution So.. - Why not distribute resources around a ring?
  • 46. The distributed resources solution DC1 DC2 DC3 Message Publisher Message Publisher Message Publisher Message Publisher Message Publisher Message Publisher Message Publisher Message Publisher Message Publisher Other Platforms
  • 47. - common information are shared across a Cassandra cluster - the massive hardware upgrade: solved - the data are available for other platforms - it is highly scalable - easy to accommodate more data Checking the results
  • 48.
  • 49. Wrapping up the Moves - always upgrade to newest versions - high throughput and availability makes a difference - costs really, really matter! - the horizontal scalability is great! if your volume of data grow, increase the number of nodes