SlideShare a Scribd company logo
A CHANGE OF SEASONS
A big move to Apache Cassandra
Eiti Kimura, IT Coordinator @Movile Brazil
Eiti Kimura
Spreading the word...
Leader in Latin America
Mobile phones, Smartphones and Tablets
Movile is the company behind the
apps that make your life easier.
We think mobile...
Movile develops apps across all platforms for smartphones
and tablets to not only make life easier, but also more fun.
The company recorded an annual average growth of 80% in the last 7 years
use
cases3
THAT Constitute
THE BIG
move to
Apache Cassandra
- Move I -
The Subscription and Billing
System a.k.a SBS
Subscription and Billing Platform
- it is a service API
- responsible to manage user’s subscriptions
- responsible to charge users in carriers
- an engine to renew subscriptions
“can not” stop under any circumstance
it has to be very performatic
The platform in numbers
88 Million of
Subscriptions
66,1M of unique
users
105M of
transactions a day
Platform Evolution timeline
2008
Pure relational
database times
2009
Apache Cassandra
adoption (v0.6)
2011
The data model was
entirely remodeled
4 nodes
Cluster upgrade from
version 1.0 to 1.2
2013
Cluster upgrade
from version 0.7
to 1.0
Expanded from
4 to 6 nodes
2014
New data index
using time series
2015
THE BIG MOVE
migrating complex
queries from
relational database
Initial architecture revisited
API
DB
API APIAPI API
Engine
Engine Engine
Classical solution using a regular RDBMS
Architecture disadvantages
- single point of failure
- slow response times
- platform gone down often
- hard and expensive to scale
- if you scale your platform and forget to scale
database and other related resources you’ll
fail
A new architecture has come
API
API
Engine
Engine
DB
A hybrid solution using Apache Cassandra Cluster plus a
relational database solution to execute complex queries
Regular
SQL
Queries
API
API
The benefits of new solution
- performance problems: solved
- availability problems: solved
- single point of failure: partially solved
- significantly increased read and write
throughput
The solution weaknesses
Engine
Engine
DB
SQL Queries
- querying relational database consumes time
- has side effects, it locks data being updated
and inserted
- concurrency causes performance
degradation
- it does not scale well
- we still need to use relational database to
execute complex queries
The problems
The complex query..
- query subscription table
- selects expired subscriptions
- the subscriptions must be grouped by user
- must be ordered by priority, criteria, type of
user plan
Sort data
Aggregation
Filter Criterias
Projection
SQLServer’s query
SELECT s.phone, MIN(s.last_renew_attempt) AS min_last_renew_attempt
FROM subscription AS s WITH(nolock)
JOIN configuration AS c WITH(nolock)
ON s.configuration_id = c.configuration_id
WHERE s.enabled = 1
AND s.timeout_date < GETDATE()
AND s.related_id IS NULL
AND c.carrier_id = ?
AND ( c.enabled = 1 AND
( c.renewable = 1 OR c.use_balance_parameter = 1 ) )
GROUP BY s.phone
ORDER BY charge_priority DESC, max(user_plan) DESC,
min_last_renew_attempt
The solution
- Extract data from Apache Cassandra instead
of use relational database
- There is no single point of failure
- Performance improved, but more work
querying and filtering data
Main concern: distributed sort data by multiple
criterias and data aggregation
- Apache Spark!?
- Databricks to use Apache Spark to sort 100 TB of
data on 206 machines in 23 minutes
https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html
Divide-And-Conquer
Preparing for the new solution
Subscription Subscription Index
● configuration_id
○ phone-number
Using a new table as index applying data denormalization!
● each subscription becomes a
column (time series)
Proof of Concept with Apache Spark
Data Extractor
Processor
Preparing Resources
Processor
Java Code Snippet
JavaSparkContext sc = new JavaSparkContext("local[*]", "Simple App",
SPARK_HOME, "spark-simple-1.0.jar");
// Get file from resources folder
ClassLoader classLoader = SparkFileJob.class.getClassLoader();
File file = new File(classLoader.getResource("dataset-10MM.json").getFile());
SQLContext sqlContext = new SQLContext(sc);
DataFrame df = sqlContext.read().json(file.getPath());
df.registerTempTable("subscription");
Preparing and Executing Query
SELECT phone, MAX(charge_priority) as max_priority,
FROM subscription
WHERE enabled = 1
AND timeout_date < System.currentTimeMillis()
AND related_id IS NULL
AND carrier_id in (1, 4, 2, 5)
GROUP BY phone
ORDER BY max_priority DESC, max_plan DESC
sqlContext.sql(query)
.javaRDD()
.foreach(row -> process(row));
Spark SQL
Query
Java code
snippet
- We have Datastax Spark-Cassandra-Connector!
- It allow to expose Cassandra tables as Spark RDDs
- use Apache Spark coupled to Cassandra
https://github.com/datastax/spark-cassandra-connector
https://github.com/eiti-kimura-movile/spark-cassandra
Next Steps
- upgrade cluster version to >= 2.1
- cluster read improvements in 50% from thrift
to CQL, native protocol v3
- implement the final solution Cassandra +
Spark
- Move II -
The Kiwi Migration
The Kiwi Platform
- it is a common backend smartphone
platform
- provides user and device management
- user event and media tracker
- analytics
- push notifications
High Performance Availability Required
Kiwi: The beginning
API
Consumer
Consumer
API
Dynamo
DB
Queue SQS
Queue SQS
PostgreSQL
Push notifications
low reading throughput
The push notification crusade
PostgreSQL
Push Publisher
Push Publisher
Push Publisher
Apple notification
service
Google notification
service
The problems (dejavú?)
- single point of failure with PostgreSQL
- high costs paying for 2 storage services
- DynamoDB does not have good read
throughput for linear readings
- RDS PostreSQL tuning limit reached
- low throughput sending notifications
Slowness means frustration
The solution in numbers
- data storage cost
- Amazon DynamoDB: U$ 4,575.00 / mo
- PostgreSQL (RDS): U$ 6,250.00 / mo
- read throughput measured
- Amazon DynamoDB: 1,4k /s (linear, sequential reads)
- PostgreSQL (RDS): 10k /s
U$ 10,825.00 / mo
Push Publisher
Push Publisher
Push Publisher
Apple notification
service
Google notification
service
Remodeled solution, Cassandra Way
Datamodel changes
- Amazon DynamoDB
- object serialized with Avro
- a few columns
- Apache Cassandra
- exploded object
- more than 80 columns without serialization
Conclusion
AWS DynamoDB + Postgres = U$ 10,825.00/mo
Read Throughput = ~ 12k/s
Apache Cassandra
(8 nodes c3.2xlarge) = U$ 2,580.00/mo
Read Throughput = ~ 200k/s
Before Migration
After Migration
savings of 300%!!!
- Move III -
Distributing Resources
What a kind of resources?
The black listed phone numbers
The ported phone numbers database
Text file resources
Messaging platform
- resources checked before send messages
- identify the user carrier
- resources loaded up in the memory (RAM)
- servers off-cloud (hard to upgrade)
Problem: larger resource files for the
same amount of memory
4GB - 6GB RAM
Loading everything, RAM story
Message Publisher
Black list Portability
- low JVM responses (GC)
- server memory limit
reached
- files continue to grow
- more than 20 instances in
different servers loading
the same resources
How about a distributed solution?
- the resource files are the same in all of the
servers
- RAM memory does not scale well
- It is an expensive solution
So..
- Why not distribute resources around a ring?
The distributed resources solution
DC1
DC2
DC3
Message
Publisher
Message
Publisher
Message
Publisher
Message
Publisher
Message
Publisher
Message
Publisher
Message
Publisher
Message
Publisher
Message
Publisher
Other
Platforms
- common information are shared across a
Cassandra cluster
- the massive hardware upgrade: solved
- the data are available for other platforms
- it is highly scalable
- easy to accommodate more data
Checking the results
Wrapping up the Moves
- always upgrade to newest versions
- high throughput and availability makes a
difference
- costs really, really matter!
- the horizontal scalability is great! if your
volume of data grow, increase the number of
nodes
eitikimura eiti-kimura-movile eiti.kimura@movile.com

More Related Content

What's hot

HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersRahul Jain
 
Policy based cluster management in oracle 12c
Policy based cluster management in oracle 12c Policy based cluster management in oracle 12c
Policy based cluster management in oracle 12c Anju Garg
 
How Impala Works
How Impala WorksHow Impala Works
How Impala WorksYue Chen
 
Oracle database 12c intro
Oracle database 12c introOracle database 12c intro
Oracle database 12c intropasalapudi
 
ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIES
ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIESORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIES
ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIESLudovico Caldara
 
Apache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseApache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseNag Arvind Gudiseva
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBaseCloudera, Inc.
 
HBase In Action - Chapter 10 - Operations
HBase In Action - Chapter 10 - OperationsHBase In Action - Chapter 10 - Operations
HBase In Action - Chapter 10 - Operationsphanleson
 
RACAttack 12c Advanced Lab: Server Pools and Policy-managed databases
RACAttack 12c Advanced Lab: Server Pools and Policy-managed databasesRACAttack 12c Advanced Lab: Server Pools and Policy-managed databases
RACAttack 12c Advanced Lab: Server Pools and Policy-managed databasesLudovico Caldara
 
How to boost performance of your rails app using dynamo db and memcached
How to boost performance of your rails app using dynamo db and memcachedHow to boost performance of your rails app using dynamo db and memcached
How to boost performance of your rails app using dynamo db and memcachedAndolasoft Inc
 
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark Summit
 
Sql server 2012 dba online training
Sql server 2012 dba online trainingSql server 2012 dba online training
Sql server 2012 dba online trainingsqlmasters
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Scott Leberknight
 
Scaling MySQL -- Swanseacon.co.uk
Scaling MySQL -- Swanseacon.co.uk Scaling MySQL -- Swanseacon.co.uk
Scaling MySQL -- Swanseacon.co.uk Dave Stokes
 

What's hot (20)

HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
SQL Server 2012 Best Practices
SQL Server 2012 Best PracticesSQL Server 2012 Best Practices
SQL Server 2012 Best Practices
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Policy based cluster management in oracle 12c
Policy based cluster management in oracle 12c Policy based cluster management in oracle 12c
Policy based cluster management in oracle 12c
 
How Impala Works
How Impala WorksHow Impala Works
How Impala Works
 
Oracle database 12c intro
Oracle database 12c introOracle database 12c intro
Oracle database 12c intro
 
ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIES
ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIESORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIES
ORACLE 12C DATA GUARD: FAR SYNC, REAL-TIME CASCADE STANDBY AND OTHER GOODIES
 
Apache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseApache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBase
 
Hadoop
HadoopHadoop
Hadoop
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
HBase In Action - Chapter 10 - Operations
HBase In Action - Chapter 10 - OperationsHBase In Action - Chapter 10 - Operations
HBase In Action - Chapter 10 - Operations
 
RACAttack 12c Advanced Lab: Server Pools and Policy-managed databases
RACAttack 12c Advanced Lab: Server Pools and Policy-managed databasesRACAttack 12c Advanced Lab: Server Pools and Policy-managed databases
RACAttack 12c Advanced Lab: Server Pools and Policy-managed databases
 
How to think like the engine
How to think like the engineHow to think like the engine
How to think like the engine
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
 
How to boost performance of your rails app using dynamo db and memcached
How to boost performance of your rails app using dynamo db and memcachedHow to boost performance of your rails app using dynamo db and memcached
How to boost performance of your rails app using dynamo db and memcached
 
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
 
SQL and Search with Spark in your browser
SQL and Search with Spark in your browserSQL and Search with Spark in your browser
SQL and Search with Spark in your browser
 
Sql server 2012 dba online training
Sql server 2012 dba online trainingSql server 2012 dba online training
Sql server 2012 dba online training
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0
 
Scaling MySQL -- Swanseacon.co.uk
Scaling MySQL -- Swanseacon.co.uk Scaling MySQL -- Swanseacon.co.uk
Scaling MySQL -- Swanseacon.co.uk
 

Similar to Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra

Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Alluxio, Inc.
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkC4Media
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Nati Shalom
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151xlight
 
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...Lviv Startup Club
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyershuguk
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...GeeksLab Odessa
 
Camunda BPM 7.2: Performance and Scalability (English)
Camunda BPM 7.2: Performance and Scalability (English)Camunda BPM 7.2: Performance and Scalability (English)
Camunda BPM 7.2: Performance and Scalability (English)camunda services GmbH
 
Application design for the cloud using AWS
Application design for the cloud using AWSApplication design for the cloud using AWS
Application design for the cloud using AWSJonathan Holloway
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsStrata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsSingleStore
 
Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...Tao Cheng
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹Amazon Web Services
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network ProcessingRyousei Takano
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsVMware Tanzu
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Amazon Web Services
 
Roles y Responsabilidades en SQL Azure
Roles y Responsabilidades en SQL AzureRoles y Responsabilidades en SQL Azure
Roles y Responsabilidades en SQL AzureEduardo Castro
 

Similar to Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra (20)

Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
L21 scalability
L21 scalabilityL21 scalability
L21 scalability
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
Camunda BPM 7.2: Performance and Scalability (English)
Camunda BPM 7.2: Performance and Scalability (English)Camunda BPM 7.2: Performance and Scalability (English)
Camunda BPM 7.2: Performance and Scalability (English)
 
Sql Server
Sql ServerSql Server
Sql Server
 
Application design for the cloud using AWS
Application design for the cloud using AWSApplication design for the cloud using AWS
Application design for the cloud using AWS
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsStrata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
 
Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven Applications
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
 
Roles y Responsabilidades en SQL Azure
Roles y Responsabilidades en SQL AzureRoles y Responsabilidades en SQL Azure
Roles y Responsabilidades en SQL Azure
 

More from DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Recently uploaded

Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»QADay
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupCatarinaPereira64715
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...Elena Simperl
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxAbida Shariff
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesThousandEyes
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
 

Recently uploaded (20)

Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 

Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra

  • 1. A CHANGE OF SEASONS A big move to Apache Cassandra Eiti Kimura, IT Coordinator @Movile Brazil
  • 4. Leader in Latin America Mobile phones, Smartphones and Tablets Movile is the company behind the apps that make your life easier.
  • 5.
  • 6.
  • 7. We think mobile... Movile develops apps across all platforms for smartphones and tablets to not only make life easier, but also more fun. The company recorded an annual average growth of 80% in the last 7 years
  • 9. - Move I - The Subscription and Billing System a.k.a SBS
  • 10. Subscription and Billing Platform - it is a service API - responsible to manage user’s subscriptions - responsible to charge users in carriers - an engine to renew subscriptions “can not” stop under any circumstance it has to be very performatic
  • 11. The platform in numbers 88 Million of Subscriptions 66,1M of unique users 105M of transactions a day
  • 12. Platform Evolution timeline 2008 Pure relational database times 2009 Apache Cassandra adoption (v0.6) 2011 The data model was entirely remodeled 4 nodes Cluster upgrade from version 1.0 to 1.2 2013 Cluster upgrade from version 0.7 to 1.0 Expanded from 4 to 6 nodes 2014 New data index using time series 2015 THE BIG MOVE migrating complex queries from relational database
  • 13. Initial architecture revisited API DB API APIAPI API Engine Engine Engine Classical solution using a regular RDBMS
  • 14. Architecture disadvantages - single point of failure - slow response times - platform gone down often - hard and expensive to scale - if you scale your platform and forget to scale database and other related resources you’ll fail
  • 15. A new architecture has come API API Engine Engine DB A hybrid solution using Apache Cassandra Cluster plus a relational database solution to execute complex queries Regular SQL Queries API API
  • 16. The benefits of new solution - performance problems: solved - availability problems: solved - single point of failure: partially solved - significantly increased read and write throughput
  • 18.
  • 19. - querying relational database consumes time - has side effects, it locks data being updated and inserted - concurrency causes performance degradation - it does not scale well - we still need to use relational database to execute complex queries The problems
  • 20. The complex query.. - query subscription table - selects expired subscriptions - the subscriptions must be grouped by user - must be ordered by priority, criteria, type of user plan
  • 21. Sort data Aggregation Filter Criterias Projection SQLServer’s query SELECT s.phone, MIN(s.last_renew_attempt) AS min_last_renew_attempt FROM subscription AS s WITH(nolock) JOIN configuration AS c WITH(nolock) ON s.configuration_id = c.configuration_id WHERE s.enabled = 1 AND s.timeout_date < GETDATE() AND s.related_id IS NULL AND c.carrier_id = ? AND ( c.enabled = 1 AND ( c.renewable = 1 OR c.use_balance_parameter = 1 ) ) GROUP BY s.phone ORDER BY charge_priority DESC, max(user_plan) DESC, min_last_renew_attempt
  • 22. The solution - Extract data from Apache Cassandra instead of use relational database - There is no single point of failure - Performance improved, but more work querying and filtering data Main concern: distributed sort data by multiple criterias and data aggregation - Apache Spark!?
  • 23. - Databricks to use Apache Spark to sort 100 TB of data on 206 machines in 23 minutes https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html
  • 24. Divide-And-Conquer Preparing for the new solution Subscription Subscription Index ● configuration_id ○ phone-number Using a new table as index applying data denormalization! ● each subscription becomes a column (time series)
  • 25. Proof of Concept with Apache Spark Data Extractor Processor
  • 26. Preparing Resources Processor Java Code Snippet JavaSparkContext sc = new JavaSparkContext("local[*]", "Simple App", SPARK_HOME, "spark-simple-1.0.jar"); // Get file from resources folder ClassLoader classLoader = SparkFileJob.class.getClassLoader(); File file = new File(classLoader.getResource("dataset-10MM.json").getFile()); SQLContext sqlContext = new SQLContext(sc); DataFrame df = sqlContext.read().json(file.getPath()); df.registerTempTable("subscription");
  • 27. Preparing and Executing Query SELECT phone, MAX(charge_priority) as max_priority, FROM subscription WHERE enabled = 1 AND timeout_date < System.currentTimeMillis() AND related_id IS NULL AND carrier_id in (1, 4, 2, 5) GROUP BY phone ORDER BY max_priority DESC, max_plan DESC sqlContext.sql(query) .javaRDD() .foreach(row -> process(row)); Spark SQL Query Java code snippet
  • 28. - We have Datastax Spark-Cassandra-Connector! - It allow to expose Cassandra tables as Spark RDDs - use Apache Spark coupled to Cassandra https://github.com/datastax/spark-cassandra-connector https://github.com/eiti-kimura-movile/spark-cassandra
  • 29. Next Steps - upgrade cluster version to >= 2.1 - cluster read improvements in 50% from thrift to CQL, native protocol v3 - implement the final solution Cassandra + Spark
  • 30. - Move II - The Kiwi Migration
  • 31. The Kiwi Platform - it is a common backend smartphone platform - provides user and device management - user event and media tracker - analytics - push notifications High Performance Availability Required
  • 34. low reading throughput The push notification crusade PostgreSQL Push Publisher Push Publisher Push Publisher Apple notification service Google notification service
  • 35. The problems (dejavú?) - single point of failure with PostgreSQL - high costs paying for 2 storage services - DynamoDB does not have good read throughput for linear readings - RDS PostreSQL tuning limit reached - low throughput sending notifications
  • 37. The solution in numbers - data storage cost - Amazon DynamoDB: U$ 4,575.00 / mo - PostgreSQL (RDS): U$ 6,250.00 / mo - read throughput measured - Amazon DynamoDB: 1,4k /s (linear, sequential reads) - PostgreSQL (RDS): 10k /s U$ 10,825.00 / mo
  • 38. Push Publisher Push Publisher Push Publisher Apple notification service Google notification service Remodeled solution, Cassandra Way
  • 39. Datamodel changes - Amazon DynamoDB - object serialized with Avro - a few columns - Apache Cassandra - exploded object - more than 80 columns without serialization
  • 40. Conclusion AWS DynamoDB + Postgres = U$ 10,825.00/mo Read Throughput = ~ 12k/s Apache Cassandra (8 nodes c3.2xlarge) = U$ 2,580.00/mo Read Throughput = ~ 200k/s Before Migration After Migration savings of 300%!!!
  • 41. - Move III - Distributing Resources
  • 42. What a kind of resources? The black listed phone numbers The ported phone numbers database Text file resources
  • 43. Messaging platform - resources checked before send messages - identify the user carrier - resources loaded up in the memory (RAM) - servers off-cloud (hard to upgrade) Problem: larger resource files for the same amount of memory
  • 44. 4GB - 6GB RAM Loading everything, RAM story Message Publisher Black list Portability - low JVM responses (GC) - server memory limit reached - files continue to grow - more than 20 instances in different servers loading the same resources
  • 45. How about a distributed solution? - the resource files are the same in all of the servers - RAM memory does not scale well - It is an expensive solution So.. - Why not distribute resources around a ring?
  • 46. The distributed resources solution DC1 DC2 DC3 Message Publisher Message Publisher Message Publisher Message Publisher Message Publisher Message Publisher Message Publisher Message Publisher Message Publisher Other Platforms
  • 47. - common information are shared across a Cassandra cluster - the massive hardware upgrade: solved - the data are available for other platforms - it is highly scalable - easy to accommodate more data Checking the results
  • 48.
  • 49. Wrapping up the Moves - always upgrade to newest versions - high throughput and availability makes a difference - costs really, really matter! - the horizontal scalability is great! if your volume of data grow, increase the number of nodes