SlideShare a Scribd company logo
1 of 50
Download to read offline
A CHANGE OF SEASONS
A big move to Apache Cassandra
Eiti Kimura, IT Coordinator @Movile Brazil
Eiti Kimura
Spreading the word...
Leader in Latin America
Mobile phones, Smartphones and Tablets
Movile is the company behind the
apps that make your life easier.
We think mobile...
Movile develops apps across all platforms for smartphones
and tablets to not only make life easier, but also more fun.
The company recorded an annual average growth of 80% in the last 7 years
use
cases3
THAT Constitute
THE BIG
move to
Apache Cassandra
- Move I -
The Subscription and Billing
System a.k.a SBS
Subscription and Billing Platform
- it is a service API
- responsible to manage user’s subscriptions
- responsible to charge users in carriers
- an engine to renew subscriptions
“can not” stop under any circumstance
it has to be very performatic
The platform in numbers
88 Million of
Subscriptions
66,1M of unique
users
105M of
transactions a day
Platform Evolution timeline
2008
Pure relational
database times
2009
Apache Cassandra
adoption (v0.6)
2011
The data model was
entirely remodeled
4 nodes
Cluster upgrade from
version 1.0 to 1.2
2013
Cluster upgrade
from version 0.7
to 1.0
Expanded from
4 to 6 nodes
2014
New data index
using time series
2015
THE BIG MOVE
migrating complex
queries from
relational database
Initial architecture revisited
API
DB
API APIAPI API
Engine
Engine Engine
Classical solution using a regular RDBMS
Architecture disadvantages
- single point of failure
- slow response times
- platform gone down often
- hard and expensive to scale
- if you scale your platform and forget to scale
database and other related resources you’ll
fail
A new architecture has come
API
API
Engine
Engine
DB
A hybrid solution using Apache Cassandra Cluster plus a
relational database solution to execute complex queries
Regular
SQL
Queries
API
API
The benefits of new solution
- performance problems: solved
- availability problems: solved
- single point of failure: partially solved
- significantly increased read and write
throughput
The solution weaknesses
Engine
Engine
DB
SQL Queries
- querying relational database consumes time
- has side effects, it locks data being updated
and inserted
- concurrency causes performance
degradation
- it does not scale well
- we still need to use relational database to
execute complex queries
The problems
The complex query..
- query subscription table
- selects expired subscriptions
- the subscriptions must be grouped by user
- must be ordered by priority, criteria, type of
user plan
Sort data
Aggregation
Filter Criterias
Projection
SQLServer’s query
SELECT s.phone, MIN(s.last_renew_attempt) AS min_last_renew_attempt
FROM subscription AS s WITH(nolock)
JOIN configuration AS c WITH(nolock)
ON s.configuration_id = c.configuration_id
WHERE s.enabled = 1
AND s.timeout_date < GETDATE()
AND s.related_id IS NULL
AND c.carrier_id = ?
AND ( c.enabled = 1 AND
( c.renewable = 1 OR c.use_balance_parameter = 1 ) )
GROUP BY s.phone
ORDER BY charge_priority DESC, max(user_plan) DESC,
min_last_renew_attempt
The solution
- Extract data from Apache Cassandra instead
of use relational database
- There is no single point of failure
- Performance improved, but more work
querying and filtering data
Main concern: distributed sort data by multiple
criterias and data aggregation
- Apache Spark!?
- Databricks to use Apache Spark to sort 100 TB of
data on 206 machines in 23 minutes
https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html
Divide-And-Conquer
Preparing for the new solution
Subscription Subscription Index
● configuration_id
○ phone-number
Using a new table as index applying data denormalization!
● each subscription becomes a
column (time series)
Proof of Concept with Apache Spark
Data Extractor
Processor
Preparing Resources
Processor
Java Code Snippet
JavaSparkContext sc = new JavaSparkContext("local[*]", "Simple App",
SPARK_HOME, "spark-simple-1.0.jar");
// Get file from resources folder
ClassLoader classLoader = SparkFileJob.class.getClassLoader();
File file = new File(classLoader.getResource("dataset-10MM.json").getFile());
SQLContext sqlContext = new SQLContext(sc);
DataFrame df = sqlContext.read().json(file.getPath());
df.registerTempTable("subscription");
Preparing and Executing Query
SELECT phone, MAX(charge_priority) as max_priority,
FROM subscription
WHERE enabled = 1
AND timeout_date < System.currentTimeMillis()
AND related_id IS NULL
AND carrier_id in (1, 4, 2, 5)
GROUP BY phone
ORDER BY max_priority DESC, max_plan DESC
sqlContext.sql(query)
.javaRDD()
.foreach(row -> process(row));
Spark SQL
Query
Java code
snippet
- We have Datastax Spark-Cassandra-Connector!
- It allow to expose Cassandra tables as Spark RDDs
- use Apache Spark coupled to Cassandra
https://github.com/datastax/spark-cassandra-connector
https://github.com/eiti-kimura-movile/spark-cassandra
Next Steps
- upgrade cluster version to >= 2.1
- cluster read improvements in 50% from thrift
to CQL, native protocol v3
- implement the final solution Cassandra +
Spark
- Move II -
The Kiwi Migration
The Kiwi Platform
- it is a common backend smartphone
platform
- provides user and device management
- user event and media tracker
- analytics
- push notifications
High Performance Availability Required
Kiwi: The beginning
API
Consumer
Consumer
API
Dynamo
DB
Queue SQS
Queue SQS
PostgreSQL
Push notifications
low reading throughput
The push notification crusade
PostgreSQL
Push Publisher
Push Publisher
Push Publisher
Apple notification
service
Google notification
service
The problems (dejavú?)
- single point of failure with PostgreSQL
- high costs paying for 2 storage services
- DynamoDB does not have good read
throughput for linear readings
- RDS PostreSQL tuning limit reached
- low throughput sending notifications
Slowness means frustration
The solution in numbers
- data storage cost
- Amazon DynamoDB: U$ 4,575.00 / mo
- PostgreSQL (RDS): U$ 6,250.00 / mo
- read throughput measured
- Amazon DynamoDB: 1,4k /s (linear, sequential reads)
- PostgreSQL (RDS): 10k /s
U$ 10,825.00 / mo
Push Publisher
Push Publisher
Push Publisher
Apple notification
service
Google notification
service
Remodeled solution, Cassandra Way
Datamodel changes
- Amazon DynamoDB
- object serialized with Avro
- a few columns
- Apache Cassandra
- exploded object
- more than 80 columns without serialization
Conclusion
AWS DynamoDB + Postgres = U$ 10,825.00/mo
Read Throughput = ~ 12k/s
Apache Cassandra
(8 nodes c3.2xlarge) = U$ 2,580.00/mo
Read Throughput = ~ 200k/s
Before Migration
After Migration
savings of 300%!!!
- Move III -
Distributing Resources
What a kind of resources?
The black listed phone numbers
The ported phone numbers database
Text file resources
Messaging platform
- resources checked before send messages
- identify the user carrier
- resources loaded up in the memory (RAM)
- servers off-cloud (hard to upgrade)
Problem: larger resource files for the
same amount of memory
4GB - 6GB RAM
Loading everything, RAM story
Message Publisher
Black list Portability
- low JVM responses (GC)
- server memory limit
reached
- files continue to grow
- more than 20 instances in
different servers loading
the same resources
How about a distributed solution?
- the resource files are the same in all of the
servers
- RAM memory does not scale well
- It is an expensive solution
So..
- Why not distribute resources around a ring?
The distributed resources solution
DC1
DC2
DC3
Message
Publisher
Message
Publisher
Message
Publisher
Message
Publisher
Message
Publisher
Message
Publisher
Message
Publisher
Message
Publisher
Message
Publisher
Other
Platforms
- common information are shared across a
Cassandra cluster
- the massive hardware upgrade: solved
- the data are available for other platforms
- it is highly scalable
- easy to accommodate more data
Checking the results
Wrapping up the Moves
- always upgrade to newest versions
- high throughput and availability makes a
difference
- costs really, really matter!
- the horizontal scalability is great! if your
volume of data grow, increase the number of
nodes
eitikimura eiti-kimura-movile eiti.kimura@movile.com

More Related Content

What's hot

Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraCaserta
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupVictor Coustenoble
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQLYousun Jeong
 
Apache Spark Overview part2 (20161117)
Apache Spark Overview part2 (20161117)Apache Spark Overview part2 (20161117)
Apache Spark Overview part2 (20161117)Steve Min
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016DataStax
 
Spark and Spark Streaming
Spark and Spark StreamingSpark and Spark Streaming
Spark and Spark Streaming宇 傅
 
[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)Steve Min
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksLegacy Typesafe (now Lightbend)
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)zznate
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at ScaleSascha Dittmann
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Аліна Шепшелей
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraPatrick McFadin
 
Strata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark StreamingStrata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark StreamingDatabricks
 
Ingesting data at scale into elasticsearch with apache pulsar
Ingesting data at scale into elasticsearch with apache pulsarIngesting data at scale into elasticsearch with apache pulsar
Ingesting data at scale into elasticsearch with apache pulsarTimothy Spann
 
Cassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful APICassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful APISimran Kedia
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
 
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/HudiVinoth Chandar
 

What's hot (20)

Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
 
RubyKaigi 2014: ServerEngine
RubyKaigi 2014: ServerEngineRubyKaigi 2014: ServerEngine
RubyKaigi 2014: ServerEngine
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
SQL on Hadoop in Taiwan
SQL on Hadoop in TaiwanSQL on Hadoop in Taiwan
SQL on Hadoop in Taiwan
 
Apache Spark Overview part2 (20161117)
Apache Spark Overview part2 (20161117)Apache Spark Overview part2 (20161117)
Apache Spark Overview part2 (20161117)
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
 
Spark and Spark Streaming
Spark and Spark StreamingSpark and Spark Streaming
Spark and Spark Streaming
 
[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at Scale
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Strata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark StreamingStrata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark Streaming
 
Ingesting data at scale into elasticsearch with apache pulsar
Ingesting data at scale into elasticsearch with apache pulsarIngesting data at scale into elasticsearch with apache pulsar
Ingesting data at scale into elasticsearch with apache pulsar
 
Cassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful APICassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful API
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
[Pulsar summit na 21] Change Data Capture To Data Lakes Using Apache Pulsar/Hudi
 

Viewers also liked

What causes seasons on earth
What causes seasons on earthWhat causes seasons on earth
What causes seasons on earthnmsouthern
 
Watching the seasons change
Watching the seasons changeWatching the seasons change
Watching the seasons changelauttasaari
 
the seasons change
the seasons changethe seasons change
the seasons changeliuhanxiang
 
Shadows and Solar-Lunar Eclipses
Shadows and Solar-Lunar EclipsesShadows and Solar-Lunar Eclipses
Shadows and Solar-Lunar EclipsesVal Bolislis
 
Solar and lunar eclipse
Solar and lunar eclipseSolar and lunar eclipse
Solar and lunar eclipseKunal Yadav
 
Seasons
SeasonsSeasons
SeasonsBrandi
 
Cassandra and security
Cassandra and securityCassandra and security
Cassandra and securityBen Bromhead
 
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable CassandraCassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandraaaronmorton
 
Pythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterPythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterDataStax Academy
 
Hardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoiaHardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoiazznate
 
Securing Cassandra The Right Way
Securing Cassandra The Right WaySecuring Cassandra The Right Way
Securing Cassandra The Right WayDataStax Academy
 
Seasons and weather james
Seasons and weather jamesSeasons and weather james
Seasons and weather jamesroom04
 
Classification power point_with_domain
Classification power point_with_domainClassification power point_with_domain
Classification power point_with_domainSam Crockett
 
What is the reason for the season
What is the reason for the seasonWhat is the reason for the season
What is the reason for the seasonMal Tomlin
 
Science presentation
Science presentationScience presentation
Science presentationsh2610
 
Cassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For OperatorsCassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For OperatorsJeff Jirsa
 

Viewers also liked (20)

What causes seasons on earth
What causes seasons on earthWhat causes seasons on earth
What causes seasons on earth
 
Watching the seasons change
Watching the seasons changeWatching the seasons change
Watching the seasons change
 
the seasons change
the seasons changethe seasons change
the seasons change
 
Seasons
SeasonsSeasons
Seasons
 
Shadows and Solar-Lunar Eclipses
Shadows and Solar-Lunar EclipsesShadows and Solar-Lunar Eclipses
Shadows and Solar-Lunar Eclipses
 
Solar and lunar eclipse
Solar and lunar eclipseSolar and lunar eclipse
Solar and lunar eclipse
 
Seasons
SeasonsSeasons
Seasons
 
Cassandra and security
Cassandra and securityCassandra and security
Cassandra and security
 
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable CassandraCassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
Cassandra SF 2015 - Repeatable, Scalable, Reliable, Observable Cassandra
 
Pythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterPythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra Cluster
 
Hardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoiaHardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoia
 
Securing Cassandra The Right Way
Securing Cassandra The Right WaySecuring Cassandra The Right Way
Securing Cassandra The Right Way
 
Seasons and weather james
Seasons and weather jamesSeasons and weather james
Seasons and weather james
 
Infomercial about Water
Infomercial about WaterInfomercial about Water
Infomercial about Water
 
Classification power point_with_domain
Classification power point_with_domainClassification power point_with_domain
Classification power point_with_domain
 
What is the reason for the season
What is the reason for the seasonWhat is the reason for the season
What is the reason for the season
 
Science presentation
Science presentationScience presentation
Science presentation
 
Cassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For OperatorsCassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For Operators
 
Eclipse
EclipseEclipse
Eclipse
 
Reasons for seasons
Reasons for seasonsReasons for seasons
Reasons for seasons
 

Similar to Cassandra Summit 2015 - A Change of Seasons

Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Alluxio, Inc.
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkC4Media
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Nati Shalom
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151xlight
 
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...Lviv Startup Club
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyershuguk
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...GeeksLab Odessa
 
Camunda BPM 7.2: Performance and Scalability (English)
Camunda BPM 7.2: Performance and Scalability (English)Camunda BPM 7.2: Performance and Scalability (English)
Camunda BPM 7.2: Performance and Scalability (English)camunda services GmbH
 
Application design for the cloud using AWS
Application design for the cloud using AWSApplication design for the cloud using AWS
Application design for the cloud using AWSJonathan Holloway
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsStrata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsSingleStore
 
Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...Tao Cheng
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹Amazon Web Services
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network ProcessingRyousei Takano
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsVMware Tanzu
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Amazon Web Services
 
Roles y Responsabilidades en SQL Azure
Roles y Responsabilidades en SQL AzureRoles y Responsabilidades en SQL Azure
Roles y Responsabilidades en SQL AzureEduardo Castro
 

Similar to Cassandra Summit 2015 - A Change of Seasons (20)

Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
L21 scalability
L21 scalabilityL21 scalability
L21 scalability
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
Camunda BPM 7.2: Performance and Scalability (English)
Camunda BPM 7.2: Performance and Scalability (English)Camunda BPM 7.2: Performance and Scalability (English)
Camunda BPM 7.2: Performance and Scalability (English)
 
Sql Server
Sql ServerSql Server
Sql Server
 
Application design for the cloud using AWS
Application design for the cloud using AWSApplication design for the cloud using AWS
Application design for the cloud using AWS
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsStrata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
 
Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...Building and deploying large scale real time news system with my sql and dist...
Building and deploying large scale real time news system with my sql and dist...
 
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
透過 Amazon Redshift 打造數據分析服務及 Amazon Redshift 新功能案例介紹
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven Applications
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
Understanding AWS Database Options (DAT201) | AWS re:Invent 2013
 
Roles y Responsabilidades en SQL Azure
Roles y Responsabilidades en SQL AzureRoles y Responsabilidades en SQL Azure
Roles y Responsabilidades en SQL Azure
 

More from Eiti Kimura

[Datafest 2018] Apache Spark Structured Stream - Moedor de dados em tempo qua...
[Datafest 2018] Apache Spark Structured Stream - Moedor de dados em tempo qua...[Datafest 2018] Apache Spark Structured Stream - Moedor de dados em tempo qua...
[Datafest 2018] Apache Spark Structured Stream - Moedor de dados em tempo qua...Eiti Kimura
 
[Redis conf18] The Versatility of Redis
[Redis conf18] The Versatility of Redis[Redis conf18] The Versatility of Redis
[Redis conf18] The Versatility of RedisEiti Kimura
 
[DEVFEST] Apache Spark Casos de Uso e Escalabilidade
[DEVFEST] Apache Spark  Casos de Uso e Escalabilidade[DEVFEST] Apache Spark  Casos de Uso e Escalabilidade
[DEVFEST] Apache Spark Casos de Uso e EscalabilidadeEiti Kimura
 
[DataFest-2017] Apache Cassandra Para Sistemas de Alto Desempenho
[DataFest-2017] Apache Cassandra Para Sistemas de Alto Desempenho[DataFest-2017] Apache Cassandra Para Sistemas de Alto Desempenho
[DataFest-2017] Apache Cassandra Para Sistemas de Alto DesempenhoEiti Kimura
 
[TDC2016] Apache SparkMLlib: Machine Learning na Prática
[TDC2016] Apache SparkMLlib:  Machine Learning na Prática[TDC2016] Apache SparkMLlib:  Machine Learning na Prática
[TDC2016] Apache SparkMLlib: Machine Learning na PráticaEiti Kimura
 
[TDC2016] Apache Cassandra Estratégias de Modelagem de Dados
[TDC2016]  Apache Cassandra Estratégias de Modelagem de Dados[TDC2016]  Apache Cassandra Estratégias de Modelagem de Dados
[TDC2016] Apache Cassandra Estratégias de Modelagem de DadosEiti Kimura
 
QConSP16 - Apache Cassandra Evoluindo Sistemas Distribuídos
QConSP16 - Apache Cassandra Evoluindo Sistemas DistribuídosQConSP16 - Apache Cassandra Evoluindo Sistemas Distribuídos
QConSP16 - Apache Cassandra Evoluindo Sistemas DistribuídosEiti Kimura
 
SP Big Data Meetup - Conhecendo Apache Cassandra @Movile
SP Big Data Meetup - Conhecendo Apache Cassandra @MovileSP Big Data Meetup - Conhecendo Apache Cassandra @Movile
SP Big Data Meetup - Conhecendo Apache Cassandra @MovileEiti Kimura
 
TDC2015 - Apache Cassandra no Desenvolvimento de Sistemas de Alto Desempenho
TDC2015 - Apache Cassandra no Desenvolvimento de Sistemas de Alto DesempenhoTDC2015 - Apache Cassandra no Desenvolvimento de Sistemas de Alto Desempenho
TDC2015 - Apache Cassandra no Desenvolvimento de Sistemas de Alto DesempenhoEiti Kimura
 
Conhecendo Apache Cassandra @Movile
Conhecendo Apache Cassandra  @MovileConhecendo Apache Cassandra  @Movile
Conhecendo Apache Cassandra @MovileEiti Kimura
 
Cassandra overview: Um Caso Prático
Cassandra overview:  Um Caso PráticoCassandra overview:  Um Caso Prático
Cassandra overview: Um Caso PráticoEiti Kimura
 
QConSP 2014 - Cassandra no Desenvolvimento de Aplicações para serviços Móveis
QConSP 2014 - Cassandra no Desenvolvimento  de Aplicações para  serviços MóveisQConSP 2014 - Cassandra no Desenvolvimento  de Aplicações para  serviços Móveis
QConSP 2014 - Cassandra no Desenvolvimento de Aplicações para serviços MóveisEiti Kimura
 

More from Eiti Kimura (12)

[Datafest 2018] Apache Spark Structured Stream - Moedor de dados em tempo qua...
[Datafest 2018] Apache Spark Structured Stream - Moedor de dados em tempo qua...[Datafest 2018] Apache Spark Structured Stream - Moedor de dados em tempo qua...
[Datafest 2018] Apache Spark Structured Stream - Moedor de dados em tempo qua...
 
[Redis conf18] The Versatility of Redis
[Redis conf18] The Versatility of Redis[Redis conf18] The Versatility of Redis
[Redis conf18] The Versatility of Redis
 
[DEVFEST] Apache Spark Casos de Uso e Escalabilidade
[DEVFEST] Apache Spark  Casos de Uso e Escalabilidade[DEVFEST] Apache Spark  Casos de Uso e Escalabilidade
[DEVFEST] Apache Spark Casos de Uso e Escalabilidade
 
[DataFest-2017] Apache Cassandra Para Sistemas de Alto Desempenho
[DataFest-2017] Apache Cassandra Para Sistemas de Alto Desempenho[DataFest-2017] Apache Cassandra Para Sistemas de Alto Desempenho
[DataFest-2017] Apache Cassandra Para Sistemas de Alto Desempenho
 
[TDC2016] Apache SparkMLlib: Machine Learning na Prática
[TDC2016] Apache SparkMLlib:  Machine Learning na Prática[TDC2016] Apache SparkMLlib:  Machine Learning na Prática
[TDC2016] Apache SparkMLlib: Machine Learning na Prática
 
[TDC2016] Apache Cassandra Estratégias de Modelagem de Dados
[TDC2016]  Apache Cassandra Estratégias de Modelagem de Dados[TDC2016]  Apache Cassandra Estratégias de Modelagem de Dados
[TDC2016] Apache Cassandra Estratégias de Modelagem de Dados
 
QConSP16 - Apache Cassandra Evoluindo Sistemas Distribuídos
QConSP16 - Apache Cassandra Evoluindo Sistemas DistribuídosQConSP16 - Apache Cassandra Evoluindo Sistemas Distribuídos
QConSP16 - Apache Cassandra Evoluindo Sistemas Distribuídos
 
SP Big Data Meetup - Conhecendo Apache Cassandra @Movile
SP Big Data Meetup - Conhecendo Apache Cassandra @MovileSP Big Data Meetup - Conhecendo Apache Cassandra @Movile
SP Big Data Meetup - Conhecendo Apache Cassandra @Movile
 
TDC2015 - Apache Cassandra no Desenvolvimento de Sistemas de Alto Desempenho
TDC2015 - Apache Cassandra no Desenvolvimento de Sistemas de Alto DesempenhoTDC2015 - Apache Cassandra no Desenvolvimento de Sistemas de Alto Desempenho
TDC2015 - Apache Cassandra no Desenvolvimento de Sistemas de Alto Desempenho
 
Conhecendo Apache Cassandra @Movile
Conhecendo Apache Cassandra  @MovileConhecendo Apache Cassandra  @Movile
Conhecendo Apache Cassandra @Movile
 
Cassandra overview: Um Caso Prático
Cassandra overview:  Um Caso PráticoCassandra overview:  Um Caso Prático
Cassandra overview: Um Caso Prático
 
QConSP 2014 - Cassandra no Desenvolvimento de Aplicações para serviços Móveis
QConSP 2014 - Cassandra no Desenvolvimento  de Aplicações para  serviços MóveisQConSP 2014 - Cassandra no Desenvolvimento  de Aplicações para  serviços Móveis
QConSP 2014 - Cassandra no Desenvolvimento de Aplicações para serviços Móveis
 

Recently uploaded

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Cassandra Summit 2015 - A Change of Seasons

  • 1. A CHANGE OF SEASONS A big move to Apache Cassandra Eiti Kimura, IT Coordinator @Movile Brazil
  • 4. Leader in Latin America Mobile phones, Smartphones and Tablets Movile is the company behind the apps that make your life easier.
  • 5.
  • 6.
  • 7. We think mobile... Movile develops apps across all platforms for smartphones and tablets to not only make life easier, but also more fun. The company recorded an annual average growth of 80% in the last 7 years
  • 9. - Move I - The Subscription and Billing System a.k.a SBS
  • 10. Subscription and Billing Platform - it is a service API - responsible to manage user’s subscriptions - responsible to charge users in carriers - an engine to renew subscriptions “can not” stop under any circumstance it has to be very performatic
  • 11. The platform in numbers 88 Million of Subscriptions 66,1M of unique users 105M of transactions a day
  • 12. Platform Evolution timeline 2008 Pure relational database times 2009 Apache Cassandra adoption (v0.6) 2011 The data model was entirely remodeled 4 nodes Cluster upgrade from version 1.0 to 1.2 2013 Cluster upgrade from version 0.7 to 1.0 Expanded from 4 to 6 nodes 2014 New data index using time series 2015 THE BIG MOVE migrating complex queries from relational database
  • 13. Initial architecture revisited API DB API APIAPI API Engine Engine Engine Classical solution using a regular RDBMS
  • 14. Architecture disadvantages - single point of failure - slow response times - platform gone down often - hard and expensive to scale - if you scale your platform and forget to scale database and other related resources you’ll fail
  • 15. A new architecture has come API API Engine Engine DB A hybrid solution using Apache Cassandra Cluster plus a relational database solution to execute complex queries Regular SQL Queries API API
  • 16. The benefits of new solution - performance problems: solved - availability problems: solved - single point of failure: partially solved - significantly increased read and write throughput
  • 18.
  • 19. - querying relational database consumes time - has side effects, it locks data being updated and inserted - concurrency causes performance degradation - it does not scale well - we still need to use relational database to execute complex queries The problems
  • 20. The complex query.. - query subscription table - selects expired subscriptions - the subscriptions must be grouped by user - must be ordered by priority, criteria, type of user plan
  • 21. Sort data Aggregation Filter Criterias Projection SQLServer’s query SELECT s.phone, MIN(s.last_renew_attempt) AS min_last_renew_attempt FROM subscription AS s WITH(nolock) JOIN configuration AS c WITH(nolock) ON s.configuration_id = c.configuration_id WHERE s.enabled = 1 AND s.timeout_date < GETDATE() AND s.related_id IS NULL AND c.carrier_id = ? AND ( c.enabled = 1 AND ( c.renewable = 1 OR c.use_balance_parameter = 1 ) ) GROUP BY s.phone ORDER BY charge_priority DESC, max(user_plan) DESC, min_last_renew_attempt
  • 22. The solution - Extract data from Apache Cassandra instead of use relational database - There is no single point of failure - Performance improved, but more work querying and filtering data Main concern: distributed sort data by multiple criterias and data aggregation - Apache Spark!?
  • 23. - Databricks to use Apache Spark to sort 100 TB of data on 206 machines in 23 minutes https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html
  • 24. Divide-And-Conquer Preparing for the new solution Subscription Subscription Index ● configuration_id ○ phone-number Using a new table as index applying data denormalization! ● each subscription becomes a column (time series)
  • 25. Proof of Concept with Apache Spark Data Extractor Processor
  • 26. Preparing Resources Processor Java Code Snippet JavaSparkContext sc = new JavaSparkContext("local[*]", "Simple App", SPARK_HOME, "spark-simple-1.0.jar"); // Get file from resources folder ClassLoader classLoader = SparkFileJob.class.getClassLoader(); File file = new File(classLoader.getResource("dataset-10MM.json").getFile()); SQLContext sqlContext = new SQLContext(sc); DataFrame df = sqlContext.read().json(file.getPath()); df.registerTempTable("subscription");
  • 27. Preparing and Executing Query SELECT phone, MAX(charge_priority) as max_priority, FROM subscription WHERE enabled = 1 AND timeout_date < System.currentTimeMillis() AND related_id IS NULL AND carrier_id in (1, 4, 2, 5) GROUP BY phone ORDER BY max_priority DESC, max_plan DESC sqlContext.sql(query) .javaRDD() .foreach(row -> process(row)); Spark SQL Query Java code snippet
  • 28. - We have Datastax Spark-Cassandra-Connector! - It allow to expose Cassandra tables as Spark RDDs - use Apache Spark coupled to Cassandra https://github.com/datastax/spark-cassandra-connector https://github.com/eiti-kimura-movile/spark-cassandra
  • 29. Next Steps - upgrade cluster version to >= 2.1 - cluster read improvements in 50% from thrift to CQL, native protocol v3 - implement the final solution Cassandra + Spark
  • 30. - Move II - The Kiwi Migration
  • 31. The Kiwi Platform - it is a common backend smartphone platform - provides user and device management - user event and media tracker - analytics - push notifications High Performance Availability Required
  • 34. low reading throughput The push notification crusade PostgreSQL Push Publisher Push Publisher Push Publisher Apple notification service Google notification service
  • 35. The problems (dejavú?) - single point of failure with PostgreSQL - high costs paying for 2 storage services - DynamoDB does not have good read throughput for linear readings - RDS PostreSQL tuning limit reached - low throughput sending notifications
  • 37. The solution in numbers - data storage cost - Amazon DynamoDB: U$ 4,575.00 / mo - PostgreSQL (RDS): U$ 6,250.00 / mo - read throughput measured - Amazon DynamoDB: 1,4k /s (linear, sequential reads) - PostgreSQL (RDS): 10k /s U$ 10,825.00 / mo
  • 38. Push Publisher Push Publisher Push Publisher Apple notification service Google notification service Remodeled solution, Cassandra Way
  • 39. Datamodel changes - Amazon DynamoDB - object serialized with Avro - a few columns - Apache Cassandra - exploded object - more than 80 columns without serialization
  • 40. Conclusion AWS DynamoDB + Postgres = U$ 10,825.00/mo Read Throughput = ~ 12k/s Apache Cassandra (8 nodes c3.2xlarge) = U$ 2,580.00/mo Read Throughput = ~ 200k/s Before Migration After Migration savings of 300%!!!
  • 41. - Move III - Distributing Resources
  • 42. What a kind of resources? The black listed phone numbers The ported phone numbers database Text file resources
  • 43. Messaging platform - resources checked before send messages - identify the user carrier - resources loaded up in the memory (RAM) - servers off-cloud (hard to upgrade) Problem: larger resource files for the same amount of memory
  • 44. 4GB - 6GB RAM Loading everything, RAM story Message Publisher Black list Portability - low JVM responses (GC) - server memory limit reached - files continue to grow - more than 20 instances in different servers loading the same resources
  • 45. How about a distributed solution? - the resource files are the same in all of the servers - RAM memory does not scale well - It is an expensive solution So.. - Why not distribute resources around a ring?
  • 46. The distributed resources solution DC1 DC2 DC3 Message Publisher Message Publisher Message Publisher Message Publisher Message Publisher Message Publisher Message Publisher Message Publisher Message Publisher Other Platforms
  • 47. - common information are shared across a Cassandra cluster - the massive hardware upgrade: solved - the data are available for other platforms - it is highly scalable - easy to accommodate more data Checking the results
  • 48.
  • 49. Wrapping up the Moves - always upgrade to newest versions - high throughput and availability makes a difference - costs really, really matter! - the horizontal scalability is great! if your volume of data grow, increase the number of nodes