SlideShare a Scribd company logo
Jeff Bollinger – CTO - @jbollinger
Jeff Smoley – Infrastructure Architect
Scaling With Cassandra
About NativeX
The Backstory
Why Cassandra
Cassandra Overview
NativeX Cassandra Implementation / Metrics
What we Learned
Agenda
Formerly W3i
Marketing technology platform
that enables developers to build
successful businesses around
their apps.
NativeX
Over 620M unique devices on our network
Over 500 apps in network
> 100M Monthly Active Users
100 GB of data ingest per week
Vanity Metrics
A growing mobile advertising network
Backstory
0
1
2
3
4
5
6
2011 Q4 2012 Q1 2012 Q2 2012 Q3 2012 Q4 2013 Q1
Billions
API Requests
Infrastructure Intensive Model
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7 8 9 10 11 12
Millions
Session Calls by Week After User Acquired
Lifetime of user
Microsoft SQL Server
2 Node Cluster (failover)
12 cores / node
192 GB of / node
Compellent SAN
172 Disk (SSD,FC,SATA)
Scale Up Architecture
Consistency
Partition
Tolerance
Availability
CAP Theorem
SQL Server, MySQL
Cassandra
MongoDB
Scale
•Horizontal
•Incremental
cost structure
Resiliency
•No single point
of failure
•Geographically
distributed
Objectives
Web Application Tier
Database Tier
What Needed to Scale
Web Application Tier is already a server farm that can scale
horizontally through our VMWare environment.
Database Tier was one giant monolithic Microsoft SQL
Server machine.
Stands for Not Only SQL
The NoSQL movement is not about silver bullets and
black boxes.
It’s about understanding problems and focusing on
solutions.
It’s about using the right tool for the right problem.
What is NoSQL?
Selecting Cassandra
DB Distributed Maturity High Availability Style Documentation Native Language Drivers Popularity
MongoDB Yes Medium Yes Document - NoSQL Excellent Major Languages High
VoltDB Yes Low Yes RDBMS - SQL Good Major Languages Low
MySQL Cluster Yes High Yes RDBMS - SQL & Key/Value Excellent Major Languages Medium
MySQL ScaleDB Yes Low Yes RDBMS - SQL Good Major Languages Low
Cassandra Yes Medium Yes Key/Value - Column Family Excellent Major; Poor .Net High
CouchDB No Medium Yes Document - NoSQL ? No - REST only Medium
RavenDB Yes? Low No Document - NoSQL Poor C#, JS, REST Medium
Couchbase Yes Medium Yes Key/Value - Document Good Major Languages Medium
*Disclaimer, this data was complied in spring of 2012 and my not reflect the
current state of each database system shown here.
http://nosql.mypopescu.com/ is a helpful site for discovering and learning about
different DB Systems.
Considered Multiple DB Providers
MySQL Cluster
Relational and very familiar.
Has physical row limitations.
MongoDB
Data modeling was simpler than C*.
Not very clear if it had multi-cluster support.
Cassandra
At the very core it’s all about scalability and resiliency.
Data modeling a little scary, limited .Net support.
Top Choices
Cassandra
Multi-node
Multi-cluster
Highly Available
Durable
Tunable Consistency
Shared Nothing
C* was not a replacement DB system, but an addition.
C* solves a very specific problem (for us).
Writing large volumes of data quickly.
Reading very specific data out of a large record set.
NoSQL solutions, like C*, are not meant to be a
replacement for everything.
You will make your lifer harder if you try!
The same should be said about Relational Databases.
They don’t solve every problem!
C* at NativeX
We have three major classifications of data.
Configuration
Activity Tracking
Device History
Data Classification
This data is relatively small in total size and is used
to operationally run our products. Examples
include:
Mobile Apps
Offers
Campaigns
Restrictions
Queue Settings
This data is typically relational and therefore
continues to be stored in MS SQL Server.
Configuration Data
Data is stored inside of Column Families using nested Key/Value pairs.
A Row Key maps to a collection of Columns.
A Column Name (AKA Column Key) maps to a Column Value.
The Column Name is stored along side the Value.
A common strategy is to store JSON/XML in the Column Value.
(Side note, if you’ve heard of Super Columns, forget about them, they
hurt more than they help)
The Very Basics of C* Data Modeling
Raw tracking data for all activities used by the ETL process to
produce OLAP data on an hourly basis.
Synonymous with Time Series, Event Series, or Logging data.
Examples include:
Running of Mobile Apps
Viewing Offers
Clicking on Offers
Receiving Rewards
Activity Tracking Data
Historical activities that each device has performed while
being part of NativeX’s network.
Used for offer classification for a given device.
Examples include:
Clicking on Offers
Running Mobile Apps
Redeeming Rewards
Device History Data
12 Nodes
Cisco UCS Blades
12 Cores @ 2.0GHz with Hyper-threading
64GB of Ram
2 x 480GB Intel commodity SSDs in RAID 0
10.5 TB total, ~7 TB usable
Red Hat Linux
Hardware
We chose to use Enterprise hardware for the servers
so that we would have support for them.
However, our work load is very read heavy and 15K
rpm rotational disks were a bottle neck.
We chose to swap out the rotational for commodity
SSDs. (Enterprise SSDs were 10x as expensive)
We have limited support on the hardware because of
this.
Commodity Vs. Enterprise
240 peak Writes per second per node
2,880/sec cluster wide
888 peak Reads per second per node
10,656/sec cluster wide
0.53 ms average Write Latency per request
1.7 ms average Read Latency per request
Almost 3 TB of data adding 1 TB a month
Internal C* Cluster Stats
MS SQL
Writes 12 ms
Reads 1.5 ms
C*
Writes 3 ms
Reads 4 ms
Application Side Latencies
We think that in SQL Server, reads were faster
because most of the data sat in memory.
We might be able to achieve lower latencies in C* if
we gave each node just as much memory as our SQL
Server.
To counter act the increased latencies we used
certain techniques like parallel reads using multi-
threading in our web application.
Can We Make Reads Faster?
There are still challenges with C*, like any complex
system.
More moving parts and things that need to stay in
sync.
Misconfigurations can literally destroy your data.
Certain config settings cannot be changed after you
are live, such as the number of virtual Racks.
Not all Roses
Get into production early
Data Import = Reality
Break down communication barriers
Understanding your IO profile is really important
Cassandra changes quickly, you need to keep up
Scalable systems like C* have a massive amount of
knobs, you need to know them
Leverage cloud resources in working toward right
sizing your cluster
Lessons Learned
We’re hiring
http://nativex.com/careers/
Join the MSP C* Meetup
http://www.meetup.com/Minneapolis-St-Paul-Cassandra-
Meetup/
Email us
Jeff.Smoley@nativex.com
Jeff.Bollinger@nativex.com or @jbollinger
Slide Deck
http://www.slideshare.net/JBollinger/minnebar-2013-scaling-
with-cassandra
Thanks

More Related Content

What's hot

NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)
NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)
NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)
Binary Studio
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
DataStax
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
SQL vs. NoSQL
SQL vs. NoSQLSQL vs. NoSQL
SQL vs. NoSQL
Guido Schmutz
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
abdulrahmanhelan
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
Ram kumar
 
Azure Data services
Azure Data servicesAzure Data services
Azure Data services
Rajesh Kolla
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
Venu Anuganti
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
James Serra
 
SQL Server 2019 Data Virtualization
SQL Server 2019 Data VirtualizationSQL Server 2019 Data Virtualization
SQL Server 2019 Data Virtualization
Matthew W. Bowers
 
How SQL Server 2016 SP1 Changes the Game
How SQL Server 2016 SP1 Changes the GameHow SQL Server 2016 SP1 Changes the Game
How SQL Server 2016 SP1 Changes the Game
PARIKSHIT SAVJANI
 
Presentation on Databases in the Cloud
Presentation on Databases in the CloudPresentation on Databases in the Cloud
Presentation on Databases in the Cloud
moshfiq
 
MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB
 
NoSQL Options Compared
NoSQL Options ComparedNoSQL Options Compared
NoSQL Options ComparedSergey Bushik
 
Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?
Ahmed Rashwan
 
Database Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big DataDatabase Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big Data
exponential-inc
 

What's hot (20)

RDBMS vs NoSQL
RDBMS vs NoSQLRDBMS vs NoSQL
RDBMS vs NoSQL
 
NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)
NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)
NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
SQL vs. NoSQL
SQL vs. NoSQLSQL vs. NoSQL
SQL vs. NoSQL
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
Azure Data services
Azure Data servicesAzure Data services
Azure Data services
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 
SQL Server 2019 Data Virtualization
SQL Server 2019 Data VirtualizationSQL Server 2019 Data Virtualization
SQL Server 2019 Data Virtualization
 
SQL vs NoSQL
SQL vs NoSQLSQL vs NoSQL
SQL vs NoSQL
 
Relational vs. Non-Relational
Relational vs. Non-RelationalRelational vs. Non-Relational
Relational vs. Non-Relational
 
How SQL Server 2016 SP1 Changes the Game
How SQL Server 2016 SP1 Changes the GameHow SQL Server 2016 SP1 Changes the Game
How SQL Server 2016 SP1 Changes the Game
 
Sql vs. NoSql
Sql vs. NoSqlSql vs. NoSql
Sql vs. NoSql
 
Presentation on Databases in the Cloud
Presentation on Databases in the CloudPresentation on Databases in the Cloud
Presentation on Databases in the Cloud
 
MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best Practices
 
NoSQL Options Compared
NoSQL Options ComparedNoSQL Options Compared
NoSQL Options Compared
 
Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?
 
Database Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big DataDatabase Virtualization: The Next Wave of Big Data
Database Virtualization: The Next Wave of Big Data
 

Viewers also liked

XMPro BPM - Innovative Solutions to Painful Problems
XMPro BPM  - Innovative Solutions to Painful ProblemsXMPro BPM  - Innovative Solutions to Painful Problems
XMPro BPM - Innovative Solutions to Painful Problems
JonGRyder_PA
 
Bronces Mestre Teo Macias series
Bronces Mestre Teo Macias series Bronces Mestre Teo Macias series
Bronces Mestre Teo Macias series
Gaetano Mestre
 
Cct slide
Cct slideCct slide
Cct slide
Effa Atieka
 
Memoria de actividades MADin USAL
Memoria de actividades  MADin USALMemoria de actividades  MADin USAL
Memoria de actividades MADin USAL
Madin-Usal Arquitectura-Interiorismo
 
From Blog to Brand - TBU Rotterdam
From Blog to Brand - TBU RotterdamFrom Blog to Brand - TBU Rotterdam
From Blog to Brand - TBU Rotterdam
Oliver Gradwell
 
The Best Practices in Travel Blogging White Paper
The Best Practices in Travel Blogging White PaperThe Best Practices in Travel Blogging White Paper
The Best Practices in Travel Blogging White PaperOliver Gradwell
 
presentación Madinusal 2016, ADDIP Uruguay
presentación Madinusal 2016, ADDIP Uruguaypresentación Madinusal 2016, ADDIP Uruguay
presentación Madinusal 2016, ADDIP Uruguay
Madin-Usal Arquitectura-Interiorismo
 
Looking Beyond The Blog to Create Opportunities - TBU Rotterdam
Looking Beyond The Blog to Create Opportunities - TBU RotterdamLooking Beyond The Blog to Create Opportunities - TBU Rotterdam
Looking Beyond The Blog to Create Opportunities - TBU Rotterdam
Oliver Gradwell
 
How to work with tourism boards & travel brands
How to work with tourism boards & travel brandsHow to work with tourism boards & travel brands
How to work with tourism boards & travel brands
Oliver Gradwell
 
The Professional Bloggers Sponsorship Guide - TBU Rotterdam
The Professional Bloggers Sponsorship Guide - TBU RotterdamThe Professional Bloggers Sponsorship Guide - TBU Rotterdam
The Professional Bloggers Sponsorship Guide - TBU Rotterdam
Oliver Gradwell
 
Introduction to Travel PR - TBU Rotterdam
Introduction to Travel PR - TBU RotterdamIntroduction to Travel PR - TBU Rotterdam
Introduction to Travel PR - TBU Rotterdam
Oliver Gradwell
 
The Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop
The Perils and Triumphs of using Cassandra at a .NET/Microsoft ShopThe Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop
The Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop
Jeff Smoley
 
How travel bloggers impact the booking funnel
How travel bloggers impact the booking funnelHow travel bloggers impact the booking funnel
How travel bloggers impact the booking funnel
Oliver Gradwell
 
Is Content Really King? - TBU Rotterdam
Is Content Really King? - TBU RotterdamIs Content Really King? - TBU Rotterdam
Is Content Really King? - TBU Rotterdam
Oliver Gradwell
 
Create an Editing Workflow - TBU Rotterdam
Create an Editing Workflow - TBU RotterdamCreate an Editing Workflow - TBU Rotterdam
Create an Editing Workflow - TBU Rotterdam
Oliver Gradwell
 

Viewers also liked (17)

XMPro BPM - Innovative Solutions to Painful Problems
XMPro BPM  - Innovative Solutions to Painful ProblemsXMPro BPM  - Innovative Solutions to Painful Problems
XMPro BPM - Innovative Solutions to Painful Problems
 
Bronces Mestre Teo Macias series
Bronces Mestre Teo Macias series Bronces Mestre Teo Macias series
Bronces Mestre Teo Macias series
 
Atlanta
AtlantaAtlanta
Atlanta
 
Cct slide
Cct slideCct slide
Cct slide
 
Blog.analitics
Blog.analiticsBlog.analitics
Blog.analitics
 
Memoria de actividades MADin USAL
Memoria de actividades  MADin USALMemoria de actividades  MADin USAL
Memoria de actividades MADin USAL
 
From Blog to Brand - TBU Rotterdam
From Blog to Brand - TBU RotterdamFrom Blog to Brand - TBU Rotterdam
From Blog to Brand - TBU Rotterdam
 
The Best Practices in Travel Blogging White Paper
The Best Practices in Travel Blogging White PaperThe Best Practices in Travel Blogging White Paper
The Best Practices in Travel Blogging White Paper
 
presentación Madinusal 2016, ADDIP Uruguay
presentación Madinusal 2016, ADDIP Uruguaypresentación Madinusal 2016, ADDIP Uruguay
presentación Madinusal 2016, ADDIP Uruguay
 
Looking Beyond The Blog to Create Opportunities - TBU Rotterdam
Looking Beyond The Blog to Create Opportunities - TBU RotterdamLooking Beyond The Blog to Create Opportunities - TBU Rotterdam
Looking Beyond The Blog to Create Opportunities - TBU Rotterdam
 
How to work with tourism boards & travel brands
How to work with tourism boards & travel brandsHow to work with tourism boards & travel brands
How to work with tourism boards & travel brands
 
The Professional Bloggers Sponsorship Guide - TBU Rotterdam
The Professional Bloggers Sponsorship Guide - TBU RotterdamThe Professional Bloggers Sponsorship Guide - TBU Rotterdam
The Professional Bloggers Sponsorship Guide - TBU Rotterdam
 
Introduction to Travel PR - TBU Rotterdam
Introduction to Travel PR - TBU RotterdamIntroduction to Travel PR - TBU Rotterdam
Introduction to Travel PR - TBU Rotterdam
 
The Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop
The Perils and Triumphs of using Cassandra at a .NET/Microsoft ShopThe Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop
The Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop
 
How travel bloggers impact the booking funnel
How travel bloggers impact the booking funnelHow travel bloggers impact the booking funnel
How travel bloggers impact the booking funnel
 
Is Content Really King? - TBU Rotterdam
Is Content Really King? - TBU RotterdamIs Content Really King? - TBU Rotterdam
Is Content Really King? - TBU Rotterdam
 
Create an Editing Workflow - TBU Rotterdam
Create an Editing Workflow - TBU RotterdamCreate an Editing Workflow - TBU Rotterdam
Create an Editing Workflow - TBU Rotterdam
 

Similar to MinneBar 2013 - Scaling with Cassandra

http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151xlight
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
James Serra
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
Amazon Web Services
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
Amazon Web Services
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale Systems
Directi Group
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Jon Meredith
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
huguk
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
Bikram Sinha. MBA, PMP
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
Compare Clustering Methods for MS SQL Server
Compare Clustering Methods for MS SQL ServerCompare Clustering Methods for MS SQL Server
Compare Clustering Methods for MS SQL Server
AlexDepo
 
NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013
Facundo Farias
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
 
Introduction to asdfghjkln b vfgh n v
Introduction to asdfghjkln b vfgh n    vIntroduction to asdfghjkln b vfgh n    v
Introduction to asdfghjkln b vfgh n v
23mz02
 
SQL or NoSQL, is this the question? - George Grammatikos
SQL or NoSQL, is this the question? - George GrammatikosSQL or NoSQL, is this the question? - George Grammatikos
SQL or NoSQL, is this the question? - George Grammatikos
George Grammatikos
 

Similar to MinneBar 2013 - Scaling with Cassandra (20)

http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 
No sql
No sqlNo sql
No sql
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale Systems
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
 
No sql
No sqlNo sql
No sql
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Compare Clustering Methods for MS SQL Server
Compare Clustering Methods for MS SQL ServerCompare Clustering Methods for MS SQL Server
Compare Clustering Methods for MS SQL Server
 
NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013
 
Nosql
NosqlNosql
Nosql
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Introduction to asdfghjkln b vfgh n v
Introduction to asdfghjkln b vfgh n    vIntroduction to asdfghjkln b vfgh n    v
Introduction to asdfghjkln b vfgh n v
 
SQL or NoSQL, is this the question? - George Grammatikos
SQL or NoSQL, is this the question? - George GrammatikosSQL or NoSQL, is this the question? - George Grammatikos
SQL or NoSQL, is this the question? - George Grammatikos
 

Recently uploaded

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 

Recently uploaded (20)

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 

MinneBar 2013 - Scaling with Cassandra

  • 1. Jeff Bollinger – CTO - @jbollinger Jeff Smoley – Infrastructure Architect Scaling With Cassandra
  • 2. About NativeX The Backstory Why Cassandra Cassandra Overview NativeX Cassandra Implementation / Metrics What we Learned Agenda
  • 3. Formerly W3i Marketing technology platform that enables developers to build successful businesses around their apps. NativeX
  • 4. Over 620M unique devices on our network Over 500 apps in network > 100M Monthly Active Users 100 GB of data ingest per week Vanity Metrics
  • 5. A growing mobile advertising network Backstory 0 1 2 3 4 5 6 2011 Q4 2012 Q1 2012 Q2 2012 Q3 2012 Q4 2013 Q1 Billions API Requests
  • 6. Infrastructure Intensive Model 0 2 4 6 8 10 12 0 1 2 3 4 5 6 7 8 9 10 11 12 Millions Session Calls by Week After User Acquired Lifetime of user
  • 7. Microsoft SQL Server 2 Node Cluster (failover) 12 cores / node 192 GB of / node Compellent SAN 172 Disk (SSD,FC,SATA) Scale Up Architecture
  • 9. Scale •Horizontal •Incremental cost structure Resiliency •No single point of failure •Geographically distributed Objectives
  • 10. Web Application Tier Database Tier What Needed to Scale Web Application Tier is already a server farm that can scale horizontally through our VMWare environment. Database Tier was one giant monolithic Microsoft SQL Server machine.
  • 11. Stands for Not Only SQL The NoSQL movement is not about silver bullets and black boxes. It’s about understanding problems and focusing on solutions. It’s about using the right tool for the right problem. What is NoSQL?
  • 12. Selecting Cassandra DB Distributed Maturity High Availability Style Documentation Native Language Drivers Popularity MongoDB Yes Medium Yes Document - NoSQL Excellent Major Languages High VoltDB Yes Low Yes RDBMS - SQL Good Major Languages Low MySQL Cluster Yes High Yes RDBMS - SQL & Key/Value Excellent Major Languages Medium MySQL ScaleDB Yes Low Yes RDBMS - SQL Good Major Languages Low Cassandra Yes Medium Yes Key/Value - Column Family Excellent Major; Poor .Net High CouchDB No Medium Yes Document - NoSQL ? No - REST only Medium RavenDB Yes? Low No Document - NoSQL Poor C#, JS, REST Medium Couchbase Yes Medium Yes Key/Value - Document Good Major Languages Medium *Disclaimer, this data was complied in spring of 2012 and my not reflect the current state of each database system shown here. http://nosql.mypopescu.com/ is a helpful site for discovering and learning about different DB Systems.
  • 13. Considered Multiple DB Providers MySQL Cluster Relational and very familiar. Has physical row limitations. MongoDB Data modeling was simpler than C*. Not very clear if it had multi-cluster support. Cassandra At the very core it’s all about scalability and resiliency. Data modeling a little scary, limited .Net support. Top Choices
  • 15. C* was not a replacement DB system, but an addition. C* solves a very specific problem (for us). Writing large volumes of data quickly. Reading very specific data out of a large record set. NoSQL solutions, like C*, are not meant to be a replacement for everything. You will make your lifer harder if you try! The same should be said about Relational Databases. They don’t solve every problem! C* at NativeX
  • 16. We have three major classifications of data. Configuration Activity Tracking Device History Data Classification
  • 17. This data is relatively small in total size and is used to operationally run our products. Examples include: Mobile Apps Offers Campaigns Restrictions Queue Settings This data is typically relational and therefore continues to be stored in MS SQL Server. Configuration Data
  • 18. Data is stored inside of Column Families using nested Key/Value pairs. A Row Key maps to a collection of Columns. A Column Name (AKA Column Key) maps to a Column Value. The Column Name is stored along side the Value. A common strategy is to store JSON/XML in the Column Value. (Side note, if you’ve heard of Super Columns, forget about them, they hurt more than they help) The Very Basics of C* Data Modeling
  • 19. Raw tracking data for all activities used by the ETL process to produce OLAP data on an hourly basis. Synonymous with Time Series, Event Series, or Logging data. Examples include: Running of Mobile Apps Viewing Offers Clicking on Offers Receiving Rewards Activity Tracking Data
  • 20. Historical activities that each device has performed while being part of NativeX’s network. Used for offer classification for a given device. Examples include: Clicking on Offers Running Mobile Apps Redeeming Rewards Device History Data
  • 21. 12 Nodes Cisco UCS Blades 12 Cores @ 2.0GHz with Hyper-threading 64GB of Ram 2 x 480GB Intel commodity SSDs in RAID 0 10.5 TB total, ~7 TB usable Red Hat Linux Hardware
  • 22. We chose to use Enterprise hardware for the servers so that we would have support for them. However, our work load is very read heavy and 15K rpm rotational disks were a bottle neck. We chose to swap out the rotational for commodity SSDs. (Enterprise SSDs were 10x as expensive) We have limited support on the hardware because of this. Commodity Vs. Enterprise
  • 23. 240 peak Writes per second per node 2,880/sec cluster wide 888 peak Reads per second per node 10,656/sec cluster wide 0.53 ms average Write Latency per request 1.7 ms average Read Latency per request Almost 3 TB of data adding 1 TB a month Internal C* Cluster Stats
  • 24. MS SQL Writes 12 ms Reads 1.5 ms C* Writes 3 ms Reads 4 ms Application Side Latencies
  • 25. We think that in SQL Server, reads were faster because most of the data sat in memory. We might be able to achieve lower latencies in C* if we gave each node just as much memory as our SQL Server. To counter act the increased latencies we used certain techniques like parallel reads using multi- threading in our web application. Can We Make Reads Faster?
  • 26. There are still challenges with C*, like any complex system. More moving parts and things that need to stay in sync. Misconfigurations can literally destroy your data. Certain config settings cannot be changed after you are live, such as the number of virtual Racks. Not all Roses
  • 27. Get into production early Data Import = Reality Break down communication barriers Understanding your IO profile is really important Cassandra changes quickly, you need to keep up Scalable systems like C* have a massive amount of knobs, you need to know them Leverage cloud resources in working toward right sizing your cluster Lessons Learned
  • 28. We’re hiring http://nativex.com/careers/ Join the MSP C* Meetup http://www.meetup.com/Minneapolis-St-Paul-Cassandra- Meetup/ Email us Jeff.Smoley@nativex.com Jeff.Bollinger@nativex.com or @jbollinger Slide Deck http://www.slideshare.net/JBollinger/minnebar-2013-scaling- with-cassandra Thanks