SlideShare a Scribd company logo
1 of 45
introduction to cassandra
eben hewitt
september 29. 2010
web 2.0 expo
new york city
• director, application architecture
at a global corp
• focus on SOA, SaaS, Events
• i wrote this
@ebenhewitt
agenda
• context
• features
• data model
• api
“nosql”  “big data”
• mongodb
• couchdb
• tokyo cabinet
• redis
• riak
• what about?
– Poet, Lotus, Xindice
– they’ve been around forever…
– rdbms was once the new kid…
innovation at scale
• google bigtable (2006)
– consistency model: strong
– data model: sparse map
– clones: hbase, hypertable
• amazon dynamo (2007)
– O(1) dht
– consistency model: client tune-able
– clones: riak, voldemort
cassandra ~= bigtable + dynamo
proven
• The Facebook stores 150TB of data on 150 nodes
web 2.0
• used at Twitter, Rackspace, Mahalo, Reddit,
Cloudkick, Cisco, Digg, SimpleGeo, Ooyala, OpenX,
others
cap theorem
•consistency
– all clients have same view of data
•availability
– writeable in the face of node failure
•partition tolerance
– processing can continue in the face of network failure
(crashed router, broken network)
daniel abadi: pacelc
write consistency
Level Description
ZERO Good luck with that
ANY 1 replica (hints count)
ONE 1 replica. read repair in bkgnd
QUORUM (DCQ for RackAware) (N /2) + 1
ALL N = replication factor
Level Description
ZERO Ummm…
ANY Try ONE instead
ONE 1 replica
QUORUM (DCQ for RackAware) Return most recent TS after (N /2) + 1
report
ALL N = replication factor
read consistency
agenda
• context
• features
• data model
• api
cassandra properties
• tuneably consistent
• very fast writes
• highly available
• fault tolerant
• linear, elastic scalability
• decentralized/symmetric
• ~12 client languages
– Thrift RPC API
• ~automatic provisioning of new nodes
• 0(1) dht
• big data
write op
Staged Event-Driven Architecture
• A general-purpose framework for high
concurrency & load conditioning
• Decomposes applications into stages
separated by queues
• Adopt a structured approach to event-driven
concurrency
instrumentation
data replication
• configurable replication factor
• replica placement strategy
rack unaware  Simple Strategy
rack aware  Old Network Topology Strategy
data center shard  Network Topology Strategy
partitioner smack-down
Random Preserving
• system will use MD5(key) to
distribute data across nodes
• even distribution of keys
from one CF across
ranges/nodes
Order Preserving
• key distribution determined
by token
• lexicographical ordering
• required for range queries
– scan over rows like cursor in
index
• can specify the token for
this node to use
• ‘scrabble’ distribution
agenda
• context
• features
• data model
• api
structure
keyspace
settings
(eg,
partitioner)
column family
settings (eg,
comparator,
type [Std])
column
name value clock
keyspace
• ~= database
• typically one per application
• some settings are configurable only per
keyspace
column family
• group records of similar kind
• not same kind, because CFs are sparse tables
• ex:
– User
– Address
– Tweet
– PointOfInterest
– HotelRoom
think of cassandra as
row-oriented
• each row is uniquely identifiable by key
• rows group columns and super columns
column family
n=
42
user=eben
key
123
key
456
user=alison
icon=
nickname=
The
Situation
json-like notation
User {
123 : { email: alison@foo.com,
icon: },
456 : { email: eben@bar.com,
location: The Danger Zone}
}
0.6 example
$cassandra –f
$bin/cassandra-cli
cassandra> connect localhost/9160
cassandra> set
Keyspace1.Standard1[‘eben’][‘age’]=‘29’
cassandra> set
Keyspace1.Standard1[‘eben’][‘email’]=‘e@e.com’
cassandra> get Keyspace1.Standard1[‘eben'][‘age']
=> (column=6e616d65, value=39,
timestamp=1282170655390000)
a column has 3 parts
1. name
– byte[]
– determines sort order
– used in queries
– indexed
2. value
– byte[]
– you don’t query on column values
3. timestamp
– long (clock)
– last write wins conflict resolution
column comparators
• byte
• utf8
• long
• timeuuid
• lexicaluuid
• <pluggable>
– ex: lat/long
super column
super columns group columns under a common name
<<SCF>>PointOfInterest
super column family
<<SC>>Central
Park
10017
<<SC>>
Empire State Bldg
<<SC>>
Phoenix
Zoo
85255
desc=Fun to
walk in.
phone=212.
555.11212
desc=Great
view from
102nd floor!
PointOfInterest {
key: 85255 {
Phoenix Zoo { phone: 480-555-5555, desc: They have animals here. },
Spring Training { phone: 623-333-3333, desc: Fun for baseball fans. },
}, //end phx
key: 10019 {
Central Park { desc: Walk around. It's pretty.} ,
Empire State Building { phone: 212-777-7777,
desc: Great view from 102nd floor. }
} //end nyc
}
s
super column
super column family
flexible schema
key
column
super column family
about super column families
• sub-column names in a SCF are not indexed
– top level columns (SCF Name) are always indexed
• often used for denormalizing data from
standard CFs
agenda
• context
• features
• data model
• api
slice predicate
• data structure describing columns to return
– SliceRange
• start column name
• finish column name (can be empty to stop on count)
• reverse
• count (like LIMIT)
read api
• get() : Column
– get the Col or SC at given ColPath
COSC cosc = client.get(key, path, CL);
• get_slice() : List<ColumnOrSuperColumn>
– get Cols in one row, specified by SlicePredicate:
List<ColumnOrSuperColumn> results =
client.get_slice(key, parent, predicate, CL);
• multiget_slice() : Map<key, List<CoSC>>
– get slices for list of keys, based on SlicePredicate
Map<byte[],List<ColumnOrSuperColumn>> results =
client.multiget_slice(rowKeys, parent, predicate, CL);
• get_range_slices() : List<KeySlice>
– returns multiple Cols according to a range
– range is startkey, endkey, starttoken, endtoken:
List<KeySlice> slices = client.get_range_slices(
parent, predicate, keyRange, CL);
write api
client.insert(userKeyBytes, parent,
new Column(“band".getBytes(UTF8),
“Funkadelic".getBytes(), clock), CL);
batch_mutate
– void batch_mutate(
map<byte[], map<String, List<Mutation>>> , CL)
remove
– void remove(byte[],
ColumnPath column_path, Clock, CL)
batch_mutate
//create param
Map<byte[], Map<String, List<Mutation>>> mutationMap =
new HashMap<byte[], Map<String, List<Mutation>>>();
//create Cols for Muts
Column nameCol = new Column("name".getBytes(UTF8),
“Funkadelic”.getBytes("UTF-8"), new Clock(System.nanoTime()););
Mutation nameMut = new Mutation();
nameMut.column_or_supercolumn = nameCosc; //also phone, etc
Map<String, List<Mutation>> muts = new HashMap<String, List<Mutation>>();
List<Mutation> cols = new ArrayList<Mutation>();
cols.add(nameMut);
cols.add(phoneMut);
muts.put(CF, cols);
//outer map key is a row key; inner map key is the CF name
mutationMap.put(rowKey.getBytes(), muts);
//send to server
client.batch_mutate(mutationMap, CL);
raw thrift: for masochists only
• pycassa (python)
• fauna (ruby)
• hector (java)
• pelops (java)
• kundera (JPA)
• hectorSharp (C#)
what about…
SELECT WHERE
ORDER BY
JOIN ON
GROUP
rdbms: domain-based model
what answers do I have?
cassandra: query-based model
what questions do I have?
SELECT WHERE
cassandra is an index factory
<<cf>>USER
Key: UserID
Cols: username, email, birth date, city, state
How to support this query?
SELECT * FROM User WHERE city = ‘Scottsdale’
Create a new CF called UserCity:
<<cf>>USERCITY
Key: city
Cols: IDs of the users in that city.
Also uses the Valueless Column pattern
• Use an aggregate key
state:city: { user1, user2}
• Get rows between AZ: & AZ;
for all Arizona users
• Get rows between AZ:Scottsdale &
AZ:Scottsdale1
for all Scottsdale users
SELECT WHERE pt 2
ORDER BY
Rows
are placed according to their Partitioner:
•Random: MD5 of key
•Order-Preserving: actual key
are sorted by key, regardless of partitioner
Columns
are sorted according to
CompareWith or
CompareSubcolumnsWith
rdbms
cassandra
is cassandra a good fit?
• you need really fast writes
• you need durability
• you have lots of data
> GBs
>= three servers
• your app is evolving
– startup mode, fluid data
structure
• loose domain data
– “points of interest”
• your programmers can deal
– documentation
– complexity
– consistency model
– change
– visibility tools
• your operations can deal
– hardware considerations
– can move data
– JMX monitoring
thank you!
@ebenhewitt

More Related Content

Similar to Scaling Web Applications with Cassandra Presentation.ppt

So You Want to Write a Connector?
So You Want to Write a Connector? So You Want to Write a Connector?
So You Want to Write a Connector? confluent
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature PreviewYonik Seeley
 
Hive @ Bucharest Java User Group
Hive @ Bucharest Java User GroupHive @ Bucharest Java User Group
Hive @ Bucharest Java User GroupRemus Rusanu
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
Take a Trip Into the Forest: A Java Primer on Maps, Trees, and Collections
Take a Trip Into the Forest: A Java Primer on Maps, Trees, and Collections Take a Trip Into the Forest: A Java Primer on Maps, Trees, and Collections
Take a Trip Into the Forest: A Java Primer on Maps, Trees, and Collections Teamstudio
 
Hadoop Tutorial with @techmilind
Hadoop Tutorial with @techmilindHadoop Tutorial with @techmilind
Hadoop Tutorial with @techmilindEMC
 
SQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPSQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPTony Rogerson
 
Cascading introduction
Cascading introductionCascading introduction
Cascading introductionAlex Su
 
Jan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester MeetupJan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester MeetupChristopher Batey
 
Solr as a Spark SQL Datasource
Solr as a Spark SQL DatasourceSolr as a Spark SQL Datasource
Solr as a Spark SQL DatasourceChitturi Kiran
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparisonshsedghi
 
MWLUG Session- AD112 - Take a Trip Into the Forest - A Java Primer on Maps, ...
MWLUG Session-  AD112 - Take a Trip Into the Forest - A Java Primer on Maps, ...MWLUG Session-  AD112 - Take a Trip Into the Forest - A Java Primer on Maps, ...
MWLUG Session- AD112 - Take a Trip Into the Forest - A Java Primer on Maps, ...Howard Greenberg
 
New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012 New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012 Richie Rump
 
No SQL, No problem - using MongoDB in Ruby
No SQL, No problem - using MongoDB in RubyNo SQL, No problem - using MongoDB in Ruby
No SQL, No problem - using MongoDB in Rubysbeam
 
Get started with Lua - Hackference 2016
Get started with Lua - Hackference 2016Get started with Lua - Hackference 2016
Get started with Lua - Hackference 2016Etiene Dalcol
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 

Similar to Scaling Web Applications with Cassandra Presentation.ppt (20)

So You Want to Write a Connector?
So You Want to Write a Connector? So You Want to Write a Connector?
So You Want to Write a Connector?
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 
Presentation
PresentationPresentation
Presentation
 
Hive @ Bucharest Java User Group
Hive @ Bucharest Java User GroupHive @ Bucharest Java User Group
Hive @ Bucharest Java User Group
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Take a Trip Into the Forest: A Java Primer on Maps, Trees, and Collections
Take a Trip Into the Forest: A Java Primer on Maps, Trees, and Collections Take a Trip Into the Forest: A Java Primer on Maps, Trees, and Collections
Take a Trip Into the Forest: A Java Primer on Maps, Trees, and Collections
 
Hadoop Tutorial with @techmilind
Hadoop Tutorial with @techmilindHadoop Tutorial with @techmilind
Hadoop Tutorial with @techmilind
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
SQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPSQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTP
 
Cascading introduction
Cascading introductionCascading introduction
Cascading introduction
 
Jan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester MeetupJan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester Meetup
 
Solr as a Spark SQL Datasource
Solr as a Spark SQL DatasourceSolr as a Spark SQL Datasource
Solr as a Spark SQL Datasource
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparison
 
MWLUG Session- AD112 - Take a Trip Into the Forest - A Java Primer on Maps, ...
MWLUG Session-  AD112 - Take a Trip Into the Forest - A Java Primer on Maps, ...MWLUG Session-  AD112 - Take a Trip Into the Forest - A Java Primer on Maps, ...
MWLUG Session- AD112 - Take a Trip Into the Forest - A Java Primer on Maps, ...
 
New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012 New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012
 
No SQL, No problem - using MongoDB in Ruby
No SQL, No problem - using MongoDB in RubyNo SQL, No problem - using MongoDB in Ruby
No SQL, No problem - using MongoDB in Ruby
 
Get started with Lua - Hackference 2016
Get started with Lua - Hackference 2016Get started with Lua - Hackference 2016
Get started with Lua - Hackference 2016
 
01 hbase
01 hbase01 hbase
01 hbase
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 

More from ssuserbad56d (7)

search
searchsearch
search
 
search
searchsearch
search
 
Cassandra
CassandraCassandra
Cassandra
 
Redis
RedisRedis
Redis
 
Covered Call
Covered CallCovered Call
Covered Call
 
Lec04.pdf
Lec04.pdfLec04.pdf
Lec04.pdf
 
Project.pdf
Project.pdfProject.pdf
Project.pdf
 

Recently uploaded

KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 

Recently uploaded (20)

KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 

Scaling Web Applications with Cassandra Presentation.ppt

  • 1. introduction to cassandra eben hewitt september 29. 2010 web 2.0 expo new york city
  • 2. • director, application architecture at a global corp • focus on SOA, SaaS, Events • i wrote this @ebenhewitt
  • 4. “nosql”  “big data” • mongodb • couchdb • tokyo cabinet • redis • riak • what about? – Poet, Lotus, Xindice – they’ve been around forever… – rdbms was once the new kid…
  • 5. innovation at scale • google bigtable (2006) – consistency model: strong – data model: sparse map – clones: hbase, hypertable • amazon dynamo (2007) – O(1) dht – consistency model: client tune-able – clones: riak, voldemort cassandra ~= bigtable + dynamo
  • 6. proven • The Facebook stores 150TB of data on 150 nodes web 2.0 • used at Twitter, Rackspace, Mahalo, Reddit, Cloudkick, Cisco, Digg, SimpleGeo, Ooyala, OpenX, others
  • 7. cap theorem •consistency – all clients have same view of data •availability – writeable in the face of node failure •partition tolerance – processing can continue in the face of network failure (crashed router, broken network)
  • 9. write consistency Level Description ZERO Good luck with that ANY 1 replica (hints count) ONE 1 replica. read repair in bkgnd QUORUM (DCQ for RackAware) (N /2) + 1 ALL N = replication factor Level Description ZERO Ummm… ANY Try ONE instead ONE 1 replica QUORUM (DCQ for RackAware) Return most recent TS after (N /2) + 1 report ALL N = replication factor read consistency
  • 11. cassandra properties • tuneably consistent • very fast writes • highly available • fault tolerant • linear, elastic scalability • decentralized/symmetric • ~12 client languages – Thrift RPC API • ~automatic provisioning of new nodes • 0(1) dht • big data
  • 13. Staged Event-Driven Architecture • A general-purpose framework for high concurrency & load conditioning • Decomposes applications into stages separated by queues • Adopt a structured approach to event-driven concurrency
  • 15. data replication • configurable replication factor • replica placement strategy rack unaware  Simple Strategy rack aware  Old Network Topology Strategy data center shard  Network Topology Strategy
  • 16. partitioner smack-down Random Preserving • system will use MD5(key) to distribute data across nodes • even distribution of keys from one CF across ranges/nodes Order Preserving • key distribution determined by token • lexicographical ordering • required for range queries – scan over rows like cursor in index • can specify the token for this node to use • ‘scrabble’ distribution
  • 19. keyspace • ~= database • typically one per application • some settings are configurable only per keyspace
  • 20. column family • group records of similar kind • not same kind, because CFs are sparse tables • ex: – User – Address – Tweet – PointOfInterest – HotelRoom
  • 21. think of cassandra as row-oriented • each row is uniquely identifiable by key • rows group columns and super columns
  • 23. json-like notation User { 123 : { email: alison@foo.com, icon: }, 456 : { email: eben@bar.com, location: The Danger Zone} }
  • 24. 0.6 example $cassandra –f $bin/cassandra-cli cassandra> connect localhost/9160 cassandra> set Keyspace1.Standard1[‘eben’][‘age’]=‘29’ cassandra> set Keyspace1.Standard1[‘eben’][‘email’]=‘e@e.com’ cassandra> get Keyspace1.Standard1[‘eben'][‘age'] => (column=6e616d65, value=39, timestamp=1282170655390000)
  • 25. a column has 3 parts 1. name – byte[] – determines sort order – used in queries – indexed 2. value – byte[] – you don’t query on column values 3. timestamp – long (clock) – last write wins conflict resolution
  • 26. column comparators • byte • utf8 • long • timeuuid • lexicaluuid • <pluggable> – ex: lat/long
  • 27. super column super columns group columns under a common name
  • 28. <<SCF>>PointOfInterest super column family <<SC>>Central Park 10017 <<SC>> Empire State Bldg <<SC>> Phoenix Zoo 85255 desc=Fun to walk in. phone=212. 555.11212 desc=Great view from 102nd floor!
  • 29. PointOfInterest { key: 85255 { Phoenix Zoo { phone: 480-555-5555, desc: They have animals here. }, Spring Training { phone: 623-333-3333, desc: Fun for baseball fans. }, }, //end phx key: 10019 { Central Park { desc: Walk around. It's pretty.} , Empire State Building { phone: 212-777-7777, desc: Great view from 102nd floor. } } //end nyc } s super column super column family flexible schema key column super column family
  • 30. about super column families • sub-column names in a SCF are not indexed – top level columns (SCF Name) are always indexed • often used for denormalizing data from standard CFs
  • 32. slice predicate • data structure describing columns to return – SliceRange • start column name • finish column name (can be empty to stop on count) • reverse • count (like LIMIT)
  • 33. read api • get() : Column – get the Col or SC at given ColPath COSC cosc = client.get(key, path, CL); • get_slice() : List<ColumnOrSuperColumn> – get Cols in one row, specified by SlicePredicate: List<ColumnOrSuperColumn> results = client.get_slice(key, parent, predicate, CL); • multiget_slice() : Map<key, List<CoSC>> – get slices for list of keys, based on SlicePredicate Map<byte[],List<ColumnOrSuperColumn>> results = client.multiget_slice(rowKeys, parent, predicate, CL); • get_range_slices() : List<KeySlice> – returns multiple Cols according to a range – range is startkey, endkey, starttoken, endtoken: List<KeySlice> slices = client.get_range_slices( parent, predicate, keyRange, CL);
  • 34. write api client.insert(userKeyBytes, parent, new Column(“band".getBytes(UTF8), “Funkadelic".getBytes(), clock), CL); batch_mutate – void batch_mutate( map<byte[], map<String, List<Mutation>>> , CL) remove – void remove(byte[], ColumnPath column_path, Clock, CL)
  • 35. batch_mutate //create param Map<byte[], Map<String, List<Mutation>>> mutationMap = new HashMap<byte[], Map<String, List<Mutation>>>(); //create Cols for Muts Column nameCol = new Column("name".getBytes(UTF8), “Funkadelic”.getBytes("UTF-8"), new Clock(System.nanoTime());); Mutation nameMut = new Mutation(); nameMut.column_or_supercolumn = nameCosc; //also phone, etc Map<String, List<Mutation>> muts = new HashMap<String, List<Mutation>>(); List<Mutation> cols = new ArrayList<Mutation>(); cols.add(nameMut); cols.add(phoneMut); muts.put(CF, cols); //outer map key is a row key; inner map key is the CF name mutationMap.put(rowKey.getBytes(), muts); //send to server client.batch_mutate(mutationMap, CL);
  • 36. raw thrift: for masochists only • pycassa (python) • fauna (ruby) • hector (java) • pelops (java) • kundera (JPA) • hectorSharp (C#)
  • 38. rdbms: domain-based model what answers do I have? cassandra: query-based model what questions do I have?
  • 39. SELECT WHERE cassandra is an index factory <<cf>>USER Key: UserID Cols: username, email, birth date, city, state How to support this query? SELECT * FROM User WHERE city = ‘Scottsdale’ Create a new CF called UserCity: <<cf>>USERCITY Key: city Cols: IDs of the users in that city. Also uses the Valueless Column pattern
  • 40. • Use an aggregate key state:city: { user1, user2} • Get rows between AZ: & AZ; for all Arizona users • Get rows between AZ:Scottsdale & AZ:Scottsdale1 for all Scottsdale users SELECT WHERE pt 2
  • 41. ORDER BY Rows are placed according to their Partitioner: •Random: MD5 of key •Order-Preserving: actual key are sorted by key, regardless of partitioner Columns are sorted according to CompareWith or CompareSubcolumnsWith
  • 42. rdbms
  • 44. is cassandra a good fit? • you need really fast writes • you need durability • you have lots of data > GBs >= three servers • your app is evolving – startup mode, fluid data structure • loose domain data – “points of interest” • your programmers can deal – documentation – complexity – consistency model – change – visibility tools • your operations can deal – hardware considerations – can move data – JMX monitoring