SlideShare a Scribd company logo
Professional Cassandra support and services




Tuesday, August 10, 2010
Cassandra: Present & Future
                           Jonathan Ellis
                             @spyced




Tuesday, August 10, 2010
Cassandra 0.6 & 0.7
                                 Jonathan Ellis
                                   @spyced




Tuesday, August 10, 2010
Quiet change of policy

                    • 0.5.1 was bug fixes only
                    • Too early to be strict about bugfix-only
                           policy in stable branch, especially w/ 0.7
                           being longer/more break-y
                    • Maybe after 1.0?

Tuesday, August 10, 2010
1500
                                                                                      mails sent

          1125



            750



            375



                0
                           Jan     Feb             Apr            May       Jun         Jul
                           (0.5)   (0.5.1)   Mar   (0.6, 0.6.1)   (0.6.2)   (0.6.3)     (0.6.4)




Tuesday, August 10, 2010
Lots of bug fixes


                    • 85 issues marked Resolved/Fixed in 0.6
                           branch after 0.6 released




Tuesday, August 10, 2010
Runtime configuration

                    • concurrent reads, writes (0.6.2)
                           •   making it easier to bandage your foot after you
                               shoot it

                    • PhiConvictThreshold (0.6.2)


Tuesday, August 10, 2010
Performance

                    • JVM GC defaults (0.6.2)
                    • Faster commitlog (0.6.2)
                    • Faster range slice, Hadoop jobs (0.6.1, 2)
                    • Better parallelization of multiget (0.6.4)
                    • UTF8Type, UUIDType optimizations (0.6.5)

Tuesday, August 10, 2010
Bulletproofing
                    •      HH disable (0.6.2)

                    •      compaction priority (0.6.3)

                    •      HH hourly scan (0.6.3)

                    •      JMX metrics for row-level bloom filters (0.6.3)

                    •      Flow control (0.6.4, 5)

                    •      HH paging (0.6.5)

                    •      Dynamic snitch (0.6.5)


Tuesday, August 10, 2010
Hinted Handoff
                    •      0.6.0: send hints to natural replicas

                    •      0.6.0: fix row-level concurrency bottleneck

                    •      0.6.2: option to disable entirely

                    •      0.6.3: remove hourly scan

                    •      0.6.4: lower priority

                    •      0.6.5: paging of large hinted rows

                    •      0.7.0: large rows


Tuesday, August 10, 2010
Why keep HH around?




                           https://www.cloudkick.com/blog/2010/jan/12/visual-ec2-latency/



Tuesday, August 10, 2010
Compaction priority


                    -XX:+UseThreadPriorities 
                    -XX:ThreadPriorityPolicy=42 
                    -Dcassandra.compaction.priority=1 

                              Extended to HH in 0.6.4


Tuesday, August 10, 2010
http://www.javamex.com/tutorials/threads/priority_what.shtml

Tuesday, August 10, 2010
JMX for bloom filters


                    • o.a.c.db:ColumnFamilyStores
                           •   getBloomFilterFalsePositives

                               •   [not in nodetool yet]




Tuesday, August 10, 2010
Flow control in 0.5


                    • Why backpressure doesn’t fit Cassandra



Tuesday, August 10, 2010
Flow Control in 0.6.4
                    • Replica nodes drop hopeless requests on
                           the floor
                           •   Coordinator node is unaffected

                           •   TimedOutException signals client to back off

                           •   Requires enough memory to buffer
                               RPCTimeout’s worth of requests

                    • (In the short term, you’re still screwed)
Tuesday, August 10, 2010
Flow Control, 0.6.4
                              IncomingTcpConnection


                               Message Deserializer   Uncapped


                                Read       Mutation Capped at 4096




Tuesday, August 10, 2010
IncomingTcpConnection


                                  Message Deserializer


                           Read         Gossip       Mutation




Tuesday, August 10, 2010
Flow Control, 0.6.5
                               IncomingTcpConnection


                            Read      Gossip      Mutation Uncapped




Tuesday, August 10, 2010
Dynamic snitch


                    • sortByProximity



Tuesday, August 10, 2010
Open problems

                    • Linux/mmap/swap unholy trio (0.6.5)
                    • Memory fragmentation (0.6.5?)
                    • Compaction effect on caches (0.7.1?)


Tuesday, August 10, 2010
mmap and swap
                    • The problem
                    • Mitigations
                           •   mmap_index_only

                           •   swappiness=0

                               •   turn off swap

                    • mlockall at startup (Xms=Xmx)
Tuesday, August 10, 2010
GC Fragmentation


                    • Culprit of infamous CASSANDRA-1014?
                    • Mitigation: tune with much larger new
                           generation / tenuring threshold?




Tuesday, August 10, 2010
Compaction and caches


                    • Compactions wrecks the OS fs cache
                    • Wrecks Cassandra key cache, too
                           •   (but not row cache)




Tuesday, August 10, 2010
0.7

Tuesday, August 10, 2010
New in 0.7

                    • live schema changes
                    • large rows
                    • secondary indexes
                    • efficient Streaming
                    • DatacenterStrategy

Tuesday, August 10, 2010
Live schema changes


                    • Details: http://www.riptano.com/blog/live-
                           schema-updates-cassandra-07




Tuesday, August 10, 2010
Large rows


                    • 0.6: smaller of {2GB, memory limit}
                    • 0.7: in_memory_compaction_limit_in_mb


Tuesday, August 10, 2010
Secondary indexes




Tuesday, August 10, 2010
Streaming in 0.6
                               W           A




                                               F
                                   (A-L]


                           T

                                           L




Tuesday, August 10, 2010
W           A




                                                F
                                   (A-F]


                                               (A-F]
                           T
                                   (F-L]
                                           L




Tuesday, August 10, 2010
W            A




                                                F


                                   Data
                           T

                                            L
                                   Index
                                   Filter
Tuesday, August 10, 2010
Streaming in 0.7
                               W            A




                                                F



                           T

                                            L
                                   Index
                                   Filter
Tuesday, August 10, 2010
DatacenterStrategy

                    • RackAwareStrategy is tuned for 3 replicas
                           and 2 data centers
                    • DS allows configuring replicas per data
                           center, per Keyspace




Tuesday, August 10, 2010
Minor features in 0.7

                    • read_repair_chance
                    • per-keyspace request scheduling
                    • Hadoop OutputFormat
                    • Per CF what used to be global
                           (gc_grace_seconds, memtable thresholds)



Tuesday, August 10, 2010
0.7 API changes

                    • String keys become byte[]
                    • Thrift keyspace argument moved to
                           set_keyspace
                    • i64 timestamp becomes Clock
                    • SlicePredicate for _count methods

Tuesday, August 10, 2010
0.7 performance
                    • Reads roughly 100% faster, thanks largely to
                           removing String creation
                    • Row-cached reads up to 8x faster after
                           optimizations by tjake and jbellis
                    • Optimizations for reads of large rows
                    • 0.7.1? ~20% improvement everywhere from
                           Thrift optimizations


Tuesday, August 10, 2010
Thrift

                    • OOMs on malformed packets
                    • Python Unicode string issues
                    • PHP support is buggy and maintainerless


Tuesday, August 10, 2010
After 0.7.0
                    • IndexOperator.GT
                    • Triggers / plugins
                    • Avro?
                    • On-disk data format improvements
                           (Compression, heirarchical data?)
                    • Auth
Tuesday, August 10, 2010
Questions




Tuesday, August 10, 2010

More Related Content

Similar to State of Cassandra, August 2010

Akka scalaliftoff london_2010
Akka scalaliftoff london_2010Akka scalaliftoff london_2010
Akka scalaliftoff london_2010Skills Matter
 
HBase @ Hadoop Day Seattle
HBase @ Hadoop Day SeattleHBase @ Hadoop Day Seattle
HBase @ Hadoop Day Seattleamansk
 
Playing between the clouds - Better Software 2010
Playing between the clouds - Better Software 2010Playing between the clouds - Better Software 2010
Playing between the clouds - Better Software 2010
Stefano Linguerri
 
Gaelyk - SpringOne2GX - 2010 - Guillaume Laforge
Gaelyk - SpringOne2GX - 2010 - Guillaume LaforgeGaelyk - SpringOne2GX - 2010 - Guillaume Laforge
Gaelyk - SpringOne2GX - 2010 - Guillaume LaforgeGuillaume Laforge
 
Debugging your JavaScript
Debugging your JavaScriptDebugging your JavaScript
Debugging your JavaScript
Diogo Antunes
 
Ops for Developers
Ops for DevelopersOps for Developers
Ops for Developers
Mojo Lingo
 
Jeff mc cune sf 2010
Jeff mc cune sf 2010Jeff mc cune sf 2010
Jeff mc cune sf 2010Puppet
 
OpenStack Summit, A Community of Service Providers
OpenStack Summit, A Community of Service ProvidersOpenStack Summit, A Community of Service Providers
OpenStack Summit, A Community of Service Providers
Andrew Shafer
 
Riak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared StateRiak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared State
Rusty Klophaus
 
12 hours to rate a rails application
12 hours to rate a rails application12 hours to rate a rails application
12 hours to rate a rails application
ehuard
 
Odnoklassniki.ru Architecture
Odnoklassniki.ru ArchitectureOdnoklassniki.ru Architecture
Odnoklassniki.ru ArchitectureDmitry Buzdin
 
Edted 2010 Ruby on Rails
Edted 2010 Ruby on RailsEdted 2010 Ruby on Rails
Edted 2010 Ruby on Rails
Fabio Akita
 
MySQL Sandbox - A toolkit for laziness
MySQL Sandbox - A toolkit for lazinessMySQL Sandbox - A toolkit for laziness
MySQL Sandbox - A toolkit for laziness
Giuseppe Maxia
 
Please Don't Touch the Slow Parts
Please Don't Touch the Slow PartsPlease Don't Touch the Slow Parts
Please Don't Touch the Slow Parts
Federico Galassi
 
In depth with html5 java2days 2010
In depth with html5 java2days 2010In depth with html5 java2days 2010
In depth with html5 java2days 2010
Mystic Coders, LLC
 
Silentale mongo slides
Silentale   mongo slidesSilentale   mongo slides
Silentale mongo slidesSkills Matter
 

Similar to State of Cassandra, August 2010 (20)

Akka scalaliftoff london_2010
Akka scalaliftoff london_2010Akka scalaliftoff london_2010
Akka scalaliftoff london_2010
 
HBase @ Hadoop Day Seattle
HBase @ Hadoop Day SeattleHBase @ Hadoop Day Seattle
HBase @ Hadoop Day Seattle
 
Playing between the clouds - Better Software 2010
Playing between the clouds - Better Software 2010Playing between the clouds - Better Software 2010
Playing between the clouds - Better Software 2010
 
Gaelyk - SpringOne2GX - 2010 - Guillaume Laforge
Gaelyk - SpringOne2GX - 2010 - Guillaume LaforgeGaelyk - SpringOne2GX - 2010 - Guillaume Laforge
Gaelyk - SpringOne2GX - 2010 - Guillaume Laforge
 
Debugging your JavaScript
Debugging your JavaScriptDebugging your JavaScript
Debugging your JavaScript
 
Ops for Developers
Ops for DevelopersOps for Developers
Ops for Developers
 
Jeff mc cune sf 2010
Jeff mc cune sf 2010Jeff mc cune sf 2010
Jeff mc cune sf 2010
 
OpenStack Summit, A Community of Service Providers
OpenStack Summit, A Community of Service ProvidersOpenStack Summit, A Community of Service Providers
OpenStack Summit, A Community of Service Providers
 
Riak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared StateRiak Core: Building Distributed Applications Without Shared State
Riak Core: Building Distributed Applications Without Shared State
 
12 hours to rate a rails application
12 hours to rate a rails application12 hours to rate a rails application
12 hours to rate a rails application
 
Odnoklassniki.ru Architecture
Odnoklassniki.ru ArchitectureOdnoklassniki.ru Architecture
Odnoklassniki.ru Architecture
 
Is these a bug
Is these a bugIs these a bug
Is these a bug
 
Edted 2010 Ruby on Rails
Edted 2010 Ruby on RailsEdted 2010 Ruby on Rails
Edted 2010 Ruby on Rails
 
MySQL Sandbox - A toolkit for laziness
MySQL Sandbox - A toolkit for lazinessMySQL Sandbox - A toolkit for laziness
MySQL Sandbox - A toolkit for laziness
 
Please Don't Touch the Slow Parts
Please Don't Touch the Slow PartsPlease Don't Touch the Slow Parts
Please Don't Touch the Slow Parts
 
In depth with html5 java2days 2010
In depth with html5 java2days 2010In depth with html5 java2days 2010
In depth with html5 java2days 2010
 
Node.js and Ruby
Node.js and RubyNode.js and Ruby
Node.js and Ruby
 
Silentale mongo slides
Silentale   mongo slidesSilentale   mongo slides
Silentale mongo slides
 
Java to scala
Java to scalaJava to scala
Java to scala
 
HTML5 offline
HTML5 offlineHTML5 offline
HTML5 offline
 

More from jbellis

Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
jbellis
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
jbellis
 
Data day texas: Cassandra and the Cloud
Data day texas: Cassandra and the CloudData day texas: Cassandra and the Cloud
Data day texas: Cassandra and the Cloud
jbellis
 
Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015
jbellis
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014
jbellis
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1jbellis
 
Tokyo cassandra conference 2014
Tokyo cassandra conference 2014Tokyo cassandra conference 2014
Tokyo cassandra conference 2014jbellis
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013jbellis
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0jbellis
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynotejbellis
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012jbellis
 
Top five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionTop five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionjbellis
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012jbellis
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandrajbellis
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1jbellis
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Javajbellis
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprisejbellis
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)jbellis
 
Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011jbellis
 
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)jbellis
 

More from jbellis (20)

Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
Data day texas: Cassandra and the Cloud
Data day texas: Cassandra and the CloudData day texas: Cassandra and the Cloud
Data day texas: Cassandra and the Cloud
 
Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1
 
Tokyo cassandra conference 2014
Tokyo cassandra conference 2014Tokyo cassandra conference 2014
Tokyo cassandra conference 2014
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012
 
Top five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionTop five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solution
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandra
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Java
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
 
Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011
 
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
 

Recently uploaded

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 

Recently uploaded (20)

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 

State of Cassandra, August 2010

  • 1. Professional Cassandra support and services Tuesday, August 10, 2010
  • 2. Cassandra: Present & Future Jonathan Ellis @spyced Tuesday, August 10, 2010
  • 3. Cassandra 0.6 & 0.7 Jonathan Ellis @spyced Tuesday, August 10, 2010
  • 4. Quiet change of policy • 0.5.1 was bug fixes only • Too early to be strict about bugfix-only policy in stable branch, especially w/ 0.7 being longer/more break-y • Maybe after 1.0? Tuesday, August 10, 2010
  • 5. 1500 mails sent 1125 750 375 0 Jan Feb Apr May Jun Jul (0.5) (0.5.1) Mar (0.6, 0.6.1) (0.6.2) (0.6.3) (0.6.4) Tuesday, August 10, 2010
  • 6. Lots of bug fixes • 85 issues marked Resolved/Fixed in 0.6 branch after 0.6 released Tuesday, August 10, 2010
  • 7. Runtime configuration • concurrent reads, writes (0.6.2) • making it easier to bandage your foot after you shoot it • PhiConvictThreshold (0.6.2) Tuesday, August 10, 2010
  • 8. Performance • JVM GC defaults (0.6.2) • Faster commitlog (0.6.2) • Faster range slice, Hadoop jobs (0.6.1, 2) • Better parallelization of multiget (0.6.4) • UTF8Type, UUIDType optimizations (0.6.5) Tuesday, August 10, 2010
  • 9. Bulletproofing • HH disable (0.6.2) • compaction priority (0.6.3) • HH hourly scan (0.6.3) • JMX metrics for row-level bloom filters (0.6.3) • Flow control (0.6.4, 5) • HH paging (0.6.5) • Dynamic snitch (0.6.5) Tuesday, August 10, 2010
  • 10. Hinted Handoff • 0.6.0: send hints to natural replicas • 0.6.0: fix row-level concurrency bottleneck • 0.6.2: option to disable entirely • 0.6.3: remove hourly scan • 0.6.4: lower priority • 0.6.5: paging of large hinted rows • 0.7.0: large rows Tuesday, August 10, 2010
  • 11. Why keep HH around? https://www.cloudkick.com/blog/2010/jan/12/visual-ec2-latency/ Tuesday, August 10, 2010
  • 12. Compaction priority -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Dcassandra.compaction.priority=1 Extended to HH in 0.6.4 Tuesday, August 10, 2010
  • 14. JMX for bloom filters • o.a.c.db:ColumnFamilyStores • getBloomFilterFalsePositives • [not in nodetool yet] Tuesday, August 10, 2010
  • 15. Flow control in 0.5 • Why backpressure doesn’t fit Cassandra Tuesday, August 10, 2010
  • 16. Flow Control in 0.6.4 • Replica nodes drop hopeless requests on the floor • Coordinator node is unaffected • TimedOutException signals client to back off • Requires enough memory to buffer RPCTimeout’s worth of requests • (In the short term, you’re still screwed) Tuesday, August 10, 2010
  • 17. Flow Control, 0.6.4 IncomingTcpConnection Message Deserializer Uncapped Read Mutation Capped at 4096 Tuesday, August 10, 2010
  • 18. IncomingTcpConnection Message Deserializer Read Gossip Mutation Tuesday, August 10, 2010
  • 19. Flow Control, 0.6.5 IncomingTcpConnection Read Gossip Mutation Uncapped Tuesday, August 10, 2010
  • 20. Dynamic snitch • sortByProximity Tuesday, August 10, 2010
  • 21. Open problems • Linux/mmap/swap unholy trio (0.6.5) • Memory fragmentation (0.6.5?) • Compaction effect on caches (0.7.1?) Tuesday, August 10, 2010
  • 22. mmap and swap • The problem • Mitigations • mmap_index_only • swappiness=0 • turn off swap • mlockall at startup (Xms=Xmx) Tuesday, August 10, 2010
  • 23. GC Fragmentation • Culprit of infamous CASSANDRA-1014? • Mitigation: tune with much larger new generation / tenuring threshold? Tuesday, August 10, 2010
  • 24. Compaction and caches • Compactions wrecks the OS fs cache • Wrecks Cassandra key cache, too • (but not row cache) Tuesday, August 10, 2010
  • 26. New in 0.7 • live schema changes • large rows • secondary indexes • efficient Streaming • DatacenterStrategy Tuesday, August 10, 2010
  • 27. Live schema changes • Details: http://www.riptano.com/blog/live- schema-updates-cassandra-07 Tuesday, August 10, 2010
  • 28. Large rows • 0.6: smaller of {2GB, memory limit} • 0.7: in_memory_compaction_limit_in_mb Tuesday, August 10, 2010
  • 30. Streaming in 0.6 W A F (A-L] T L Tuesday, August 10, 2010
  • 31. W A F (A-F] (A-F] T (F-L] L Tuesday, August 10, 2010
  • 32. W A F Data T L Index Filter Tuesday, August 10, 2010
  • 33. Streaming in 0.7 W A F T L Index Filter Tuesday, August 10, 2010
  • 34. DatacenterStrategy • RackAwareStrategy is tuned for 3 replicas and 2 data centers • DS allows configuring replicas per data center, per Keyspace Tuesday, August 10, 2010
  • 35. Minor features in 0.7 • read_repair_chance • per-keyspace request scheduling • Hadoop OutputFormat • Per CF what used to be global (gc_grace_seconds, memtable thresholds) Tuesday, August 10, 2010
  • 36. 0.7 API changes • String keys become byte[] • Thrift keyspace argument moved to set_keyspace • i64 timestamp becomes Clock • SlicePredicate for _count methods Tuesday, August 10, 2010
  • 37. 0.7 performance • Reads roughly 100% faster, thanks largely to removing String creation • Row-cached reads up to 8x faster after optimizations by tjake and jbellis • Optimizations for reads of large rows • 0.7.1? ~20% improvement everywhere from Thrift optimizations Tuesday, August 10, 2010
  • 38. Thrift • OOMs on malformed packets • Python Unicode string issues • PHP support is buggy and maintainerless Tuesday, August 10, 2010
  • 39. After 0.7.0 • IndexOperator.GT • Triggers / plugins • Avro? • On-disk data format improvements (Compression, heirarchical data?) • Auth Tuesday, August 10, 2010