REAL WORLD CASSANDRA         AT        NASA       Christopher Keller      December 13th, 2012
THANKS!I failed to copy this to iCloud after the DC presentation
WHO AM I?•a CSC solutions architect working at the advanced supercomputing facility (NAS) at NASA Ames in silicon valley• ...
WHO I’M NOT•a   cassandra expert• someone   pushing a corporate agenda
ENVIRONMENT• unix   based enterprise (desktops, servers, supercomputers)• heavywrites around the clock from incoming data,...
THE PROBLEM• TL;DR- how to use all of our available data to make supercomputing more secure for our customers• replace   a...
WHY CASSANDRA• snapshotting   for backups was lightning fast• no   single point of failure• reads   are fast, writes are f...
WHY CASSANDRA• simple    clustering = win  • availability   + scalability + replication• built   in data expiration was ke...
IN THE BEGINNING...• set   up a virtualized three node cluster on a spare server• wrote    the cassandra equivalent of “he...
ARE YOU KIDDING ME?• selling   cassandra to management was easier than i thought• theNAS is very receptive to new technolo...
TAX DOLLARS AT WORK• bought   five servers for around 22.5k •3   of them for our production cluster •1   for our data parsi...
write operations             30000             23750Operations             17500             11250                  5000  ...
TAKEAWAY• bare   metal > virtualized w/ assigned disks > fully virtualized• match your hardware to your environment , expe...
CURRENT CLUSTER• gentoo    running xen 4.1.2 & apache cassandra 1.1.3• three   virtual nodes per physical server  •7   cpu...
ELAPSED TIME• emptyrack to benchmarks took about five days over the course of christmas/new years 2011• veryhelpful to unde...
HELPFUL TIPS• always      start with the questions you plan to ask the data• if   you know these your job just got exponen...
MAINTENANCE•i haven’t done serious sys-admin years...had to develop tools from scratch • cluster   start up and shutdown s...
TRIAL AND ERROR•a   lot of testing dealing how to organize the data  • secondary     indexes  • materialized   views• i’d ...
THIS WORKED...POORLYuid     name age gender            uid     job     hobby    1    chris   39    male         1    archi...
THIS WORKED WELL            1234       1235     {“age”:”39”,                                “name”:”chris”,”gender”:”male”...
WHY DID THAT WORK• we   only have to query a single table • aren’tyou glad you optimized the schemas for the questions   a...
LESSONS LEARNED• if   your data changes frequently, de-normalization is annoying,  but can be solved with discipline• give...
TECHNICAL TIPS• use ‘-pr’ to   repair each node at least every gc_grace_seconds• script   which staggers weekly repairs ac...
HOW I SPENT MY TIME• i’dspend a few hours writing code to load data into  cassandra, then another few hours writing code t...
ANALYTICS• all    server side analytics are developed in python using sub-  processes for parallel performance  • pycassa ...
SHOW STOPPERS• dealing        with an incredibly annoying JMX recurring crash but it doesn’t seem to affect cassandra stab...
RECENT SHOW STOPPERS• 1.1.3   accidentally removed the ability to drop column families  • pick   your poison - full disks ...
ROAD AHEADcqlmap/reducesolrops center
SHOUT OUT• the   folks at datastax have been very helpful  • Tyler   Hobbs (cassandra developer)  • Darren    Sack (accoun...
QUESTIONS?• cnkeller@gmail.com• @cnkeller• http://www.linkedin.com/in/christopherkeller
Cassandra Silicon Valley
Upcoming SlideShare
Loading in …5
×

Cassandra Silicon Valley

3,463 views

Published on

Presentation at the December meet up of the silicon valley cassandra users group. Summaries how the NASA supercomputer center at Ames is using currently using a cassandra cluster.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,463
On SlideShare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Cassandra Silicon Valley

  1. 1. REAL WORLD CASSANDRA AT NASA Christopher Keller December 13th, 2012
  2. 2. THANKS!I failed to copy this to iCloud after the DC presentation
  3. 3. WHO AM I?•a CSC solutions architect working at the advanced supercomputing facility (NAS) at NASA Ames in silicon valley• consulted at various federal agencies during the tech boom of the 90’s • classified and unclassified• http://about.me/christopherkeller
  4. 4. WHO I’M NOT•a cassandra expert• someone pushing a corporate agenda
  5. 5. ENVIRONMENT• unix based enterprise (desktops, servers, supercomputers)• heavywrites around the clock from incoming data, but far fewer analytical reads• we retain the data in a raw format, but it does not need to be in a database (however we can easily load old data)• weneed flexibility as technology and our requirements evolve over time
  6. 6. THE PROBLEM• TL;DR- how to use all of our available data to make supercomputing more secure for our customers• replace a COTS security event management system • poor query performance • difficult to extend and integrate with our custom software • pre-defined analytics were a big plus, but more overall minuses for our environment
  7. 7. WHY CASSANDRA• snapshotting for backups was lightning fast• no single point of failure• reads are fast, writes are faster•idid research other solutions (couchbase, hbase, mongo, riak, etc), but didn’t find anything compelling enough to trial
  8. 8. WHY CASSANDRA• simple clustering = win • availability + scalability + replication• built in data expiration was key• enabling technology that allowed us to ask new questions
  9. 9. IN THE BEGINNING...• set up a virtualized three node cluster on a spare server• wrote the cassandra equivalent of “hello world” to check • replication / availability • data expiration • rough performance estimates
  10. 10. ARE YOU KIDDING ME?• selling cassandra to management was easier than i thought• theNAS is very receptive to new technology even though we prefer to be system integrators rather than developers• my testing showed that cassandra works...shocking!!!• openssource resources are good, DataStax being able to provide support after i leave is better
  11. 11. TAX DOLLARS AT WORK• bought five servers for around 22.5k •3 of them for our production cluster •1 for our data parsing and loading •1 for our analytics• thosewere our only purchases, the rest has been primarily my labor hours
  12. 12. write operations 30000 23750Operations 17500 11250 5000 1 6 12 17 23 28 34 39 45 50 56 61 67 72 78 83 89 94 Elapsed Time 6 nodes 9 nodes (v) 9 nodes (p) latency .6 20 1.0 Milliseconds 10 0 1 6 12 17 23 28 34 39 45 50 56 61 67 72 78 83 89 94 Elapsed Time 6 nodes 9 nodes (v) 9 nodes (p) http://christophernkeller.tumblr.com/post/15242366864/cassandra-benchmarks
  13. 13. TAKEAWAY• bare metal > virtualized w/ assigned disks > fully virtualized• match your hardware to your environment , expertise, and requirements
  14. 14. CURRENT CLUSTER• gentoo running xen 4.1.2 & apache cassandra 1.1.3• three virtual nodes per physical server •7 cpu’s, 15gig RAM, 1.2 TB disk• eight disks per physical server •2 running the hypervisor + OS in a RAID 1 •2 disks per virtual machine in a RAID 0
  15. 15. ELAPSED TIME• emptyrack to benchmarks took about five days over the course of christmas/new years 2011• veryhelpful to understand our hardware limits and how cassandra scaled• understandinghow to model the data and effectively use cassandra took a lot longer...i’m still learning
  16. 16. HELPFUL TIPS• always start with the questions you plan to ask the data• if you know these your job just got exponentially easier• if you never deviate from this, you’re lucky • once you realize how powerful cassandra is, you’ll figure out new questions that may change things• don’t use supercolumns
  17. 17. MAINTENANCE•i haven’t done serious sys-admin years...had to develop tools from scratch • cluster start up and shutdown scripts • use good CM software (we use puppet) • OS, Cassandra & JVM upgrades • cassandra-env.sh & cassandra.yaml
  18. 18. TRIAL AND ERROR•a lot of testing dealing how to organize the data • secondary indexes • materialized views• i’d get failures and errors in cassandra that were solved by changing the schema to be more efficient (based on our questions)• try not to think relationally, it wasn’t helping me
  19. 19. THIS WORKED...POORLYuid name age gender uid job hobby 1 chris 39 male 1 architect jiu-jitsu 2 jaeden 2 male 2 toddler gaminguid employer phone address1 csc 5555555555 123 Main St2 mom 4444444444 123 Main St
  20. 20. THIS WORKED WELL 1234 1235 {“age”:”39”, “name”:”chris”,”gender”:”male”...}architect json blob {“age”:”2”,toddler json blob “name”:”jaeden”,”gender”:”male”...} 4567 7364 3453 4554 male json blob json blob chris json blob jaeden json blob
  21. 21. WHY DID THAT WORK• we only have to query a single table • aren’tyou glad you optimized the schemas for the questions ahead of time?• manualjoins by reading successive column families resulted in timeout errors even though the cluster was idle and everything was on the same switch segment
  22. 22. LESSONS LEARNED• if your data changes frequently, de-normalization is annoying, but can be solved with discipline• give yourself a lot of experimentation time if you’re new to cassandra • if you are hitting problems...likely you’re doing it wrong
  23. 23. TECHNICAL TIPS• use ‘-pr’ to repair each node at least every gc_grace_seconds• script which staggers weekly repairs across each node• onceyou assign a token ID, you can remove it from cassandra.yaml and keep the same file across nodes• you are free to use the Thrift bindings for the language of your choice, but save yourself time and use a high level client (eg Java, Python, Scala, PHP, Erlang, etc)
  24. 24. HOW I SPENT MY TIME• i’dspend a few hours writing code to load data into cassandra, then another few hours writing code to retrieve it • the data browsers aren’t great and unhelpful with blobs• theni’d profile the performance, tweak the code, tweak the schema, reload the data and repeat until i was happy
  25. 25. ANALYTICS• all server side analytics are developed in python using sub- processes for parallel performance • pycassa is our cassandra client library• our web layer is currently ruby on rails, but we might end up going with django to stay language consistent
  26. 26. SHOW STOPPERS• dealing with an incredibly annoying JMX recurring crash but it doesn’t seem to affect cassandra stability • other cassandra sites haven’t seen this, so it may just be a consequence of java6 on gentoo .1.3• commitlog_total_space_in_mb was being ignored in 1 ED FIX
  27. 27. RECENT SHOW STOPPERS• 1.1.3 accidentally removed the ability to drop column families • pick your poison - full disks or data that never goes away• recent v6 JVM patches required per-thread stack sizes to 180k • nodes were up individually, zero log errors, gossip is up, but the nodes weren’t talking collectively• cassandra solves a need, but bugs like this make my customers wary
  28. 28. ROAD AHEADcqlmap/reducesolrops center
  29. 29. SHOUT OUT• the folks at datastax have been very helpful • Tyler Hobbs (cassandra developer) • Darren Sack (accounts) • Michael Shaler (biz dev)• everyone in #cassandra on irc.freenode.org
  30. 30. QUESTIONS?• cnkeller@gmail.com• @cnkeller• http://www.linkedin.com/in/christopherkeller

×