C* Summit EU 2013: Hardware Agnostic: Cassandra on Raspberry Pi
Upcoming SlideShare
Loading in...5

C* Summit EU 2013: Hardware Agnostic: Cassandra on Raspberry Pi



Speaker: Andy Cobley, Lecturer at University of Dundee ...

Speaker: Andy Cobley, Lecturer at University of Dundee
Video: http://www.youtube.com/watch?v=0U4iOSMnRdk&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=1
Abstract: The raspberry Pi is a credit-card sized $25 ARM based linux box designed to teach children the basics of programming. The machine comes with a 700MHz ARM and 512Mb of memory and boots off a SD card, not much power for running the likes of a Cassandra cluster. This presentation will discuss the problems of getting Cassandra up and running on the Pi and will answer the all important question: Why on Earth would you want to do this!?



Total Views
Views on SlideShare
Embed Views



2 Embeds 3 2
http://localhost 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

C* Summit EU 2013: Hardware Agnostic: Cassandra on Raspberry Pi C* Summit EU 2013: Hardware Agnostic: Cassandra on Raspberry Pi Presentation Transcript

  • Hardware Agnostic: Cassandra on Raspberry Pi Andy Cobley | Lecturer, University of Dundee, Scotland #CASSANDRAEU CASSANDRASUMMITEU
  • What we will discuss today… *  Cassandra is hardware agnostic *  So why not run it on a Raspberry Pi ? *  How hard can it be ? *  What can we do with it once it works? #CASSANDRAEU CASSANDRASUMMITEU
  • Who Am I ? *  Andy Cobley *  Program Director, MSc in Data Science and Business Intelligence *  School of Computing *  University of Dundee *  Twitter: @andycobley #CASSANDRAEU CASSANDRASUMMITEU View slide
  • Whats a Raspberry Pi ? *  Single chip Linux computer *  500 Meg ram *  Boots off an SD card *  Ethernet port *  (graphics and all you need for a general purpose computer) #CASSANDRAEU CASSANDRASUMMITEU View slide
  • And, here’s one for real *  Also 4 node cluster. #CASSANDRAEU CASSANDRASUMMITEU
  • The Bad News *  Cassandra is designed to be fast, fast at writing, fast at reading. *  This laptop with one instance of Cassandra will do 12,000 write operations *  Raspberry Pi will do 200 ! #CASSANDRAEU CASSANDRASUMMITEU
  • More bad news ! *  Running a external USB drive is actually worse ! *  Probably be hardware feature #CASSANDRAEU CASSANDRASUMMITEU
  • And then there’s Java! *  Oracle Java vs OpenJDK #CASSANDRAEU CASSANDRASUMMITEU
  • And Raspbian *  Raspbian is Debian for the PI *  Uses the Hard floating point accelerator *  Much faster than Debian *  Current official Oracle JDK won’t run on it ! #CASSANDRAEU CASSANDRASUMMITEU
  • Oracle java *  http://www.oracle.com/technetwork/java/embedded/downloads/ javase/index.html *  Java SE Embedded version 6 *  Cassandra might prefer 6 (or 7 for Cassandra 2) *  But *  https://blogs.oracle.com/henrik/entry/oracle_releases_jdk_for_linux *  Preview at: *  https://jdk8.java.net/fxarmpreview/ #CASSANDRAEU CASSANDRASUMMITEU
  • Hard vs Soft Float *  And then it turns out: Actually  not  much  difference  in  performance   #CASSANDRAEU CASSANDRASUMMITEU
  • The Problem with compression *  Cassandra uses compression for performance *  Started in version 1.0 2x-­‐4x  reduc8on  in  data  size   25-­‐35%  performance  improvement  on  reads   5-­‐10%  performance  improvement  on  writes   #CASSANDRAEU CASSANDRASUMMITEU
  • Compression types *  Three types: Google  Snappy  Compressor  (Faster  read/writes)   DeflateCompressor  (Java  zip,  slower  ,  beOer   compression)   *  Snappy Compression not available on Pi (requires  na8ve  methods,  so  someone  might  get  it  to   work!)   #CASSANDRAEU CASSANDRASUMMITEU
  • Compression *  Cassandra 1.2 (and 2) also has lz4 compression *  Which is good news ! #CASSANDRAEU CASSANDRASUMMITEU
  • And the startup script *  Startup script allocates memory *  Calculates based on number of processors *  Pi reports Zero processors ! *  Boom ! *  Now fixed #CASSANDRAEU CASSANDRASUMMITEU
  • JMX Config *  In Cassandra-env.sh *  JVM_OPTS="$JVM_OPTS Djava.rmi.server.hostname=” *  Or else nodetool will not work between nodes #CASSANDRAEU CASSANDRASUMMITEU
  • JVM OPT UseCondCardMark *  C* 1.22. added UseCondCardMark as a JVM Opt *  "for better lock handling especially on hotspot with multicore processor” *  In cassandra-env.sh #if  [  "$JVM_VERSION"  >  "1.7"  ]  ;  then                                                           #        JVM_OPTS="$JVM_OPTS  -­‐XX: +UseCondCardMark"                                                                                                                       #fi     #CASSANDRAEU CASSANDRASUMMITEU
  • The Good News ! *  We’ve forgotten one thing *  The Pi cost £25 *  You can power 4 from USB hub (no need for a power supply on each one) *  So: #CASSANDRAEU CASSANDRASUMMITEU
  • So, have a 64 node computer for £2000 University  of  Southhampton   #CASSANDRAEU CASSANDRASUMMITEU
  • Or this *  32 node Beowolf cluster: *  Joshua Kiepert, Boise University #CASSANDRAEU CASSANDRASUMMITEU
  • Or this Hadoop Cluster from LinkedIn hOp://prac8calcloudcompu8ng.com/post/53996976003/hadoop-­‐running-­‐on-­‐a-­‐14-­‐chip-­‐ raspberry-­‐pi-­‐cluster   #CASSANDRAEU CASSANDRASUMMITEU
  • Adding nodes is good *  Adding nodes adds performance *  Adding nodes adds replicas of data *  BUT *  Make sure your ring is balanced, *  Pi’s don’t like to be unbalanced. #CASSANDRAEU CASSANDRASUMMITEU
  • Vnodes *  Vnodes (in 1.2) would be very nice *  However at this point I haven’t got 1.2 on Pi running on a cluster *  As for Cassandra 2, see later #CASSANDRAEU CASSANDRASUMMITEU
  • Performance with 3/4 nodes #CASSANDRAEU CASSANDRASUMMITEU
  • Performance with 5/6 nodes #CASSANDRAEU CASSANDRASUMMITEU
  • Stress test commands *  ./stress -d,, -o insert -I DeflateCompressor *  Note: nodes to use *  You will get different performance if you insert to less nodes than you have in your ring #CASSANDRAEU CASSANDRASUMMITEU
  • Getting more memory *  On Debian, you can free memory from the graphics chip Cd  /boot   sudo  cp  start.elf  start.elf.old   sudo  cp  arm224_start.elf  to  start.elf   reboot   #CASSANDRAEU CASSANDRASUMMITEU
  • Getting more Memory *  Under Rasbian *  Run with a monitor plugged for the first time *  Set options for screen memory *  Perhaps disable boot to GUI #CASSANDRAEU CASSANDRASUMMITEU
  • Network address *  I prefer static network addresses *  Edit /etc/network/interfaces iface  eth0  inet  sta8c                address                netmask                network                broadcast                gateway   #CASSANDRAEU CASSANDRASUMMITEU
  • Multiple nodes *  Make a master SD card *  Copy it ! *  Make sure the master version has no data on it. *  Consider ”Puppet” (though I don’t use it) #CASSANDRAEU CASSANDRASUMMITEU
  • Starting as a service *  See https://github.com/acobley/CassandraStartup *  Put the file in /etc/init.d *  update-rc.d cassandra defaults #CASSANDRAEU CASSANDRASUMMITEU
  • Pi is for teaching *  So for £200 we get an 8 node C* cluster *  It can be reconfigured, blown away, stress tested and generally abused *  We can simulate data racks, data centers and I hope even long network delays. *  Hopefully our students will use these clusters #CASSANDRAEU CASSANDRASUMMITEU
  • C* is network aware *  We know C* can be configured to be aware of: Network  racks   Data  Centers   *  We know we can have replicas are stored across these racks *  How can we play with this cheaply #CASSANDRAEU CASSANDRASUMMITEU
  • Proposed teaching tool Noise   injec8on   10mbs   Hubb   Switch   1   #CASSANDRAEU Switch   2   CASSANDRASUMMITEU
  • TC ? *  What about the Linux tc command *  Lets look again at the diagram #CASSANDRAEU CASSANDRASUMMITEU
  • Network *  What we can’t do *  Recommended bandwidth is 1000 Mbit/s (Gigabit) or greater. *  Bind the Thrift interface (listen_address) to a specific NIC (Network Interface Card). *  Bind the RPC server interface (rpc_address) to another NIC. hOp://www.datastax.com/docs/1.2/cluster_architecture/cluster_planning   #CASSANDRAEU CASSANDRASUMMITEU
  • What about Cassandra 2.0 *  Internode compression currently uses Snappy *  So turn it off in conf file: internode_compression:  none   #CASSANDRAEU CASSANDRASUMMITEU
  • How does C* 2 run on a PI *  Some bad news *  So need to tune it : *  See John Berryman’s blog: *  http://www.opensourceconnections.com/2013/08/31/building-theperfect-cassandra-test-environment/ #CASSANDRAEU CASSANDRASUMMITEU
  • Pi is discovery *  Cassandra wouldn’t run on a PI *  It does now. *  Running it on a Pi shook out some Cassandra bugs *  You can run it in a secure lab #CASSANDRAEU CASSANDRASUMMITEU
  • Pi is for fun *  Most important, this was pure Geeky Fun #CASSANDRAEU CASSANDRASUMMITEU
  • Obligatory Plug *  Data Science: *  http://www.computing.dundee.ac.uk/study/postgrad/ degreedetails.asp?17 #CASSANDRAEU CASSANDRASUMMITEU
  • What we discussed today… *  Raspberry Pi is cheap *  C* needs some work to run on it *  You can make clusters cheaply for experimentation *  It’s fun ! #CASSANDRAEU CASSANDRASUMMITEU #CASSANDRAEU CASSANDRASUMMITEU