C* Summit 2013: Hardware Agnostic - Cassandra on Raspberry Pi by Andy Cobley


Published on

The raspberry Pi is a credit-card sized $25 ARM based linux box designed to teach children the basics of programming. The machine comes with a 700MHz ARM and 512Mb of memory and boots off a SD card, not much power for running the likes of a Cassandra cluster. This presentation will discuss the problems of getting Cassandra up and running on the Pi and will answer the all important question: Why on Earth would you want to do this!?

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

C* Summit 2013: Hardware Agnostic - Cassandra on Raspberry Pi by Andy Cobley

  1. 1. Hardware Agnostic: Cassandra on Raspberry PiAndy Cobley | Lecturer, University of Dundee, Scotland
  2. 2. *  Cassandra is hardware agnostic*  So why not run it on a Raspberry Pi ?*  How hard can it be ?*  What can we do with it once it works?Cassandra on Raspberry Pi
  3. 3. *  Andy Cobley*  School of Computing*  University of Dundee*  Twitter: @andycobleyWho Am I ?
  4. 4. *  Single chip Linux computer*  500 Meg ram*  Boots off an SD card*  Ethernet port*  (graphics and all you need for a general purpose computer)Whats a Raspberry Pi ?
  5. 5. Pi with pound coin
  6. 6. *  And here’s the Cassandra cluster *And, here’s one for real* Power Permitting !
  7. 7. *  Cassandra is designed to be fast, fast at writing, fast at reading.*  This laptop with one instance of Cassandra will do 12,000 writeoperations*  Raspberry Pi will do 200 !The Bad News
  8. 8. *  Running a external USB drive is actually worse !*  Probably be hardware featureMore bad news !
  9. 9. Raspberry Pi Schematic
  10. 10. *  Oracle Java vs OpenJDKAnd then there’s Java!
  11. 11. *  Raspbian is Debian for the PI*  Uses the Hard floating point accelerator*  Much faster than Debian*  Current Oracle JDK won’t run on it !And Raspbian
  12. 12. *  http://www.oracle.com/technetwork/java/embedded/downloads/javase/index.html*  Java SE Embedded version 6*  Cassandra might prefer 6*  But*  https://blogs.oracle.com/henrik/entry/oracle_releases_jdk_for_linux*  Preview at:*  https://jdk8.java.net/fxarmpreview/Oracle java
  13. 13. *  Actually not much difference in performanceHard vs Soft Float
  14. 14. *  Cassandra uses compression for performance*  Started in version 1.02x-­‐4x  reduc+on  in  data  size  25-­‐35%  performance  improvement  on  reads  5-­‐10%  performance  improvement  on  writes  The Problem with compression
  15. 15. *  Two types:Google  Snappy  Compressor  (Faster  read/writes)  DeflateCompressor  (Java  zip,  slower  ,  beLer  compression)  *  Snappy Compression not available on Pi(requires  na+ve  methods,  so  someone  might  get  it  to  work!)  Compression types
  16. 16. *  Startup script allocates memory*  Calculates based on number of processors*  Pi reports Zero processors !*  Boom !*  Now fixedAnd the startup script
  17. 17. *  In Cassandra-env.sh*  JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=”*  Or else nodetool will not work between nodesJMX Config
  18. 18. *  C* 1.22. added UseCondCardMark as a JVM Opt*  "for better lock handling especially on hotspot with multicoreprocessor”*  In cassandra-env.sh#if  [  "$JVM_VERSION"  >  "1.7"  ]  ;  then                                                                                                                                                #        JVM_OPTS="$JVM_OPTS  -­‐XX:+UseCondCardMark"                                                                                                                                #fi    JVM OPT UseCondCardMark
  19. 19. *  We’ve forgotten one thing*  The Pi cost £25*  You can power 4 from USB hub (no need for a power supply oneach one)*  So:The Good News !
  20. 20. So, have a 64 node computer for £2000University  of  Southhampton  
  21. 21. *  32 node Beowolf cluster:*  Joshua Kiepert, Boise UniversityOr this
  22. 22. *  Adding nodes adds performance*  Adding nodes adds replicas of data*  BUT*  Make sure your ring is balanced,*  Pi’s don’t like to be unbalanced.Adding nodes is good
  23. 23. *  Vnodes (in 1.2) would be very nice*  However at this point I haven’t got 1.2 on Pi running on a clusterVnodes
  24. 24. Performance with 3/4 nodes
  25. 25. Performance with 5/6 nodes
  26. 26. *  ./stress -d,, -o insert -IDeflateCompressor*  Note: nodes to use*  You will get different performance if you insert to less nodes thanyou have in your ringStress test commands
  27. 27. *  Adding a node (in the absence of Vnodes)Must  seed  form  a  known  node  Use  a  program  to  calculate  new  keys    Bring  up  new  node  with  the  correct  key  in  cassandra.yaml  Use  node  tool  to  move  other  nodes  Adding Nodes Procedure
  28. 28. *  Python codeimport  sys  if  (len(sys.argv)  >  1):        num  =  int(sys.argv[1])  else:        num  =  int(raw_input("How  many  nodes?  :"))  for  i  in  range(0,num):        print  node  %d:  %d  %  (i,  (i*(2**127)/num))  Calculating keys
  29. 29. *  Use nodetoolsudo  ./nodetool  -­‐h  move  42535295865117307932921825928971026432  *  And cleanup./nodetool  -­‐h  cleanup  Moving existing nodes
  30. 30. *  On Debian, you can free memory from the graphics chipCd  /boot  sudo  cp  start.elf  start.elf.old  sudo  cp  arm224_start.elf  to  start.elf  reboot  Getting more memory
  31. 31. *  Under Rasbian*  Run with a monitor plugged for the first time*  Set options for screen memory*  Perhaps disable boot to GUIGetting more Memory
  32. 32. *  I prefer static network addresses*  Edit /etc/network/interfacesiface  eth0  inet  sta+c                address                netmask                network                broadcast                gateway  * Network address
  33. 33. *  Make a master SD card*  Copy it !*  Make sure the master version has no data on it.*  Consider ”Puppet” (though I don’t use it)Multiple nodes
  34. 34. *  See https://github.com/acobley/CassandraStartup*  Put the file in /etc/init.d*  update-rc.d cassandra defaultsStarting as a service
  35. 35. *  So for £200 we get an 8 node C* cluster*  It can be reconfigured, blown away, stress tested and generallyabused*  We can simulate data racks, data centers and I hope even longnetwork delays.*  Hopefully our upcoming MSc in Data Science will use these clustersPi is for teaching
  36. 36. *  We know C* can be configured to be aware of:Network  racks  Data  Centers  *  We know we can have replicas are stored across these racks*  How can we play with this cheaply ?C* is network aware
  37. 37. Proposed teaching tool10mbs  Hubb  Noise  injec+on  Switch  2  Switch  1  Pi  1  Pi  2  Pi  3  Pi  1  Pi  2  Pi  3  
  38. 38. *  Cassandra wouldn’t run on a PI*  It does now.*  Running it on a Pi shook out some Cassandra bugs*  You can run it in a secure labPi is discovery
  39. 39. *  Most important, this was pure Geeky FunPi is for fun
  40. 40. *  Data Science:*  http://www.computing.dundee.ac.uk/study/postgrad/degreedetails.asp?17Obligatory Plug
  41. 41. *  Raspberry Pi is cheap*  C* needs some work to run on it*  You can make clusters cheaply for experimentation*  It’s fun !C* is Hardware Agnostic
  42. 42. THANK YOU