Hardware Agnostic: Cassandra on Raspberry Pi

Andy Cobley | Lecturer, University of Dundee, Scotland

#CASSANDRAEU

CASSAN...
What we will discuss today…
*  Cassandra is hardware agnostic
*  So why not run it on a Raspberry Pi ?
*  How hard can it ...
Who Am I ?
*  Andy Cobley
*  Program Director, MSc in Data Science and Business Intelligence
*  School of Computing
*  Uni...
Whats a Raspberry Pi ?
*  Single chip Linux computer
*  500 Meg ram
*  Boots off an SD card
*  Ethernet port 
*  (graphics...
Pi with pound coin

#CASSANDRAEU

CASSANDRASUMMITEU
And, here’s one for real
*  Also 4 node cluster. 

#CASSANDRAEU

CASSANDRASUMMITEU
The Bad News
*  Cassandra is designed to be fast, fast at writing, fast at reading.
*  This laptop with one instance of Ca...
More bad news !
*  Running a external USB drive is actually worse !
*  Probably be hardware feature

#CASSANDRAEU

CASSAND...
Raspberry Pi Schematic

#CASSANDRAEU

CASSANDRASUMMITEU
And then there’s Java!
*  Oracle Java vs OpenJDK

#CASSANDRAEU

CASSANDRASUMMITEU
And Raspbian
*  Raspbian is Debian for the PI
*  Uses the Hard floating point accelerator
*  Much faster than Debian
*  Cu...
Oracle java
*  http://www.oracle.com/technetwork/java/embedded/downloads/
javase/index.html
*  Java SE Embedded version 6
...
Hard vs Soft Float
*  And then it turns out:

Actually	
  not	
  much	
  difference	
  in	
  performance
	
  

#CASSANDRAEU...
The Problem with compression
*  Cassandra uses compression for performance
*  Started in version 1.0

2x-­‐4x	
  reduc8on	...
Compression types
*  Three types:

Google	
  Snappy	
  Compressor	
  (Faster	
  read/writes)	
  
DeflateCompressor	
  (Java...
Compression
*  Cassandra 1.2 (and 2) also has lz4 compression
*  Which is good news !

#CASSANDRAEU

CASSANDRASUMMITEU
And the startup script
*  Startup script allocates memory
*  Calculates based on number of processors
*  Pi reports Zero p...
JMX Config
*  In Cassandra-env.sh
*  JVM_OPTS="$JVM_OPTS Djava.rmi.server.hostname=192.168.1.15”
*  Or else nodetool will n...
JVM OPT UseCondCardMark
*  C* 1.22. added UseCondCardMark as a JVM Opt
*  "for better lock handling especially on hotspot ...
#CASSANDRAEU

CASSANDRASUMMITEU
The Good News !
*  We’ve forgotten one thing
*  The Pi cost £25
*  You can power 4 from USB hub (no need for a power suppl...
So, have a 64 node computer for £2000

University	
  of	
  Southhampton	
  
#CASSANDRAEU

CASSANDRASUMMITEU
Or this
*  32 node Beowolf cluster:
*  Joshua Kiepert, Boise University

#CASSANDRAEU

CASSANDRASUMMITEU
Or this Hadoop Cluster from LinkedIn

hOp://prac8calcloudcompu8ng.com/post/53996976003/hadoop-­‐running-­‐on-­‐a-­‐14-­‐ch...
Adding nodes is good
*  Adding nodes adds performance
*  Adding nodes adds replicas of data 
*  BUT
*  Make sure your ring...
Vnodes
*  Vnodes (in 1.2) would be very nice
*  However at this point I haven’t got 1.2 on Pi running on a cluster
*  As f...
Performance with 3/4 nodes

#CASSANDRAEU

CASSANDRASUMMITEU
Performance with 5/6 nodes

#CASSANDRAEU

CASSANDRASUMMITEU
Stress test commands
*  ./stress -d 192.168.1.10,192.168.1.11,192.168.1.12 -o insert -I
DeflateCompressor
*  Note: nodes t...
Getting more memory
*  On Debian, you can free memory from the graphics chip

Cd	
  /boot	
  
sudo	
  cp	
  start.elf	
  s...
Getting more Memory
*  Under Rasbian
*  Run with a monitor plugged for the first time
*  Set options for screen memory
*  P...
Network address
*  I prefer static network addresses
*  Edit /etc/network/interfaces

iface	
  eth0	
  inet	
  sta8c	
  
	...
Multiple nodes
*  Make a master SD card
*  Copy it !
*  Make sure the master version has no data on it.
*  Consider ”Puppe...
Starting as a service
*  See https://github.com/acobley/CassandraStartup 
*  Put the file in /etc/init.d
*  update-rc.d cas...
Pi is for teaching
*  So for £200 we get an 8 node C* cluster
*  It can be reconfigured, blown away, stress tested and gene...
C* is network aware
*  We know C* can be configured to be aware of:

Network	
  racks	
  
Data	
  Centers	
  
*  We know we...
Proposed teaching tool
Noise	
  
injec8on	
  

10mbs	
  
Hubb	
  
Switch	
  
1	
  

#CASSANDRAEU

Switch	
  
2	
  

CASSAN...
TC ?
*  What about the Linux tc command
*  Lets look again at the diagram

#CASSANDRAEU

CASSANDRASUMMITEU
Network
*  What we can’t do
*  Recommended bandwidth is 1000 Mbit/s (Gigabit) or greater.
*  Bind the Thrift interface (li...
What about Cassandra 2.0
*  Internode compression currently uses Snappy
*  So turn it off in conf file:

internode_compress...
How does C* 2 run on a PI
*  Some bad news
*  So need to tune it :
*  See John Berryman’s blog:
*  http://www.opensourceco...
Pi is discovery
*  Cassandra wouldn’t run on a PI
*  It does now.
*  Running it on a Pi shook out some Cassandra bugs
*  Y...
Pi is for fun
*  Most important, this was pure Geeky Fun

#CASSANDRAEU

CASSANDRASUMMITEU
Obligatory Plug
*  Data Science:
*  http://www.computing.dundee.ac.uk/study/postgrad/
degreedetails.asp?17

#CASSANDRAEU

...
What we discussed today…
*  Raspberry Pi is cheap
*  C* needs some work to run on it
*  You can make clusters cheaply for ...
THANK YOU

#CASSANDRAEU

CASSANDRASUMMITEU
Upcoming SlideShare
Loading in …5
×

C* Summit EU 2013: Hardware Agnostic: Cassandra on Raspberry Pi

1,364 views

Published on

Speaker: Andy Cobley, Lecturer at University of Dundee
Video: http://www.youtube.com/watch?v=0U4iOSMnRdk&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=1
Abstract: The raspberry Pi is a credit-card sized $25 ARM based linux box designed to teach children the basics of programming. The machine comes with a 700MHz ARM and 512Mb of memory and boots off a SD card, not much power for running the likes of a Cassandra cluster. This presentation will discuss the problems of getting Cassandra up and running on the Pi and will answer the all important question: Why on Earth would you want to do this!?

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,364
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

C* Summit EU 2013: Hardware Agnostic: Cassandra on Raspberry Pi

  1. 1. Hardware Agnostic: Cassandra on Raspberry Pi Andy Cobley | Lecturer, University of Dundee, Scotland #CASSANDRAEU CASSANDRASUMMITEU
  2. 2. What we will discuss today… *  Cassandra is hardware agnostic *  So why not run it on a Raspberry Pi ? *  How hard can it be ? *  What can we do with it once it works? #CASSANDRAEU CASSANDRASUMMITEU
  3. 3. Who Am I ? *  Andy Cobley *  Program Director, MSc in Data Science and Business Intelligence *  School of Computing *  University of Dundee *  Twitter: @andycobley #CASSANDRAEU CASSANDRASUMMITEU
  4. 4. Whats a Raspberry Pi ? *  Single chip Linux computer *  500 Meg ram *  Boots off an SD card *  Ethernet port *  (graphics and all you need for a general purpose computer) #CASSANDRAEU CASSANDRASUMMITEU
  5. 5. Pi with pound coin #CASSANDRAEU CASSANDRASUMMITEU
  6. 6. And, here’s one for real *  Also 4 node cluster. #CASSANDRAEU CASSANDRASUMMITEU
  7. 7. The Bad News *  Cassandra is designed to be fast, fast at writing, fast at reading. *  This laptop with one instance of Cassandra will do 12,000 write operations *  Raspberry Pi will do 200 ! #CASSANDRAEU CASSANDRASUMMITEU
  8. 8. More bad news ! *  Running a external USB drive is actually worse ! *  Probably be hardware feature #CASSANDRAEU CASSANDRASUMMITEU
  9. 9. Raspberry Pi Schematic #CASSANDRAEU CASSANDRASUMMITEU
  10. 10. And then there’s Java! *  Oracle Java vs OpenJDK #CASSANDRAEU CASSANDRASUMMITEU
  11. 11. And Raspbian *  Raspbian is Debian for the PI *  Uses the Hard floating point accelerator *  Much faster than Debian *  Current official Oracle JDK won’t run on it ! #CASSANDRAEU CASSANDRASUMMITEU
  12. 12. Oracle java *  http://www.oracle.com/technetwork/java/embedded/downloads/ javase/index.html *  Java SE Embedded version 6 *  Cassandra might prefer 6 (or 7 for Cassandra 2) *  But *  https://blogs.oracle.com/henrik/entry/oracle_releases_jdk_for_linux *  Preview at: *  https://jdk8.java.net/fxarmpreview/ #CASSANDRAEU CASSANDRASUMMITEU
  13. 13. Hard vs Soft Float *  And then it turns out: Actually  not  much  difference  in  performance   #CASSANDRAEU CASSANDRASUMMITEU
  14. 14. The Problem with compression *  Cassandra uses compression for performance *  Started in version 1.0 2x-­‐4x  reduc8on  in  data  size   25-­‐35%  performance  improvement  on  reads   5-­‐10%  performance  improvement  on  writes   #CASSANDRAEU CASSANDRASUMMITEU
  15. 15. Compression types *  Three types: Google  Snappy  Compressor  (Faster  read/writes)   DeflateCompressor  (Java  zip,  slower  ,  beOer   compression)   *  Snappy Compression not available on Pi (requires  na8ve  methods,  so  someone  might  get  it  to   work!)   #CASSANDRAEU CASSANDRASUMMITEU
  16. 16. Compression *  Cassandra 1.2 (and 2) also has lz4 compression *  Which is good news ! #CASSANDRAEU CASSANDRASUMMITEU
  17. 17. And the startup script *  Startup script allocates memory *  Calculates based on number of processors *  Pi reports Zero processors ! *  Boom ! *  Now fixed #CASSANDRAEU CASSANDRASUMMITEU
  18. 18. JMX Config *  In Cassandra-env.sh *  JVM_OPTS="$JVM_OPTS Djava.rmi.server.hostname=192.168.1.15” *  Or else nodetool will not work between nodes #CASSANDRAEU CASSANDRASUMMITEU
  19. 19. JVM OPT UseCondCardMark *  C* 1.22. added UseCondCardMark as a JVM Opt *  "for better lock handling especially on hotspot with multicore processor” *  In cassandra-env.sh #if  [  "$JVM_VERSION"  >  "1.7"  ]  ;  then                                                           #        JVM_OPTS="$JVM_OPTS  -­‐XX: +UseCondCardMark"                                                                                                                       #fi     #CASSANDRAEU CASSANDRASUMMITEU
  20. 20. #CASSANDRAEU CASSANDRASUMMITEU
  21. 21. The Good News ! *  We’ve forgotten one thing *  The Pi cost £25 *  You can power 4 from USB hub (no need for a power supply on each one) *  So: #CASSANDRAEU CASSANDRASUMMITEU
  22. 22. So, have a 64 node computer for £2000 University  of  Southhampton   #CASSANDRAEU CASSANDRASUMMITEU
  23. 23. Or this *  32 node Beowolf cluster: *  Joshua Kiepert, Boise University #CASSANDRAEU CASSANDRASUMMITEU
  24. 24. Or this Hadoop Cluster from LinkedIn hOp://prac8calcloudcompu8ng.com/post/53996976003/hadoop-­‐running-­‐on-­‐a-­‐14-­‐chip-­‐ raspberry-­‐pi-­‐cluster   #CASSANDRAEU CASSANDRASUMMITEU
  25. 25. Adding nodes is good *  Adding nodes adds performance *  Adding nodes adds replicas of data *  BUT *  Make sure your ring is balanced, *  Pi’s don’t like to be unbalanced. #CASSANDRAEU CASSANDRASUMMITEU
  26. 26. Vnodes *  Vnodes (in 1.2) would be very nice *  However at this point I haven’t got 1.2 on Pi running on a cluster *  As for Cassandra 2, see later #CASSANDRAEU CASSANDRASUMMITEU
  27. 27. Performance with 3/4 nodes #CASSANDRAEU CASSANDRASUMMITEU
  28. 28. Performance with 5/6 nodes #CASSANDRAEU CASSANDRASUMMITEU
  29. 29. Stress test commands *  ./stress -d 192.168.1.10,192.168.1.11,192.168.1.12 -o insert -I DeflateCompressor *  Note: nodes to use *  You will get different performance if you insert to less nodes than you have in your ring #CASSANDRAEU CASSANDRASUMMITEU
  30. 30. Getting more memory *  On Debian, you can free memory from the graphics chip Cd  /boot   sudo  cp  start.elf  start.elf.old   sudo  cp  arm224_start.elf  to  start.elf   reboot   #CASSANDRAEU CASSANDRASUMMITEU
  31. 31. Getting more Memory *  Under Rasbian *  Run with a monitor plugged for the first time *  Set options for screen memory *  Perhaps disable boot to GUI #CASSANDRAEU CASSANDRASUMMITEU
  32. 32. Network address *  I prefer static network addresses *  Edit /etc/network/interfaces iface  eth0  inet  sta8c                address  192.168.1.41                netmask  255.255.255.0                network  192.168.1.0                broadcast  192.168.1.255                gateway  192.168.1.254   #CASSANDRAEU CASSANDRASUMMITEU
  33. 33. Multiple nodes *  Make a master SD card *  Copy it ! *  Make sure the master version has no data on it. *  Consider ”Puppet” (though I don’t use it) #CASSANDRAEU CASSANDRASUMMITEU
  34. 34. Starting as a service *  See https://github.com/acobley/CassandraStartup *  Put the file in /etc/init.d *  update-rc.d cassandra defaults #CASSANDRAEU CASSANDRASUMMITEU
  35. 35. Pi is for teaching *  So for £200 we get an 8 node C* cluster *  It can be reconfigured, blown away, stress tested and generally abused *  We can simulate data racks, data centers and I hope even long network delays. *  Hopefully our students will use these clusters #CASSANDRAEU CASSANDRASUMMITEU
  36. 36. C* is network aware *  We know C* can be configured to be aware of: Network  racks   Data  Centers   *  We know we can have replicas are stored across these racks *  How can we play with this cheaply #CASSANDRAEU CASSANDRASUMMITEU
  37. 37. Proposed teaching tool Noise   injec8on   10mbs   Hubb   Switch   1   #CASSANDRAEU Switch   2   CASSANDRASUMMITEU
  38. 38. TC ? *  What about the Linux tc command *  Lets look again at the diagram #CASSANDRAEU CASSANDRASUMMITEU
  39. 39. Network *  What we can’t do *  Recommended bandwidth is 1000 Mbit/s (Gigabit) or greater. *  Bind the Thrift interface (listen_address) to a specific NIC (Network Interface Card). *  Bind the RPC server interface (rpc_address) to another NIC. hOp://www.datastax.com/docs/1.2/cluster_architecture/cluster_planning   #CASSANDRAEU CASSANDRASUMMITEU
  40. 40. What about Cassandra 2.0 *  Internode compression currently uses Snappy *  So turn it off in conf file: internode_compression:  none   #CASSANDRAEU CASSANDRASUMMITEU
  41. 41. How does C* 2 run on a PI *  Some bad news *  So need to tune it : *  See John Berryman’s blog: *  http://www.opensourceconnections.com/2013/08/31/building-theperfect-cassandra-test-environment/ #CASSANDRAEU CASSANDRASUMMITEU
  42. 42. Pi is discovery *  Cassandra wouldn’t run on a PI *  It does now. *  Running it on a Pi shook out some Cassandra bugs *  You can run it in a secure lab #CASSANDRAEU CASSANDRASUMMITEU
  43. 43. Pi is for fun *  Most important, this was pure Geeky Fun #CASSANDRAEU CASSANDRASUMMITEU
  44. 44. Obligatory Plug *  Data Science: *  http://www.computing.dundee.ac.uk/study/postgrad/ degreedetails.asp?17 #CASSANDRAEU CASSANDRASUMMITEU
  45. 45. What we discussed today… *  Raspberry Pi is cheap *  C* needs some work to run on it *  You can make clusters cheaply for experimentation *  It’s fun ! #CASSANDRAEU CASSANDRASUMMITEU #CASSANDRAEU CASSANDRASUMMITEU
  46. 46. THANK YOU #CASSANDRAEU CASSANDRASUMMITEU

×