• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1
 

Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1

on

  • 1,141 views

 

Statistics

Views

Total Views
1,141
Views on SlideShare
1,134
Embed Views
7

Actions

Likes
0
Downloads
7
Comments
0

1 Embed 7

https://twitter.com 7

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • (Picture Credits below: Glenn Harris 2012) http://www.southampton.ac.uk/~sjc/raspberrypi/pi_supercomputer_southampton.htm

Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1 Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1 Presentation Transcript

  • Hardware Agnostic: Cassandra on Raspberry PiAndy Cobley | Lecturer, University of Dundee, Scotland
  • *Cassandra is hardware agnostic*So why not run it on a Raspberry Pi ?*How hard can it be ?*What can we do with it once it works?Cassandra on Raspberry Pi
  • *Andy Cobley*School of Computing*University of Dundee*Twitter: @andycobleyWho Am I ?
  • *Single chip Linux computer*500 Meg ram*Boots off an SD card*Ethernet port*(graphics and all you need for a general purpose computer)Whats a Raspberry Pi ?
  • Pi with pound coin
  • *And here’s the Cassandra cluster *And, here’s one for real* Power Permitting !
  • *Cassandra is designed to be fast, fast at writing, fast at reading.*This laptop with one instance of Cassandra will do 12,000 writeoperations*Raspberry Pi will do 200 !The Bad News
  • *Running a external USB drive is actually worse !*Probably be hardware featureMore bad news !
  • Raspberry Pi Schematic
  • *Oracle Java vs OpenJDKAnd then there’s Java!
  • *Raspbian is Debian for the PI*Uses the Hard floating point accelerator*Much faster than Debian*Current Oracle JDK won’t run on it !And Raspbian
  • *http://www.oracle.com/technetwork/java/embedded/downloads/javase/index.html*Java SE Embedded version 6*Cassandra might prefer 6*But*https://blogs.oracle.com/henrik/entry/oracle_releases_jdk_for_linux*Preview at:*https://jdk8.java.net/fxarmpreview/Oracle java
  • *Actually not much difference in performanceHard vs Soft Float
  • *Cassandra uses compression for performance*Started in version 1.02x-4x reduction in data size25-35% performance improvement on reads5-10% performance improvement on writesThe Problem with compression
  • *Two types:Google Snappy Compressor (Faster read/writes)DeflateCompressor (Java zip, slower , bettercompression)*Snappy Compression not available on Pi(requires native methods, so someone might get it towork!)Compression types
  • *Startup script allocates memory*Calculates based on number of processors*Pi reports Zero processors !*Boom !*Now fixedAnd the startup script
  • *In Cassandra-env.sh*JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=192.168.1.15”*Or else nodetool will not work between nodesJMX Config
  • *C* 1.22. added UseCondCardMark as a JVM Opt*"for better lock handling especially on hotspot with multicoreprocessor”*In cassandra-env.sh#if [ "$JVM_VERSION" > "1.7" ] ; then# JVM_OPTS="$JVM_OPTS -XX:+UseCondCardMark"#fiJVM OPT UseCondCardMark
  • *We’ve forgotten one thing*The Pi cost £25*You can power 4 from USB hub (no need for a power supply oneach one)*So:The Good News !
  • So, have a 64 node computer for £2000University of Southhampton
  • *32 node Beowolf cluster:*Joshua Kiepert, Boise UniversityOr this
  • *Adding nodes adds performance*Adding nodes adds replicas of data*BUT*Make sure your ring is balanced,*Pi’s don’t like to be unbalanced.Adding nodes is good
  • *Vnodes (in 1.2) would be very nice*However at this point I haven’t got 1.2 on Pi running on a clusterVnodes
  • Performance with 3/4 nodes
  • Performance with 5/6 nodes
  • *./stress -d 192.168.1.10,192.168.1.11,192.168.1.12 -o insert -IDeflateCompressor*Note: nodes to use*You will get different performance if you insert to less nodes thanyou have in your ringStress test commands
  • *Adding a node (in the absence of Vnodes)Must seed form a known nodeUse a program to calculate new keysBring up new node with the correct key incassandra.yamlUse node tool to move other nodesAdding Nodes Procedure
  • *Python codeimport sysif (len(sys.argv) > 1):num = int(sys.argv[1])else:num = int(raw_input("How many nodes? :"))for i in range(0,num):print node %d: %d % (i, (i*(2**127)/num))Calculating keys
  • *Use nodetoolsudo ./nodetool -h 192.168.1.10 move42535295865117307932921825928971026432*And cleanup./nodetool -h 192.168.1.10 cleanupMoving existing nodes
  • *On Debian, you can free memory from the graphics chipCd /bootsudo cp start.elf start.elf.oldsudo cp arm224_start.elf to start.elfrebootGetting more memory
  • *Under Rasbian*Run with a monitor plugged for the first time*Set options for screen memory*Perhaps disable boot to GUIGetting more Memory
  • *I prefer static network addresses*Edit /etc/network/interfacesiface eth0 inet staticaddress 192.168.1.41netmask 255.255.255.0network 192.168.1.0broadcast 192.168.1.255gateway 192.168.1.254*Network address
  • *Make a master SD card*Copy it !*Make sure the master version has no data on it.*Consider ”Puppet” (though I don’t use it)Multiple nodes
  • *See https://github.com/acobley/CassandraStartup*Put the file in /etc/init.d*update-rc.d cassandra defaultsStarting as a service
  • *So for £200 we get an 8 node C* cluster*It can be reconfigured, blown away, stress tested and generallyabused*We can simulate data racks, data centers and I hope even longnetwork delays.*Hopefully our upcoming MSc in Data Science will use these clustersPi is for teaching
  • *We know C* can be configured to be aware of:Network racksData Centers*We know we can have replicas are stored across these racks*How can we play with this cheaply ?C* is network aware
  • Proposed teaching tool10mbsHubbNoiseinjectionSwitch2Switch1Pi 1Pi 2Pi 3Pi 1Pi 2Pi 3
  • *Cassandra wouldn’t run on a PI*It does now.*Running it on a Pi shook out some Cassandra bugs*You can run it in a secure labPi is discovery
  • *Most important, this was pure Geeky FunPi is for fun
  • *Data Science:*http://www.computing.dundee.ac.uk/study/postgrad/degreedetails.asp?17Obligatory Plug
  • *Raspberry Pi is cheap*C* needs some work to run on it*You can make clusters cheaply for experimentation*It’s fun !C* is Hardware Agnostic
  • THANK YOU