Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1Presentation Transcript
Hardware Agnostic: Cassandra on Raspberry PiAndy Cobley | Lecturer, University of Dundee, Scotland
*Cassandra is hardware agnostic*So why not run it on a Raspberry Pi ?*How hard can it be ?*What can we do with it once it works?Cassandra on Raspberry Pi
*Andy Cobley*School of Computing*University of Dundee*Twitter: @andycobleyWho Am I ?
*Single chip Linux computer*500 Meg ram*Boots off an SD card*Ethernet port*(graphics and all you need for a general purpose computer)Whats a Raspberry Pi ?
Pi with pound coin
*And here’s the Cassandra cluster *And, here’s one for real* Power Permitting !
*Cassandra is designed to be fast, fast at writing, fast at reading.*This laptop with one instance of Cassandra will do 12,000 writeoperations*Raspberry Pi will do 200 !The Bad News
*Running a external USB drive is actually worse !*Probably be hardware featureMore bad news !
Raspberry Pi Schematic
*Oracle Java vs OpenJDKAnd then there’s Java!
*Raspbian is Debian for the PI*Uses the Hard floating point accelerator*Much faster than Debian*Current Oracle JDK won’t run on it !And Raspbian
*http://www.oracle.com/technetwork/java/embedded/downloads/javase/index.html*Java SE Embedded version 6*Cassandra might prefer 6*But*https://blogs.oracle.com/henrik/entry/oracle_releases_jdk_for_linux*Preview at:*https://jdk8.java.net/fxarmpreview/Oracle java
*Actually not much difference in performanceHard vs Soft Float
*Cassandra uses compression for performance*Started in version 1.02x-4x reduction in data size25-35% performance improvement on reads5-10% performance improvement on writesThe Problem with compression
*Two types:Google Snappy Compressor (Faster read/writes)DeflateCompressor (Java zip, slower , bettercompression)*Snappy Compression not available on Pi(requires native methods, so someone might get it towork!)Compression types
*Startup script allocates memory*Calculates based on number of processors*Pi reports Zero processors !*Boom !*Now fixedAnd the startup script
*In Cassandra-env.sh*JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=192.168.1.15”*Or else nodetool will not work between nodesJMX Config
*C* 1.22. added UseCondCardMark as a JVM Opt*"for better lock handling especially on hotspot with multicoreprocessor”*In cassandra-env.sh#if [ "$JVM_VERSION" > "1.7" ] ; then# JVM_OPTS="$JVM_OPTS -XX:+UseCondCardMark"#fiJVM OPT UseCondCardMark
*We’ve forgotten one thing*The Pi cost £25*You can power 4 from USB hub (no need for a power supply oneach one)*So:The Good News !
So, have a 64 node computer for £2000University of Southhampton
*32 node Beowolf cluster:*Joshua Kiepert, Boise UniversityOr this
*Adding nodes adds performance*Adding nodes adds replicas of data*BUT*Make sure your ring is balanced,*Pi’s don’t like to be unbalanced.Adding nodes is good
*Vnodes (in 1.2) would be very nice*However at this point I haven’t got 1.2 on Pi running on a clusterVnodes
Performance with 3/4 nodes
Performance with 5/6 nodes
*./stress -d 192.168.1.10,192.168.1.11,192.168.1.12 -o insert -IDeflateCompressor*Note: nodes to use*You will get different performance if you insert to less nodes thanyou have in your ringStress test commands
*Adding a node (in the absence of Vnodes)Must seed form a known nodeUse a program to calculate new keysBring up new node with the correct key incassandra.yamlUse node tool to move other nodesAdding Nodes Procedure
*Python codeimport sysif (len(sys.argv) > 1):num = int(sys.argv)else:num = int(raw_input("How many nodes? :"))for i in range(0,num):print node %d: %d % (i, (i*(2**127)/num))Calculating keys
*Make a master SD card*Copy it !*Make sure the master version has no data on it.*Consider ”Puppet” (though I don’t use it)Multiple nodes
*See https://github.com/acobley/CassandraStartup*Put the file in /etc/init.d*update-rc.d cassandra defaultsStarting as a service
*So for £200 we get an 8 node C* cluster*It can be reconfigured, blown away, stress tested and generallyabused*We can simulate data racks, data centers and I hope even longnetwork delays.*Hopefully our upcoming MSc in Data Science will use these clustersPi is for teaching
*We know C* can be configured to be aware of:Network racksData Centers*We know we can have replicas are stored across these racks*How can we play with this cheaply ?C* is network aware