“Eventually Consistent” means all updates across nodes of cassandra will eventually propagate and they will be consistent.
Chiton is aPyGTK app and I’ve only managed to run it in an Arch install, so its pretty useless if you work in Mac OS X or Windows. I remember another app being discussed in the #cassandra IRC channel, but I can’t recall its name or if its web-based.
Reddit has a pretty interesting white paper on their usage of Cassandra in place of memcacheDB. I’ll post a link on my twitter account, @thrillgore later.
The timestamp is used by Cassandra to reconcile differences across nodes, not to store revisions. The timestamp itself is autogenerated from Unix time(); Due to Cassandra’s Java heritage, the name and values are stored as byte arrays.
Essentially, you enable the module and then you create conf files for each user you want accessing the database. By default it uses AllowAllAuthentication, which means there isn’t any authentication any anyone can connect to it.
Just a quick description of what’s going on here – we’re storing a kind of a HVAC component, in this case a Louver, in a keyspace specifically for Louvers. Inside, for each individual Louver kind, we have a supercolumn that stores columns that would represent attributes of the individual louver. For each louver, another supercolumn could be created. The method behind using this gives us more control over the (dare I say it) relation of data in a spatial methodology over a tabular methodology.
The myth of Cassandra
The Myth of Cassandra<br />I’ve had it with these crazed oracles<br />NoSQL Series<br />Cameron Kilgore | @thrillgore<br />
Cas·san·dra[kəˈsændrə], noun<br />[Classical Greek Mythology.] A daughter of Priam and Hecuba, a prophet cursed by Apollo so that her prophecies, though true, were fated never to be believed.<br />[fml. “Apache Cassandra”] An open-source distributed, non-relational (NoSQL) database developed at Facebook, written in Java, and maintained as an Apache Software Foundation product<br />
What Cassandra does<br />Nonrelational associative array (key-value) data storage<br />Distributed<br />One-hop DHT (akin to Amazon Dynamo)<br />Eventually Consistent<br />Column-based storage<br />Queries faster than MySQL<br />Based on white papers and real-world use cases<br />Fault tolerant<br />Provides no single point of failure<br />Load balancing<br />
What Cassandra Does Not <br />Revision History<br />Relational Data<br />There’s this thing called “MySQL” that might be just up your alley<br />Provide an admin app<br />Chiton is an in-development desktop app<br />http://github.com/driftx/chiton<br />Store individual data fields greater than 231-1 (2,147,483,647) bytes<br />Provide any interfaces outside of Thrift or high-level interfaces<br />
She who entangles companies<br />Already at use at Facebook<br />Also being used at:<br />Digg<br />Reddit<br />Twitter<br />Rackspace<br />Cisco<br />IBM<br />Cloudkick<br />OpenX<br />And more…<br />
Introducing Cassandra<br />Understanding the concepts of data in Cassandra, scalability<br />
Columns and Data<br />Data is stored in columns, each organized by keyspaces<br />Each column stores data and can be culled based on its name value, akin to an associative array<br />+name: byte<br />+value: byte<br />+timestamp: long<br />
Supercolumns<br />What happens when Xzibit uses Cassandra<br />Supercolumns allow you to nest n number of columns in another column<br />And in return in a key you can nest n number of supercolumns. (not shown here due to Office fail)<br />
Anatomy of a Column<br />Cassandra is written in Java, so we abide by the rules of its variables<br />Most of them will be bytestrings (byte), set in Unicode<br />+time being the only value not stored as a bytestring, instead as a long<br />Java compares the +time across other Cassandra nodes to reconcile data across nodes<br />Is NOT used for revision history<br />Each column represented by an unseen UUID<br />
Anatomy of a Column (cont.)<br />Columns are found by their +name value, not their UUID<br />You cannot have multiple columns of the same name (assigning one with the same name rewrites an existing one in that given keyspace)<br />
Accessing the Data<br />Data accessed through the Apache Incubator™ Thrift API<br />Thrift can be accessed with any programming language or application<br />High-level implementations for languages exist<br />For our demos we’re going to use the cassandra-cli client, which gives us the ability to insert/remove/edit<br />
<INSERT CALL TO DEMO HERE><br />OH GOD HOW DID I GET HERE I AM NOT GOOD WITH COMPUTER<br />
Security in Cassandra<br />Cassandra does have user authentication through a SimpleAuthenticator module that is configured in conf files<br />Very rudimentary<br />Ran out of time and suitable documentation to demonstrate it<br />Cassandra is not ACID-compliant<br />
Load Balancing<br />Cassandra 0.6 has load balancing capabilities<br />Not automatic, must be configured per node<br />Load is shared in a token-ring fashion across the nodes in a multi-node configuration<br />Covered in the documentation for Cassandra<br />
Monitoring Cassandra<br />Cassandra exposes metrics as JMX data, so any JMX monitoring app should be sufficient.<br />Nagios<br />Munin<br />OpenNMS<br />Any official Oracle™ Java monitoring and administration software<br />What? I can’t be bothered to not search for the name of the software?<br />Cassandra also has software for monitoring node activity, check the docs<br />
Use Case Example<br />And a very simple one at that<br />
Product Ordering Application<br />An ordering application implemented using a SQL database could span hundreds of tables and require constant iterations over its lifespan<br />What if the attributes of these products (in this case, HVAC components) were stored in Cassandra, and we kept pricing, users, and sessions data in a RDBMS?<br />
Benefits to Cassandra<br />The data for these products that might need to be added won’t require new RDBMS fields – we can just add them in new columns and write our code statements to ignore them if they aren’t there<br />We aren’t limited to bottlenecks in the RDBMS if we choose to go multinode in our Cassandra setup<br />No single point of failure if we choose to go multinode<br />If we get a lot of users (unlikely), the nodes will equally distribute the load<br />Less time spent on queries<br />Depends on how effective our data is stored and the performance of our application<br />
Downsides to Cassandra<br />We may not have the funding needed to procure a multinode configuration<br />No guarantee that existing data that might need to be reconfigured might be changed over time to meet the demands of sales, engineering, executive, etc.<br />Data collected and given some form of relation inside the application itself, with no schema<br />Cassandra lacks a vetted security framework that could put us at risk<br />Cassandra also lacks a complete administration application<br />Chiton is barely functional as-is<br />Might not make sense when some RDBMS can scale across machines<br />
A (crude) data map showing our data in practice<br />
Cassandra and PHP<br />This is a PHP User group after all.<br />
Talking to Cassandra<br />Low-level framework, Thrift, is the actual client API for Cassandra<br />In PHP we have two such frameworks that work through Thrift<br />phpcassa<br />Pandra<br />Ran out of time to prepare a demo<br />There’s always another time for a demo. Stay tuned.<br />
Any Questions?<br />You will be baked, and there will be cake<br />