• Like
  • Save
C*ollege Credit: Creating Your First App in Java with Cassandra
Upcoming SlideShare
Loading in...5
×
 

C*ollege Credit: Creating Your First App in Java with Cassandra

on

  • 6,049 views

 

Statistics

Views

Total Views
6,049
Views on SlideShare
6,049
Embed Views
0

Actions

Likes
4
Downloads
157
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    C*ollege Credit: Creating Your First App in Java with Cassandra C*ollege Credit: Creating Your First App in Java with Cassandra Presentation Transcript

    • Brian O‟Neill, Lead Architect, Health Market Science bone@alumni.brown.edu @boneill42
    •  Background Setup Data Model / Schema Naughty List (Astyanax) Toy List (CQL)
    • Our Problem Good, bad doctors? Dead doctors? Prescriber eligibility and remediation.
    • The World-WideGlobally ScalableNaughty List! How about a Naughty and Nice list for Santa? 1.9 billion children  That will fit in a single row! Queries to support:  Children can login and check their standing.  Santa can find nice children by country, state or zip.
    • Installation As easy as…  Download http://cassandra.apache.org/download/  Uncompress tar -xvzf apache-cassandra-1.2.0-beta3-bin.tar.gz  Run bin/cassandra –f (-f puts it in foreground)
    • Configuration conf/cassandra.yamlstart_native_transport: true // CHANGE THIS TO TRUEcommitlog_directory: /var/lib/cassandra/commitlog conf/log4j-server.propertieslog4j.appender.R.File=/var/log/cassandra/system.log
    • Data Model Schema (a.k.a. Keyspace) Table (a.k.a. Column Family) Row  Have arbitrary #‟s of columns  Validator for keys (e.g. UTF8Type) Column  Validator for values and keys  Comparator for keys (e.g. DateType or BYOC) (http://www.youtube.com/watch?v=bKfND4woylw)
    • Distributed Architecture Nodes form a token ring. Nodes partition the ring by initial token  initial_token: (in cassandra.yaml) Partitioners map row keys to tokens.  Usually randomly, to evenly distribute the data All columns for a row are stored together on disk in sorted order.
    • VisuallyRow Hash Token/Hash Range : 0-99Alice 50Bob 3Eve 15 (1-33)
    • Java Interpretation Each table is a Distributed HashMap Each row is a SortedMap.Cassandra provides a massively scalable version of:HashMap<rowKey, SortedMap<columnKey, columnValue> Implications:  Direct row fetch is fast.  Searching a range of rows can be costly.  Searching a range of columns is cheap.
    • Two Tables Children Table  Store all the children in the world.  One row per child.  One column per attribute. NaughtyOrNice Table  Supports the queries we anticipate  Wide-Row Strategy
    • Details of the NaughtyOrNiceList One row per standing:country  Ensures all children in a country are grouped together on disk. One column per child using a compound key  Ensures the columns are sorted to support our search at varying levels of granularity ○ e.g. All nice children in the US. ○ e.g. All naughty children in PA.
    • Visually Nice:USA Node 1 CA:94333:johny.b.good(1) Go to the row. CA:94333:richie.rich(2) Get the column slice Nice:IRL Node 2 D:EI33:collin.oneillWatch out for: D:EI33:owen.oneill• Hot spotting• Unbalanced Clusters Nice:USA CA:94111:bart.simpson Node 3 CA:94222:dennis.menace PA:18964:michael.myers
    • Our Schema bin/cqlsh -3  CREATE KEYSPACE northpole WITH replication = {class:SimpleStrategy, replication_factor:1};  create table children ( childId varchar, firstName varchar, lastName varchar, timezone varchar, country varchar, state varchar, zip varchar, primary key (childId ) ) WITH COMPACT STORAGE;  create table naughtyOrNiceList ( standingByZone varchar, country varchar, state varchar, zip varchar, childId varchar, primary key (standingByZone, country, state, zip, childId) ); bin/cassandra-cli  (the “old school” interface)
    • The CQL->Data ModelRules First primary key becomes the rowkey. Subsequent components of the primary key form a composite column name. One column is then written for each non- primary key column.
    • CQL Viewcqlsh:northpole> select * from naughtyornicelist ; standingbycountry | state | zip | childid-------------------+-------+-------+--------------- naughty:USA | CA | 94111 | bart.simpson naughty:USA | CA | 94222 | dennis.menace nice:IRL | D | EI33 | collin.oneill nice:IRL | D | EI33 | owen.oneill nice:USA | CA | 94333 | johny.b.good nice:USA | CA | 94333 | richie.rich
    • CLI View[default@northpole] list naughtyornicelist;Using default limit of 100Using default column limit of 100-------------------RowKey: naughty:USA=> (column=CA:94111:bart.simpson:, value=, timestamp=1355168971612000)=> (column=CA:94222:dennis.menace:, value=, timestamp=1355168971614000)-------------------RowKey: nice:IRL=> (column=D:EI33:collin.oneill:, value=, timestamp=1355168971604000)=> (column=D:EI33:owen.oneill:, value=, timestamp=1355168971601000)-------------------RowKey: nice:USA=> (column=CA:94333:johny.b.good:, value=, timestamp=1355168971610000)=> (column=CA:94333:richie.rich:, value=, timestamp=1355168971606000)
    • Data Model Implicationsselect * from children where childid=owen.oneill;select * from naughtyornicelist where childid=owen.oneill;Bad Request:select * from naughtyornicelist wherestandingbycountry=nice:IRL and state=D and zip=EI33and childid=owen.oneill;
    • No, seriously. Let‟s code! What API should we use? Production- Potential Momentum Readiness Thrift 10 -1 -1 Hector 10 8 8 Astyanax 8 9 10 Kundera (JPA) 6 9 9 Pelops 7 6 7 Firebrand 8 10 8 PlayORM 5 8 7 GORA 6 9 7 CQL Driver ? ? ? Asytanax FTW!
    • Connectthis.astyanaxContext = new AstyanaxContext.Builder() .forCluster("ClusterName") .forKeyspace(keyspace) .withAstyanaxConfiguration(…) .withConnectionPoolConfiguration(…) .buildKeyspace(ThriftFamilyFactory.getInstance()); Specify:  Cluster Name (arbitrary identifier)  Keyspace  Node Discovery Method  Connection Pool Information
    • Write/UpdateMutationBatch mutation = keyspace.prepareMutationBatch();columnFamily = new ColumnFamily<String, String>(columnFamilyName, StringSerializer.get(), StringSerializer.get());mutation.withRow(columnFamily, rowKey) .putColumn(entry.getKey(), entry.getValue(), null);mutation.execute(); Process:  Create a mutation  Specify the Column Family with Serializers  Put your columns.  Execute
    • Composite Types Composite (a.k.a. Compound)public class ListEntry { @Component(ordinal = 0) public String state; @Component(ordinal = 1) public String zip; @Component(ordinal = 2) public String childId;}
    • Range Buildersrange = entitySerializer.buildRange().withPrefix(state).greaterThanEquals("").lessThanEquals("99999");Then....withColumnRange(range).execute();
    • CQL Collections!http://www.datastax.com/dev/blog/cql3_collections Set  UPDATE users SET emails = emails + {fb@friendsofmordor.org} WHERE user_id = frodo; List  UPDATE users SET top_places = [ the shire ] + top_places WHERE user_id = frodo; Maps  UPDATE users SET todo[2012-10-2 12:10] = die WHERE user_id = frodo;
    • CQL vs. Thrifthttp://www.datastax.com/dev/blog/thrift-to-cql3 Thrift is legacy API on which all of the Java APIs are built. CQL is the new native protocol and driver.
    • Let‟s get back to cranking… Recreate the schema (to be CQL friendly) UPDATE children SET toys = toys + [ „legos ] WHERE childId = ‟owen.oneill‟; Crank out a Dao layer to use CQL collections operations.
    • Shameless Shoutout(s) Virgil https://github.com/boneill42/virgil  REST interface for Cassandra https://github.com/boneill42/storm-cassandra  Distributed Processing on Cassandra  (Webinar in January)