C*ollege Credit: Creating Your First App in Java with Cassandra

  • 5,837 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,837
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
178
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Brian O‟Neill, Lead Architect, Health Market Science bone@alumni.brown.edu @boneill42
  • 2.  Background Setup Data Model / Schema Naughty List (Astyanax) Toy List (CQL)
  • 3. Our Problem Good, bad doctors? Dead doctors? Prescriber eligibility and remediation.
  • 4. The World-WideGlobally ScalableNaughty List! How about a Naughty and Nice list for Santa? 1.9 billion children  That will fit in a single row! Queries to support:  Children can login and check their standing.  Santa can find nice children by country, state or zip.
  • 5. Installation As easy as…  Download http://cassandra.apache.org/download/  Uncompress tar -xvzf apache-cassandra-1.2.0-beta3-bin.tar.gz  Run bin/cassandra –f (-f puts it in foreground)
  • 6. Configuration conf/cassandra.yamlstart_native_transport: true // CHANGE THIS TO TRUEcommitlog_directory: /var/lib/cassandra/commitlog conf/log4j-server.propertieslog4j.appender.R.File=/var/log/cassandra/system.log
  • 7. Data Model Schema (a.k.a. Keyspace) Table (a.k.a. Column Family) Row  Have arbitrary #‟s of columns  Validator for keys (e.g. UTF8Type) Column  Validator for values and keys  Comparator for keys (e.g. DateType or BYOC) (http://www.youtube.com/watch?v=bKfND4woylw)
  • 8. Distributed Architecture Nodes form a token ring. Nodes partition the ring by initial token  initial_token: (in cassandra.yaml) Partitioners map row keys to tokens.  Usually randomly, to evenly distribute the data All columns for a row are stored together on disk in sorted order.
  • 9. VisuallyRow Hash Token/Hash Range : 0-99Alice 50Bob 3Eve 15 (1-33)
  • 10. Java Interpretation Each table is a Distributed HashMap Each row is a SortedMap.Cassandra provides a massively scalable version of:HashMap<rowKey, SortedMap<columnKey, columnValue> Implications:  Direct row fetch is fast.  Searching a range of rows can be costly.  Searching a range of columns is cheap.
  • 11. Two Tables Children Table  Store all the children in the world.  One row per child.  One column per attribute. NaughtyOrNice Table  Supports the queries we anticipate  Wide-Row Strategy
  • 12. Details of the NaughtyOrNiceList One row per standing:country  Ensures all children in a country are grouped together on disk. One column per child using a compound key  Ensures the columns are sorted to support our search at varying levels of granularity ○ e.g. All nice children in the US. ○ e.g. All naughty children in PA.
  • 13. Visually Nice:USA Node 1 CA:94333:johny.b.good(1) Go to the row. CA:94333:richie.rich(2) Get the column slice Nice:IRL Node 2 D:EI33:collin.oneillWatch out for: D:EI33:owen.oneill• Hot spotting• Unbalanced Clusters Nice:USA CA:94111:bart.simpson Node 3 CA:94222:dennis.menace PA:18964:michael.myers
  • 14. Our Schema bin/cqlsh -3  CREATE KEYSPACE northpole WITH replication = {class:SimpleStrategy, replication_factor:1};  create table children ( childId varchar, firstName varchar, lastName varchar, timezone varchar, country varchar, state varchar, zip varchar, primary key (childId ) ) WITH COMPACT STORAGE;  create table naughtyOrNiceList ( standingByZone varchar, country varchar, state varchar, zip varchar, childId varchar, primary key (standingByZone, country, state, zip, childId) ); bin/cassandra-cli  (the “old school” interface)
  • 15. The CQL->Data ModelRules First primary key becomes the rowkey. Subsequent components of the primary key form a composite column name. One column is then written for each non- primary key column.
  • 16. CQL Viewcqlsh:northpole> select * from naughtyornicelist ; standingbycountry | state | zip | childid-------------------+-------+-------+--------------- naughty:USA | CA | 94111 | bart.simpson naughty:USA | CA | 94222 | dennis.menace nice:IRL | D | EI33 | collin.oneill nice:IRL | D | EI33 | owen.oneill nice:USA | CA | 94333 | johny.b.good nice:USA | CA | 94333 | richie.rich
  • 17. CLI View[default@northpole] list naughtyornicelist;Using default limit of 100Using default column limit of 100-------------------RowKey: naughty:USA=> (column=CA:94111:bart.simpson:, value=, timestamp=1355168971612000)=> (column=CA:94222:dennis.menace:, value=, timestamp=1355168971614000)-------------------RowKey: nice:IRL=> (column=D:EI33:collin.oneill:, value=, timestamp=1355168971604000)=> (column=D:EI33:owen.oneill:, value=, timestamp=1355168971601000)-------------------RowKey: nice:USA=> (column=CA:94333:johny.b.good:, value=, timestamp=1355168971610000)=> (column=CA:94333:richie.rich:, value=, timestamp=1355168971606000)
  • 18. Data Model Implicationsselect * from children where childid=owen.oneill;select * from naughtyornicelist where childid=owen.oneill;Bad Request:select * from naughtyornicelist wherestandingbycountry=nice:IRL and state=D and zip=EI33and childid=owen.oneill;
  • 19. No, seriously. Let‟s code! What API should we use? Production- Potential Momentum Readiness Thrift 10 -1 -1 Hector 10 8 8 Astyanax 8 9 10 Kundera (JPA) 6 9 9 Pelops 7 6 7 Firebrand 8 10 8 PlayORM 5 8 7 GORA 6 9 7 CQL Driver ? ? ? Asytanax FTW!
  • 20. Connectthis.astyanaxContext = new AstyanaxContext.Builder() .forCluster("ClusterName") .forKeyspace(keyspace) .withAstyanaxConfiguration(…) .withConnectionPoolConfiguration(…) .buildKeyspace(ThriftFamilyFactory.getInstance()); Specify:  Cluster Name (arbitrary identifier)  Keyspace  Node Discovery Method  Connection Pool Information
  • 21. Write/UpdateMutationBatch mutation = keyspace.prepareMutationBatch();columnFamily = new ColumnFamily<String, String>(columnFamilyName, StringSerializer.get(), StringSerializer.get());mutation.withRow(columnFamily, rowKey) .putColumn(entry.getKey(), entry.getValue(), null);mutation.execute(); Process:  Create a mutation  Specify the Column Family with Serializers  Put your columns.  Execute
  • 22. Composite Types Composite (a.k.a. Compound)public class ListEntry { @Component(ordinal = 0) public String state; @Component(ordinal = 1) public String zip; @Component(ordinal = 2) public String childId;}
  • 23. Range Buildersrange = entitySerializer.buildRange().withPrefix(state).greaterThanEquals("").lessThanEquals("99999");Then....withColumnRange(range).execute();
  • 24. CQL Collections!http://www.datastax.com/dev/blog/cql3_collections Set  UPDATE users SET emails = emails + {fb@friendsofmordor.org} WHERE user_id = frodo; List  UPDATE users SET top_places = [ the shire ] + top_places WHERE user_id = frodo; Maps  UPDATE users SET todo[2012-10-2 12:10] = die WHERE user_id = frodo;
  • 25. CQL vs. Thrifthttp://www.datastax.com/dev/blog/thrift-to-cql3 Thrift is legacy API on which all of the Java APIs are built. CQL is the new native protocol and driver.
  • 26. Let‟s get back to cranking… Recreate the schema (to be CQL friendly) UPDATE children SET toys = toys + [ „legos ] WHERE childId = ‟owen.oneill‟; Crank out a Dao layer to use CQL collections operations.
  • 27. Shameless Shoutout(s) Virgil https://github.com/boneill42/virgil  REST interface for Cassandra https://github.com/boneill42/storm-cassandra  Distributed Processing on Cassandra  (Webinar in January)