Brian O‟Neill, Lead Architect, Health Market Science                                                bone@alumni.brown.edu ...
 Background Setup Data Model / Schema Naughty List (Astyanax) Toy List (CQL)
Our Problem Good, bad doctors? Dead doctors? Prescriber eligibility and remediation.
The World-WideGlobally ScalableNaughty List!   How about a Naughty and    Nice list for Santa?   1.9 billion children   ...
Installation   As easy as…     Download     http://cassandra.apache.org/download/     Uncompress     tar -xvzf apache-c...
Configuration   conf/cassandra.yamlstart_native_transport: true // CHANGE THIS TO TRUEcommitlog_directory: /var/lib/cassa...
Data Model Schema (a.k.a. Keyspace) Table (a.k.a. Column Family) Row     Have arbitrary #‟s of columns     Validator ...
Distributed Architecture   Nodes form a token ring.   Nodes partition the ring by initial token     initial_token: (in ...
VisuallyRow     Hash   Token/Hash Range : 0-99Alice   50Bob     3Eve     15                                  (1-33)
Java Interpretation Each table is a Distributed HashMap Each row is a SortedMap.Cassandra provides a massively scalable ...
Two Tables Children     Table     Store all the children in the world.     One row per child.     One column per attri...
Details of the NaughtyOrNiceList   One row per standing:country     Ensures all children in a country are grouped togeth...
Visually                            Nice:USA                           Node 1   CA:94333:johny.b.good(1) Go to the row.   ...
Our Schema   bin/cqlsh -3       CREATE KEYSPACE northpole WITH replication = {class:SimpleStrategy,        replication_f...
The CQL->Data ModelRules   First primary key becomes the rowkey.   Subsequent components of the primary key    form a co...
CQL Viewcqlsh:northpole> select * from naughtyornicelist ; standingbycountry | state | zip | childid-------------------+--...
CLI View[default@northpole] list naughtyornicelist;Using default limit of 100Using default column limit of 100------------...
Data Model Implicationsselect * from children where childid=owen.oneill;select * from naughtyornicelist where childid=owen...
No, seriously. Let‟s code!   What API should we use?                      Production-   Potential   Momentum             ...
Connectthis.astyanaxContext = new AstyanaxContext.Builder()         .forCluster("ClusterName")         .forKeyspace(keyspa...
Write/UpdateMutationBatch mutation = keyspace.prepareMutationBatch();columnFamily = new ColumnFamily<String, String>(colum...
Composite Types   Composite (a.k.a. Compound)public class ListEntry {  @Component(ordinal = 0)  public String state;  @Co...
Range Buildersrange = entitySerializer.buildRange().withPrefix(state).greaterThanEquals("").lessThanEquals("99999");Then.....
CQL Collections!http://www.datastax.com/dev/blog/cql3_collections   Set     UPDATE users SET emails = emails + {fb@frien...
CQL vs. Thrifthttp://www.datastax.com/dev/blog/thrift-to-cql3   Thrift is legacy API on which all of the Java    APIs are...
Let‟s get back to cranking…   Recreate the schema (to be CQL friendly)   UPDATE children SET toys = toys + [ „legos ] WH...
Shameless Shoutout(s) Virgil https://github.com/boneill42/virgil     REST interface for Cassandra   https://github.com...
C*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with Cassandra
Upcoming SlideShare
Loading in...5
×

C*ollege Credit: Creating Your First App in Java with Cassandra

6,770

Published on

C*ollege Credit: Creating Your First App in Java with Cassandra

  1. 1. Brian O‟Neill, Lead Architect, Health Market Science bone@alumni.brown.edu @boneill42
  2. 2.  Background Setup Data Model / Schema Naughty List (Astyanax) Toy List (CQL)
  3. 3. Our Problem Good, bad doctors? Dead doctors? Prescriber eligibility and remediation.
  4. 4. The World-WideGlobally ScalableNaughty List! How about a Naughty and Nice list for Santa? 1.9 billion children  That will fit in a single row! Queries to support:  Children can login and check their standing.  Santa can find nice children by country, state or zip.
  5. 5. Installation As easy as…  Download http://cassandra.apache.org/download/  Uncompress tar -xvzf apache-cassandra-1.2.0-beta3-bin.tar.gz  Run bin/cassandra –f (-f puts it in foreground)
  6. 6. Configuration conf/cassandra.yamlstart_native_transport: true // CHANGE THIS TO TRUEcommitlog_directory: /var/lib/cassandra/commitlog conf/log4j-server.propertieslog4j.appender.R.File=/var/log/cassandra/system.log
  7. 7. Data Model Schema (a.k.a. Keyspace) Table (a.k.a. Column Family) Row  Have arbitrary #‟s of columns  Validator for keys (e.g. UTF8Type) Column  Validator for values and keys  Comparator for keys (e.g. DateType or BYOC) (http://www.youtube.com/watch?v=bKfND4woylw)
  8. 8. Distributed Architecture Nodes form a token ring. Nodes partition the ring by initial token  initial_token: (in cassandra.yaml) Partitioners map row keys to tokens.  Usually randomly, to evenly distribute the data All columns for a row are stored together on disk in sorted order.
  9. 9. VisuallyRow Hash Token/Hash Range : 0-99Alice 50Bob 3Eve 15 (1-33)
  10. 10. Java Interpretation Each table is a Distributed HashMap Each row is a SortedMap.Cassandra provides a massively scalable version of:HashMap<rowKey, SortedMap<columnKey, columnValue> Implications:  Direct row fetch is fast.  Searching a range of rows can be costly.  Searching a range of columns is cheap.
  11. 11. Two Tables Children Table  Store all the children in the world.  One row per child.  One column per attribute. NaughtyOrNice Table  Supports the queries we anticipate  Wide-Row Strategy
  12. 12. Details of the NaughtyOrNiceList One row per standing:country  Ensures all children in a country are grouped together on disk. One column per child using a compound key  Ensures the columns are sorted to support our search at varying levels of granularity ○ e.g. All nice children in the US. ○ e.g. All naughty children in PA.
  13. 13. Visually Nice:USA Node 1 CA:94333:johny.b.good(1) Go to the row. CA:94333:richie.rich(2) Get the column slice Nice:IRL Node 2 D:EI33:collin.oneillWatch out for: D:EI33:owen.oneill• Hot spotting• Unbalanced Clusters Nice:USA CA:94111:bart.simpson Node 3 CA:94222:dennis.menace PA:18964:michael.myers
  14. 14. Our Schema bin/cqlsh -3  CREATE KEYSPACE northpole WITH replication = {class:SimpleStrategy, replication_factor:1};  create table children ( childId varchar, firstName varchar, lastName varchar, timezone varchar, country varchar, state varchar, zip varchar, primary key (childId ) ) WITH COMPACT STORAGE;  create table naughtyOrNiceList ( standingByZone varchar, country varchar, state varchar, zip varchar, childId varchar, primary key (standingByZone, country, state, zip, childId) ); bin/cassandra-cli  (the “old school” interface)
  15. 15. The CQL->Data ModelRules First primary key becomes the rowkey. Subsequent components of the primary key form a composite column name. One column is then written for each non- primary key column.
  16. 16. CQL Viewcqlsh:northpole> select * from naughtyornicelist ; standingbycountry | state | zip | childid-------------------+-------+-------+--------------- naughty:USA | CA | 94111 | bart.simpson naughty:USA | CA | 94222 | dennis.menace nice:IRL | D | EI33 | collin.oneill nice:IRL | D | EI33 | owen.oneill nice:USA | CA | 94333 | johny.b.good nice:USA | CA | 94333 | richie.rich
  17. 17. CLI View[default@northpole] list naughtyornicelist;Using default limit of 100Using default column limit of 100-------------------RowKey: naughty:USA=> (column=CA:94111:bart.simpson:, value=, timestamp=1355168971612000)=> (column=CA:94222:dennis.menace:, value=, timestamp=1355168971614000)-------------------RowKey: nice:IRL=> (column=D:EI33:collin.oneill:, value=, timestamp=1355168971604000)=> (column=D:EI33:owen.oneill:, value=, timestamp=1355168971601000)-------------------RowKey: nice:USA=> (column=CA:94333:johny.b.good:, value=, timestamp=1355168971610000)=> (column=CA:94333:richie.rich:, value=, timestamp=1355168971606000)
  18. 18. Data Model Implicationsselect * from children where childid=owen.oneill;select * from naughtyornicelist where childid=owen.oneill;Bad Request:select * from naughtyornicelist wherestandingbycountry=nice:IRL and state=D and zip=EI33and childid=owen.oneill;
  19. 19. No, seriously. Let‟s code! What API should we use? Production- Potential Momentum Readiness Thrift 10 -1 -1 Hector 10 8 8 Astyanax 8 9 10 Kundera (JPA) 6 9 9 Pelops 7 6 7 Firebrand 8 10 8 PlayORM 5 8 7 GORA 6 9 7 CQL Driver ? ? ? Asytanax FTW!
  20. 20. Connectthis.astyanaxContext = new AstyanaxContext.Builder() .forCluster("ClusterName") .forKeyspace(keyspace) .withAstyanaxConfiguration(…) .withConnectionPoolConfiguration(…) .buildKeyspace(ThriftFamilyFactory.getInstance()); Specify:  Cluster Name (arbitrary identifier)  Keyspace  Node Discovery Method  Connection Pool Information
  21. 21. Write/UpdateMutationBatch mutation = keyspace.prepareMutationBatch();columnFamily = new ColumnFamily<String, String>(columnFamilyName, StringSerializer.get(), StringSerializer.get());mutation.withRow(columnFamily, rowKey) .putColumn(entry.getKey(), entry.getValue(), null);mutation.execute(); Process:  Create a mutation  Specify the Column Family with Serializers  Put your columns.  Execute
  22. 22. Composite Types Composite (a.k.a. Compound)public class ListEntry { @Component(ordinal = 0) public String state; @Component(ordinal = 1) public String zip; @Component(ordinal = 2) public String childId;}
  23. 23. Range Buildersrange = entitySerializer.buildRange().withPrefix(state).greaterThanEquals("").lessThanEquals("99999");Then....withColumnRange(range).execute();
  24. 24. CQL Collections!http://www.datastax.com/dev/blog/cql3_collections Set  UPDATE users SET emails = emails + {fb@friendsofmordor.org} WHERE user_id = frodo; List  UPDATE users SET top_places = [ the shire ] + top_places WHERE user_id = frodo; Maps  UPDATE users SET todo[2012-10-2 12:10] = die WHERE user_id = frodo;
  25. 25. CQL vs. Thrifthttp://www.datastax.com/dev/blog/thrift-to-cql3 Thrift is legacy API on which all of the Java APIs are built. CQL is the new native protocol and driver.
  26. 26. Let‟s get back to cranking… Recreate the schema (to be CQL friendly) UPDATE children SET toys = toys + [ „legos ] WHERE childId = ‟owen.oneill‟; Crank out a Dao layer to use CQL collections operations.
  27. 27. Shameless Shoutout(s) Virgil https://github.com/boneill42/virgil  REST interface for Cassandra https://github.com/boneill42/storm-cassandra  Distributed Processing on Cassandra  (Webinar in January)
  1. ¿Le ha llamado la atención una diapositiva en particular?

    Recortar diapositivas es una manera útil de recopilar información importante para consultarla más tarde.

×