Cassandra Java APIs Old and New – A Comparison


Published on

An introductory session at Toronto Cassandra User Group, September 2013

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Cassandra Java APIs Old and New – A Comparison

  1. 1. Cassandra Java APIs Old and New – A Comparison Shahryar Sedghi Toronto Cassandra User Group Sep. 18, 2013
  2. 2. #TCUG 2 Who am I? @ Founder at • Did some work on IBM Hierarchical databases (IMS DB / DOS DL1) in late 70s early 80s • Worked extensively on IBM’s first (World’s first) relational Database (SQL/DS) in early 80s • Have worked with Oracle and DB2 for years (not as a DBA) • Started working on Cassandra, late 2011 (1.0.5) @parseix
  3. 3. #TCUG 3 Disclaimer • Code samples used here except for Astyanax (that was just taken from the website) have worked once in a certain release of Cassandra. Only JDBC (modified) and new Java Driver have been tested with Cassandra 1.2
  4. 4. #TCUG 4 Agenda • What a Java API for Cassandra needs? • A basic introduction to Cassandra data model • Thrift • Thrift based APIs • Binary Protocol • DATASTAX new Java API
  5. 5. #TCUG 5 A Java Database API • Typically used in Java Application Servers – Thread Safe – Connection Pooling • When used with Cassandra – Tolerates database Machine/Network failure – Load balancing – Reconnects to the failed machine when its back • Together they should provide a highly available environment for Web apps without an expensive HA investment
  6. 6. #TCUG 6 Cassandra Data Model at a Glance B A D K B1 Value11 B2 Value12 B3 Value13 B4 Value14 A1 Value21 A2 Value22 A3 Value23 D1 Value51 D2 Value52 D3 Value53 D4 Value54 D5 Value55 • Is a row key, by default (best practice) it is not sorted, it is sorted by hash of the Key • All columns of one row reside in one node • Is a column name, 2 billion distinct column names can be in one row • Columns are sorted by column name (Ascending or Descending) • Is a column value, it can be null or can be a different type for each column in each row. E.G. A1 can be an Integer and D1 can be a String • If all 1s and all 2s and all 3s, … (e.g., A1,B1, C1) column values carry the same data type, it can be used like a relational DB with CQL 2, better scalability and less functionality, but not the best use of Cassandra C C1 Value61 C2 Value62 D51 Value551 D52 Value552 D53 Value553 Super Column (Deprecated)
  7. 7. #TCUG 7 Data Model -Composite Columns 122 11:firstName • We would like to model the following data structure: {deptartmentId Integer, employeeId Integer, firtName String, lastName String} 11:lastName 12:firstName 12:lastName 13:firstName 13:lastName departmentId 122, employeeId 11, 12 and 13 225 17:firstName 17:lastName 19:firstName 19:lastName departmentId 225, employeeId 17 and 19 • CQL3 create table department( departmentid int, employeeid int, firstname text, lastname text, PRIMARY KEY (departmentid , employeeid) ); • departmentId is called Partition key • employeeId is called Clustering key Logical Row Physical Row
  8. 8. #TCUG 8 Thrift • An Apache Project • YaRPC (Remote Procedure Call) • Has an IDL (Interface Definition Language) like other RPCs • Language Neutral • Easier than many others to use • Good fit for early releases of Cassandra to support all sorts of clients – Apparently not every client works as well as Java and Python • Is RPC a good fit for database interaction? Yes and no • Cassandra thrift by default listens on 9160
  9. 9. #TCUG 9 Thrift Importance for Cassandra • Any Clients, except new DATASTAX drivers for Java and .NET are using Thrift underneath – Including Hector, JDBC and Astyanax • Supports – Ring Discovery – Native access to Cassandra – CQL 2 – CQL 3 • JDBC and Astyanax may move to native driver in the future
  10. 10. #TCUG 10 Thrift Example: Ring Discovery Ttransport transport = new TFramedTransport(new TSocket(“", 9160)); TProtocol protocol = new TBinaryProtocol(transport); client = new Cassandra.Client(protocol);; List<TokenRange> trList = client.describe_ring(“mydb"); TokenRange tr = trList.get(0); for(String endpoint: tr.getEndpoints()){ System.out.println(endpoint); }
  11. 11. #TCUG 11 Thrift Example: Get All Row Keys ColumnParent columnParent = new ColumnParent(“xyz"); SlicePredicate predicate = new SlicePredicate(); predicate.setSlice_range(new SliceRange(ByteBuffer.wrap(new byte[0]), ByteBuffer.wrap(new byte[0]), false, 1)); // Here you can specify a slice KeyRange keyRange = new KeyRange(); //Get all keys, or set a range List<KeySlice> keySlices = client.get_range_slices(columnParent, predicate, keyRange, ConsistencyLevel.ONE); // or null in this case ArrayList<Integer> list = new ArrayList<Integer>(); for (KeySlice ks : keySlices) { list.add(ByteBuffer.wrap(ks.getKey()).getInt()); System.out.println(ByteBuffer.wrap(ks.getKey()).getInt()); }
  12. 12. #TCUG 12 Hector • Most Commonly used Java API for Cassandra • Using Thrift underneath • Among the other features: – Connection Pooling – Ring Discovery and automatic Failover – automatic retry of downed hosts – automatic discovery of additional hosts in the cluster – suspension of hosts for a short period of time after several timeouts
  13. 13. #TCUG 13 Hector Example: Read All RowKeys Cluster myCluster = HFactory.getOrCreateCluster(" MyCluster ", ""); ConfigurableConsistencyLevel ccl = new ConfigurableConsistencyLevel(); ccl.setDefaultReadConsistencyLevel(HConsistencyLevel.ONE); Keyspace myKeyspace = HFactory.createKeyspace(("MYDB", , myCluster, ccl); RangeSlicesQuery<Integer, Composite, String> rangeSlicesQuery = HFactory.createRangeSlicesQuery(myKeyspace, IntegerSerializer.get(), CompositeSerializer.get(), StringSerializer.get()); QueryResult<OrderedRows<Integer, Composite, String>> result = rangeSlicesQuery.setColumnFamily(CF).setKeys(0, -1).setReturnKeysOnly().execute(); OrderedRows<Integer, Composite, String> orderedRows = result.get(); ArrayList<Integer> list = new ArrayList<Integer>(); for(Row<Integer, Composite, String> row: orderedRows){ list.add(row.getKey()); }
  14. 14. #TCUG 14 Astyanax • Developed by Netflix • Supports all Hector functions, much easier • Much better connection pool and failover than Hector • More than an API for Cassandra – Provides some database functionality at the API level, called Recipes • Parallel all rows query • Message Queue • Chunked Object Store • many more • Utilities – JSON Writer, CVS Importer • Netflix expressed the plan to move to binary protocol at Cassandra Summit 2013
  15. 15. #TCUG 15 Astyanax Example: Pagination ColumnList<String> columns; int pageize = 10; try { RowQuery<String, String> query = keyspace .prepareQuery(CF_STANDARD1) .getKey("A") .setIsPaginating() .withColumnRange(new RangeBuilder().setMaxSize(pageize).build()); while (!(columns = query.execute().getResult()).isEmpty()) { for (Column<String> c : columns) { // do something like c.getStringValue() } } } catch (ConnectionException e) { }
  16. 16. #TCUG 16 JDBC(Java Database Connectivity) • Standard Java Database API • Only supports CQL to access Cassandra • Current Cassandra JDBC driver is a shallow implementation of JDBC on top of Thrift • URL is like: – jdbc:cassandra:// • All Java Application Servers support connection pooling for JDBC • No database failover and Cassandra Cluster support • Helps to convert relational database apps to Cassandra
  17. 17. #TCUG 17 JDBC Example: Insert • This code can run in a Servlet or an “EJB”!!! with some minor modification • Nothing in this code points to Cassandra or Thrift classes • insertQuery for CQL is not always as simple as this Context envCtx = (Context) new InitialContext().lookup("java:comp/env"); DataSource datasource = (DataSource) envCtx.lookup("jdbc/cassandra"); Connection cqlCon = datasource.getConnection(); String insertQuery = "INSERT INTO department(departmentid, employeeid, firstname, lastname) VALUES ( ?, ?, ? )"; PreparedStatment statement = cqlCon.prepareStatement(insertQuery); statement.setInt(1, 122); statement.setInt(2, 11); statement.setString(3, "John"); statement.setString(4, "Doe"); statement.close(); cqlCon.close();
  18. 18. #TCUG 18 Cassandra Binary Protocol • Inherently asynchronous – Can be used synchronously as well • Frame and stream based – Many Request with different Stream id can be sent asynchronously – A set of frames belong to the same stream coming from the server • Certain events are pushed from the server – Topology change – Status Change – Schema change • Because of the asynchronous nature, can easily be integrated with new technologies like WebSockets and Servlet 3.0, 3.1 • Listens on port 9042
  19. 19. #TCUG 19 DATASTAX Java Driver • Implements the Binary Protocol client side • Similar to JDBC but easier in certain areas – Specific to Cassandra, not portable • Supports CQL and plan to support OO and DB APIs • Supports – Query Builder (who wants this?) – Node Discovery – Connection pooling – Reconnection policies – Load balancing policies – Retry policies • Cursor support announced during Cassandra Summit 2013
  20. 20. #TCUG 20 DATASTAX Java Driver : Cluster and Session Cluster cluster = Cluster.builder().addContactPoint( "",""). withRetryPolicy(DowngradingConsistencyRetryPolicy. INSTANCE). withReconnectionPolicy(new ConstantReconnectionPolicy(1000L)). withLoadBalancingPolicy(new DCAwareRoundRobinPolicy("DC1")). withCredentials("myuser", "mypassword“).build(); Session session = cluster.connect(("mykeyspace"));
  21. 21. #TCUG 21 DATASTAX Java Driver Example: Select String selectQuery = "select * from department where departmentid = ? "; PreparedStatment statement = session.prepare(selectQuery); statement.setConsistencyLevel(ConsistencyLevel.ONE); BoundStatement query = statement.bind(122); ResultSet result = session.execute(query); // you can do async here and // get a Future instead for(Row row:result){ System.out.println(row.getInt("employeeid")); System.out.println(row.getString(“firstname")); System.out.println(row.getString(“lastname")); }
  22. 22. #TCUG 22 References • Thrift • Hector • Astyanax • JDBC • DATASTAX Java Driver – YouTube Presentation, Cassandra Summit 2013 – Slideshare, Cassandra Summit 2013 – Mailing list , enroll at
  23. 23. #TCUG 23 Thanks And especially Victor Anjos
  24. 24. #TCUG 24