Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cassandra Java APIs Old and New – A Comparison


Published on

An introductory session at Toronto Cassandra User Group, September 2013

Published in: Education, Technology

Cassandra Java APIs Old and New – A Comparison

  1. 1. Cassandra Java APIs Old and New – A Comparison Shahryar Sedghi Toronto Cassandra User Group Sep. 18, 2013
  2. 2. #TCUG 2 Who am I? @ Founder at • Did some work on IBM Hierarchical databases (IMS DB / DOS DL1) in late 70s early 80s • Worked extensively on IBM’s first (World’s first) relational Database (SQL/DS) in early 80s • Have worked with Oracle and DB2 for years (not as a DBA) • Started working on Cassandra, late 2011 (1.0.5) @parseix
  3. 3. #TCUG 3 Disclaimer • Code samples used here except for Astyanax (that was just taken from the website) have worked once in a certain release of Cassandra. Only JDBC (modified) and new Java Driver have been tested with Cassandra 1.2
  4. 4. #TCUG 4 Agenda • What a Java API for Cassandra needs? • A basic introduction to Cassandra data model • Thrift • Thrift based APIs • Binary Protocol • DATASTAX new Java API
  5. 5. #TCUG 5 A Java Database API • Typically used in Java Application Servers – Thread Safe – Connection Pooling • When used with Cassandra – Tolerates database Machine/Network failure – Load balancing – Reconnects to the failed machine when its back • Together they should provide a highly available environment for Web apps without an expensive HA investment
  6. 6. #TCUG 6 Cassandra Data Model at a Glance B A D K B1 Value11 B2 Value12 B3 Value13 B4 Value14 A1 Value21 A2 Value22 A3 Value23 D1 Value51 D2 Value52 D3 Value53 D4 Value54 D5 Value55 • Is a row key, by default (best practice) it is not sorted, it is sorted by hash of the Key • All columns of one row reside in one node • Is a column name, 2 billion distinct column names can be in one row • Columns are sorted by column name (Ascending or Descending) • Is a column value, it can be null or can be a different type for each column in each row. E.G. A1 can be an Integer and D1 can be a String • If all 1s and all 2s and all 3s, … (e.g., A1,B1, C1) column values carry the same data type, it can be used like a relational DB with CQL 2, better scalability and less functionality, but not the best use of Cassandra C C1 Value61 C2 Value62 D51 Value551 D52 Value552 D53 Value553 Super Column (Deprecated)
  7. 7. #TCUG 7 Data Model -Composite Columns 122 11:firstName • We would like to model the following data structure: {deptartmentId Integer, employeeId Integer, firtName String, lastName String} 11:lastName 12:firstName 12:lastName 13:firstName 13:lastName departmentId 122, employeeId 11, 12 and 13 225 17:firstName 17:lastName 19:firstName 19:lastName departmentId 225, employeeId 17 and 19 • CQL3 create table department( departmentid int, employeeid int, firstname text, lastname text, PRIMARY KEY (departmentid , employeeid) ); • departmentId is called Partition key • employeeId is called Clustering key Logical Row Physical Row
  8. 8. #TCUG 8 Thrift • An Apache Project • YaRPC (Remote Procedure Call) • Has an IDL (Interface Definition Language) like other RPCs • Language Neutral • Easier than many others to use • Good fit for early releases of Cassandra to support all sorts of clients – Apparently not every client works as well as Java and Python • Is RPC a good fit for database interaction? Yes and no • Cassandra thrift by default listens on 9160
  9. 9. #TCUG 9 Thrift Importance for Cassandra • Any Clients, except new DATASTAX drivers for Java and .NET are using Thrift underneath – Including Hector, JDBC and Astyanax • Supports – Ring Discovery – Native access to Cassandra – CQL 2 – CQL 3 • JDBC and Astyanax may move to native driver in the future
  10. 10. #TCUG 10 Thrift Example: Ring Discovery Ttransport transport = new TFramedTransport(new TSocket(“", 9160)); TProtocol protocol = new TBinaryProtocol(transport); client = new Cassandra.Client(protocol);; List<TokenRange> trList = client.describe_ring(“mydb"); TokenRange tr = trList.get(0); for(String endpoint: tr.getEndpoints()){ System.out.println(endpoint); }
  11. 11. #TCUG 11 Thrift Example: Get All Row Keys ColumnParent columnParent = new ColumnParent(“xyz"); SlicePredicate predicate = new SlicePredicate(); predicate.setSlice_range(new SliceRange(ByteBuffer.wrap(new byte[0]), ByteBuffer.wrap(new byte[0]), false, 1)); // Here you can specify a slice KeyRange keyRange = new KeyRange(); //Get all keys, or set a range List<KeySlice> keySlices = client.get_range_slices(columnParent, predicate, keyRange, ConsistencyLevel.ONE); // or null in this case ArrayList<Integer> list = new ArrayList<Integer>(); for (KeySlice ks : keySlices) { list.add(ByteBuffer.wrap(ks.getKey()).getInt()); System.out.println(ByteBuffer.wrap(ks.getKey()).getInt()); }
  12. 12. #TCUG 12 Hector • Most Commonly used Java API for Cassandra • Using Thrift underneath • Among the other features: – Connection Pooling – Ring Discovery and automatic Failover – automatic retry of downed hosts – automatic discovery of additional hosts in the cluster – suspension of hosts for a short period of time after several timeouts
  13. 13. #TCUG 13 Hector Example: Read All RowKeys Cluster myCluster = HFactory.getOrCreateCluster(" MyCluster ", ""); ConfigurableConsistencyLevel ccl = new ConfigurableConsistencyLevel(); ccl.setDefaultReadConsistencyLevel(HConsistencyLevel.ONE); Keyspace myKeyspace = HFactory.createKeyspace(("MYDB", , myCluster, ccl); RangeSlicesQuery<Integer, Composite, String> rangeSlicesQuery = HFactory.createRangeSlicesQuery(myKeyspace, IntegerSerializer.get(), CompositeSerializer.get(), StringSerializer.get()); QueryResult<OrderedRows<Integer, Composite, String>> result = rangeSlicesQuery.setColumnFamily(CF).setKeys(0, -1).setReturnKeysOnly().execute(); OrderedRows<Integer, Composite, String> orderedRows = result.get(); ArrayList<Integer> list = new ArrayList<Integer>(); for(Row<Integer, Composite, String> row: orderedRows){ list.add(row.getKey()); }
  14. 14. #TCUG 14 Astyanax • Developed by Netflix • Supports all Hector functions, much easier • Much better connection pool and failover than Hector • More than an API for Cassandra – Provides some database functionality at the API level, called Recipes • Parallel all rows query • Message Queue • Chunked Object Store • many more • Utilities – JSON Writer, CVS Importer • Netflix expressed the plan to move to binary protocol at Cassandra Summit 2013
  15. 15. #TCUG 15 Astyanax Example: Pagination ColumnList<String> columns; int pageize = 10; try { RowQuery<String, String> query = keyspace .prepareQuery(CF_STANDARD1) .getKey("A") .setIsPaginating() .withColumnRange(new RangeBuilder().setMaxSize(pageize).build()); while (!(columns = query.execute().getResult()).isEmpty()) { for (Column<String> c : columns) { // do something like c.getStringValue() } } } catch (ConnectionException e) { }
  16. 16. #TCUG 16 JDBC(Java Database Connectivity) • Standard Java Database API • Only supports CQL to access Cassandra • Current Cassandra JDBC driver is a shallow implementation of JDBC on top of Thrift • URL is like: – jdbc:cassandra:// • All Java Application Servers support connection pooling for JDBC • No database failover and Cassandra Cluster support • Helps to convert relational database apps to Cassandra
  17. 17. #TCUG 17 JDBC Example: Insert • This code can run in a Servlet or an “EJB”!!! with some minor modification • Nothing in this code points to Cassandra or Thrift classes • insertQuery for CQL is not always as simple as this Context envCtx = (Context) new InitialContext().lookup("java:comp/env"); DataSource datasource = (DataSource) envCtx.lookup("jdbc/cassandra"); Connection cqlCon = datasource.getConnection(); String insertQuery = "INSERT INTO department(departmentid, employeeid, firstname, lastname) VALUES ( ?, ?, ? )"; PreparedStatment statement = cqlCon.prepareStatement(insertQuery); statement.setInt(1, 122); statement.setInt(2, 11); statement.setString(3, "John"); statement.setString(4, "Doe"); statement.close(); cqlCon.close();
  18. 18. #TCUG 18 Cassandra Binary Protocol • Inherently asynchronous – Can be used synchronously as well • Frame and stream based – Many Request with different Stream id can be sent asynchronously – A set of frames belong to the same stream coming from the server • Certain events are pushed from the server – Topology change – Status Change – Schema change • Because of the asynchronous nature, can easily be integrated with new technologies like WebSockets and Servlet 3.0, 3.1 • Listens on port 9042
  19. 19. #TCUG 19 DATASTAX Java Driver • Implements the Binary Protocol client side • Similar to JDBC but easier in certain areas – Specific to Cassandra, not portable • Supports CQL and plan to support OO and DB APIs • Supports – Query Builder (who wants this?) – Node Discovery – Connection pooling – Reconnection policies – Load balancing policies – Retry policies • Cursor support announced during Cassandra Summit 2013
  20. 20. #TCUG 20 DATASTAX Java Driver : Cluster and Session Cluster cluster = Cluster.builder().addContactPoint( "",""). withRetryPolicy(DowngradingConsistencyRetryPolicy. INSTANCE). withReconnectionPolicy(new ConstantReconnectionPolicy(1000L)). withLoadBalancingPolicy(new DCAwareRoundRobinPolicy("DC1")). withCredentials("myuser", "mypassword“).build(); Session session = cluster.connect(("mykeyspace"));
  21. 21. #TCUG 21 DATASTAX Java Driver Example: Select String selectQuery = "select * from department where departmentid = ? "; PreparedStatment statement = session.prepare(selectQuery); statement.setConsistencyLevel(ConsistencyLevel.ONE); BoundStatement query = statement.bind(122); ResultSet result = session.execute(query); // you can do async here and // get a Future instead for(Row row:result){ System.out.println(row.getInt("employeeid")); System.out.println(row.getString(“firstname")); System.out.println(row.getString(“lastname")); }
  22. 22. #TCUG 22 References • Thrift • Hector • Astyanax • JDBC • DATASTAX Java Driver – YouTube Presentation, Cassandra Summit 2013 – Slideshare, Cassandra Summit 2013 – Mailing list , enroll at
  23. 23. #TCUG 23 Thanks And especially Victor Anjos
  24. 24. #TCUG 24