Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Coursera Cassandra Driver

638 views

Published on

Hear about how Coursera uses Cassandra as the core of its scalable online education platform. I'll discuss the strengths of Cassandra that we leverage, as well as some limitations that you might run into as well in practice.

In the second part of this talk, we'll dive into how best to effectively use the Datastax Java drivers. We'll dig into how the driver is architected, and use this understanding to develop best practices to follow. I'll also share a couple of interesting bug we've run into at Coursera.

Published in: Technology
  • Be the first to comment

Coursera Cassandra Driver

  1. 1. Coursera, Cassandra, Java Drivers
  2. 2. Biography Daniel Chia @DanielJHChia Software Engineer, Infrastructure Team 2
  3. 3. 1 Introduction 2 Why We Chose Cassandra 3 Example Use Cases 4 Pain Points 5 Java Drivers
  4. 4. Coursera 4
  5. 5. 5
  6. 6. 6 Web iOS Android
  7. 7. Why Cassandra 7
  8. 8. Coursera Tech Stack • 100% AWS • MySQL + Cassandra • Service-oriented 8
  9. 9. Consistently Fast Latencies 9
  10. 10. Availability 10
  11. 11. Scalability 11
  12. 12. Use Case #1 • Resume video where you left off • High write volume • TTL data 12
  13. 13. 13 CREATE TABLE video_progress_kvs_basic ( user_id int, course_id varchar, video_id varchar, viewed_up_to bigint, updated_at bigint PRIMARY KEY ((user_id, course_id, video_id)) );
  14. 14. Use Case #2: Media Asset Service 14
  15. 15. 15
  16. 16. 16
  17. 17. Use case #3: Video Workflows 17 Input.mp4 Step 1: Audio Step 2: Low Res Video Step 3: High Res Video Assembly 1: Crash Assembly 2: Ok Assembly 3: Crash Assembly 4: Ok Assembly 5: Ok
  18. 18. 18
  19. 19. CREATE TABLE transloadit_workflow ( workflow_id text, step_id text, assembly_id text, step_details text, step_payload map<text, text>, step_status text, PRIMARY KEY (workflow_id, step_id, assembly_id) ) 19
  20. 20. 20 Looking Back
  21. 21. Cassandra - Initial Pain Points • Can’t execute arbitrary queries • Filtering, sorting, etc. • Can’t be abused as an OLAP database • Worries about ‘eventual’ consistency 21
  22. 22. Gotchas • Lots of truly ad-hoc queries is hard • Don’t use C* directly to explore your data. (Spark?) • Sorting, filtering can be hard • Consider Solr / ElasticSearch • Or even MySQL depending on load / importance 22
  23. 23. Helpful Things • Data modeling consulting • Monitoring • Data access layer for common use cases 23
  24. 24. 24
  25. 25. 25
  26. 26. Java Drivers
  27. 27. Best Practices • Driver Choice • Cluster / Connection Setup • Executing Queries 27
  28. 28. 28 Datastax Java Drivers
  29. 29. 29 public class Scratch { static Cluster cluster; public static void main(String args[]) { cluster = Cluster.builder() .addContactPoint("cassandra") .build(); readRow("asset:QoMqLLyCEeSOi3paAormVw"); cluster.close(); } static void readRow(String id) { Session session = cluster.connect("asset"); ResultSet result = session.execute( "SELECT * from asset_kvs_timestamp where part_key = ?", id); System.out.println(result.one()); session.close(); } }
  30. 30. 30 cluster = Cluster.builder() .addContactPoint("cassandra") .build();
  31. 31. 31 LoadBalancingPolicy policy = new TokenAwarePolicy( new DCAwareRoundRobinPolicy()); cluster = Cluster.builder() .addContactPoint(“cassandra") .withLoadBalancingPolicy(policy) .build();
  32. 32. 32 cluster = Cluster.builder() .addContactPoint(“cassandra") .withLoadBalancingPolicy(policy) .withRetryPolicy(retryPolicy) .build();
  33. 33. Default Retry Policy • Retries read if enough replicas alive, but data fetch failed. • Retries write only for batched writes. • Retries next host on Unavailable. 2.0.11+ or 2.1.7 (JAVA-709) 33
  34. 34. Share Session! 34 public static void main(String args[]) { cluster = Cluster.builder() .addContactPoint(“cassandra”).build(); readRow("asset:QoMqLLyCEeSOi3paAormVw"); readRow("asset:7i2ClbKnEeSk_npaAormVw"); readRow("asset:KS1vywpGEeWKtzoMw4q1xg"); cluster.close(); } static void readRow(String id) { Session session = cluster.connect("asset"); ResultSet result = session.execute( "SELECT * from asset_kvs_timestamp where part_key = ?", id); System.out.println(result.one()); session.close(); }
  35. 35. 35 public static void main(String args[]) { cluster = Cluster.builder() .addContactPoint("cassandra").build(); session = cluster.connect(); readRow("asset:QoMqLLyCEeSOi3paAormVw"); readRow("asset:7i2ClbKnEeSk_npaAormVw"); readRow("asset:KS1vywpGEeWKtzoMw4q1xg"); session.close(); cluster.close(); } static void readRow(String id) { ResultSet result = session.execute( "SELECT * from asset.asset_kvs_timestamp where part_key = ?", id); System.out.println(result.one()); }
  36. 36. Use prepared statements • If doing query more than once • Better performance • Token aware routing 36
  37. 37. 37 static PreparedStatement statement; public static void main(String args[]) { … session = cluster.connect(); statement = session.prepare( "SELECT * from asset.asset_kvs_timestamp where part_key = ?") readRow("asset:QoMqLLyCEeSOi3paAormVw"); … } static void readRow(String id) { BoundStatement bound = statement.bind().setString("part_key", id); ResultSet result = session.execute(bound); System.out.println(result.one()); }
  38. 38. There Be Dragons.. JAVA-420 statement = session.prepare( "SELECT part_key, time_key, content from asset.asset_kvs_timestamp where part_key = ?") 38 Always specify columns explicitly for prepared statements!
  39. 39. Consider Async static List<String> readRows(List<String> ids) { return ids.stream().map(id -> { BoundStatement bound = statement.bind().setString("part_key", id); ResultSet result = session.execute(bound); return result.one().getString("c_enc"); }).collect(Collectors.toList()); } 39
  40. 40. Async.. static ListenableFuture<List<String>> readRowsAsync(List<String> ids) { List<ListenableFuture<String>> futures = ids.stream().map(id -> { BoundStatement bound = statement.bind().setString("part_key", id); ResultSetFuture future = session.executeAsync(bound); return Futures.transform(future, (ResultSet result) -> 
 result.one().getString(“c_enc")); }).collect(Collectors.toList()); return Futures.allAsList(futures); } 40 http://www.datastax.com/dev/blog/java-driver-async-queries
  41. 41. Thank you
  42. 42. Cassandra Summit 2016 
 September 7-9 
 San Jose, CA Get 15% Off with Code: MeetupPromo
 Cassandrasummit.org

×