Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Multi criteria queries on a cassandra application

2,662 views

Published on

You shouldn't normally do multi-criteria queries with Cassandra.

We will detail the technical challenges and limitations involved, and how we successfully implemented such a solution for one of our biggest clients, using Java 8 and the CQL 3 driver.

This session will also focus on the tools we used (JHipster), our Java application framework (Spring Boot), and the cluster configuration we set up.

Jerome Mainaud - Ippon Technologies
Jerome Mainaud is Java Software Architect at Ippon Technologies. It help its clients to create, deploy and run applications based on NoSQL Database and Search Indexes. He mainly works with Cassandra, Apache or DataStax flavor. Jerome is a certified DataStax Enterprise Solution Architect.

http://blog.ippon.fr/

Published in: Technology
  • Be the first to comment

Multi criteria queries on a cassandra application

  1. 1. Multi-criteria Queries on a Cassandra Application Jérôme Mainaud
  2. 2. Ippon Technologies © 2015#CassandraSummit Who am I Jérôme Mainaud ➔ @jxerome ➔ Software Architect at Ippon Technologies, Paris ➔ DataStax Solution Architect Certified
  3. 3. Ippon Technologies © 2015#CassandraSummit Ippon Technologies ● 200 software engineers in France and the US ➔ Paris, Nantes, Bordeaux ➔ Richmond (Virginia), Washington (DC) ● Expertise ➔ Digital, Big Data and Cloud ➔ Java & Agile ● Open-source Projects : ➔ JHipster, ➔ Tatami … ● @ipponusa
  4. 4. Agenda 1. Context 2. Technical Stack 3. Modelisation 4. Implementation 5. Results
  5. 5. Ippon Technologies © 2015 Warning The following slideshow features data patterns and code performed by professionals. Accordingly, Ippon and conference organisers must insist that no one attempt to recreate any data pattern and code performed in this slideshow.
  6. 6. Once Upon a time an app …
  7. 7. Ippon Technologies © 2015#CassandraSummit Once Upon a time an app … Invoice application in SAAS ➔ A single database for all users ➔ Data isolation for each user High volume data ➔ 1 year ➔ 500 millions invoices ➔ 2 billions invoice lines
  8. 8. Ippon Technologies © 2015#CassandraSummit Once Upon a time an app …
  9. 9. Ippon Technologies © 2015#CassandraSummit Once Upon a time an app …
  10. 10. Ippon Technologies © 2015#CassandraSummit Back-end evolution
  11. 11. Technical Stack
  12. 12. Ippon Technologies © 2015#CassandraSummit Technical Stack JHipster ➔ Spring Boot + AngularJS Application Generator ➔ Support JPA, MongoDB ➔ and now Cassandra! Made us generate first version very fast ➔ Application skeleton ready in 5 minutes ➔ Add entities tables, objets and mapping ➔ Configuration, build, logs management, etc. ➔ Gatling Tests ready to use http://jhipster.github.io
  13. 13. Ippon Technologies © 2015#CassandraSummit Technical Stack Spring Boot ➔ Build on Spring ➔ Convention over configuration ➔ Many “starters” ready to use Services Web ➔ CXF instead of Spring MVC REST Cassandra ➔ DataStax Enterprise Java 8
  14. 14. Ippon Technologies © 2015#CassandraSummit JHipster — Code generator ● But ➔ Cassandra was not yet supported ➔ No AngularJS nor frontend ➔ CXF instead of Spring MVC
  15. 15. Ippon Technologies © 2015#CassandraSummit JHipster — Code generator ● But ➔ Cassandra was not yet supported ➔ No AngularJS nor frontend ➔ CXF instead of Spring MVC ● JHipster alpha generator ➔ Secret Generator secret used to validate concepts before writing Yeoman generator
  16. 16. Ippon Technologies © 2015#CassandraSummit JHipster — Code generator Julien Dubois Code Generator
  17. 17. Ippon Technologies © 2015#CassandraSummit Cassandra Driver Configuration Spring Boot Configuration ➔ No integration of driver DataStax Java Driver in Spring Boot ➔ Created Spring Boot autoconfiguration of DataStax Java Driver ➔ Use the standard YAML File Offered to Spring Boot 1.3 ➔ Github ticket #2064 « Add a spring-boot-starter-data-cassandra » ➔ Still opened Improved by the Community ➔ JHipster version was improved by pull-request ➔ Authentication, Load-Balancer config
  18. 18. Data Model
  19. 19. Ippon Technologies © 2015#CassandraSummit Conceptual Model
  20. 20. Ippon Technologies © 2015#CassandraSummit Physical Model
  21. 21. Ippon Technologies © 2015#CassandraSummit create table invoice ( invoice_id timeuuid, user_id uuid static, firstname text static, lastname text static, invoice_date timestamp static, payment_date timestamp static, total_amount decimal static, delivery_address text static, delivery_city text static, delivery_zipcode text static, item_id timeuuid, item_label text, item_price decimal, item_qty int, item_total decimal, primary key (invoice_id, item_id) ); Table
  22. 22. Multi-criteria Search
  23. 23. Ippon Technologies © 2015#CassandraSummit Multi-criteria Search Mandatory Criteria ➔ User (implicit) ➔ Invoice date (range of dates) Additional Criteria ➔ Client lastname ➔ Client firstname ➔ City ➔ Zipcode Paginated Result
  24. 24. Ippon Technologies © 2015#CassandraSummit Shall we use Solr ?
  25. 25. Ippon Technologies © 2015#CassandraSummit Shall we use Solr ? ● Integrated in DataStax Enterprise ● Atomic and Automatic Index update ● Full-Text Search
  26. 26. Ippon Technologies © 2015#CassandraSummit Shall we use Solr ? ● We search on static columns ➔ Solr don’t support them ● We search partitions ➔ Solr search lines
  27. 27. Ippon Technologies © 2015#CassandraSummit Shall we use Solr ? ● We search on static columns ➔ Solr don’t support them ● We search partitions ➔ Solr search lines
  28. 28. Ippon Technologies © 2015#CassandraSummit Shall we use secondary indexes ? ● Only one index used for a query ● Hard to get good performance with them
  29. 29. Ippon Technologies © 2015#CassandraSummit Index Table Use index tables ➔ Partition Key : Mandatory criteria and one additional criterium ○ user_id ○ invoice day (truncated invoice date) ○ additional criterium ➔ Clustering columns : Invoice UUID
  30. 30. Ippon Technologies © 2015#CassandraSummit Index Table
  31. 31. Ippon Technologies © 2015#CassandraSummit Materialized view CREATE MATERIALIZED VIEW invoice_by_firstname AS SELECT invoice_id FROM invoice WHERE firstname IS NOT NULL PRIMARY KEY ((user_id, invoice_day, firstname), invoice_id) WITH CLUSTERING ORDER BY (invoice_id DESC) new in 3.0
  32. 32. Ippon Technologies © 2015#CassandraSummit Parallel Search on indexes in memory merge by application
  33. 33. Ippon Technologies © 2015#CassandraSummit Parallel item detail queries Result Page (id)
  34. 34. Ippon Technologies © 2015#CassandraSummit Search Search on date range ➔ loop an every days in the range and stop when there is enough result for a page
  35. 35. Ippon Technologies © 2015#CassandraSummit Search Complexity Query count ➔ For each day in date range ○ 1 query per additional criterium filled (partition by query) ➔ 1 query per item in result page (partition by query) Search Complexity ➔ partitions by query Example: 3 criteria, 7 days, 100 items per page ➔ query count ≤ 3 × 7 + 100 = 121
  36. 36. JAVA Indexes
  37. 37. Ippon Technologies © 2015#CassandraSummit Index — Instances @Repository public class InvoiceByLastNameRepository extends IndexRepository<String> { public InvoiceByLastNameRepository() { super("invoice_by_lastname", "lastname", Invoice::getLastName, Criteria::getLastName); } } @Repository public class InvoiceByFirstNameRepository extends IndexRepository<String> { public InvoiceByFirstNameRepository() { super("invoice_by_firstname", "firstname", Invoice::getFirstName, Criteria::getFirstName); } }
  38. 38. Ippon Technologies © 2015#CassandraSummit Index — Parent Class public class IndexRepository<T> { @Inject private Session session; private final String tableName; private final String valueName; private final Function<Invoice, T> valueGetter; private final Function<Criteria, T> criteriumGetter; private PreparedStatement insertStmt; private PreparedStatement findStmt; private PreparedStatement findWithOffsetStmt; @PostConstruct public void init() { /* initialize PreparedStatements */ }
  39. 39. Ippon Technologies © 2015#CassandraSummit Index — Insert @Override public void insert(Invoice invoice) { T value = valueGetter.apply(invoice); if (value != null) { session.execute( insertStmt.bind( invoice.getUserId(), Dates.toDate(invoice.getInvoiceDay()), value, invoice.getId())); } }
  40. 40. Ippon Technologies © 2015#CassandraSummit Index — Insert — Prepare Statement insertStmt = session.prepare( QueryBuilder.insertInto(tableName) .value("user_id", bindMarker()) .value("invoice_day", bindMarker()) .value(valueName, bindMarker()) .value("invoice_id", bindMarker()) );
  41. 41. Ippon Technologies © 2015#CassandraSummit Index — Insert — Date conversion public static Date toDate(LocalDate date) { return date == null ? null : Date.from(date.atStartOfDay().atZone(ZoneOffset.systemDefault()).toInstant()); }
  42. 42. Ippon Technologies © 2015#CassandraSummit Index — Search @Override public CompletableFuture<Iterator<UUID>> find(Criteria criteria, LocalDate day, UUID offset) { T criterium = criteriumGetter.apply(criteria); if (criterium == null) { return CompletableFuture.completedFuture(null); } BoundStatement stmt; if (invoiceIdOffset == null) { stmt = findStmt.bind(criteria.getUserId(), Dates.toDate(day), criterium); } else { stmt = findWithOffsetStmt.bind(criteria.getUserId(), Dates.toDate(day), criterium, offset); } return Jdk8.completableFuture(session.executeAsync(stmt)) .thenApply(rs -> Iterators.transform(rs.iterator(), row -> row.getUUID(0))); }
  43. 43. Ippon Technologies © 2015#CassandraSummit Index — Search — Prepare Statement findWithOffsetStmt = session.prepare( QueryBuilder.select() .column("invoice_id") .from(tableName) .where(eq("user_id", bindMarker())) .and(eq("invoice_day", bindMarker())) .and(eq(valueName, bindMarker())) .and(lte("invoice_id", bindMarker())) );
  44. 44. Ippon Technologies © 2015#CassandraSummit Index — Search (Guava to Java 8) public static <T> CompletableFuture<T> completableFuture(ListenableFuture<T> guavaFuture) { CompletableFuture<T> future = new CompletableFuture<>(); Futures.addCallback(guavaFuture, new FutureCallback<T>() { @Override public void onSuccess(T result) { future.complete(result); } @Override public void onFailure(Throwable t) { future.completeExceptionally(t); } }); return future; }
  45. 45. JAVA Search Service
  46. 46. Ippon Technologies © 2015#CassandraSummit Service — Class @Service public class InvoiceSearchService { @Inject private InvoiceRepository invoiceRepository; @Inject private InvoiceByDayRepository byDayRepository; @Inject private InvoiceByLastNameRepository byLastNameRepository; @Inject private InvoiceByFirstNameRepository byLastNameRepository; @Inject private InvoiceByCityRepository byCityRepository; @Inject private InvoiceByZipCodeRepository byZipCodeRepository;
  47. 47. Ippon Technologies © 2015#CassandraSummit Service — Search public ResultPage findByCriteria(Criteria criteria) { return byDateInteval(criteria, (crit, day, offset) -> { CompletableFuture<Iterator<UUID>> futureUuidIt; if (crit.hasIndexedCriteria()) { /* * ... Doing multi-criteria search; see next slide ... */ } else { futureUuidIt = byDayRepository.find(crit.getUserId(), day, offset); } return futureUuidIt; }); }
  48. 48. Ippon Technologies © 2015#CassandraSummit Service — Search CompletableFuture<Iterator<UUID>>[] futures = Stream.<IndexRepository> of( byLastNameRepository, byFirstNameRepository, byCityRepository, byZipCodeRepository) .map(repo -> repo.find(crit, day, offset)) .toArray(CompletableFuture[]::new); futureUuidIt = CompletableFuture.allOf(futures).thenApply(v -> Iterators.intersection(TimeUUIDComparator.desc, Stream.of(futures) .map(CompletableFuture::join) .filter(Objects::nonNull) .collect(Collectors.toList())));
  49. 49. Ippon Technologies © 2015#CassandraSummit Service — UUIDs Comparator /** * TimeUUID Comparator equivalent to Cassandra’s Comparator: * @see org.apache.cassandra.db.marshal.TimeUUIDType#compare() */ public enum TimeUUIDComparator implements Comparator<UUID> { desc { @Override public int compare(UUID o1, UUID o2) { long delta = o2.timestamp() - o1.timestamp(); if (delta != 0) return Ints.saturatedCast(delta); return o2.compareTo(o1); } }; }
  50. 50. Ippon Technologies © 2015#CassandraSummit Service — Days Loop @FunctionalInterface private static interface DayQuery { CompletableFuture<Iterator<UUID>> find(Criteria criteria, LocalDate day, UUID invoiceIdOffset); } private ResultPage byDateInteval(Criteria criteria, DayQuery dayQuery) { int limit = criteria.getLimit(); List<Invoice> resultList = new ArrayList<>(limit); LocalDate dayOffset = criteria.getDayOffset(); UUID invoiceIdOffset = criteria.getInvoiceIdOffset(); /* ... Loop on days ; to be seen in next slide ... */ return new ResultPage(resultList); }
  51. 51. Ippon Technologies © 2015#CassandraSummit Service — Days Loop LocalDate day = criteria.getLastDay(); do { Iterator<UUID> uuidIt = dayQuery.find(criteria, day, invoiceIdOffset).join(); limit -= loadInvoices(resultList, uuidIt, criteria, limit); if (uuidIt.hasNext()) { return new ResultPage(resultList, day, uuidIt.next()); } day = day.minusDays(1); invoiceIdOffset = null; } while (!day.isBefore(criteria.getFirstDay()));
  52. 52. Ippon Technologies © 2015#CassandraSummit Service — Invoices Loading private int loadInvoices(List<Invoice> resultList, Iterator<UUID> uuidIt, int limit) { List<CompletableFuture<Invoice>> futureList = new ArrayList<>(limit); for (int i = 0; i < limit && uuidIt.hasNext(); ++i) { futureList.add(invoiceRepository.findOne(uuidIt.next())); } futureList.stream() .map(CompletableFuture::join) .forEach(resultList::add); return futureList.size(); }
  53. 53. Results
  54. 54. Ippon Technologies © 2015#CassandraSummit Limits ● We got an exact-match search ➔ No full text search ➔ No « start with » search ➔ No pattern base search ● Requires highly discriminating mandatory criteria ➔ user_id & invoice_day ● Pagination doesn’t give total item count ➔ Could be done with additionnal query cost ● No sort availaible
  55. 55. Ippon Technologies © 2015#CassandraSummit Hardware ● Hosted by Ippon Hosting ● 8 nodes ➔ 16 Gb RAM ➔ Two SSD drives with 256 Gb in RAID 0 ● 6 nodes dedicated to Cassandra cluster ● 2 nodes dedicated to the application
  56. 56. Ippon Technologies © 2015#CassandraSummit Application ● 5,000 concurrent users ● 9 months of data loaded ➔ Legacy system: store 1 year; search on last 3 months. ➔ Target: 3 years of history ● Real-time search Result ➔ Data are immediately available ➔ Legacy system: data available next day ● Cost Killer
  57. 57. Q & A
  58. 58. PARIS BORDEAUX NANTES WASHINGTON NEW-YORK RICHMOND contact@ippon.fr www.ippon.fr - www.ippon-hosting.com - www.ippon-digital.fr @ippontech - 01 46 12 48 48

×