2. Agenda• Lightning talks / community announcements• Main session• Bier @ Feierabend - 422 Yale Ave North• Hashtags #Seattle #Hadoop
3. GigaSpaces:• DeWaynes talk will cover the joining of a real- time service/data fabric with NOSQL big data to create a complete linearly scalable solution supporting analytics, complex event processing, and reporting in both real-time and batch domains.
4. Expedia (Cassandra):• Todds session: Expedia needs the ability to search by price in a fast and efficient manner. Prices are complex objects containing base rate, taxes, fees, etc which means a calculation is required to determine the customer price. This makes searching by price difficult. What to do?
5. Building An Elastic Real Time NoSQL Platform Creating a platform for unlimited elastic computation power and storage
6. Motivation• Complete elastic solution stack• Applications that need massive “strategic” storage (disk-based NoSQL) and a real time (“tactical”) component• Horizontally and vertically scalable• Highly available• Self healing• Fault tolerant: suitable for commodity h/w strategy• Simplified management and monitoring, vs conventional, multi-product solutions® Copyright 2011 GigaspacesLtd. All Rights Reserved
7. What Is Real-Time?• In this context, means “really fast”.• Reads as low as 5 μs and typically under 1 ms for a fully replicated write. Source: http://blog.gigaspaces.com/2010/12/06/possible-impossibility-the-race-to-zero-latency/® Copyright 2011 GigaspacesLtd. All Rights Reserved
8. Two Layer Approach• Advantage: Minimal Raw Event Stream Raw Event Stream Raw Event Stream ts ents “impedance mismatch” en Real Time Ev Real Time Ev between layers. – Both NoSQL cluster technologies, with similar advantages SCALE• Grid layer serves as an in Reporting Engine In Memory Compute Cluster memory cache for interactive Raw And Derived Events requests.• Grid layer serves as a real time ... SCALE computation fabric for CEP, and NoSQL Cluster limited ( to allocated memory) real time map/reduce capability. ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
9. Two Layer Approach (continued)• Grid layer doing CEP can act as a filter, as many raw events get converted to semantic/business events, reducing meaningless data verbosity• Grid layer provides scalable messaging• NoSQL layer provides unlimited cheap storage on commodity hardware• NoSQL layer provides virtually unlimited scale processing power® Copyright 2011 GigaspacesLtd. All Rights Reserved
10. Basics Of In Memory DataGrid Technology • An In Memory Data Grid (IMDG) is a data store • Grid just means “cluster” • Data can be partitioned across cluster nodes • Processing power near data storage • Distributed hash table • Application optimized data model denormalization • Nodes are typically configured with one or more replicas (sound familiar yet)? • Not a “cache”: a system of record, but can be used as a cache, or both® Copyright 2011 GigaspacesLtd. All Rights Reserved
11. Advanced Capabilities• Business logic (code) co-resident with data shards• Scalable messaging• Dynamic code execution across cluster• Multi-language support• Object-oriented• Document-oriented/schema free• Multi-level indexing• SQL Queries• Full ACID transaction support• Elastic scaling (automatic and manual)• Write-behind persistence® Copyright 2011 GigaspacesLtd. All Rights Reserved
12. Features: IMDG vs NoSQL Disk Based Data Grid NoSQL Low Latency Eventual/Tunable Horizontally Scalable Consistency Code co-location Service remoting Parallel Execution Unlimited scale Fault Tolerant Cloud enabled Hadoop toolsTransactional Highly Available Elastic Messaging Platform Independent Complex Event Processing Flexible Schema ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
13. Vive La Difference• The IMDG compliments a NoSQL store: – Can serve as a short term request cache (side cache or inline) – Can serve as a cache for MR results – Enables event driven architectures / CEP – In memory map/reduce – Very fast writes, regardless of NoSQL store – Transactional layer: can essentially turn “eventual” consistency into pure transactional persistency without a performance hit – Highly available and independently scalable® Copyright 2011 GigaspacesLtd. All Rights Reserved
14. A Complete Scalable Application Platform Raw Event Stream Raw Event Stream Raw Event Stream ts vents n Real Time Eve Real Time E SCALE Reporting Engine In Memory Compute Cluster Raw And Derived Events ... SCALE NoSQL Cluster® Copyright 2011 GigaspacesLtd. All Rights Reserved
15. Key Implementation Issues• Grid must support reliable asynchronous persistence – If not reliable: in-flight data is at risk. Ideally tunable to accommodate differing risk tolerance. – If not asynchronous: too slow – If not persistent: obviously nothing gets send to disk• To do more than a distributed cache, grid must support code and data partitioning – Ideally, code is collocated in memory with data partition – Needed to support CEP, application, and service remoting capabilities® Copyright 2011 GigaspacesLtd. All Rights Reserved
16. Key Implementation Issues• Grid ideally supports FIFO entry ordering – Key to using grid as a queue – Key to scaling messaging without an additional tier – Combined with co-located business logic, operates at memory speeds• Write speed on the NoSQLlayer – Grid is, in effect, queuing entries to the NoSQL layer – If the NoSQL layer cannot keep up, in memory grid backs up – This behavior is an asset, unless an unanticipated, sustained flood occurs. – The faster the write speed the better® Copyright 2011 GigaspacesLtd. All Rights Reserved
17. Use Case 1 – Event Cloud• Complex event processing Collect events in real time Transform into decision factors •Interactions •Good customer •Orders •Pays 3-6 days early •Bills •Decreasing usage •Payments •Missed payment •Activations •Unusual bill •… •App usage Original events, possibly scrubbed or annotated, are passed through Business logic derived “synthetic events” constructed from raw event stream. Possible rule engine integration(e.g. Drools). Derived events and analytics passed on to NoSQL layer Other events forwarded to external listeners, systems® Copyright 2011 GigaspacesLtd. All Rights Reserved
18. Use Case 2 – Time Bounded• Time Bounded – suited to operations with daily business cycle (e.g. trading)• Current day (or other time period that will fit in memory) held in memory, along with related application state, caching etc…• Still streaming operations to underlying NoSQL platform, or hold for end of day flush if back end can’t write fast enough.• Supports application hosting, messaging, and complex event processing.• External clients are aware of “current day” store, vs archival.• Large scale reports/analytics run in background on NoSQL archive.® Copyright 2011 GigaspacesLtd. All Rights Reserved
19. Use Case 3 - LRU• Grid holds a subset of NoSQL store, and supports an LRU caching model.• In line or side-cache.• Appropriate only in cases where, like any cache, usage pattern does not generate many cache misses.• Still supports CEP, messaging, and computation scaling (provided grid product supports it).® Copyright 2011 GigaspacesLtd. All Rights Reserved
20. Wishlist• This platform concept is still at an early stage• For Gigaspaces, integrations already exist for Cassandra and MongoDB.• Customers are currently implementing solutions• Stuff I’d like to see: – Unified management and scaling. Shared infrastructure. – Grid/NoSQL aware hive façade that can run MR jobs on both. Perhaps other Hadoop tools integration – Deeper integration. To further optimize write speed/capacity, and perhaps offload some in-memory aspects of underlying NoSQL platform to minimize duplication and possibly optimize elasticity.® Copyright 2011 GigaspacesLtd. All Rights Reserved
21. Conclusion• Two shared nothing “NoSQL” architectures complementing each other• Fully elastic/scalable• Ultra high performance/low latency combined with unlimited scale.• Full application stack• Highly reliable and self-healing• Scalable complex event handling• Multi-language• Simple. Two products.® Copyright 2011 GigaspacesLtd. All Rights Reserved
22. DataStax is the company behind Apache Cassandra. Besidescontributing the majority of the code for the open sourceproject, DataStax also provides products and services forApache Cassandra DataStax Community – the 100% free way to get started with Apache Cassandra (free management software and packaging!) DataStax Enterprise –Hadoop Analytics and Support! Download + Docs at http://www.datastax.com/dev
23. Expedia Hotel Price Cache B. Todd Burruss
24. Who am I• B. Todd Burruss – Sr Architect, Expedia• Worked with Cassandra for nearly 2 years• Committer on Hector (Java Client)• General testing on Cassandra, working with community, but not committer
25. Expedia’s MotivationWe need the ability to search by price in a fastand efficient manner. Prices are complexobjects containing base rate, taxes, fees, etcwhich means a calculation is required todetermine the customer price. This makessearching by price difficult. What to do?
26. What to Do?• Precalculate Total Price! Let’s look at Hotels• Hotel prices vary based on Date, Length of Stay (LOS), Number Adult Travelers (AT)• Customers book in advance so must have prices fairly far into the future (1 year)• Approximately 140,000 hotels in our inventory• Support 1-14 LOS and 1-4 AT• Over 2 billion prices!
27. Example of Hotel Pricing• A customer’s family of 4 wants to stay 7 nights at the Hilton in Maui, checkin on 12/1/2011, checkout on 12/8/2011• Each night could be a different rate because of day of week, conference in the area, holiday, etc.• So must sum the rate, taxes and fees for each night to get the total room price
28. Use Case : Median Price• Ex: What is the median hotel price in Seattle for each day between 11/1 and 11/30?• 200 * 30 = 6,000 prices returned from Cassandra – median calculated on client.• Idea is customer searches city and date range, then narrows search to smaller area and dates• Prices are volatile, so want close to real-time updates
29. Enter Cassandra : Expectations• Cassandra can handle large amounts of data nicely : billions of price objects• Cassandra is very fast (read and write.) Can handle the volatile prices• Cluster expands easily – our dataset is growing• Easy to setup, administer and use• Operational costs are good• Support is available
30. Solution : Data Model• 1 ColumnFamily : Prices• Row key : date + LOS + AT• Column name : Hotel ID – 140,000 columns (integer comparator)• Column value : precalculated hotel price for date + LOS + AT• 365 * 14 * 4 = 20,440 row keys• 20,440 * 140,000 = 2,861,600,000 price objects
31. Solution : Retrieving Prices• Generate keys for each checkin day, LOS, AT combination wanted• Query Cassandra using the generated keys, using specific column names (hotel IDs)• For family example, one key, one column = 12/1/2011 + 7 + 4 = total price for hilton hotel• For median example, 30 keys, 200 columns per key. Client receives 30 result rows, then calculates median for each row
32. Testing Scenario• Found 19 boxes, 16gb old RAM, 1 old 4 core CPU• 18th and 19th boxes are clients + Cassandra servers (don’t do this in prod)• Can never find enough hardware :)• 2 Keyspaces on cluster• Reduce dataset to 90 days, up to 7 day LOS, up to 4 AT, 70k hotels – removes disk I/O from test and leaves some RAM for caching• We believe our hot data will be in RAM• Query 30 days and 200 hotels : 6000 price objects
33. Results : Page 1Default Memory and Column Index Settings• ~50ms : -Xmn400m, 64k index, 8gb, no row cache, no key cache : pretty good• ~800ms : -Xmn400m, 64k index, 8gb, no key cache, 600 row cache (Serializing) – copying to heap accounts for slowness
34. Results : Page 2Change Index to 1k Column Pages:• ~45ms : -Xmn400m, 1k index, 8gb, no row cache, no key cache• ~29ms : -Xmn400m, 1k index, 8gb, 600 row cache (ConcurrentLinkedHash)1k index saves a little, but data is all in RAM.Bigger savings when hitting disk
35. Results : Page 3Tune Memory• ~45ms : -Xmn200m, 1k index, 8gb, no row cache, no key cache• ~29ms : -Xmn200m, 1k index, 8gb, 600 row cache (ConcurrentLinkedHash)Increasing Old Gen will not help because all data fitsin RAM, based on reported JVM usage. ReducingNew Gen moves from less frequent long pauses tomore frequent short pauses. No help.
36. Take Away• Test is worst case scenario. Completely random usage pattern – which is rarely (if ever) the case in production. Causes cache churn if cache is too small• Wide rows are not always bad. Access columns sequentially or by range is very good (e.g. time series data)• Serializing cache has trade off between serialized objects and copying to/from off heap storage
37. References• Query plan description by Aaron Morton : http://thelastpickle.com/2011/07/04/Cassand ra-Query-Plans/• Disk sizing by B. Todd Burruss (me): http://btoddb-cass-storage.blogspot.com/