Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SnappyData overview NikeTechTalk 11/19/15

1,719 views

Published on

This is a slide deck that was used for our 11/19/15 Nike Tech Talk to give a detailed overview of the SnappyData technology vision. The slides were presented by Jags Ramnarayan, Co-Founder & CTO of SnappyData

Published in: Technology
  • Be the first to comment

SnappyData overview NikeTechTalk 11/19/15

  1. 1. SnappyData Getting Spark ready for real-time, operational analytics www.snappydata.io Jags Ramnarayan jramnarayan@snappydata.io Co-founder SnappyData Nov 2015
  2. 2. SnappyData - an EMC/Pivotal spin out ● New Spark-based open source project started by Pivotal GemFire founders+engineers ● Decades of in-memory data management experience ● Focus on real-time, operational analytics: Spark inside an OLTP+OLAP database www.snappydata.io
  3. 3. Lambda Architecture (LA) for Analytics
  4. 4. Perspective on LA for real time In-Memory DB Interactive queries, updates Deep Scale, High volume MPP DB Transform Data-in-motion Analytics Application Streams Alerts
  5. 5. Use case: Telemetry Revenue Generation Real-time Location based Mobile Advertising (B2B2C) Location Based Services (B2C, B2B, B2B2C) Revenue Protection Customer experience management to reduce churn Customers Sentiment analysis Network Efficiency Network bandwidth optimisation Network signalling maximisation • Network optimization – E.g. re-reroute call to another cell tower if congestion detected • Location based Ads – Match incoming event to Subscriber profile; If ‘Opt-in’ show location sensitive Ad • Challenge: Too much streaming data – Many subscribers, lots of 2G/3G/4G voice/data – Network events: location events, CDRs, network issues
  6. 6. Challenge - Keeping up with streams In-Memory DB Interactive queries, updates Deep Scale, High volume MPP DB Transform Data-in-motion Analytics Application Streams Alerts • Millions of events/sec • HA – Continuously Ingest • Cannot throttle the stream • Diverse formats
  7. 7. Challenge - Transform is expensive In-Memory DB Interactive queries, updates Deep Scale, High volume MPP DB Transform Data-in-motion Analytics Application Streams Alerts • Filter, Normalize, transform • Need reference data to normalize – point lookups Reference DB (Enterprise Oracle, …)
  8. 8. Challenge - Stream joins, correlations In-Memory DB Interactive queries, updates Deep Scale, High volume MPP DB Transform Data-in-motion Analytics Application Streams Alerts Analyze over time window ● Simple rules - (CallDroppedCount > threshold) then alert ● Or, Complex (OLAP like query) ● TopK, Trending, Join with reference data, correlate with history How do you keep up with OLAP style analytics with millions of events in window and billions of records in ref data?
  9. 9. Challenge - State management In-Memory DB Interactive queries, updates Deep Scale, High volume MPP DB Transform Data-in-motion Analytics Application Streams Alerts Manage generated state ● Mutating state: millions of counters ● “Once and only once” ● Consistency across distributed system ● State HA
  10. 10. Challenge - Interactive Query speed In-Memory DB Interactive queries, updates Deep Scale, High volume MPP DB Transform Data-in-motion Analytics Application Streams Alerts Interactive queries - OLAP style queries - High concurrency - Low response time
  11. 11. Today: queue -> process -> NoSQL Messaging cluster adds extra hops, management No distributed, HA Data store Streaming joins, or with external state is slow and not scalable in many cases
  12. 12. SnappyData: A new approach Single unified HA cluster: OLTP + OLAP + Stream for real-time analytics Batch design, high throughput Real-time design center - Low latency, HA, concurrent Vision: Drastically reduce the cost and complexity in modern big data
  13. 13. SnappyData: A new approach Single unified HA cluster: OLTP + OLAP + Stream for real-time analytics Batch design, high throughput Real time operational Analytics – TBs in memory RDB Rows Txn Columnar API Stream processing ODBC, JDBC, REST Spark - Scala, Java, Python, R HDFS AQP First commercial project on Approximate Query Processing(AQP) MPP DB Index
  14. 14. Why columnar storage?
  15. 15. Why Spark? ● Blends streaming, interactive, and batch analytics ● Appeals to Java, R, Python, Scala folks ● Succinct programs ● Rich set of transformations and libraries ● RDD and fault tolerance without replication ● Stream processing with high throughput
  16. 16. Spark Myths ● It is a distributed in-memory database ○ It’s a computational framework with immutable caching ● It is Highly Available ○ Fault tolerance is not the same as HA ● Well suited for real time, operational environments ○ Does not handle concurrency well
  17. 17. Common Spark Streaming Architecture Driver Executor – spark engine RDD Partition @t0 RDD Partition @t2 RDD Partition @t1 time Executor – spark engine RDD Partition @t0 RDD Partition @t2 RDD Partition @t1 time cassandra Kafka queue Client submits stream App Queue is buffered in executor. Driver submits batch job every second. This results in a new RDD pushed to stream(batch from buffer) Short term immutable state. Long term – In external DB
  18. 18. Challenge: Spark driver not HA Driver Executor – spark engine Executor – spark engine Client submits stream App If Driver fails – Executors automatically exit All CACHED STATE HAS TO BE RE_HYDRATED
  19. 19. Challenge: Sharing state DriverClient1 Executor • Spark designed for total isolation across client apps • Sharing state across clients requires external DB/Tachyon Executor DriverClient2 Executor Executor
  20. 20. Challenge: External state management Driver Executor – spark engine RDD Partition @t0 RDD Partition @t2 RDD Partition @t1 time time cassandra Kafka queue Client submits stream App Key based access might keep up But, Joins, analytic operators is a problem. Serialization, copying costs are too high, esp in JVMs newDStream = wordDstream.updateStateByKey[Int] (func) - Spark capability to update state as batches arrive requires full iteration over RDD
  21. 21. Challenge: “Once and only once” = hard Executor Executor Recovered partition cassandra X = 10 X = 20 X = 30 X = X+10 X = X+10 OK
  22. 22. Challenge: Always on Driver Executor – spark engine RDD Partition @t0 RDD Partition @t2 RDD Partition @t1 time Executor – spark engine RDD Partition @t0 RDD Partition @t2 RDD Partition @t1 time Kafka queue Client submits stream App HA requirement : If something fails, there is always a redundant copy that is fully in sync. Failover is instantaneous Fault tolerance in Spark: Recover state from the original source or checkpoint by tracking lineage. Can take too long.
  23. 23. Challenge: Concurrent queries too slow SELECT SUBSTR(sourceIP, 1, X), SUM(adRevenue) FROM uservisits GROUP BY SUBSTR(sourceIP, 1, X) Berkeley AMPLab Big Data Benchmark -- AWS m2.4xlarge ; total of 342 GB
  24. 24. SnappyData: P2P cluster w/ consensus Data Server JVM1 Data Server JVM2 Data Server JVM3 ● Cluster elects a coordinator ● Consistent views across members ● Virtual synchrony across members ● WHY? Strong consistency during replication, failure detection is accurate and fast
  25. 25. Colocated row/column Tables in Spark Row Table Column Table Spark Executor TASK Spark Block Manager Stream processing Row Table Column Table Spark Executor TASK Spark Block Manager Stream processing Row Table Column Table Spark Executor TASK Spark Block Manager Stream processing ● Spark Executors are long lived and shared across multiple apps ● Gem Memory Mgr and Spark Block Mgr integrated
  26. 26. Table can be partitioned or replicated Replicated Table Partitioned Table (Buckets A-H) Replicated Table Partitioned Table (Buckets I-P) consistent replica on each node Partition Replica (Buckets A-H) Replicated Table Partitioned Table (Buckets Q-W)Partition Replica (Buckets I-P) Data partitioned with one or more replicas
  27. 27. Linearly scale with shared partitions Spark Executor Spark Executor Kafka queue Subscriber N-Z Subscriber A-M Subscriber A-M Ref data Linearly scale with partition pruning Input queue, Stream, IMDB, Output queue all share the same partitioning strategy
  28. 28. Point access, updates, fast writes ● Row tables with PKs are distributed HashMaps ○ with secondary indexes ● Support for transactional semantics ○ read_committed, repeatable_read ● Support for scalable high write rates ○ streaming data goes through stages ○ queue streams, intermediate storage (Delta row buffer), immutable compressed columns
  29. 29. Full Spark Compatibility ● Any table is also visible as a DataFrame ● Any RDD[T]/DataFrame can be stored in SnappyData tables ● Tables appear like any JDBC sourced table ○ But, in executor memory by default ● Addtional API for updates, inserts, deletes //Save a dataFrame using the spark context … context.createExternalTable(”T1", "ROW", myDataFrame.schema, props ); //save using DataFrame API dataDF.write.format("ROW").mode(SaveMode.Append).options(props).saveAsTable(”T1");
  30. 30. Extends Spark CREATE [Temporary] TABLE [IF NOT EXISTS] table_name ( <column definition> ) USING ‘JDBC | ROW | COLUMN ’ OPTIONS ( COLOCATE_WITH 'table_name', // Default none PARTITION_BY 'PRIMARY KEY | column name', // will be a replicated table, by default REDUNDANCY '1' , // Manage HA PERSISTENT "DISKSTORE_NAME ASYNCHRONOUS | SYNCHRONOUS", // Empty string will map to default disk store. OFFHEAP "true | false" EVICTION_BY "MEMSIZE 200 | COUNT 200 | HEAPPERCENT", ….. [AS select_statement];
  31. 31. Key feature: Synopses Data ● Maintain stratified samples ○ Intelligent sampling to keep error bounds low ● Probabilistic data ○ TopK for time series (using time aggregation CMS, item aggregation) ○ Histograms, HyperLogLog, Bloom Filters, Wavelets CREATE SAMPLE TABLE sample-table-name USING columnar OPTIONS ( BASETABLE ‘table_name’ // source column table or stream table [ SAMPLINGMETHOD "stratified | uniform" ] STRATA name ( QCS (“comma-separated-column-names”) [ FRACTION “frac” ] ),+ // one or more QCS
  32. 32. Stratified Sampling Spark Demo www.snappydata.io
  33. 33. Driver HA, JobServer for interactive jobs ● REST based JobServer for sharing a single Context across clients ○ clients use REST to execute streaming jobs, queries, DML ○ secondary JobServer for HA ○ primary election using Gem clustering ● Native SnappyData cluster manager for long running executors ○ makes resources (executors) long running ○ resuse same executors across apps, jobs ● Low latency scheduling that skips the Spark driver altogether
  34. 34. Unified OLAP/OLTP streaming w/ Spark ● Far fewer resources: TB problem becomes GB. ○ CPU contention drops ● Far less complex ○ single cluster for stream ingestion, continuous queries, interactive queries and machine learning ● Much faster ○ compressed data managed in distributed memory in columnar form reduces volume and is much more responsive
  35. 35. www.snappydata.io SnappyData is Open Source ● Beta will be on github before December. We are looking for contributors! ● Learn more & register for beta: www.snappydata.io ● Connect: ○ twitter: www.twitter.com/snappydata ○ facebook: www.facebook.com/snappydata ○ linkedin: www.linkedin.com/snappydata ○ slack: http://snappydata-slackin.herokuapp.com ○ IRC: irc.freenode.net #snappydata
  36. 36. Extras www.snappydata.io
  37. 37. OLAP/OLTP with Synopses CQ Subscriptions OLAP Query Engine Micro Batch Processing Module (Plugins) Sliding Window Emits Batches [ ] User Applications processing Events & Issuing Interactive Queries Summary DB ▪ Time Series with decay ▪ TopK, Frequency Summary Structures ▪ Counters ▪ Histograms ▪ Stratified Samples ▪ Raw Data Windows Exact DB (Row + column oriented)
  38. 38. Not pancea, but comes close ● Synopses require prior workload knowledge ● Not all queries … complex queries will result in high error rates ○ single cluster for stream ingestion and analytics queries (both streaming and interactive) ● Our strategy - be adjunct to MPP databases... ○ first compute the error estimate; if error is above tolerance delegate to exact store
  39. 39. Adjunct store in certain scenarios
  40. 40. Speed/Accuracy tradeoffError 30 mins Time to Execute on Entire Dataset Interactive Queries 2 sec Execution Time (Sample Size) 41
  41. 41. Stratified Sampling ● Random sampling has intuitive semantics ● However, data is typically skewed and our queries are multi- dimensional ○ avg sales order price for each product class for each geography ○ some products may have little to no sales ○ stratification ensures that each “group” (product class) is represented
  42. 42. Stratified Sampling Challenges ● Solutions exist for batch data (BlinkDB) ● Needs to work for infinite streams of data ○ Answer: use combination of Stratified with other techniques like Bernouli/reservoir sampling ○ Exponentially decay over time
  43. 43. Dealing with errors and latency ● Well known error techniques for “closed form aggregations” ● Exploring other techniques -- Analytical Bootstrap ● User can specify error bound with confidence interval SELECT avg(sessionTime) FROM Table WHERE city=‘San Francisco’ ERROR 0.1 CONFIDENCE 95.0% ● Engine would determine if it can satisfy error bound first ● If not, delegate execution to an “exact” store (GPDB, etc) ● Query execution can also be latency bounded
  44. 44. Sketching techniques ● Sampling not effective for outlier detection ○ MAX/MIN etc ● Other probabilistic structures like CMS, heavy hitters, etc ● We implemented Hokusai ○ capture frequencies of items in time series ● Design permits TopK queries over arbitrary trim intervals (Top100 popular URLs) SELECT pageURL, count(*) frequency FROM Table WHERE …. GROUP BY …. ORDER BY frequency DESC LIMIT 100
  45. 45. Demo Zeppelin Spark Interpreter (Driver) Zeppelin Server Row cache Columnar compressed Spark Executor JVM Row cache Columnar compressed Spark Executor JVM Row cache Columnar compressed Spark Executor JVM
  46. 46. A new approach to Real Time Analytics Streaming Analytics Probabilistic data Distributed In-Memory SQL Deep integration of Spark + Gem Unified cluster, AlwaysOn, Cloud ready For Real time analytics Vision – Drastically reduce the cost and complexity in modern big data. …Using fraction of the resources 10X better response time, drop resource cost 10X, reduce complexity 10X Deep Scale, High volume MPP DB Integrate with

×