Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SnappyData, the Spark Database. A unified cluster for streaming, transactions & interactive analytics

1,764 views

Published on

Apache Spark 2.0 offers many enhancements that make continuous analytics quite simple. In this talk, we will discuss many other things that you can do with your Apache Spark cluster. We explain how a deep integration of Apache Spark 2.0 and in-memory databases can bring you the best of both worlds! In particular, we discuss how to manage mutable data in Apache Spark, run consistent transactions at the same speed as state-the-art in-memory grids, build and use indexes for point lookups, and run 100x more analytics queries at in-memory speeds. No need to bridge multiple products or manage, tune multiple clusters. We explain how one can take regulation Apache Spark SQL OLAP workloads and speed them up by up to 20x using optimizations in SnappyData.

We then walk through several use-case examples, including IoT scenarios, where one has to ingest streams from many sources, cleanse it, manage the deluge by pre-aggregating and tracking metrics per minute, store all recent data in a in-memory store along with history in a data lake and permit interactive analytic queries at this constantly growing data. Rather than stitching together multiple clusters as proposed in Lambda, we walk through a design where everything is achieved in a single, horizontally scalable Apache Spark 2.0 cluster. A design that is simpler, a lot more efficient, and let’s you do everything from Machine Learning and Data Science to Transactions and Visual Analytics all in one single cluster.

Published in: Technology
  • Be the first to comment

SnappyData, the Spark Database. A unified cluster for streaming, transactions & interactive analytics

  1. 1. SnappyData A Unified Cluster for Streaming, Transactions,& Interactive Analytics © Snappydata Inc 2017 www.Snappydata.io Jags Ramnarayan CTO, CoFounder
  2. 2. Our Pedigree 2 GTD Ventures Team : ~ 30 - Founded GemFire (In-memory data grid) - Pivotal Spin out - 25+ VMWare, Pivotal Database Engineers Investors 
  3. 3. Mixed Workloads Are Everywhere 3 Stream Processing Transaction Interactive Analytics Analyticsonmutatingdata Correlatingand joiningstreams with large histories Maintainingstateorcounters whileingestingstreams
  4. 4. Telco Use Case : Location based Services, network optimization 4 Revenue Generation Real-time Location based Mobile Advertising (B2B2C) Location Based Services (B2C, B2B, B2B2C) Revenue Protection Customer experience management to reduce churn Customers Sentiment analysis Network Efficiency Network bandwidth optimisation Network signalling maximisation • Network optimization – E.g. re-reroute call to another cell tower if congestion detected • Location based Ads – Match incoming event to Subscriber profile; If ‘Opt-in’ show location sensitive Ad • Challenge: Streaming analytics, interactive real-time dashboards ● Simple rules - (CallDroppedCount > threshold) then alert ● Or, Complex (OLAP like query) ● TopK, Trending, Join with reference data, correlate with history Stream processor today Need: Stream Analytics
  5. 5. Stream pipeline 5 Perpetually Growing … Maintain summaries Reference data -- External systems Time Window Aggregated Summary Ref Tables Stream Process Stream Sink TableIOT Devices …. Cell • IoT sensors, • Call Data Record(CDR), • AdImpressions..
  6. 6. Number of Processing Challenges 6 Perpetually Growing … Maintain summaries Reference data -- External systems Time Window Aggregated Summary Stream Sink Table Ref Tables Stream Process JOIN – streams, large table, reference data - Window could be large (an hour) sliding every 2 seconds - Fast Writes - Updates - Exactly once semantics - HA JOIN
  7. 7. Number of Processing Challenges 7 Time Window Aggregated Summary Stream Sink Ref Tables Stream Process INTERACTIVE QUERIES (ad-hoc) - High concurrency - Point lookup, scan/aggregations
  8. 8. Why Supporting Mixed Workloads is Difficult? Data Structures Query Processing Paradigm Scheduling & Provisioning Columnar Batch Processing Long-running Row stores Point Lookups Short-lived Sketches Delta / Incremental Bursty
  9. 9. Lambda Architecture 9 Query New Data Batch layer Master Datasheet 2 Serving layer Batch view 3 Batch view Speed layer 4 Real-time View Real-time View 1 Query 5 Storm, Spark Streaming, Samza… HDFS, Hbase SQL On Hadoop, MPP DB
  10. 10. Lambda Architecture is Complex 10 • Complexity • Learn and master multiple products, data models, disparate APIs & configs • Wasted resources • Slower • Excessive copying, serialization, shuffles • Impossible to achieve interactive-speed analytics on large or mutating data
  11. 11. • Cannot update • Repeated for each User/App APP1 Spark Master Spark Executor (Worker) Framework for streaming SQL, ML… Immutable CACHE APP2 Spark Master Spark Executor (Worker) Framework for streaming SQL, ML… Immutable CACHE SQL NoSQL Bottleneck Spark Streaming with NoSQL for State 11 1. Pushdown filter to NoSQL partition 2. Serialize, de-serialize to Spark executor 3. Multiple copies of large data sets 4. Lose optimization - vectorization Interactive, Continuous queries TOO SLOW
  12. 12. 12 Can We Simplify & Optimize?
  13. 13. 13 Our Solution SnappyData A SingleUnifiedCluster: OLTP+OLAP+ Streamingforreal-timeanalytics
  14. 14. Our Solution – FUSE Spark with In-memory database 14 Deep Scale, High Volume MPP DB Real-time design Low latency, HA, concurrencyBatch design, high throughput, Rich API, Eco-system Maturedover 13 years Single Unified HA Cluster OLTP+ OLAP + StreamingforReal-timeAnalytics
  15. 15. • Cannot update • Repeated for each User/App USER 1/APP1 Spark Master Spark Executor (Worker) Framework for streaming SQL, ML… Immutable CACHE USER 2/APP2 Spark Master Spark Executor (Worker) Framework for streaming SQL, ML… Immutable CACHE HDFS SQL NoSQL Bottleneck We Transform Spark from a computational engine … 15
  16. 16. … Into an “Always-On” Hybrid Database ! 16 Deep Scale, High Volume MPP DB HDFS SQL NoSQL HISTORY Spark Executor (Worker)JVM -Long running Spark Runtime Stream process, SQL, ML… Spark Driver In-Memory ROW + COLUMN In-memory Indexes Store - Mutable, - TransactionalSpark Cluster JDBC ODBC SparkJob Shared Nothing Persistence
  17. 17. … Into an “Always-On” Hybrid Database ! 17 Spark API (Streaming, ML, Graph) Transactions , Indexing Full SQL HA DataFrame, RDD, DataSets RowsColumnar IN-MEMORY Spark Cache Synopses (Samples) Unified Data Access (Virtual Tables) Unified CatalogNative Store SNAPPYDATA HDFS/HBAS E S3 JSON, CSV, XML SQL db Cassandra MPP DB Stream sources Spark Jobs, Scala/Java/Python/R API, JDBC/ODBC, Object API (RDD, DataSets)
  18. 18. Overview 18 Cluster Manager & Scheduler Snappy Data Server (SparkExecutor+ Store) Parser OLAP TRX Data Synopsis Engine Distributed Membership Service H A Stream Processing Data Frame RDD Low Latency High Latency HYBRID Store Probabilistic Rows Columns Index Query Optimizer Add / Remove Server Tables ODBC/JDBC
  19. 19. Unified Data Model & API 19 Cluster Manager & Scheduler Snappy Data Server (SparkExecutor+ Store) Parser OLAP TRX Data Synopsis Engine Distributed Membership Service H A Stream Processing Tables ODBC/JDBCData Frame RDD Low Latency High Latency HYBRID Store Probabilistic Rows Columns Index Query Optimizer Add / Remove Server • Mutability(DML+Trx) • Indexing • SQL-basedstreaming
  20. 20. Hybrid Store 20 Unbounded Streams Ingestion Real time Sampling Transactional State Update Probabilistic IndexRows Row Buffer Columns Random Writes ( Reference data ) OLAP Stream Analytics Row table Column table Sample table
  21. 21. Simple API and Spark Compatible 21 // Use the Spark DATA SOURCE API val snappy = new org.apache.spark.sql.SnappySession(spark.sparkContext) snappy.createTable(“TableName”, “Row | Column | Sample”, schema, options ) someDataFrame.write.insertInto(“TableName”) // Update, Delete, Put using the SnappySession snappy.update(“tableName”, filterExpr, Row(<newColumnValues>), updatedColumns ) snappy.delete(“tableName”, filterExpr) // Or, just Use Spark SQL syntax .. Snappy.sql(“select x from tableName”).count // Or, JDBC, ODBC to access like a regular Database jdbcStatement.executeUpdate(“insert into tableName values …”)
  22. 22. Extends Spark SQL 22 CREATE [Temporary] TABLE [IF NOT EXISTS] table_name ( <column definition> ) USING ‘ROW | COLUMN ’ OPTIONS ( COLOCATE_WITH 'table_name', PARTITION_BY 'PRIMARY KEY | column name', // Replicated table, by default REDUNDANCY '1' , // Manage HA PERSISTENT "DISKSTORE_NAME ASYNCHRONOUS | SYNCHRONOUS", // Default – only in-memory OFFHEAP "true | false" EVICTION_BY "MEMSIZE 200 | COUNT 200 | HEAPPERCENT", ….. [AS select_statement];
  23. 23. Stream SQL DDL and Continuous queries (based on Spark Streaming) 23 Consume from stream Transform raw data Continuous Analytics Ingest into in-memory Store Overflow table to HDFS Create stream table AdImpressionLog (<Columns>) using directkafka_stream options ( <socket endpoints> "topics 'adnetwork-topic’ “, "rowConverter ’ AdImpressionLogAvroDecoder’ ) streamingContext.registerCQ( "select publisher, geo, avg(bid) as avg_bid, count(*) imps, count(distinct(cookie)) uniques from AdImpressionLog window (duration '2' seconds, slide '2' seconds) where geo != 'unknown' group by publisher, geo”)// Register CQ .foreachDataFrame(df => { df.write.format("column").mode(SaveMode.Append) .saveAsTable("adImpressions")
  24. 24. Updates & Deletes on Column Tables 24 Column Segment ( t1-t2) Column Segment ( t2-t3) 0 1 0 0 0 0 1 1 0 K11 K12 . . . . . C11 C12 . . . . . C21 C22 . . . . . Summary Metadata PeriodicCompaction One Partition Time WRITE Row Buffer MVCC New Segment Replicate for HA
  25. 25. Can we use Statistical techniques to shrink data? 25 • Most apps happy to tradeoff 1% accuracy for 200x speedup! • Can usually get a 99.9% accurate answer by only looking at a tiny fraction of data! • Often can make perfectly accurate decisions with imperfect answers! • A/B Testing, visualization, ... • The data itself is usually noisy • Processing entire data doesn’t necessarily mean exact answers!
  26. 26. ` Probabilistic Store: Sketches + Uniform & Stratified Samples Higher resolution for more recent time ranges 1. Streaming CMS(Count-Min-Sketch) [t1, t2) [t2, t3) [t3, t4) [t4, now) Time 4T 2T T ≤T .... Maintain a small sample at each CMS cell 2. Top-K Queries w/ArbitraryFilters Traditional CMS CMS+Samples 3. Fully Distributed Stratified Samples Always include timestamp as a stratified column for streams Streams AgingRow Store (In-memory) Column Store (Disk) timestamp
  27. 27. Overview 27 Cluster Manager & Scheduler Snappy Data Server (SparkExecutor+ Store) Parser OLAP Transact Data Synopsis Engine Distributed Membership Service H A Stream Processing Data Frame RDD Low Latency High Latency HYBRID Store Probabilistic Rows Columns Index Query Optimizer Add / Remove Server Tables ODBC/JDBC
  28. 28. Supporting Real-time & HA 28 Locator Lead Node Executor JVM (Server) Shared Nothing Persistence JDBC/ ODBC Catalog Service Managed Driver SPARK Contacts SPARK Context SNAPPY Cluster Manager REST SPARK JOBS SPARK Program Memory Mgmt BLOCKS SNAPPY STORE Stream SNAPPY Tables Tables DataFrame • Spark Executors are long running. Driver failure doesn’t shutdown Executors • Driver HA – Drivers are “Managed” by SnappyData with standby secondary • Data HA – Consensus based clustering integrated for eager replication DataFrame Peer-2-peer
  29. 29. Overview 29 Cluster Manager & Scheduler Snappy Data Server (SparkExecutor+ Store) Parser OLAP Transact Data Synopsis Engine Distributed Membership Service H A Stream Processing Data Frame RDD Low Latency High Latency HYBRID Store Probabilistic Rows Columns Index Query Optimizer Add / Remove Server Tables ODBC/JDBC
  30. 30. Transactions 30 • Support for Read Committed & Repeatable Read • W-W and R-W conflict detection at write time • MVCC for non blocking reads and snapshot isolation • Distributed system failure detection integrated with commit protocol - Evict unresponsive replicas - Ensure consistency when replicas recover
  31. 31. Overview 31 Cluster Manager & Scheduler Snappy Data Server (SparkExecutor+ Store) Parser OLAP Transact Data Synopsis Engine Distributed Membership Service H A Stream Processing Data Frame RDD Low Latency High Latency HYBRID Store Probabilistic Rows Columns Index Query Optimizer Add / Remove Server Tables ODBC/JDBC
  32. 32. Query Optimization • Bypass the scheduler for transactions and low-latency jobs • Minimize shuffles aggressively - Dynamic replication for reference data - Retain ‘join indexes’ whenever possible - Collocate related data sets • Optimized ‘Hash Join’, ‘Scan’, ‘GroupBy’ compared to Spark - Uses more variables in code generation, vectorized structures • Column segment pruning through statistics
  33. 33. Co-partitioning & Co-location 33 Spark Executor Subscriber A-M Ref data Spark Executor Subscriber N-Z Ref data Linearlyscalewithpartitionpruning Subscriber A-M Subscriber N-Z KAFKA Queue KAFKA Queue
  34. 34. Overview 34 Cluster Manager & Scheduler Snappy Data Server (SparkExecutor+ Store) Parser OLAP Transact Data Synopsis Engine Distributed Membership Service H A Stream Processing Data Frame RDD Low Latency High Latency HYBRID Store Probabilistic Rows Columns Index Query Optimizer Add / Remove Server Tables ODBC/JDBC
  35. 35. Approximate Query Processing (AQP): Academia vs. Industry 35 25+yrsofsuccessful research in academia User-facing AQP almost non-existent in commercial world! Some approximate features in Infobright, Yahoo’s Druid, Facebook’s Presto, Oracle 12C, ... AQUA, Online Aggregation, MapReduce Online, STRAT, ABS, BlinkDB / G-OLA, ... WHY ? BUT: select geo, avg(bid) from adImpressions group by geo having avg(bid)>10 with error 0.05 at confidence 95 geo avg(bid) error prob_existence MI 21.5 ± 0.4 0.99 CA 18.3 ± 5.1 0.80 MA 15.6 ± 2.4 0.81 ... ... ... .... 1. Incompatible w/ BItools 2. Complex semantics 3. Bad sales pitch!
  36. 36. A First Industrial-Grade AQP Engine 1. Highlevel Accuracy Contract (HAC) • Concurrency: 10’s of queries in shared clusters • Resource usage: everyone hates their AWS bill • Network shuffles • Immediate results while waiting for final results 2. Fully compatible w/BItools • Set HAC behavior at JDBC/ODBC connection level 3. Better marketing! • User picks a single number p, where 0≤p≤1 (by default p=0.95) • Snappy guarantees that s/he only sees things that are at least p% accurate • Snappy handles (and hides) everything else! geo avg(bid) MI 21.5 WI 42.3 NY 65.6 ... ... iSight (Immediate inSight)
  37. 37. Conclusion
  38. 38. Unified OLAP/OLTP streaming w/ Spark ● Far fewer resources: TB problem becomes GB. ○ CPU contention drops ● Far less complex ○ single cluster for stream ingestion, continuous queries, interactive queries and machine learning ● Much faster ○ compressed data managed in distributed memory in columnar form reduces volume and is much more responsive
  39. 39. Lessons Learned 2. A unified cluster is simpler, cheaper, andfaster - By sharing state across apps, we decouple apps from data servers and provide HA - Save memory, data copying, serialization, and shuffles - Co-partitioning and co-location for faster joins and stream analytics 3. Advantages over HTAP engines: Deep stream integration +AQP 1. A unique experience marryingstwo different breeds ofdistributed systems lineage-based for high-throughput vs. (consensus-) replication-based for low-latency - Stream processing ≠ stream analytics - Top-k w/ almost arbitrary predicates + 1-pass stratified sampling over streams 4. Commercializing academic workis lots ofwork but alsolots offun
  40. 40. THANK YOU ! Try our iSight cloud for free: http://snappydata.io/iSight
  41. 41. iSight: Immediate inSight iSight’s immediate answer to the query: 1.7 secs Final answer to the query: 42.7 secs 25x speedup!
  42. 42. Our Solution: Highlevel Accuracy Contract (HAC) • A single number 0≤p≤1 (by default p=0.95) • We guarantee that you only see things that are at least p% accurate • We handle (and hide) everything else – Choose a behavior: REPLACE WITH SPECIAL SYMBOL (default), DO NOTHING, DROP THE ROW)

×