Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to build streaming data applications - evaluating the top contenders

811 views

Published on

Originally presented at:

British Computer Society (BCS) SPA-287, London, UK, 3 March 2015
http://www.eventbrite.co.uk/e/spa-287-how-to-build-streaming-data-applications-evaluating-the-top-contenders-tickets-15735307729/

Published in: Technology
  • Be the first to comment

  • Be the first to like this

How to build streaming data applications - evaluating the top contenders

  1. 1. page HOW TO BUILD STREAMING DATA APPLICATIONS: EVALUATING THE TOP CONTENDERS Akmal B. Chaudhri about.me/akmalchaudhri
  2. 2. page© 2015 VoltDB PROPRIETARY page INTRODUCTION 2
  3. 3. page VOLTDB OVERVIEW Mike Stonebraker Founded in 2009 by database luminary FAST World Record Cloud Benchmark: YCSB (Yahoo Cloud Serving Benchmark) - 2.4m million tps (transactions per second) Other Stonebraker Companies Customers 3 Technology •  In-Memory (but data is durable to disk) •  Scale-Out shared-nothing architecture •  Reliability and fault tolerance •  SQL + Java with ACID •  Hadoop and data warehouse integration •  Open source and commercially licensed (24X7) © 2015 VoltDB PROPRIETARY
  4. 4. page VOLTDB BENCHMARK ON AMAZON VIRTUAL AND IBM SOFTLAYER BARE-METAL SERVERS •  Yahoo Cloud Serving Benchmark (YCSB) is a popular industry-standard benchmark for cloud databases •  AWS – virtualized servers •  SoftLayer - bare-metal servers •  Workload “B” - 95% reads with 5% updates. •  Results: Best in class cloud performance (run in the cloud)! •  AWS - 285k tps for 3 nodes scaling linearly to 724k tps for a 12 node cluster •  IBM SoftLayer - 1.02 million tps for 3 nodes scaling linearly to 2.4 million tps for a 12 node cluster SoftLayer AWS SoftLayer: Update and Read Latency Latency(ms) Throughput (ops/sec)© 2015 VoltDB PROPRIETARY
  5. 5. page PREDICTION 5 All businesses will compete on their ability to make decisions “in the moment” using Fast Data. © 2015 VoltDB PROPRIETARY
  6. 6. page FAST DATA SOURCES AND DRIVERS Mobile IoT Social Sensors Logs Data is doubling every two years •  26 billion connected devices by 2020 (Gartner 2014) •  37% of most data will be processed at the edge in milliseconds (Cisco IoT Study 12/11/14) Mobile IoT 6© 2015 VoltDB PROPRIETARY
  7. 7. page Mobile Billing and rights management, subscriber marketing, etc. IoT, Energy, Sensor Smart grid/meters, asset tracking & management Personalized Targeting Ad optimization, audience segmenting Capital Markets Risk, market data management, customer mgt Infrastructure Data pipeline, system performance, streaming ETL EVERY COMPANY HAS FAST DATA PROBLEMS UK Smart Meter 7 VoltDB Customers © 2015 VoltDB PROPRIETARY
  8. 8. page FAST DATA IS A COMPETITIVE ADVANTAGE TODAY! Instant insight Instant action Instant awareness 8 * VoltDB customers “Event triggered, real-time recommendations based on customer behavior have 10-15 times the response rates than mass marketing” “We get competitive advantage by analyzing device and user data to create an interactive and personalized consumer experience across all devices.” “Real time contextual offers increase offer uptake rates by 75% and data revenues by 15%.” * * © 2015 VoltDB PROPRIETARY
  9. 9. page TRADITIONAL RDBMS •  Heavy Overhead •  1000s of concurrent versions •  Contention for locked records •  Contention for latching on lock table •  Index bottlenecks •  Disk I/O bottlenecks •  Architecture limits scaling © 2015 VoltDB PROPRIETARY 9
  10. 10. page ARCHITECTURE IS IMPORTANT Fast data requires a different architecture. © 2015 VoltDB PROPRIETARY 10
  11. 11. page© 2015 VoltDB PROPRIETARY page BIG DATA + FAST DATA 11
  12. 12. page Collect' Explore' (Data'Science)' Analyze' Act' (Discoveries/' Op:miza:ons)' Big data ecosystem has several components © 2015 VoltDB PROPRIETARY 12
  13. 13. page DATA ARCHITECTURE FOR FAST + BIG DATA Enterprise Apps ETL CRM ERP Etc. Data Lake (HDFS, etc.) BIG DATA SQL on Hadoop Map Reduce Exploratory Analytics BI Reporting Fast Operational Database FAST DATA Export Ingest / Interactive Real-time Analytics Fast Serve Analytics Decisioning 13© 2015 VoltDB PROPRIETARY
  14. 14. page Calculations Serving of Results Real Time, Per Event, Interactive VOLTDB AND FAST DATA PIPELINE 14© 2015 VoltDB PROPRIETARY
  15. 15. page IN THE BIG CORNER Systems facilitating exploration and analytics of large collections. 15 Example Technologies Columnar OLAP warehouses Hadoop Ecosystem •  MapReduce •  Hive, Pig •  SQL.next: Impala, Drill, Shark Example Applications •  User segmentation & pre-scoring •  Seasonal trending •  Recommendation matrices •  Building search indexes •  Data Science: statistical clustering, machine learning © 2015 VoltDB PROPRIETARY
  16. 16. page IN THE FAST CORNER Systems facilitating real time ingest, analytics and decisions against incoming streams of events. 16 Example Technologies •  Streaming frameworks (e.g. Spark) •  Fast OLAP (e.g. HANA) •  Fast OLTP (e.g. VoltDB) Example Applications •  Micro-personalization •  Recommendation serving •  Alerting/alarming •  Operational monitoring •  Data enrichment (ETL elimination) •  High throughput authorization •  Ex: API quota enforcement © 2015 VoltDB PROPRIETARY
  17. 17. page TYPICAL FAST DATA QUESTIONS 17 Hadoop& Volume' SQL&/&OLAP& Data'Science' Fast& Velocity' •  Is the fast layer streaming? •  It is often more like fast OLTP •  How do the pieces communicate? •  OLAP analytics from Big -> Fast •  New events from Fast -> Big •  Where do “analytics” belong? •  Analytics per-event: with Fast •  Analytics across history: with Big •  Are streaming frameworks equivalent? •  Traditional SQL CEP (Esper, Streambase) •  Tuple DAGs (Storm) •  Window processors on Hadoop (Spark) & © 2015 VoltDB PROPRIETARY
  18. 18. page HOW TO SOLVE IT* 18 *"With"admiring"credit"to"G."Polya" Considering'Data' Considering'Processing' What&are&the&types&of& data&to&be&managed&in& fast&data&applica>ons?& How&does&data&flow& through&fast&data& applica>ons?& What&are&the& calcula>ons&&&analy>cs& that&are&necessary?& © 2015 VoltDB PROPRIETARY
  19. 19. page Data Temporality Incoming events Click stream, tick stream, sensors, metrics Real-Time Analytic Results Event metadata Device version, location, user profiles, point-of-interest data OLAP Analytics Used in Real-Time Decisions Responses/side effects Examples Event Stream Persistent (Queryable) Persistent (Look-Ups) Outgoing events Persistent (Look-Ups) Event Stream Event Stream Counters, streaming aggregates, Time-series rollups Scoring models, seasonal usage, demographic trends Policy enforcement decisions, personalization recommendations Enriched, filtered, correlated transform of input feed © 2015 VoltDB PROPRIETARY 19
  20. 20. page SOURCES OF STATE 1.  Analytics outputs must be query-able. 2.  “Lookup tables” to create groupings for analytics and to supply enrichment data. 3.  Session managements: grouping, filtering and aggregating create intermediate state. 20© 2015 VoltDB PROPRIETARY
  21. 21. page 21 Considering'Data' Considering'Processing' What&are&the&types&of& data&to&be&managed&in& fast&data&applica>ons?& How&does&data&flow& through&fast&data& applica>ons?& What&are&the& calcula>ons&&&analy>cs& that&are&necessary?& © 2015 VoltDB PROPRIETARY
  22. 22. page DATA FLOWS Real-time Analytics •  Streaming summaries for operations •  KPI measurement •  Analytics for apps 22 Real-Time Analytics © 2015 VoltDB PROPRIETARY
  23. 23. page DATA FLOWS 23 Fast Request/Response (and side effects) •  Mobile Authorization •  Campaign Evaluation •  Quota Enforcement •  Micro-Personalization •  Recommendation Serving Request/ Response © 2015 VoltDB PROPRIETARY
  24. 24. page DATA FLOWS Data Pipelines •  Data enrichment •  Sessionization and re-assembly of incoming events. •  Correlation (by time, location, identity) •  Filtering 24 Pipeline Data Lake © 2015 VoltDB PROPRIETARY
  25. 25. page 25 Considering'Data' Considering'Processing' What&are&the&types&of& data&to&be&managed&in& fast&data&applica>ons?& How&does&data&flow& through&fast&data& applica>ons?& What&are&the& calcula>ons&&&analy>cs& that&are&necessary?& © 2015 VoltDB PROPRIETARY
  26. 26. page 26 Continuous Query Transactional Event Evaluation Transformation © 2015 VoltDB PROPRIETARY
  27. 27. page FAST DATA STACK Applications, Message Queues, Data Sources Ingest Analyze Decide •  Counters •  Aggregations •  Time series •  Statistics •  Store results •  Query and recombine •  Fast serving •  Per-event policy evaluations •  Responses (synchronous): authorization, personalization •  Side-effects (asynchronous): alerts, alarms Export & Pipeline © 2015 VoltDB PROPRIETARY 27
  28. 28. page 28 Applications, Message Queues, Data Sources Ingest Analyze Decide Counters Aggregations Time series Statistics Store results Query and recombine Fast serving Per-event policy evaluations Responses (synchronous) Side-effects (asynchronous) Export & Pipeline APACHE-ISH TECHNOLOGY STACK Kafka / RabbitMQ Storm, Flume, Sqoop Storm + Serving Layer Spark + Serving Layer Cassandra, HBase Hadoop, Message queues © 2015 VoltDB PROPRIETARY
  29. 29. page 29 Applications, Message Queues, Data Sources Ingest Analyze Decide Counters Aggregations Time series Statistics Store results Query and recombine Fast serving Per-event policy evaluations Responses (synchronous) Side-effects (asynchronous) Export & Pipeline VOLTDB TECHNOLOGY STACK Kafka / RabbitMQ VoltDB SQL, Java for Analytics Transactions / ACID Hadoop, Message queues © 2015 VoltDB PROPRIETARY
  30. 30. page 30 OLTP (Transactions First) Streaming Event Processors OLAP (Columnar Analytics) © 2015 VoltDB PROPRIETARY
  31. 31. page 31 Applications, Message Queues, Data Sources Ingest Analyze Decide Counters Aggregations Time series Statistics Store results Query and recombine Fast serving Per-event policy evaluations Responses (synchronous) Side-effects (asynchronous) Export & Pipeline STREAM TECHNOLOGY STACK © 2015 VoltDB PROPRIETARY
  32. 32. page 32 Applications, Message Queues, Data Sources Ingest Analyze Decide Counters Aggregations Time series Statistics Store results Query and recombine Fast serving Per-event policy evaluations Responses (synchronous) Side-effects (asynchronous) Export & Pipeline OLAP TECHNOLOGY STACK © 2015 VoltDB PROPRIETARY
  33. 33. page Applications & Streams Logs, Sensors, Meter Readings, IoT, Location Real-Time Applications Message Queue Ingest Kafka Loader CSV loaders C++, C#, PHP, Python Java (and others) Export CSV Data Thrift Messages JDBC HTTP Local File Extensible Connectors SQL Views Java Analyze ACID Txns State Decide Downstream Pipeline Hadoop Data Warehouse Message Queue STREAMING DATA PIPELINE © 2015 VoltDB PROPRIETARY 33
  34. 34. page© 2015 VoltDB PROPRIETARY page FAST DATA PATTERNS 34
  35. 35. page THREE FAST DATA APPLICATION PATTERNS •  Real-Time Analytics •  Real-time analytics for operations •  Real-time KPI measurement •  Real-time analytics for apps •  Data Pipelines •  Streaming data enrichment •  Sessionization / re-assembly •  Correlation (by time, by location, by id) •  Filtering •  Pre-aggregation 35 •  Fast Request/Response •  Mobile Authorization •  Campaign Authorization •  Fast API Quota Enforcement •  Micro-Personalization •  Recommendation Serving © 2015 VoltDB PROPRIETARY
  36. 36. page VOLTDB: REAL-TIME ANALYTICS 36 VoltDB Metadata (Dimension table) Session state (Fact table) •  Operational analytics and monitoring •  RT analytics enabling user- facing applications •  KPI for internal BI/Dashboards •  In-memory MPP SQL over ODBC/JDBC •  Cheap + correct materialized views for streaming aggregations SQL, Views Ingest © 2015 VoltDB PROPRIETARY
  37. 37. page VOLTDB: DATA PIPELINES WITH EXPORT 37 VoltDB Metadata (Dimension table) Session state (Fact table) •  Filtering (ex: only RFID / iBeacon readings that show change from previous location). •  Sessionization •  Common version re-writing •  Data enrichment •  MPP streaming Export •  Row data, Thrift messages, CSV •  OLAP, HDFS and message queues Export © 2015 VoltDB PROPRIETARY
  38. 38. page VOLTDB: REQUEST/RESPONSE DECISIONS 38 •  Authorization •  RT balance checks, quota enforcement •  Personalization and Recommendation Serving •  Combine pre-score with immediate context •  Fully ACID transaction model. •  Thousands to Millions per second •  At less than 5ms latencies Metadata& (Dimension&table)& Session&state& (Fact&table)& ACID&Transac>ons& © 2015 VoltDB PROPRIETARY
  39. 39. page© 2015 VoltDB PROPRIETARY page VOLTDB V5.0 39
  40. 40. page VOLTDB V5.0 – ACCELERATING FAST DATA APPLICATION DEVELOPMENT •  Hadoop/Big Data Ecosystem Integrations •  Fast Data Pipeline Sample Applications •  Ease of Database Development (traditional API) •  VoltDB Management Center (VMC) •  Updated Hortonworks HDP Certification 40© 2015 VoltDB PROPRIETARY
  41. 41. page FAST DATA INTEGRATIONS - IMPORTERS •  Kafka Loader •  Subscribe to a Kafka topic and insert each message into a VoltDB Table •  JDBC Loader •  Load a JDBC result set into a VoltDB Table •  Vertica Udx •  User-defined function to load Vertica result sets into a VoltDB Table •  Apache Hive and Apache Pig •  Hadoop OutputFormat to load Hive and Pig result sets into VoltDB © 2015 VoltDB PROPRIETARY 41
  42. 42. page FAST DATA INTEGRATIONS - EXPORTERS •  HDFS Export •  Hadoop export via WebHDFS and HttpFS •  HTTP Export •  Delivery and Alerting via HTTP post/get •  Kafka Export, RabbitMQ Export •  Message queue delivery •  Export format configurable •  Avro, CSV, TSV, more coming… © 2015 VoltDB PROPRIETARY 42
  43. 43. page FAST DATA PIPELINE SAMPLE APPLICATION •  Streaming Data, Real-time Analytics •  Export to Hadoop •  Export to OLAP (Vertica, others) •  Place historical decision making intelligence into VoltDB •  Closed Loop, via Hive, Pig OutputFormat or Vertica Udx •  Download: https://github.com/VoltDB/app-fastdata •  And see our blog posts: http://voltdb.com/blog/fast-data-look-voltdb-sample-app © 2015 VoltDB PROPRIETARY 43
  44. 44. page LAMBDA ARCHITECTURE SAMPLE APPLICATION •  Type of application: Real-time analytics •  Demonstrates how to simplify the “Speed Layer” •  Using VoltDB, developers can replace both the streaming and the operational data store portions of the speed layer. •  Less code, greatly reduced complexity •  Improving the Lambda Architecture •  Perform real-time analytics AND react, per event, to the incoming data stream •  Try it yourself: http://voltdb.com/community/applications HOW MANY UNIQUE USERS INTERACTED WITH MY APP TODAY? © 2015 VoltDB PROPRIETARY 44
  45. 45. page VOLTDB MANAGEMENT CENTER (VMC) A browser-based management tool for monitoring, examining, and querying a running VoltDB database © 2015 VoltDB PROPRIETARY 45
  46. 46. page UPDATED HORTONWORKS CERTIFICATION © 2015 VoltDB PROPRIETARY 46
  47. 47. page© 2015 VoltDB PROPRIETARY page CUSTOMER CASE STUDIES 47
  48. 48. page 60 Million meters under management, saving millions in efficiency, reduced waste VOLTDB DELIVERS SUPERIOR CUSTOMER VALUE Customers Business Value Internet Service Provider Discover 100% of DoS attacks, and improved response time by 97% Communications Service Provider Improved infrastructure utilization by 150% Online Game Analytics Increased free-to-pay conversion rate by 30% Mobile Network Management Saves $0.5 million/customer installation; unlimited scale in the cloud Mobile Ad Service Provider OpEx – 93% reduction in servers (100 to 7) Saved millions in ad budget overages 48 Smart Meter, Energy Management © 2015 VoltDB PROPRIETARY
  49. 49. page 49© 2015 VoltDB PROPRIETARY
  50. 50. page TRY V5.0 TODAY FOR FREE •  VoltDB Enterprise Edition •  Production-ready •  Fully durable, highly available •  Commercial license, fully supported •  http://voltdb.com/download/software •  Sample apps (in a Docker container) •  http://voltdb.com/community/demo •  VoltDB Community Edition – open source •  http://github.com/voltdb VoltDB runs over 6 BILLION transactions/day in production! © 2015 VoltDB PROPRIETARY 50
  51. 51. Capability Spark,Streaming Storm TIBCO,Streambase IBM,Streams Google,Dataflow Amazon,Kinesis VoltDB Focus Micro&Batching&for&Hadoop Infrastructure&for&data& capture Complex&Event&Processing Stream&processing&and& analytics&without&queries Next&gen&MapReduce&in&the& cloud Infrastructure&for&data& capture Stream&processing,&analytics&with& queries,&and&realCtime&decision& making Programming&Model Java,&Scala Clojure,&Java,&Ruby,&Python SQL Proprietary&C&Stream& Processing&Language&(SPL) Java Java Java,&Relational,&SQL,&ACIDC compliant Latency&(milliseconds) >&&1,000&milliseconds milliseconds 1&millisecond 1&millisecond >&&2,000&milliseconds 35C100&milliseconds 1&milllisecond Data&Capture/Ingestion Batch ! ! ! ! ! ! Stateful,Operation X X X X X X ! Ad,hoc,queries Interactive,SQL X X X X X X ! Analytics,w/o,Queries ! with&add&on&DDLs ! ! ! ! ! Analytics,with,queries,and,perKevent, decision,making X X X X X X ! Real&time&Data&Enrichment Using&metadata&to&enrich,&denormalize,&etc.,& incoming&event&streams X X X X X X ! Apply&OLAP&results&to&real&time&data&stream X X X ! X X ! ScaleCout&architecture ! ! X ! ! ! ! Reliability:&ability&to&persist&data X X X X X ! Fault&Tolerant ! ! ! ! ! ! Requires&Zookeeper&for&HA Reliability:&ability&to&persist&data X X ! ! X X ! Cluster&&&Resource&Management Need&to&addCon&Zookeeper Need&to&addCon&Zookeeper;& supports&YARN BuiltCIn BuiltCIn BuiltCIn BuiltCIn BuiltCIn Support Cloudera Hortonworks TIBCO IBM Google Amazon VoltDB Output&(OLAP&Integration) HDFS,&Flume,&Kafka,,&ZeroMQ HDFS,&Kafka,&Redis,&RDBMS HDFS,&CSV,&IBM&Netezza,&HP& Vertica,&&Microsoft,&Oracle,& Sybase HDFS,&CSV,&IBM&Netezza,&HP& Vertica,&&Microsoft,&Oracle,& Sybase Google Amazon HDFS,&Kafka,&RabbitMQ,&CSV,& Netezza,&HP&Vertica,&JDBC Available&as&Open&Source Yes,&Apache&license Yes,&Apache&license X X X X Yes,&AGPL&License Comparing,Fast,Data,Application,Platforms:,From,Simple,Streaming,to,RealKTime,Interaction,with,Decision,Making Ingestion&&&&C>&&&Analytics&&w/o&Queries&&&&&C>&&&&&Analytics&with&queries&&&&&C&>&&&&Data&Enrichment&C>&&&Real&time&Decisions Fast,data,applications,three,unique,requirements:,rapid,data,ingestion,,realKtime,analytics,on,streaming,data,,and,per,event,realKtime,decisions

×