Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Map r seattle streams meetup oct 2016

378 views

Published on

By Will Ochandarena

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Map r seattle streams meetup oct 2016

  1. 1. © 2016 MapR Technologies 1© 2016 MapR Technologies 1MapR Confidential © 2016 MapR Technologies When Your Stream is the System of Record Seattle Kafka Meetup Will Ochandarena Sr Dir, Product October 24 2016
  2. 2. © 2016 MapR Technologies 2© 2016 MapR Technologies 2MapR Confidential Agenda • Streaming System of Record - What? • A Little About MapR Streams • Versioning a Real-time Data Pipeline – Demo - MapR + StreamSets
  3. 3. © 2016 MapR Technologies 3© 2016 MapR Technologies 3MapR Confidential © 2016 MapR Technologies Streaming System of Record System of Record (n): information storage system that is the authoritative data source for a given data element or piece of information.
  4. 4. © 2016 MapR Technologies 4© 2016 MapR Technologies 4MapR Confidential Who Does This Today? Events Processing DB More Processing Long Term Storage
  5. 5. © 2016 MapR Technologies 5© 2016 MapR Technologies 5MapR Confidential Reprocessing is Hard Events Processing DB More Processing Long Term Storage ? Medium Term Storage 3d ago -> Now 1 Year ago -> ~an hour ago
  6. 6. © 2016 MapR Technologies 6© 2016 MapR Technologies 6MapR Confidential Easy Fix - Streaming System of Persistence Events Processing DB More Processing Long Term Storage Long Term Storage Events
  7. 7. © 2016 MapR Technologies 7© 2016 MapR Technologies 7MapR Confidential DMV_Updates Imagine each event as a change to an entry in a database. DL_ID City Points 0: { WillO : {City : Mountain View}, ts : 7/5/2009 04:01:01, src : dmv201 } 1: { BradA : {City : Atlanta}, ts : 5/11/2010 05:11:31, src : dmv1341 } 2: { BradA : {Points : +2}, ts : 6/22/2011 03:31:10, src : officer1213} 3: { WillO : {City : San Jose}, ts : 11/1/2012 04:01:01, src : dmv1661 } WillO BradA Mountain View Atlanta 0 0 San Jose 2 How Can a Stream Be a System of Record?
  8. 8. © 2016 MapR Technologies 8© 2016 MapR Technologies 8MapR Confidential Key-Val Document Graph Wide Column Time Series Relational ???Inserts Updates Streams and Databases in Harmony
  9. 9. © 2016 MapR Technologies 9© 2016 MapR Technologies 9MapR Confidential Which of these can be used to reconstruct the other? 0: { WillO : {City : Mountain View}, ts : 7/5/2009 04:01:01, src : dmv201 } 1: { BradA : {City : Atlanta}, ts : 5/11/2010 05:11:31, src : dmv1341 } 2: { BradA : {Points : +2}, ts : 6/22/2011 03:31:10, src : officer1213} 3: { WillO : {City : San Jose}, ts : 11/1/2012 04:01:01, src : dmv1661 } DL_ID City Points Will0 San Jose 0 BradA Atlanta 2 Which Makes a Better System of Record?
  10. 10. © 2016 MapR Technologies 10© 2016 MapR Technologies 10MapR Confidential • Auditing - “how did BradA’s points get so high?” • Lineage - “who added points to BradA license?” • History - “where did WillO used to live?” • Integrity - “can I trust this data hasn’t been tampered with?” • Yup - Streams are immutable 0: { WillO : {City : Mountain View}, ts : 7/5/2009 04:01:01, src : dmv201 } 1: { BradA : {City : Atlanta}, ts : 5/11/2010 05:11:31, src : dmv1341 } 2: { BradA : {Points : +2}, ts : 6/22/2011 03:31:10, src : officer1213} 3: { WillO : {City : San Jose}, ts : 11/1/2012 04:01:01, src : dmv1661 } Other Benefits of Streaming System of Record
  11. 11. © 2016 MapR Technologies 11© 2016 MapR Technologies 11MapR Confidential • Infinitely persisted events • A way to query your persisted stream data • An integrated security model across data services What Do I Need For This to Work? • Applied Streaming System of Record @ Liaison Blog
  12. 12. © 2016 MapR Technologies 12© 2016 MapR Technologies 12MapR Confidential © 2016 MapR Technologies About MapR & MapR Streams
  13. 13. © 2016 MapR Technologies 13© 2016 MapR Technologies 13MapR Confidential MapR Streams: Global Pub-sub Event Streaming System for Big Data Producers publish billions of events/sec to a topic in a stream. Events persisted and immediately delivered to all consumers, guaranteed. Tie together geo-dispersed clusters. Worldwide. Standard real-time API (Kafka). Integrates with Spark Streaming, Storm, Apex, and Flink. Direct data access (OJAI API) from analytics frameworks. To pi c Stream TopicProducers Consumers Remote sites and consumers Batch analytics
  14. 14. © 2016 MapR Technologies 14© 2016 MapR Technologies 14MapR Confidential Streams Offers a Durable, Persistent System of Record [ {“Topic1Part0Seq5001”: { “timestamp” : 1456246886, “topic” : “Topic1”, “partition” : 0, “producer” : “wochanda”, “offset” : 5001, “key” : “MsgKey”, “data” : {...} }, {“Topic2Part0Seq5002”: { … } }, … ] ● Reliable ● Secure ● Immutable ● Auditable ● Replayable
  15. 15. © 2016 MapR Technologies 15© 2016 MapR Technologies 15MapR Confidential Streams Enables Global Applications and Analytics Provides ● Arbitrary topology of thousands of clusters ● Automatic loop prevention ● DNS-based discovery ● Globally synchronized message offsets and consumer cursors Enables ● Global applications & data collection ● Producer & consumer failover ● Analysis/filtering/aggregation at the edge ● “Occasional” connections Producers Consumers
  16. 16. © 2016 MapR Technologies 16© 2016 MapR Technologies 16MapR Confidential Fun Facts MapR Streams Converged Global Scale Secure & Multi-Tenant Single cluster for files, tables, and streams. Global, IoT-scale “fabrics” with failover. Tenant-owned streams, logical grouping of topics and messages. Authentication, authorization, encryption. Unified policy with all other platform services. Infinite “system of record” persistence. Metadata tracked internally, no dependencies on ZK. Consumers, topics scale into millions.
  17. 17. © 2016 MapR Technologies 17© 2016 MapR Technologies 17MapR Confidential Open Source Engines & Tools Commercial Engines & Applications DataProcessing Web-Scale Storage MapR-FS MapR-DB Search and Others Global Namespace | No Single Point of Failure | Data Protection | Multi-tenancy | Workload Management Multi Temperature | Global Multi Datacenter | High Performance Low Latency | Security | Management & Monitoring MapR Streams Cloud and Managed Services Search and Others UnifiedManagementandMonitoring Search and Others Event StreamingDatabase Custom Apps HDFS API POSIX, NFS HBase API JSON API Kafka API MapR Converged Data Platform MapR Data Platform Services Commodity Hardware/Storage, Clouds, & Containers
  18. 18. © 2016 MapR Technologies 18© 2016 MapR Technologies 18MapR Confidential © 2016 MapR Technologies Versioning a Real-time Data Pipeline
  19. 19. © 2016 MapR Technologies 19© 2016 MapR Technologies 19MapR Confidential Challenges of a Streaming App Developer Pre-Production Streaming System Database Hadoop Cluster App Environment events logs events2 logs2 v2 v2 /clicks /clicks2 ... ... ... ...
  20. 20. © 2016 MapR Technologies 20© 2016 MapR Technologies 20MapR Confidential Challenges with Versioning Post-Production Input Data App Logic Output Data+ = Output Streams Database Tables Logs, Metrics What if you deploy a new version of your application? What happens to all of this?
  21. 21. © 2016 MapR Technologies 21© 2016 MapR Technologies 21MapR Confidential Example: Versioning in Production 45 40 60 30 37 39 72 79 60 Input_Stream 45 35 70 Output_Stream Calculate_Mean_3 Time Value 00:00:00 70 00:00:05 35 00:00:10 45 Output_Table Calculate_Mean_3Calculate_Median_3
  22. 22. © 2016 MapR Technologies 22© 2016 MapR Technologies 22MapR Confidential Calculate_Mean_3 Volume Versioning with Converged App Volumes 45 40 60 30 37 39 72 79 60 Input_Stream 35 70 Output_Stream Calculate_Mean_3 Time Value 00:00:00 70 00:00:05 35 00:00:10 Output_Table Calculate_Mean_3Calculate_Median_3 Calculate_Median_3 Volume Time Value 00:00:00 72 00:00:05 37 00:00:10 45 45 37 72 Output_Stream Output_Table
  23. 23. © 2016 MapR Technologies 23© 2016 MapR Technologies 23MapR Confidential Versioning & A/B Testing 80% 10% 10% A B C
  24. 24. © 2016 MapR Technologies 24© 2016 MapR Technologies 24MapR Confidential © 2016 MapR Technologies DEMO - MapR & Streamsets Versioning a Production Data Pipeline Rupal Shah - Streamsets
  25. 25. © 2016 MapR Technologies 25© 2016 MapR Technologies 25MapR Confidential StreamSets Data Collector™ Adaptable Pipelines -> Efficiency ❑ Intent-driven ingest (minimal schema specification). ❑ Data drift handling. Pipeline KPIs -> Visibility ❑ Real-time stage, edge and bad data metrics. ❑ Alerts via profiling, sampling and threshold-based rules. Containerized Architecture -> Agility ❑ Flexible deployment: edge, cluster, embedded, pipeline, pub/sub ❑ Zero-downtime upgrades due to logical component isolation. StreamSets Data Collector™ is open source software for building and deploying individual any- to-any ingest pipelines in the face of data drift.
  26. 26. © 2016 MapR Technologies 26© 2016 MapR Technologies 26MapR Confidential StreamSets Dataflow Performance Manager™ StreamSets Dataflow Performance Manager (DPM™) provides a single pane of glass to map, measure and master big data in motion. MASTER Availability & Accuracy Proactive Remediation MEASURE Any Path Any Time MAP Dataflow Lineage Live Data Architecture
  27. 27. © 2016 MapR Technologies 27© 2016 MapR Technologies 27MapR Confidential …helping you put data technology to work ● Find answers ● Ask technical questions ● Join on-demand training course discussions ● Follow release announcements ● Share and vote on product ideas ● Find Meetup and event listings Connect with fellow Apache Hadoop and Spark professionals community.mapr.com
  28. 28. © 2016 MapR Technologies 28© 2016 MapR Technologies 28MapR Confidential © 2016 MapR Technologies Backup
  29. 29. © 2016 MapR Technologies 29© 2016 MapR Technologies 29MapR Confidential bit.ly/tbd Find my slides & other related materials to this talk here: or search:

×