Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Streaming Patterns Revolutionary Architectures with the Kafka API

670 views

Published on

Building a robust, responsive, secure data service for healthcare is tricky. For starters, healthcare data lends itself to multiple models:

• Document representation for patient profile view or update

• Graph representation to query relationships between patients, providers, and medications

• Search representation for advanced lookups

Keeping these different systems up to date requires an architecture that can synchronize them in real time as data is updated. Furthermore, meeting audit requirements in Healthcare requires the ability to apply granular cross-datacenter replication policies to data and be able to provide detailed lineage information for each record. This post will describe how stream-first architectures can solve these challenges, and look at how this has been implemented at a Health Information Network provider.

This talk will go over the Kafka API with these design patterns:

• Turning the database upside down

• Event Sourcing , Command Query Responsibity Separation , Polyglot Persistence

• Kappa Architecture

Published in: Software
  • Be the first to comment

Streaming Patterns Revolutionary Architectures with the Kafka API

  1. 1. © 2016 MapR Technologies L1-1® © 2016 MapR Technologies ® Streaming Patterns, Revolutionary Architectures Carol McDonald
  2. 2. © 2016 MapR Technologies L1-2® Agenda Streams Core Components •  Topics, Partitions •  Fault Tolerance •  High Availability Patterns •  Event Sourcing •  Duality of Streams and Databases •  Command Query Responsibility Separation •  Polyglot Persistence, Multiple Materialized Views •  Turning the Database Upside Down Real World Examples •  Fraud Detection •  Healthcare Exchange
  3. 3. © 2016 MapR Technologies L1-3® Which products are we discussing?
  4. 4. © 2016 MapR Technologies L1-4® © 2016 MapR Technologies© 2016 MapR Technologies Streams Core Components
  5. 5. © 2016 MapR Technologies L1-5® What’s a Stream ? Producers ConsumersEvents_Stream A stream is an unbounded sequence of events carried from a set of producers to a set of consumers. Events
  6. 6. © 2016 MapR Technologies L1-6® What is Streaming Data? Got Some Examples? Data Collection Devices Smart Machinery Phones and Tablets Home Automation RFID Systems Digital Signage Security Systems Medical Devices
  7. 7. © 2016 MapR Technologies L1-7® Why Streams? Trigger Events: •  Stock Prices •  User Activity •  Sensor Data Topic Many Big Data sources are Event Oriented StreamStreamStream Event Data TopicTopic Real-Time Analytics
  8. 8. © 2016 MapR Technologies L1-8® Analyze Data What if you need to analyze data as it arrives?
  9. 9. © 2016 MapR Technologies L1-9® It was hot at 6:05 yesterday! Batch Processing with HDFS Analyze 6:01 P.M.: 72° 6:02 P.M.: 75° 6:03 P.M.: 77° 6:04 P.M.: 85° 6:05 P.M.: 90° 6:06 P.M.: 85° 6:07 P.M.: 77° 6:08 P.M.: 75° 90°90° 6:01 P.M.: 72° 6:02 P.M.: 75° 6:03 P.M.: 77° 6:04 P.M.: 85° 6:05 P.M.: 90° 6:06 P.M.: 85° 6:07 P.M.: 77° 6:08 P.M.: 75°
  10. 10. © 2016 MapR Technologies L1-10® Event Processing with Streams 6:05 P.M.: 90° To pic Stream Temperature Turn on the air conditioning!
  11. 11. © 2016 MapR Technologies L1-11® Organize Data What if you need to organize data as it arrives?
  12. 12. © 2016 MapR Technologies L1-12® Integrating Many Data Sources and Applications Sources (Producers) Applications (Consumers) Unorganized, Complicated, and Tightly Coupled.
  13. 13. © 2016 MapR Technologies L1-13® Organize Data into Topics with MapR Streams Topics Organize Events into Categories and Decouple Producers from Consumers Consumers MapR Cluster Topic: Pressure Topic: Temperature Topic: Warnings Consumers Consumers Kafka API Kafka API
  14. 14. © 2016 MapR Technologies L1-14® Process High Volume of Data What if you need to process a high volume of data as it arrives?
  15. 15. © 2016 MapR Technologies L1-15® What if BP had detected problems before the oil hit the water ? •  1M samples/sec •  High performance at scale is necessary!
  16. 16. © 2016 MapR Technologies L1-16® Legacy Messaging Millions of Sources Hundreds of Destinationsinsert Legacy Message Queue: Message rate <100K/s Publish Acks delete Consume Acks
  17. 17. © 2016 MapR Technologies L1-17® Mechanisms for Decoupling Traditional message queues? •  Huge performance hit for persistence: •  message acknowledgement per message per consumer •  Lots of Non sequential disk I/O when messages added/removed
  18. 18. © 2016 MapR Technologies L1-18® Scalable Messaging with MapR Streams Server 1 Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning Server 2 Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning Server 3 Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning Topics are partitioned for throughput and scalability
  19. 19. © 2016 MapR Technologies L1-19® Scalable Messaging with MapR Streams Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning Producers are load balanced between partitions Kafka API
  20. 20. © 2016 MapR Technologies L1-20® Scalable Messaging with MapR Streams Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning Consumers Consumers Consumers Consumer groups can read in parallel Kafka API
  21. 21. © 2016 MapR Technologies L1-21® Core Components: Partitions Consumers MapR Cluster Topic: Admission / Server 1 Topic: Admission / Server 2 Topic: Admission / Server 3 Consumers Consumers Partition 1 Partitions: –  Messages are appended in order Offset: –  Sequential id of a message in a partition Partition 2 Partition 3 6 5 4 3 2 1 3 2 1 5 4 3 2 1 Producers Producers Producers New Message 6 5 4 3 2 1 Old Message
  22. 22. © 2016 MapR Technologies L1-22® Read Cursors •  Read cursor: offset ID of most recent read message •  Producers Append New messages to tail •  Consumers Read from head MapR Cluster 6 5 4 3 2 1 Consumer groupProducers Read cursors Consumer group
  23. 23. © 2016 MapR Technologies L1-23® Consumers MapR Cluster Topic: Admission / Server 1 Topic: Admission / Server 2 Topic: Admission / Server 3 Consumers Consumers Partition 1 Partition 2 Partition 3 6 5 4 3 2 1 3 2 1 5 4 3 2 1 Producers Producers Producers Events are delivered in the order they are received, like a queue. Partitioned, Sequential Access = High Performance New Message 6 5 4 3 2 1 Old Message
  24. 24. © 2016 MapR Technologies L1-24® Unlike a queue, events are persisted even after they’re delivered Messages remain on the partition, available to other consumers Minimizes Non-Sequential disk read-writes MapR Cluster (1 Server) Topic: Warning Partition 1 3 2 1 Unread Events Get Unread 3 2 1 Client Library ConsumerPoll
  25. 25. © 2016 MapR Technologies L1-25® Considering a Messaging Platform Kafka-esque Logs? •  Sequential writing/reading disk: •  Messages are persisted sequentially as produced, and read sequentially when consumed •  Performance plus persistence •  performance of up to a billion messages per second at millisecond-level delivery times. Kafka model is BLAZING fast •  Kafka 0.9 API with message sizes at 200 bytes •  MapR Streams on a 5 node cluster sustained 18 million events / sec •  Throughput of 3.5GB/s and over 1.5 trillion events / day
  26. 26. © 2016 MapR Technologies L1-26® When Are Messages Deleted? •  Messages can be persisted forever Or •  Older messages can be deleted automatically based on time to live MapR Cluster (1 Server) 6 5 4 3 2 1Partition 1 Older message
  27. 27. © 2016 MapR Technologies L1-27® Parallelism When Reading To read messages from the same Topic in parallel: •  create consumer groups •  consumers with same group.id •  partitions assigned dynamically round-robin Consumer group: Oil Wells Consumer A Consumer B Consumer C MapR Cluster Partition 4: Warning Partition 3: Warning Partition 2: Warning Partition 1: Warning Partition 5: Warning
  28. 28. © 2016 MapR Technologies L1-28® Fault Tolerance Consumption: Partitions Re-Assigned Dynamically If consumer goes offline, partitions re-assigned Consumer group.id: Oil Wells Consumer A Consumer C MapR Cluster Partition4: Warning Partition3: Warning Partition2: Warning Partition1: Warning Partition5: Warning
  29. 29. © 2016 MapR Technologies L1-29® Processing Same Message for Different Views Consumers Consumers Consumers Producers Producers Producers MapR-FS Kafka API Kafka API Pub Sub: Multiple Consumers, Multiple Destinations
  30. 30. © 2016 MapR Technologies L1-30® © 2016 MapR Technologies© 2016 MapR Technologies Partition Fault Tolerance
  31. 31. © 2016 MapR Technologies L1-31® Message Recovery What if you need to recover messages in case of server failure?
  32. 32. © 2016 MapR Technologies L1-32® Partitions are Replicated for Fault Tolerance Producer Producer Server 2 Partition2: Topic - Warning Producer Server 1 Partition1: Topic - Warning Server 3 Partition3: Topic - Warning Server 2 Server 3 Server 1 Server 3 Server 1 Server 2
  33. 33. © 2016 MapR Technologies L1-33® Partition1: Warning Partition2: Warning Replica Partition3: Warning Replica Partition1: Warning Replica Partition3: Warning Replica Partition1: Warning Replica Partition2: Warning Replica Partition3: Warning Producer Producer Producer Server 1 Server 2 Server 3 Security Investigation & Event Management Operational Intelligence Real-time Analytics Partition2: Warning Partitions are Replicated for Fault Tolerance
  34. 34. © 2016 MapR Technologies L1-34® Partitions are Replicated for Fault Tolerance Producer Producer Producer Security Investigation & Event Management Operational Intelligence Real-time Analytics Partition1: Warning Partition2: Warning Replica Partition3: Warning Replica Partition1: Warning Replica Partition3: Warning Replica Partition1: Warning Replica Partition2: Warning Replica Partition3: Warning Server 1 Server 2 Server 3 Partition2: Warning
  35. 35. © 2016 MapR Technologies L1-35® Partitions are Replicated for Fault tolerance Producer Producer Producer Security Investigation & Event Management Operational Intelligence Real-time Analytics Partition1: Warning Partition2: Warning Replica Partition3: Warning Replica Partition1: Warning Replica Partition3: Warning Replica Partition1: Warning Replica Partition2: Warning Replica Partition3: Warning Server 1 Server 2 Server 3 Partition2: Warning
  36. 36. © 2016 MapR Technologies L1-36® © 2016 MapR Technologies© 2016 MapR Technologies Streams and High Availability
  37. 37. © 2016 MapR Technologies L1-37® •  Stream: –  collection of topics managed together •  Manage stream: –  replication –  security –  time-to-live –  number of partitions Core Components: Streams Stream Pressure Temperature Warning Stream Pressure Temperature Warning Consumers Consumers Consumers Consumers Producers Producers Replication
  38. 38. © 2016 MapR Technologies L1-38® Real-time Access What if you need real-time access to live data distributed across multiple clusters and multiple data centers?
  39. 39. © 2016 MapR Technologies L1-39® Lack of Global Replication Topic: C
  40. 40. © 2016 MapR Technologies L1-40® Streams and Replication Streams: •  are a collection of topics •  can be replicated worldwide Topic: A Topic: B Topic: C Topic: A Topic: B Topic: C Replicating to another cluster
  41. 41. © 2016 MapR Technologies L1-41® Streams and Replication Topic: A Topic: B Topic: C Fail Over Streams: •  high availability •  disaster recovery
  42. 42. © 2016 MapR Technologies L1-42® Replicating Streams: Master-Slave Replication Venezuela_HA Cluster Metrics Stream MetricsProducers Venezuela Cluster Metrics Stream Metrics Consumers High Availabiltiy Backup for Venezula Master Slave
  43. 43. © 2016 MapR Technologies L1-43® Replicating Streams: Many-to-One Replication Houston Metrics Stream Metrics Producers Venezuela Metrics Stream MetricsConsumers Consumers Producers Mexico Metrics Stream MetricsConsumers Analyze all data from Houston Many One
  44. 44. © 2016 MapR Technologies L1-44® Replicating Streams: Multi-Master Replication Producers Seoul Metrics Stream MetricsConsumers ProducersSan Francisco Metrics Stream Metrics Consumers Both send and receive updates
  45. 45. © 2016 MapR Technologies L1-45® Stream Replication WAN Stream Pressure Temperature Warning Stream Pressure Temperature Warning Stream Pressure Temperature Warning
  46. 46. © 2016 MapR Technologies L1-46® Ship picks up containers… Singapore
  47. 47. © 2016 MapR Technologies L1-47® Arrives at destination… Tokyo
  48. 48. © 2016 MapR Technologies L1-48® While enroute to next destination… Washington
  49. 49. © 2016 MapR Technologies L1-49® Where does the data live… Singapore Washington Tokyo
  50. 50. © 2016 MapR Technologies L1-50® What is important about this? Data is generated on the ship •  Must have an easy way (i.e. foolproof) to move the data off the ship Each port stores the data from the ship •  Moving data between locations •  Analytics could happen at any location This is a multi-data center time series data use case •  Events from sensors = metrics •  Same concepts as data center monitoring
  51. 51. © 2016 MapR Technologies L1-51® © 2016 MapR Technologies© 2016 MapR Technologies Patterns
  52. 52. © 2016 MapR Technologies L1-52® Event Sourcing Updates Imagine each event as a change to an entry in a database. Account Id Balance WillO 80.00 BradA 20.00 1: WillO : Deposit : 100.00 2: BradA : Deposit : 50.00 3: BradA : Withdraw : 30.00 4: WillO : Withdraw: 20.00 https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying Change log 4 3 2 1 credit, debit events current account balances
  53. 53. © 2016 MapR Technologies L1-53® Replication Change Log https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying 3 2 1 3 2 1 3 2 1 Duality of Streams and Tables: Database: captures data at rest Stream: captures data change Master: Append writes Slave: Apply writes in order
  54. 54. © 2016 MapR Technologies L1-54® Which Makes a Better System of Record? Which of these can be used to reconstruct the other? 1: WillO : Deposit : 100.00 2: BradA : Deposit : 50.00 3: BradA : Withdraw : 30.00 4: WillO : Withdraw: 20.00 Account Id Balance WillO 80.00 BradA 20.00 Change Log 3 2 1
  55. 55. © 2016 MapR Technologies L1-55® Rewind: Reprocessing Events MapR Cluster 6 5 4 3 2 1Producers Reprocess from oldest message Consumer Create new view, Index, cache
  56. 56. © 2016 MapR Technologies L1-56® Rewind Reprocessing Events MapR Cluster 6 5 4 3 2 1Producers To Newest message Consumer new view Read from new view
  57. 57. © 2016 MapR Technologies L1-57® Event Sourcing, Command Query Responsibility Separation: Turning the Database Upside Down Key-Val Document Graph Wide Column Time Series Relational ???Events Updates
  58. 58. © 2016 MapR Technologies L1-58® What Else Do I Use My Stream For? Lineage - “how did BradA’s balance get so low?” Auditing - “who deposited/withdrew from BradA’s account?” History – to see the status of the accounts last year Integrity - “can I trust this data hasn’t been tampered with?” •  Yup - Streams are immutable 0: WillO : Deposit : 100.00 1: BradA : Deposit : 50.00 2: BradA : Withdraw : 30.00 3: WillO : Withdraw: 20.00
  59. 59. © 2016 MapR Technologies L1-59® What Do I Need For This to Work? Infinitely persisted events A way to query your persisted stream data An integrated security model across the stream and databases
  60. 60. © 2016 MapR Technologies L1-60® Fraud Detection Point of Sale -> Data Center is Transaction Fraud ? •  Lots of requests •  Need answer within ~ 50 100 milliseconds Data Center Point of Sale Location, time, card# Fraud yes/no ?
  61. 61. © 2016 MapR Technologies L1-61® Traditional Solution POS 1..n Fraud detector Last card use 1.  Look up last card use 2.  Compute the card velocity: •  Subtract last location, time from current location, time 3.  Update last card use
  62. 62. © 2016 MapR Technologies L1-62® What Happens Next? POS 1..n Fraud detector Last card use POS 1..n Fraud detector POS 1..n Fraud detector 1.  Look up last card use 2.  Compute the card velocity 3.  Update last card use Bottleneck !
  63. 63. © 2016 MapR Technologies L1-63® Service Isolation: Separate Read from Write POS 1..n Fraud detector Last card use Updater card activity Read Read last card use
  64. 64. © 2016 MapR Technologies L1-64® Separate Read Model from the Write Model: Command Query Responsibility Separation POS 1..n Fraud detector Last card use Updater card activity Read Event last card use Write last card use
  65. 65. © 2016 MapR Technologies L1-65® Event Sourcing: New Uses of Data Processing Same Message for Multiple Views POS 1..n Fraud detector Last card use Updater Card location history Other card activity
  66. 66. © 2016 MapR Technologies L1-66® Scaling Through Isolation allows Multiple Consumers POS 1..n Last card use Updater POS 1..n Last card use Updater card activity Fraud detector Fraud detector Multiple fraud detectors can use the same message queue •  De-coupling and isolation are key •  Propagate events, not table updates
  67. 67. © 2016 MapR Technologies L1-67® Decoupled Architecture Producer Activity Handler Producer Producer Historical Interesting Data Real-time Analysis Results Dashboard Anomaly Detection more than one component can make use of the same stream of messages for a variety of uses
  68. 68. © 2016 MapR Technologies L1-68® Lessons De-coupling and isolation are key Propagate events, not table updates
  69. 69. © 2016 MapR Technologies L1-69® Building Enterprise Software vs Internet Companies Enterprise Software: Complexity of domain => Business logic, Business rules Banking, Healthcare, Telecom Compliance=> Security Internet Companies: Volume of data => Complex data infrastructure Large Scale Availability, Recovery Reference Martin Kleppmann
  70. 70. © 2016 MapR Technologies L1-70® Building Enterprise Software vs Internet Companies Enterprise Software: Event Sourcing Internet Companies: Stream Processing Reference Martin Kleppmann
  71. 71. © 2016 MapR Technologies L1-71® © 2016 MapR Technologies© 2016 MapR Technologies Real World Solution
  72. 72. © 2016 MapR Technologies L1-72® Credit Card Fraud Model Building
  73. 73. © 2016 MapR Technologies L1-73® ServeNoSQL StorageData Ingest Fraud Stream Processing Architecture Stream ProcessingSource MapR-FS MapR-DB Topic: A Topic: B Topic: C Topic: A Topic: B Topic: C
  74. 74. © 2016 MapR Technologies L1-74® Streams Messaging Fraud Processing Stream Processing Derive features Model raw enriched alerts process Batch Processing MapR-FS MapR-DB MapR-DB raw enriched alerts Model build model update model
  75. 75. © 2016 MapR Technologies L1-75® Streams Messaging Fraud Event Processing Stream Processing NoSQL Storage MapR-FS MapR-DB Raw Enriched Fraud 1.  Parse raw event 2.  read card holder profile from MapR-DB 3.  Derive features 4.  Get prediction from model with features 5.  Publish not fraud to enriched topic 6.  Publish fraud to fraud topic
  76. 76. © 2016 MapR Technologies L1-76® Fraud Processing Same Message for Different Views Partition1: Topic – Raw Trans Partition1: Topic – Enriched Partition1: Topic – Fraud Alert Partition2: Topic – Raw Trans Partition2: Topic - Enriched Partition2: Topic – Fraud Alert Partition3: Topic – Raw Trans Partition3: Topic - Enriched Partition3: Topic – Fraud Alert Consumers MapR-FS MapR-DB Consumers Consumers Consumers MapR-FS MapR-DB Consumers Consumers Consumers MapR-FS MapR-DB Consumers Consumers
  77. 77. © 2016 MapR Technologies L1-77® © 2016 MapR Technologies© 2016 MapR Technologies Real World Solution
  78. 78. © 2016 MapR Technologies L1-78® JSON DB (MapR-DB) Graph DB (Titan on MapR-DB) Search Engine (Elastic-Search) Transforming the Health Care Ecosystem Electronic Medical Records “The Stream is the System of Record” –Brad Anderson VP Big Data Informatics
  79. 79. © 2016 MapR Technologies L1-79® Liaison ALLOY™ Platform 79 Data Integration ingest syndicatetransform Data Management master deduplicate harmonize relate merge tokenize store / persist analyze summarize report distill recommend explore query sandbox batch transform learn traverse
  80. 80. © 2016 MapR Technologies L1-80® Use Case: Streaming System of Record for Healthcare Objective: •  Build a flexible, secure healthcare exchange Records Analysis Applications Challenges: •  Many different data models •  Security and privacy issues •  HIPAA compliance Records
  81. 81. © 2016 MapR Technologies L1-81® ALLOY Health: Exchange State HIE Clinical Data Viewer Analytics queries like: What are the outcomes in the entire state on diabetes? Are there doctors that are doing this better than others? Clinical Data Financial Data Provider Organizations
  82. 82. © 2016 MapR Technologies L1-82® 2000+ Practices 200 + Labs 30,000 + Clinicians OrdersAnywhere PORTAL (no EHR) EHR with HL7 ONLY EHR with WORKFLOW INTEGRATION RADIOLOGY LAB
  83. 83. © 2016 MapR Technologies L1-83® This is a PAIN ! COMPLIAN CE SECURITY CONTROLS COMPLIANCE FEATURES PRIVACY PCI DSS 3.0 21 CFR Part 11 SSAE16 / SOC2 HIPAA/HITECH  
  84. 84. © 2016 MapR Technologies L1-84® WHY NOW? 84http://bit.ly/29aBatK
  85. 85. © 2016 MapR Technologies L1-85® WHY NOW? 2014 FQ4 profit $ -440 M Total Cost Estimate $ -12 B
  86. 86. © 2016 MapR Technologies L1-86® Why Now? The Relational database is not the only tool 1234 Attribute Value patient_id 1234 Name Jon Smith Age 50 999 Attribute Value patient_id 999 Name Jonathan Smith DOB Jun 1965 86 9876 Attribute Value provider_id 86 Name Dr. Nora Paige Specialty Diabetes Attribute Value rx_id 9876 Name Sitagliptin Dosage 325mg Visited Prescribed WasPrescribed Patient Patient Prescription Provider Context and Relationships
  87. 87. © 2016 MapR Technologies L1-87® WHY NOW? Mind the Gap 87
  88. 88. © 2016 MapR Technologies L1-88® Streaming System of Record for Healthcare Stream Topic Records Applications 6 5 4 3 2 1 Search Graph DB JSON HBase Micro Service Micro Service Micro Service Micro Service Micro Service Micro Service A P I Streaming System of Record Materialized Views
  89. 89. © 2016 MapR Technologies L1-89® 89   Immutable Log Raw Data workflow Key/Value (MapR-DB) materialized view workflow Search Engine materialized view CEP k v v v v v k v v v k v v k v v v v k v v v k v v v v v Document Log (MapR-FS) log API App pre- processor workflow Graph (ArangoDB) materialized view workflow Time Series (OpenTSDB) materialized view micro service micro service micro service micro service micro service micro service micro service micro service App AppApp ... The Promised Land Compliance Auditor
  90. 90. © 2016 MapR Technologies L1-90® The Promised Land Auditor smiley faces •  Data Lineage •  Audit Logging •  Wire-level encryption •  At Rest encryption Replication •  Disaster Recovery •  EU – data can’t leave Non-Stream / Non-”Big Data” •  Software Development Lifecycle •  System Hardening •  Separation of Concerns -  Dev vs Ops •  Patch Management 90 Compliance Auditor
  91. 91. © 2016 MapR Technologies L1-91® Solution Design/architecture solved some •  Streams •  Data Lineage/System of Record •  Kappa Architecture (Kreps/Kleppman) MapR solved others •  Unified Security •  Replication DC to DC •  Converge Kafka/HBase/Hadoop to one cluster •  Multi-tenancy (lots of topics, for lots of tenants) 91
  92. 92. © 2016 MapR Technologies L1-92® © 2016 MapR Technologies© 2016 MapR Technologies API
  93. 93. © 2016 MapR Technologies L1-93® Sample Producer: All Together public class SampleProducer { String topic=“/streams/pump:warning”; public static KafkaProducer producer; public static void main(String[] args) { producer=setUpProducer(); for(int i = 0; i < 3; i++) { String txt = “msg ” + i; ProducerRecord<String, String> rec = new ProducerRecord<String, String>(topic, txt); producer.send(rec); System.out.println("Sent msg number " + i); } producer.close(); }
  94. 94. © 2016 MapR Technologies L1-94® public class MyConsumer { public static String topic = "/stream/pump:warning”; public static KafkaConsumer consumer; public static void main(String[] args) { configureConsumer(args); consumer.subscribe(topic); while (true) { ConsumerRecords<String, String> msg= consumer.poll(pollTimeOut); Iterator<ConsumerRecord<String, String>> iter = msg.iterator(); while (iter.hasNext()) { ConsumerRecord<String, String> record = iter.next(); System.out.println(”read " + record.toString()); } } consumer.close(); } } Sample Consumer: All Together
  95. 95. © 2016 MapR Technologies L1-95® © 2016 MapR Technologies© 2016 MapR Technologies Summary
  96. 96. © 2016 MapR Technologies L1-96® Can we get “Extreme” ? 1+ Trillion Events •  per day Millions of Producers •  Billions of events per second Multiple Consumers •  Potentially for every event Multiple Data Centers •  Plan for success •  Plan for drastic failure Think that is crazy? Consider having 100 servers and performing: Monitoring and Application logs… •  100 metrics per server •  60 samples per minute •  50 metrics per request •  1,000 log entries per request (abnormally small, depends on level) •  1million requests per day ~ 2 billion events per day, for one small (ish) use case Extreme Average Reality
  97. 97. © 2016 MapR Technologies L1-97® Stream Processing Building a Complete Data Architecture MapR File System (MapR-FS) MapR Converged Data Platform MapR Database (MapR-DB) MapR Streams Sources/Apps Bulk Processing
  98. 98. © 2016 MapR Technologies L1-98®
  99. 99. © 2016 MapR Technologies L1-99®
  100. 100. © 2016 MapR Technologies L1-10 0 ® bit.ly/jjug-aug2016 Find my slides & other related materials to this talk here: or search:
  101. 101. © 2016 MapR Technologies L1-10 1 ® MapR Blog • https://www.mapr.com/blog/
  102. 102. © 2016 MapR Technologies L1-10 2 ® …helping you put data technology to work ●  Find answers ●  Ask technical questions ●  Join on-demand training course discussions ●  Follow release announcements ●  Share and vote on product ideas ●  Find Meetup and event listings Connect with fellow Apache Hadoop and Spark professionals community.mapr.com

×