Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Live Tutorial – Streaming Real-Time Events Using Apache APIs

893 views

Published on

For this talk we will explore the power of streaming real time events in the context of the IoT and smart cities.

http://info.mapr.com/WB_Streaming-Real-Time-Events_Global_DG_17.08.02_RegistrationPage.html

Published in: Data & Analytics
  • Be the first to comment

Live Tutorial – Streaming Real-Time Events Using Apache APIs

  1. 1. © 2017 MapR Technologies Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- Time Uber Data Using Apache APIs: Kafka, Spark, HBase Carol McDonald @caroljmcdonald
  2. 2. © 2017 MapR Technologies Agenda Using An End to End Distributed Pipeline for Real-Time Uber Data Using Apache APIs: Kafka, Spark, Hbase we will discuss: •  Why IOT? •  Why combine Machine Learning with IOT? •  What is Machine Learning? How do you do it? •  Why Spark with Machine Learning? •  What is Streaming? •  Why Kafka (-ish –esque) Distributed Immutable Log ? •  Why Spark Streaming? •  Why Kafka + WebSockets? •  Why NoSQL HBase? Note: this code example is from me, only the data is from Uber
  3. 3. © 2017 MapR Technologies Why IOT? Lots of Things are Producing Streaming Data Data Collection Devices Smart Machinery Phones and Tablets Home Automation RFID Systems Digital Signage Security Systems Medical Devices
  4. 4. © 2017 MapR Technologies What’s a Stream ? Producers ConsumersEvents_Stream A stream is an unbounded sequence of events carried from a set of producers to a set of consumers. Events
  5. 5. © 2017 MapR Technologies Why Stream Processing? 6:05 P.M.: 90° To pic Stream Temperature Turn on the air conditioning! It’s becoming important to process events as they arrive
  6. 6. © 2017 MapR Technologies Why combine IOT with Machine Learning? •  Audi and Daimler harness the power of deep learning in order to achieve their goal of building autonomous vehicles –  Using MapR platform to scale deep learning efforts https://mapr.com/company/press-releases/norcom-selects- mapr-deep-learning/ •  Audi's new A8 takes us further down the road to self-driving cars than ever before –  https://www.cnet.com/roadshow/news/audis-new-a8-is- designed-to-let-you-play-candy-crush-in-rush-hour-traffic-safely/
  7. 7. © 2017 MapR Technologies Why combine IOT with Machine Learning? •  Cheaper sensors and machine learning are making it possible for doctors to rapidly apply smart medicine to their patients’ cases –  https://www.wsj.com/articles/the-smart-medicine-solution-to-the-health-care- crisis-1499443449
  8. 8. © 2017 MapR Technologies Why combine IOT with Machine Learning? •  A Stanford team has shown that a machine-learning model can identify heart arrhythmias from an electrocardiogram (ECG) better than an expert –  https://www.technologyreview.com/s/608234/the-machines-are-getting-ready-to-play-doctor/
  9. 9. © 2017 MapR Technologies Why combine IOT with Machine Learning? •  Connected care ensuring quicker Sepsis treatment: –  Blood pressures, pulse rates and oxygen levels from monitoring devices combined with algorithms to automatically calculate a score, and provide alerts –  http://www.computerweekly.com/news/450422258/Putting-sepsis-algorithms-into-electronic- patient-records
  10. 10. © 2017 MapR Technologies Applying Machine Learning to Live Patient Data •  https://www.slideshare.net/caroljmcdonald/applying-machine-learning-to- live-patient-data
  11. 11. © 2017 MapR Technologies Why combine IOT with Machine Learning? •  Smart Cities will be using 1.39 billion connected cars, IoT sensors, and devices by 2020 •  http://www.cisco.com/c/en/us/solutions/ industries/smart-connected-communities.html
  12. 12. © 2017 MapR Technologies Why combine IOT with Machine Learning? •  Uber Near Realtime Price Surging –  https://www.slideshare.net/ConfluentInc/kafka-uber- the-worlds-realtime-transit-infrastructure-aaron- schildkrout •  machine learning & geolocation data is being used in: –  telecom, travel, marketing, and manufacturing –  identify patterns and trends: –  recommendations, anomaly detection, and fraud. NEAR REALTIME PRICE SURGING
  13. 13. © 2017 MapR Technologies Why combine Streaming Events with Machine Learning? Fraud detection Smart Machinery Utility Smart Meters Home Automation Networks Manufacturing Security Systems Patient Monitoring
  14. 14. © 2017 MapR Technologies What if BP had detected problems before the oil hit the water ? •  1M samples/sec •  High performance at scale is necessary!
  15. 15. © 2017 MapR Technologies End to End Application Architecture
  16. 16. © 2017 MapR Technologies Part 1: Spark Machine Learning •  End to End Application for Monitoring Uber Data using Spark ML •  https://mapr.com/blog/monitoring-real-time-uber-data-using-spark-machine- learning-streaming-and-kafka-api-part-1/
  17. 17. © 2017 MapR Technologies What is Machine Learning? Data Build ModelTrain Algorithm Finds patterns New Data Use Model (prediction function) Predictions Contains patterns Recognizes patterns
  18. 18. © 2017 MapR Technologies ML Discovery Model Building Model Training/ Building Training Set Test Model Predictions Test Set Evaluate Results Historical Data Deployed Model Predictions Data Discovery, Model Creation Production Feature Extraction Feature Extraction ●  Churn Modelling Uber trips Stream TopicUber trips New Data
  19. 19. © 2017 MapR Technologies Examples of ML Algorithms Supervised •  Classification –  Naïve Bayes –  SVM –  Random Decision Forests •  Regression –  Linear –  Logistic Machine Learning Unsupervised •  Clustering –  K-means •  Dimensionality reduction –  Principal Component Analysis –  SVD
  20. 20. © 2017 MapR Technologies Supervised Algorithms use labeled data Data features Build Model New Data features Predict Use Model
  21. 21. © 2017 MapR Technologies Supervised Machine Learning: Classification & Regression Classification Identifies category for item
  22. 22. © 2017 MapR Technologies Classification: Definition Form of ML that: •  Identifies which category an item belongs to •  Uses supervised learning algorithms –  Data is labeled Sentiment
  23. 23. © 2017 MapR Technologies If it Walks/Swims/Quacks Like a Duck …… Then It Must Be a Duck swims walks quacks Features: walks quacks swims Features:
  24. 24. © 2017 MapR Technologies Car Insurance Fraud Example •  What are we trying to predict? –  This is the Label or Target outcome: –  The amount of Fraud •  What are the “if questions” or properties we can use to predict? –  These are the Features: –  The claim Amount
  25. 25. © 2017 MapR Technologies Label: Amount of Fraud Y X Feature: claimed amount Data point: fraud amount, claimed amount AmntFraud = intercept + coeff * claimedAmnt Car Insurance Fraud Regression Example
  26. 26. © 2017 MapR Technologies Credit Card Fraud Example •  What are we trying to predict? –  This is the Label: –  The probability of Fraud •  What are the “if questions” or properties we can use to predict? –  These are the Features: –  transaction amount, type of merchant, distance from and time since last transaction
  27. 27. © 2017 MapR Technologies Label Probabilty of Fraud 1 X Features: trans amount, type of store, Time Location difference last trans. Fraud 0 Not Fraud .5 Credit Card Fraud Logistic Regression Example
  28. 28. © 2017 MapR Technologies Supervised Learning: Classification & Regression •  Classification: –  identifies which category (eg fraud or not fraud) •  Linear Regression: –  predicts a value (eg amount of fraud) •  Logistic Regression: –  predicts a probability (eg probability of fraud)
  29. 29. © 2017 MapR Technologies Examples of ML Algorithms Machine Learning Unsupervised •  Clustering –  K-means •  Dimensionality reduction –  Principal Component Analysis –  SVD Supervised •  Classification –  Naïve Bayes –  SVM –  Random Decision Forests •  Regression –  Linear –  Logistic
  30. 30. © 2017 MapR Technologies Unsupervised Algorithms use Unlabeled data Customer GroupsBuild ModelTrain Algorithm Finds patterns New Customer Purchase Data Use Model (prediction function) Predict Group Contains patterns Recognizes patterns Customer purchase data
  31. 31. © 2017 MapR Technologies Unsupervised Machine Learning: Clustering Clustering group news articles into different categories
  32. 32. © 2017 MapR Technologies Clustering: Definition •  Unsupervised learning task •  Groups objects into clusters of high similarity
  33. 33. © 2017 MapR Technologies Clustering: Definition •  Unsupervised learning task •  Groups objects into clusters of high similarity –  Search results grouping –  Grouping of customers, patients –  Text categorization –  recommendations •  Anomaly detection: find what’s not similar
  34. 34. © 2017 MapR Technologies Clustering: Example •  Group similar objects
  35. 35. © 2017 MapR Technologies Clustering: Example •  Group similar objects •  Use MLlib K-means algorithm 1.  Initialize coordinates to center of clusters (centroid) x x x x x
  36. 36. © 2017 MapR Technologies Clustering: Example •  Group similar objects •  Use MLlib K-means algorithm 1.  Initialize coordinates to center of clusters (centroid) 2.  Assign all points to nearest centroid x x x x x
  37. 37. © 2017 MapR Technologies Clustering: Example •  Group similar objects •  Use MLlib K-means algorithm 1.  Initialize coordinates to center of clusters (centroid) 2.  Assign all points to nearest centroid 3.  Update centroids to center of points x x x x x
  38. 38. © 2017 MapR Technologies Clustering: Example •  Group similar objects •  Use MLlib K-means algorithm 1.  Initialize coordinates to center of clusters (centroid) 2.  Assign all points to nearest centroid 3.  Update centroids to center of points 4.  Repeat until conditions met x x x x x
  39. 39. © 2017 MapR Technologies Cluster Uber Trip Locations
  40. 40. © 2017 MapR Technologies Uber Data •  Date/Time: The date and time of the Uber pickup •  Lat: The latitude of the Uber pickup •  Lon: The longitude of the Uber pickup •  Base: The TLC base company affiliated with the Uber pickup The Data Records are in CSV format. An example line is shown below: •  2014-08-01 00:00:00,40.729,-73.9422,B02598
  41. 41. © 2017 MapR Technologies Uber Example •  What are the “if questions” or properties we can use to group? –  These are the Features: –  Lattitude, longitude, Day of the week, time, rush hour … NEAR REALTIME PRICE SURGING
  42. 42. © 2017 MapR Technologies Spark ML workflow
  43. 43. © 2017 MapR Technologies Zeppelin Notebook with Spark Data Engineer Data Scientist
  44. 44. © 2017 MapR Technologies Load the data into a Dataframe: Define the Schema case class Uber(dt: String, lat: Double, lon: Double, base: String) val schema = StructType(Array( StructField("dt", TimestampType, true), StructField("lat", DoubleType, true), StructField("lon", DoubleType, true), StructField("base", StringType, true) )) Input Comma Separated Values: datetime, lattitude, longitude, base 2014-08-01 00:00:00,40.729,-73.9422,B02598
  45. 45. © 2017 MapR Technologies Data Frame Load data Load the data into a Dataset val train: Dataset[Uber] = spark.read.option("inferSchema", "false") .schema(schema).csv(”uber.csv").as[Uber]
  46. 46. © 2017 MapR Technologies Dataset merged with Dataframe •  in Spark 2.0, DataFrame APIs merged with Datasets APIs •  A Dataset is a collection of typed objects •  A DataFrame is a Dataset of generic Row objects
  47. 47. © 2017 MapR Technologies Spark Distributed Datasets Dataset W Executor P4 W Executor P1 P3 W Executor P2 partitioned Partition 1 8213034705, 95, 2.927373, jake7870, 0…… Partition 2 8213034705, 115, 2.943484, Davidbresler2, 1…. Partition 3 8213034705, 100, 2.951285, gladimacowgirl, 58… Partition 4 8213034705, 117, 2.998947, daysrus, 95…. •  Read only collection of typed objects •  Partitioned across a cluster •  Operated on in parallel •  Cached in memory
  48. 48. © 2017 MapR Technologies Spark Distributed Datasets Spark revolves around RDDs •  Read only collection of elements •  Partitioned across a cluster •  Operated on in parallel •  Cached in memory
  49. 49. © 2017 MapR Technologies Extract the Features Image reference O’Reilly Learning Spark + + ̶+ ̶ ̶ Feature Vectors Model Featurization Training Model Evaluation Best Model Training Data + + ̶+ ̶ ̶ + + ̶+ ̶ ̶ + + ̶+ ̶ ̶ + + ̶+ ̶ ̶ Feature Vectors are vectors of numbers representing the value for each feature
  50. 50. © 2017 MapR Technologies Data Frame Load data Add column DataFrame + Features Use VectorAssembler to put features in vector column val featureCols = Array("lat", "lon") val assembler = new VectorAssembler() .setInputCols(featureCols) .setOutputCol("features")
  51. 51. © 2017 MapR Technologies Data Frame Load data transform Estimator val kmeans = new KMeans() .setK(8) .setFeaturesCol("features") .setMaxIter(5) Create Kmeans Estimator, Set Features DataFrame + Features
  52. 52. © 2017 MapR Technologies Data Frame Load data transform Estimator val Array(trainingData, testData) = df2.randomSplit(Array(0.7, 0.3), 5043) val model = kmeans.fit(trainingData) Create Kmeans Estimator, Set Features DataFrame + Features fit fitted model input
  53. 53. © 2017 MapR Technologies Data Frame Load data transform Estimator model.clusterCenters.foreach(println) [40.76930621976264,-73.96034885367698] [40.67562793272868,-73.79810579052476] [40.68848772848041,-73.9634449047477] [40.78957777777776,-73.14270740740741] [40.32418330308531,-74.18665245009073] [40.732808848486286,-74.00150153727878] [40.75396549974632,-73.57692359208531] [40.901700842900674,-73.868760398198] Create Kmeans Estimator, Set Features DataFrame + Features fit fitted model input
  54. 54. © 2017 MapR Technologies fitted model Evaluate Clusters from K-Means Estimator transform features val clusters = model.transform(testdata) prediction DataFrame + Features DataFrame + Features + prediciton
  55. 55. © 2017 MapR Technologies Kafka API and Streaming Data
  56. 56. © 2017 MapR Technologies Part 2: MapR Event Streams with Kafka API and Spark Streaming •  End to End Application for Monitoring Uber Data using Spark ML •  https://mapr.com/blog/monitoring-real-time-uber-data-using-spark-machine- learning-streaming-and-kafka-api-part-2/
  57. 57. © 2017 MapR Technologies Serve DataStore DataCollect Data What Do We Need to Do ? Process DataData Sources ? ? ? ?
  58. 58. © 2017 MapR Technologies Collect the Data Data Ingest MapR-FS Source Stream Topic •  Data Ingest: –  Network Based: MapR Streams, Kafka, Kinesis, Twitter, Sockets... –  File Based: NFS with MapR-FS, HDFS
  59. 59. © 2017 MapR Technologies Organize Data into Topics with MapR Streams Topics Organize Events into Categories and Decouple Producers from Consumers Consumers MapR Cluster Topic: Pressure Topic: Temperature Topic: Warnings Consumers Consumers Kafka API Kafka API
  60. 60. © 2017 MapR Technologies Scalable Messaging with MapR Streams Server 1 Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning Server 2 Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning Server 3 Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning Topics are partitioned for throughput and scalability
  61. 61. © 2017 MapR Technologies Scalable Messaging with MapR Streams Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning Producers are load balanced between partitions Kafka API
  62. 62. © 2017 MapR Technologies Scalable Messaging with MapR Streams Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning Consumers Consumers Consumers Consumer groups can read in parallel Kafka API
  63. 63. © 2017 MapR Technologies Partition is like a Queue Consumers MapR Cluster Topic: Admission / Server 1 Topic: Admission / Server 2 Topic: Admission / Server 3 Consumers Consumers Partition 1 New Messages are appended to the end Partition 2 Partition 3 6 5 4 3 2 1 3 2 1 5 4 3 2 1 Producers Producers Producers New Message 6 5 4 3 2 1 Old Message
  64. 64. © 2017 MapR Technologies Events are delivered in the order they are received, like a queue messages are delivered in the order they are received MapR Cluster 6 5 4 3 2 1 Consumer groupProducers Read cursors Consumer group
  65. 65. © 2017 MapR Technologies Unlike a queue, events are persisted even after they’re delivered Messages remain on the partition, available to other consumers Minimizes Non-Sequential disk read-writes MapR Cluster (1 Server) Topic: Warning Partition 1 3 2 1 Unread Events Get Unread 3 2 1 Client Library ConsumerPoll
  66. 66. © 2017 MapR Technologies How do we do this with High Performance at Scale? Parallel operations and minimize disk read/write time
  67. 67. © 2017 MapR Technologies Processing Same Message for Different Purposes Consumers Consumers Consumers Producers Producers Producers MapR-FS Kafka API Kafka API
  68. 68. © 2017 MapR Technologies Use the Model with Streaming Data
  69. 69. © 2017 MapR Technologies Collect Data Process the Data with Spark Streaming and Spark Machine Learning Process Data Stream Topic •  Extension of the core Spark AP •  Enables scalable, high-throughput, fault-tolerant stream processing of live data
  70. 70. © 2017 MapR Technologies ML Discovery Model Building Model Training/ Building Training Set Test Model Predictions Test Set Evaluate Results Historical Data Deployed Model Predictions Data Discovery, Model Creation Production Feature Extraction Feature Extraction ●  Churn Modelling Uber trips Stream TopicUber trips New Data
  71. 71. © 2017 MapR Technologies Use Case: Real-Time Analysis of Geographically Clustered Vehicles Uber trip data enrich with K-means Cluster location Stream Topic Stream Topic Spark Streaming Spark Streaming Write to MapR-DB SQL
  72. 72. © 2017 MapR Technologies Use Case: Time Series Data Uber trip data Stream Topic 2014-08-01 00:00:00, 40.729,-73.9422,B02598 {"dt":"2014-08-01 00:00:00.0”, "lat":40.3495,"lon":-74.0667, "base":"B02682","cluster":5} Enrich with K-means cluster id Spark Streaming read Stream Topic
  73. 73. © 2017 MapR Technologies Processing Spark DStreams Data stream divided into batches of X milliseconds = DStreams
  74. 74. © 2017 MapR Technologies Function to Parse the Message Data to Uber Objects 2014-08-01 00:00:00, 40.729,-73.9422,B02598
  75. 75. © 2017 MapR Technologies Load the saved model // load model for getting clusters val model = KMeansModel.load(modelpath)
  76. 76. © 2017 MapR Technologies Create a DStream DStream: a sequence of RDDs representing a stream of data val messagesDStream = KafkaUtils.createDirectStream[String, String](ssc, LocationStrategies.PreferConsistent, consumerStrategy) // get message values from key,value and parse to Uber objects val uDStream = linesDStream.map(_._2).map(_.split(",")) .map(p => Uber(p(0), p(1).toDouble, p(2).toDouble, p(3))) batch time 0 to 1 batch time 1 to 2 batch time 2 to 3 dStream Stored in memory as an RDD
  77. 77. © 2017 MapR Technologies Parse message txt to Uber Object and convert to DataFrame uDStream.foreachRDD{ rdd => val df = rdd.toDF() // get cluster centers and add to df // send to Topic } ssc.start() ssc.awaitTermination()
  78. 78. © 2017 MapR Technologies Enrich Data with Cluster
  79. 79. © 2017 MapR Technologies Convert to JSON send to Topic, Send the Enriched Message
  80. 80. © 2017 MapR Technologies Process Dstream Streaming Applicaton Output dStream RDDs batch time 2 to 3 batch time 1 to 2 batch time 0 to 1 ValueDStream RDDs Transformed RDDs map map map Stream Topic
  81. 81. © 2017 MapR Technologies Real Time Dashboard
  82. 82. © 2017 MapR Technologies Part 3: Realtime Dashboard using Vert.x •  End to End Application for Monitoring Uber Data using Spark ML •  https://mapr.com/blog/monitoring-uber-with-spark-streaming-kafka-and- vertx/
  83. 83. © 2017 MapR Technologies Serve DataCollect Data What Do We Need to Do ? MapR-FS Process DataData Sources Stream Topic
  84. 84. © 2017 MapR Technologies Use Case: Real-Time Analysis of Geographically Clustered Vehicles Uber trip data enrich with K-means Cluster location Stream Topic Stream Topic Spark Streaming Spark Streaming Write to MapR-DB SQL
  85. 85. © 2017 MapR Technologies The Vert.x toolkit and Web Application Architecture •  Event-driven •  Event Bus •  Verticles single threaded
  86. 86. © 2017 MapR Technologies Use Case Dashboard
  87. 87. © 2017 MapR Technologies Dashboard Architecture
  88. 88. © 2017 MapR Technologies Create a Vert.x Service create a Router object, which routes HTTP request URLs to handlers
  89. 89. © 2017 MapR Technologies Create a Vert.x Service Route paths that match /eventbus/* to be associated with an event bus bridge SockJSHandler
  90. 90. © 2017 MapR Technologies Create a Vert.x Service create an HttpServer object tell the server to listen on the configured port for incoming requests
  91. 91. © 2017 MapR Technologies Dashboard Architecture
  92. 92. © 2017 MapR Technologies Vert.x Service Kafka consumer
  93. 93. © 2017 MapR Technologies Vert.x Service Kafka consumer Create Kafka Consumer Subscribe to Uber topic
  94. 94. © 2017 MapR Technologies Vert.x Service Kafka consumer Publish received messages to the Vert.x event bus address “dashboard.”
  95. 95. © 2017 MapR Technologies The Dashboard Vert.x HTML5 Javascript Client
  96. 96. © 2017 MapR Technologies Javascript packages
  97. 97. © 2017 MapR Technologies Initializing the Heatmap
  98. 98. © 2017 MapR Technologies Dashboard Architecture
  99. 99. © 2017 MapR Technologies Creating the Vertx EventBus •  create an instance of the vertx.EventBus object •  add an onopen listener, which registers an event bus handler for the address “dashboard.” •  handler will receive all messages published to the “dashboard” address
  100. 100. © 2017 MapR Technologies Add Event Trip location points to Map
  101. 101. © 2017 MapR Technologies Add Event Trip location points to Map Parse JSON message
  102. 102. © 2017 MapR Technologies Add Event Trip location points to Map Add lattitude and longitude points to heatmap
  103. 103. © 2017 MapR Technologies Add Event Trip location points to Map If cluster center is new then add marker
  104. 104. © 2017 MapR Technologies Spark and HBase
  105. 105. © 2017 MapR Technologies Part 4: using MapR-DB with HBase API •  https://mapr.com/blog/monitoring-uber-pt4/
  106. 106. © 2017 MapR Technologies Serve DataStore DataCollect Data What Do We Need to Do ? MapR-FS Process DataData Sources MapR-FS Stream Topic
  107. 107. © 2017 MapR Technologies Use Case: Real-Time Analysis of Geographically Clustered Vehicles Uber trip data enrich with K-means Cluster location Stream Topic Stream Topic Spark Streaming Spark Streaming Write to MapR-DB SQL
  108. 108. © 2017 MapR Technologies MapR-DB (HBase API) is Designed to Scale Key Range xxxx xxxx Key Range xxxx xxxx Key Range xxxx xxxx Fast Reads and Writes by Key! Data is automatically partitioned by Key Range! Key colB colC xxx val val xxx val val Key colB colC xxx val val xxx val val Key colB colC xxx val val xxx val val
  109. 109. © 2017 MapR Technologies Store Lots of Data with NoSQL MapR-DB bottleneck Storage ModelRDBMS MapR-DB Normalized schema à Joins for queries can cause bottleneck De-Normalized schema à Data that is read together is stored together Key colB colC xxx val val xxx val val Key colB colC xxx val val xxx val val Key colB colC xxx val val xxx val val
  110. 110. © 2017 MapR Technologies Spark Streaming writing to MapR-DB (HBase API)
  111. 111. © 2017 MapR Technologies Spark HBase and MapR-DB Binary Connector •  HConnection object in every Spark Executor: •  allowing for distributed parallel writes, reads, or scans
  112. 112. © 2017 MapR Technologies Spark Hbase streamBulkPut •  HBaseContext streamBulkPut method parameters: •  message value DStream, the TableName to write to, function to convert the Dstream values to HBase put records.
  113. 113. © 2017 MapR Technologies Massively Parrallel writes to HBase The Spark Streaming bulk put enables massively parallel sending of puts to HBase
  114. 114. © 2017 MapR Technologies HBase Schema To use the Spark HBase Connector, you need to define the Catalog for the schema mapping between the HBase and Spark
  115. 115. © 2017 MapR Technologies SparkSQL and DataFrames: Define the Schema define the Catalog for the schema mapping between the HBase and Spark
  116. 116. © 2017 MapR Technologies Loading data from MapR-DB into a Spark DataFrame Use Catalog defining schema
  117. 117. © 2017 MapR Technologies Spark Dataframes combine filters and select filters rows for cluster ids (the beginning of the row key) >= 9. The select selects a set of columns: key, lat, and lon.
  118. 118. © 2017 MapR Technologies Stream Processing Building a Complete Data Architecture MapR File System (MapR-XD) MapR Converged Data Platform MapR Database (MapR-DB) MapR Event Streams Sources/Apps Bulk Processing
  119. 119. © 2017 MapR Technologies
  120. 120. © 2017 MapR Technologies To Learn More: •  MapR Free ODT http://learn.mapr.com/
  121. 121. © 2017 MapR Technologies MapR Blog • https://www.mapr.com/blog/
  122. 122. © 2017 MapR Technologies …helping you put data technology to work ●  Find answers ●  Ask technical questions ●  Join on-demand training course discussions ●  Follow release announcements ●  Share and vote on product ideas ●  Find Meetup and event listings Connect with fellow Apache Hadoop and Spark professionals community.mapr.com
  123. 123. © 2017 MapR Technologies Open Source Engines & Tools Commercial Engines & Applications Enterprise-Grade Platform Services DataProcessing Web-Scale Storage MapR-XD MapR-DB Search and Others Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability MapR Evemt Streams Cloud and Managed Services Search and Others UnifiedManagementandMonitoring Search and Others Event StreamingDatabase Custom Apps MapR Converged Data Platform HDFS API POSIX, NFS Kakfa APIHBase API OJAI API
  124. 124. © 2017 MapR Technologies Q&A ENGAGE WITH US

×