Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hortonworks Data in Motion Webinar Series - Part 1

12,535 views

Published on

VIEW THE ON-DEMAND WEBINAR: http://hortonworks.com/webinar/introduction-hortonworks-dataflow/

Learn about Hortonworks DataFlow (HDFTM) and how you can easily augment your existing data systems – Hadoop and otherwise. Learn what Dataflow is all about and how Apache NiFi, MiNiFi, Kafka and Storm work together for streaming analytics.

Published in: Technology
  • Be the first to comment

Hortonworks Data in Motion Webinar Series - Part 1

  1. 1. Harnessing Data-in-Motion with Hortonworks DataFlow Introduction to HDF 2.0 Haimo Liu Product Manager Aldrin Piri Technical Staff
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda HDF 2.0: Flow Management – NiFi basics – NiFi use cases – NiFi demos HDF 2.0: Streaming Analytics
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Simplistic View of Enterprise Data Flow Data Flow Process and Analyze Data Acquire Data Store Data
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interacting with different business partners and customers Realistic View of Enterprise Data Flow
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved • For agile and immediate creation, configuration, control of dataflowsVisual Command and Control • Ensures trust of your dataData Lineage (Provenance) • Because not all data is of equal importanceData Prioritization • Since not all senders/receivers/connections work perfectly all the timeData Buffering/Back-Pressure • Adapt to different situations with different requirementsControl Latency vs Throughput • Security of data, and data accessSecure Control Plane/Data Plane • ScalabilityScale out Clustering • Ecosystem flexibility and growthExtensibility Apache NiFi: Designed for 8 challenges of global enterprise dataflow
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Apache NiFi used for? • Reliable and secure transfer of data between systems • Delivery of data from sources to analytic platforms • Enrichment and preparation of data: – Conversion between formats – Extraction/Parsing – Routing decisions What is Apache NiFi NOT used for? • Distributed Computation • Complex Event Processing • Joins / Complex Rolling Window Operations Use Cases for Apache NiFi
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved FlowFile • Unit of data moving through the system • Content + Attributes (key/value pairs) Processor • Performs the work, can access FlowFiles Connection • Links between processors • Queues that can be dynamically prioritized Terminology
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HTTP Data FlowFile HTTP/1.1 200 OK Date: Sun, 10 Oct 2010 23:26:07 GMT Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT Content-Type: text/html Hello world XXXXXXXXXXXXXXXXXXXXXXXXXXXX Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016' Key: 'fileSize’ Value: '23609' Key: 'filename’ Value: '15650246997242' Key: 'path’ Value: './’ 0101010101110101010101010101 (Binary) Header Content Analogy: FlowFiles are like HTTP Data
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 1. Drag and drop processors to build a flow 2. Start, stop, and configure components in real time 3. View errors and corresponding error messages 4. View statistics and health of data flow 5. Create templates of common processor & connections Create, Run, View, Start, Stop, Change, Fix, Dataflows in Real-Time
  10. 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi Demo: Tail Logs, Route on Content, Buffer in Kafka, Deliver to HDFS
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Data Provenance and Why is it Important? BEGIN END LINEAGE IT and Cloud Operators • Understand traceability, lineage • Enable recovery and replay Compliance Regulations • Provide an audit trail • Remediation capabilities
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Provenance Enables Easy Access and Traceability of Changes
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Need Fine-Grained Security and Compliance? Security • Secured authentication • Enterprise authorization services – entitlements change often • Encrypted content, encrypted communications • People and systems with different roles require difference access levels • Tagged/classified data
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Repositories - Pass by reference
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Repositories – Copy on Write
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda HDF 2.0 Flow Management HDF 2.0 Platform Evolution – Product offering – Example use case
  17. 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved  Constrained  High-latency  Localized context  Hybrid – cloud / on-premises  Low-latency  Global context Core Infrastructure Hortonworks DataFlow Manages Data in Motion Regional InfrastructureSources
  18. 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DataFlow Management and Stream Processing Core InfrastructureSources  Constrained  High-latency  Localized context  Hybrid – cloud / on-premises  Low-latency  Global context Regional Infrastructure
  19. 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Edge Intelligence with Apache MiNiFi  Guaranteed delivery  Data buffering ‒ Backpressure ‒ Pressure release  Prioritized queuing  Flow specific QoS ‒ Latency vs. throughput ‒ Loss tolerance  Data provenance  Recovery / recording a rolling log of fine-grained history  Designed for extension Different from Apache NiFi  Design and Deploy  Warm re-deploys Key Features
  20. 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi vs. MiNiFi Java Agent NiFi Framework Components MiNiFi NiFi Framework User Interface Components NiFi
  21. 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Example: Company X provides alerting services when users’ resting heart rate higher than a threshold Real-Time Insights Require DataFlow Mgmt and Stream Processing Acquire Data Company X Cloud Instance 1 Acquire Data Company X Cloud Instance 2 Acquire Data Company X Cloud Instance 3 Acquire Data Across Cloud Instances Parse, Filter, Validate, Enrich and Route Core Data Center Analytics/Pattern Match Data Store Alerts Dashboards/Visualization Flow Management Stream ProcessingLegend:
  22. 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data in Motion Needs Dataflow Management and Stream Processing  Acquire data from various Wearable Device’s Cloud Instances  Move Data from Customer Cloud Instances to on-premise instance  Perform Intelligent Routing & Filtering of data. The routing and filtering rules will be often changed at run-time.  Deliver the data data to various downstream systems. New downstream apps should will always appear and the data should be fed to it when it comes online.  Parse the device data to standardized format that downstream sysem can understand  Enrich the data with contextual information including patient/customer info (age, sex, etc..)  Recognize the Pattern when the resting heart rate exceeds a certain threshold (the insight), and then create an alert/notification.  Run a Outlier detection model on streaming heart rate that comes in. If the score is above certain threshold, alert on the heart rate. Flow Management (NiFi, MiNiFi) Stream Processing (Storm, Kafka)
  23. 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Cases for Data in Motion Use Cases for Data-in-Motion Using DataFlow Mgmt • Data Ingestion • Edge Intelligence • First Mile Problem • Physical Data Movement • Simple event processing such as Route, Filter, Enrich, Transform, etc. When Only DataFlow Management is Required Use Cases for Data-in-Motion Using DataFlow Mgmt and Steam Processing • Flow Management to deliver data for Stream Processing • PLUS: Complex pattern matching on unbounded streams of data. When Both DataFlow Management and Stream Processing
  24. 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Flow management D A T A I N M O T I O N D A T A A T R E S T IoT Data Sources AWS Azure Google Cloud Hadoop NiFi Kafka Storm Others… NiFi NiFi NiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi NiFi HDF 2.0: Data-in-Motion Platform Enterprise Services Ambari Ranger Other services Flow management + Stream Processing
  25. 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved New Stream Processing Features HDF 2.0  New Storm Connectors  Storm-Kafka Spout using new client APIs  Storm Distributed Log Search  Storm Dynamic Worker Profiling  Kafka Grafana Integration  Storm Grafana Integration  Improved Nimbus HA  Storm Automatic Back Pressure  Storm Distributed cache  Storm Windowing and State Management  Storm Performance improvements  Improved Kafka SASL  Storm Topology Event inspector  Storm Resource Aware Scheduling  Storm Dynamic Log Levels  Pacemaker Storm Daemon  Kafka Rack Awareness Developer Productivity EnterpriseReadiness Operational Simplicity
  26. 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved For More Info: https://community.hortonworks.com/ Hortonworks Community Connection: Data Ingestion and Streaming

×