Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Webinar Series Part 5 New Features of HDF 5

3,692 views

Published on

Overview of the newest features of Hortonworks DataFlow highlighting the new processors, new user interface, edge intelligence powered by Apache MiNiFi and new support for multi-tenancy and new zero master clustering architecture

Published in: Technology
  • Be the first to comment

Webinar Series Part 5 New Features of HDF 5

  1. 1. GuideTo New Features of Hortonworks DataFlow 2.0 Haimo Liu Product Manager Bryan Bende Sr. Software Engineer
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Connected Data Platforms
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Stream Processing Flow Management Enterprise Services At the edge Security Visualization On premises In the cloud Registries/Catalogs Governance (Security/Compliance) Operations HDF 2.0 – Data in Motion Platform
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Flow Management Flow management + Stream Processing D A T A I N M O T I O N D A T A A T R E S T IoT Data Sources AWS Azure Google Cloud Hadoop NiFi Kafka Storm Others… NiFi NiFi NiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi NiFi HDF 2.0 – Data in Motion Platform Enterprise Services Ambari Ranger Other services
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dataflow Management
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Problems Today: Timely Access to Data and Decisions http://diginomica.com/2016/04/22/royal-mail-starts-to-deliver-on-hortonworks-data-in-motion-promise “HDF helps us to streamline the flow of data and build models and visualisations quickly, so that my team can work iteratively with business colleagues on building solutions that work for the business.“ Royal Mail
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDP HORTONWORKS DATA PLATFORM Powered by Apache Hadoop HDF Makes Big Data Ingest Easy Complicated, messy, and takes weeks to months to move the right data into Hadoop Streamlined, Efficient, Easy HDP HORTONWORKS DATA PLATFORM Powered by Apache Hadoop
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Create a live dataflow in minutes How would that change your business?
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Add processor for data intake. Time: 1 minute 1 Drag and drop processor from top menu
  10. 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Choose the specific processor 2 Choose one of the processors – currently 170+ available
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Example: Pick Twitter Processor
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Configure the processor. Time: 2 minutes 3 4 Select processor and choose option to Configure Adjust parameters as required
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Another processor for data output. Time: 1 minute 5 6 Filter for and select a “Put” processor Drag and drop processor from top menu
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Configure second processor. Time: 1 minute 7 Configure 2nd processor
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Connect processors, configure connection. 2 minutes Configure Connection8 Note: Sample Flow is different from previous example of PutHDFS. This dataflow is PutFile. Same concepts apply.
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Click Start to Begin Processing. Time total: 7 minutes 9 Click start “play” to being processing (will run continuously until you select stop)
  17. 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDF 2.0: what’s new?
  18. 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges Different devices Globally distributed organization Intelligence on the edge Time to delivery Getting the right data to the right place at the right time is not trivial!
  19. 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi Different devices: different standards/protocols/formats • Out of the box processors • Intuitive GUI to combine processors and build ingestion pipeline • Extensible framework, extremely easy to add a new source/protocol Globally distributed organizations Intelligence on the edge Time to delivery Support disparate, distributed systems with easy drag & drop
  20. 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices: different standards/protocols/formats • Out of the box processors • Intuitive GUI to combine processors and build ingestion pipeline • Extensible framework, extremely easy to add a new source/protocol • Deeper ecosystem integration, 170+ processors in total Globally distributed organizations Intelligence on the edge Time to delivery Expanded ecosystem
  21. 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDF 2.0 has 170+ Processors, 30% Increase from HDF 1.2 Hash Extract Merge Duplicate Scan GeoEnrich Replace ConvertSplit Translate Route Content Route Context Route Text Control Rate Distribute Load Generate Table Fetch Jolt Transform JSON Prioritized Delivery Encrypt Tail Evaluate Execute HL7 FTP UDP XML SFTP HTTP Syslog Email HTML Image AMQP MQTT All Apache project logos are trademarks of the ASF and the respective projects. Fetch
  22. 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deeper Ecosystem Integration – New Processors Processor Description Publish/ConsumeKafka Two NARs, with kafka 0.9/0.10 client libraries, respectively JoltTransformJson Manipulate JSON data on the fly, with a preview functionality GenerateTableFetch Incremental fetch + parallel fetch against source table partitions PutHiveQL Ingest to Hive tables SelectHiveQL Select from Hive tables PutHiveStreaming ingest streaming data to Hive, leverage Hive streaming API CovertAvroToORC Format conversation, Avro to ORC Publish/ConsumeMQTT MQTT is a popular protocol in IoT world
  23. 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices: different standards/protocols/formats • Out of the box processors • Intuitive GUI to combine processors and build ingestion pipeline • Extensible framework, extremely easy to add a new source/protocol • Deeper ecosystem integration, 170+ processors in total • Redesigned UI, refreshed user experience Globally distributed organizations Intelligence on the edge Time to delivery More intuitive user interface
  24. 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Modernized UI – Complete Interface Redesign
  25. 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi Different devices Globally distributed organizations: dataflow across multiple data centers • Internal Site to Site communication, secured by 2-way SSL • Environmental neutral Intelligence on the edge Time to delivery Secure communications across disparate, distributed systems
  26. 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices Globally distributed organizations: dataflow across multiple data centers • Internal Site to Site communication, secured by 2-way SSL • Environmental neutral • Variable registry Intelligence on the edge Time to delivery Simplifies flow provisioning
  27. 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Variable Registry  Variable registry – To automatically resolve environmental specific values • Example: connection string • The same key referenced in a template, can be mapped to different values in DEV vs PROD – In-memory variable registry
  28. 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices Globally distributed organizations: dataflow across multiple data centers • Internal Site-to-Site communication, secured by 2-way SSL • Environmental neutral • Variable registry • Better deployment management, Apache Ambari integration Intelligence on the edge Time to delivery Simplified operations in distributed environments
  29. 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari Integration  NiFi cluster management – Start/stop NiFi service – Centralized place for managing config files  Ambari to display NiFi metrics  Ambari to manage kerberos authentication Ambari-NiFi Integration  Automated deployment by Ambari  Manual RPM deployment  Tar.gz/zip deployment (NIFI/MINIFI Java)  Tar.gz for most Linux/Mac, compile your own for other OS (MINIFI C++) HDF 2.0 Deployment Model
  30. 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices Globally distributed organizations: dataflow across multiple data centers • Internal Site to Site communication, secured by 2-way SSL • Environmental neutral • Variable registry • Better deployment management, Apache Ambari integration • Enhanced Site to Site communication Intelligence on the edge Time to delivery Modularized s2s to support pluggable protocols
  31. 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi Different devices, Globally distributed organizations Intelligence on the edge: analytics on resource constrained devices • Run single node on the edge, communicating back via S2S • Bi-directional communication Time to delivery Analytics at the Edge
  32. 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices, Globally distributed organizations Intelligence on the edge: analytics on resource constrained devices • Run single node on the edge, communicating back via Site to Site protocol • Bi-directional communication • Apache MiNiFi, bi-directional command and control on the edge Time to delivery Edge Intelligence for the first mile
  33. 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Edge Intelligence with Apache MiNiFi  Guaranteed delivery  Data buffering ‒ Backpressure ‒ Pressure release  Prioritized queuing  Flow specific QoS ‒ Latency vs. throughput ‒ Loss tolerance  Data provenance  Recovery / recording a rolling log of fine-grained history  Designed for extension Different from Apache NiFi  Design and Deploy  Warm re-deploys Key Features
  34. 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi vs. MiNiFi Java Agent NiFi Framework Components MiNiFi NiFi Framework User Interface Components NiFi
  35. 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi Different devices, Globally distributed organizations, Intelligence on the edge Time to delivery: need an application, out of the box solution • Data provenance, traceability and compliance issues • Flow visibility, big picture of the enterprise dataflow • Automatic failure handling FAST AND EASY To get results, tune and change dataflows
  36. 36. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices, Globally distributed organizations, Intelligence on the edge Time to delivery: need an application, out of the box solution • Data provenance, traceability and compliance issues • Flow visibility, big picture of the enterprise dataflow • Automatic failure handling • Control plane high availability, zero-master clustering High availability
  37. 37. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zero-master Clustering  New clustering paradigm  Zero-master clustering – Multiple entry points, no master node, no single point of failure – Auto-elected cluster coordinator for cluster maintenance – Automatic failover handling HDF 2.0 (NiFi 1.0.0)
  38. 38. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zero-master Clustering
  39. 39. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zero-master Clustering Heartbeat messages (every 5s by default) Node status: connecting/connected/disconnecting/disconnected
  40. 40. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Zero-master Clustering
  41. 41. 41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Challenges & NiFi & HDF 2.0 Different devices, Globally distributed organizations, Intelligence on the edge Time to delivery: need an application, out of the box solution • Data provenance, traceability and compliance issues • Flow visibility, big picture of the enterprise dataflow • Automatic failure handling • Control plane high availability, zero-master clustering • Multi-tenancy flow editing, and authorization Secured enterprise wide collaboration
  42. 42. 42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi-tenant Flow Editing  Multi-tenant flow editing – Self-service collaborative model, google-doc type user experience – Multiple teams making edits to different processors at the same time – Only the component being modified is locked, not the entire flow – Scalable model to speed up flow editing HDF 2.0 (NiFi 1.0.0)
  43. 43. 43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi-tenant Authorization  Component level authorization – New authorizer API – “Read” and “Write” permissions – Protection against unauthorized usage without losing context  Authorization management – Internal management (NIFI) – External management (Ranger, etc.) HDF 2.0 (NiFi 1.0.0)
  44. 44. 44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi-tenant Authorization Read Permission Processor name visible Processor configuration visible
  45. 45. 45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi-tenant Authorization NO Read Permission Processor name & configuration invisible (content) Statistics visible (context)
  46. 46. 46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Questions? Hortonworks Community Connection: Data Ingestion and Streaming https://community.hortonworks.com/

×