Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HDF 3.1 : An Introduction to New Features

1,096 views

Published on

HDF 3.1 : Presentation at 13 February 2018 Mardis Gras Meetup at TRAC Intermodal in Princeton, NJ. Discussion Kafka 1.0, Apache Ambari, Apache NiFi 1.5, Schema Registry, SAM and NiFi Registry

Published in: Technology
  • Girls give in the ass in your city! On our website ➔ girlsx.ga
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

HDF 3.1 : An Introduction to New Features

  1. 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved. © Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information. Hortonworks Data Flow 3.1 Timothy Spann, Solutions Engineer Hortonworks @PaaSDev
  2. 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved. Disclaimer • This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. • Technical feasibility, market demand, user feedback, and the Apache Software Foundation community development process can all effect timing and final delivery. • This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. • Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. • Since this document contains an outline of general product development plans, customers should not rely upon it when making a purchase decision.
  3. 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved. MULTIPLE CLUSTERS AND SOURCES MULTIHYBRID DATAPLANE SERVICE (DPS) MANAGE, GOVERN, SECURE DATA LIFECYCLE MANAGER DATA STEWARD STUDIO* ISV SERVICES *not yet available, coming soon EXTENSIBLE SERVICES IBM DSX*CLOUD- BREAK* DATA ANALYTICS STUDIO* CONNECTED DATA PLATFORMS HORTONWORKS DATA PLATFORM (HDP®) DATA-AT-REST HORTONWORKS DATAFLOW (HDF™) DATA-IN-MOTION MODERN DATA USE CASES EDW OPTIMIZATION CYBER SECURITY DATA SCIENCE ADVANCED ANALYTICS PARTNER SOLUTIONS IOT/ STREAMING ANALYTICS HORTONWORKS CONNECTION ENTERPRISE SUPPORT PREMIER SUPPORT EDUCATIONAL SERVICES PROFESSIONAL SERVICES COMMUNITY CONNECTION HORTONWORKS PLATFORM SERVICES OPERATIONAL SERVICES SMARTSENSE™ Global Data Management With Hortonworks
  4. 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved. HDF Data-In-Motion Platform – with HDF 3.1 GA Release
  5. 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved. HDF 3.1 New and Enhanced Features Ease of Use Core Enhancements Cross-Product Integration Flow Management Stream Processing • NiFi-Atlas, -SmartSense, and -Knox integration (HDF on HDP scenario only) • NiFi-Ranger: Group based policy support for NiFi resources • New SAM operations module • SAM ”Test Mode” • Kafka 1.0 Support • Schema Registry • Schema Version Lifecycle Mgmt. • SAM extensibility improvements • Ambari and Ranger support for Kafka 1.0 • Improved Ambari experience: Automate adding NiFi nodes to existing cluster • Apache NiFi Registry (new) • Flow migration and version control • MiNiFi C++, Java, Android/IOS libraries GA • Containerized deployment (Docker)
  6. 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved. Improved Operational Efficiency MiNiFi C++ Agent C++ Agent C++ Agent C++ Agent There are many configuration options for MiNiFi C++, all dependent on the use case, they may help with: • Minimizing memory footprint • Lowering CPU consumption • Reducing size on disk https://community.hortonworks.com/articles/167193/building-and- running-minifi-cpp-in-orangepi-zero.html
  7. 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved. Integrated Provisioning and Security Kafka 1.0 Support To enhance data governance and lineage, users can now manage access control policies using resource or tag-based security in Ranger for Kafka 1.0 clusters. Users can now install, configure, manage, upgrade, monitor, and secure Kafka 1.0 clusters with Ambari. New processors in NiFi and Streaming Analytics Manager support Kafka 1.0 features including message headers and transactions.
  8. 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved. When HDF is co-located with HDP… Integrations with Atlas, Knox and SmartSense SmartSense
  9. 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved. 220+ Processors for Deeper Ecosystem Integration Hash Extract Merge Duplicate Scan GeoEnrich Replace ConvertSplit Translate Route Content Route Context Route Text Control Rate Distribute Load Generate Table Fetch Jolt Transform JSON Prioritized Delivery Encrypt Tail Evaluate Execute All Apache project logos are trademarks of the ASF and the respective projects. Fetch HTTP Syslog Email HTML Image HL7 FTP UDP XML SFTP AMQP WebSocket
  10. 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved. HDF 3.1 for Big Data Engineers Multiple users, frameworks, languages, data sources & clusters BIG DATA ENGINEER • Experience in ETL • Coding skills in Scala, Python, Java • Experience with Apache Hadoop • Knowledge of tools such Hive, Flume or Pig • Knowledge of SQL • Expert in ETL (Eating, Ties and Laziness) • Social Media Maven • Deep SME in Buzzwords • No Coding skills • Interest in Pig and Falcon CAT AI • Will Drive your Car • Will Fix Your Code • Will Not Be Discussed Today • Will Not Finish This Talk For Me, This Time
  11. 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved. Aggregate all data from sensors, drones, logs, geo-location devices, machines and social feeds Collect: Bring Together Mediate point-to-point and bi-directional data flows, delivering data reliably to Apache HBase, Apache Hive, HDFS, Slack and Email. Conduct: Mediate the Data Flow Parse, filter, join, transform, fork, query, sort, dissect, enrich with weather, location, Apache OpenNLP and Apache MXNet. Curate: Gain Insights
  12. 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved. NiFi (PROD) MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi Flow Registry API Persistence Other services Other services NiFi (QA) NiFi (Dev) Register DeployDeploy DataFlow Registry • NiFi Flow Registry • Standalone application/service (URL) • Standard API with pluggable components • Design and deploy mechanism for flow migration (SDLC) use cases
  13. 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved. Kafka Powerful Pattern with Kafka Headers: Pass Schema Key in Kafka Header Truck Geo Sensor Truck Speed Sensor Kafka Topic (raw-all_truck_events_csv) Centralized Schema Repository Publish CSV Events with Schema metadata from SR stored in Kafka Header Data Movement and Processing by NiFi using new Record-Based processing Kafka Event with Header Published by the Sensor Producing App Kafka Header Kafka Payload header with key schema.name that has metadata info to lookup the schema in HWX SR CSV Binary Event
  14. 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved. Nifi and Kafka 1.0 – Use Case for Kafka Message Headers Kafka
  15. 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved. Grafana & Kafka 1.0 Integration: Monitoring Topic Level KPIs Broker Level KPIs Kafka
  16. 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved. Apache Spark Integration
  17. 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved. Apache Spark Integration
  18. 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved. New: Integrated Registry Service • Integrated Flow Registry Service • Sharable between NiFi environments for Dev/UAT/Prod promotion • API or GUI driven • Can be integrated with Enterprise Version Control e.g. GitLab • ‘Buckets’ of Flows for security and access control SDLC
  19. 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved. New: Integrated Variable Registry Service • Integrated Variable Registry • Sets of key:value pairs available on every Process Group • Referenced with NiFi Expression Language • Dynamically changeable at runtime • Use within Versioned Flows to set Environment Variables • GUI or API driven SDLC
  20. 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved. • Wrap atomic functions in harnesses for regression testing • Integrate via the Rest-API to automate testing through Jenkins etc. • Automate triggering tests when new Versions are pushed to the Flow Registry SDLC Regression test with Golden Datasets
  21. 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved. • Nest Versioned Process Groups to test composite functions • Wrap in test harnesses to validate functionality • Flow Versioning provides visibility as components of Composites are updated SDLC Build & Test Composite DataFlows
  22. 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved. New: Design & Deploy complementing Command & Control • SDLC Dev: Place Process Groups under Version Control • Make changes and commit to new version • Roll Versions back or forward SDLC
  23. 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved. • Get Notifications of local changes or new versions available in Repository • Revert or Commit local changes via the GUI or Rest-API • Use Rest-API to integrate with Jenkins, etc. SDLC New: Design & Deploy complementing Command & Control
  24. 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved. Administration
  25. 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved. Administration
  26. 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved. Schema Registry
  27. 27. 27 © Hortonworks Inc. 2011–2018. All rights reserved. Schema Registry
  28. 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved. Schema Registry
  29. 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved. Lifecycle Action 1 - Action: Fork Schema Version to Branch called Dev Schema Registry
  30. 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved. More Data Set Coverage AtlasNiFiFlowLineage (ReportingTask) NiFi Flow NiFi Data Provenance Kafka topic 1. static flow lineage from NiFi flow def 2. Add DataSet entities from NiFi Data Provenance events Atlas Integration
  31. 31. 31 © Hortonworks Inc. 2011–2018. All rights reserved. sensor-data tweets default.sensor_data path1 path0 path2 Atlas Integration
  32. 32. 32 © Hortonworks Inc. 2011–2018. All rights reserved. Registry
  33. 33. 33 © Hortonworks Inc. 2011–2018. All rights reserved. Registry
  34. 34. 34 © Hortonworks Inc. 2011–2018. All rights reserved. Registry
  35. 35. 35 © Hortonworks Inc. 2011–2018. All rights reserved. Registry
  36. 36. 36 © Hortonworks Inc. 2011–2018. All rights reserved. Questions?
  37. 37. 37 © Hortonworks Inc. 2011–2018. All rights reserved. https://community.hortonworks.com/articles/161761/new-features-in-apache-nifi- 15-apache-nifi-registr.html https://community.hortonworks.com/articles/171787/hdf-31-executing-apache- spark-via-executesparkinte.html https://community.hortonworks.com/articles/171960/using-apache-mxnet-on-an- apache-nifi-15-instance-w.html https://community.hortonworks.com/articles/171893/hdf-31-executing-apache- spark-via-executesparkinte-1.html Resources
  38. 38. 38 © Hortonworks Inc. 2011–2018. All rights reserved. Contact https://github.com/tspannhw/ApacheBigData101/tree/master https://community.hortonworks.com/users/9304/tspann.html https://dzone.com/users/297029/bunkertor.html https://www.meetup.com/futureofdata-princeton/ https://twitter.com/PaaSDev https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html
  39. 39. 39 © Hortonworks Inc. 2011–2018. All rights reserved. Hortonworks Community Connection Read access for everyone, join to participate and be recognized • Full Q&A Platform (like StackOverflow) • Knowledge Base Articles • Code Samples and Repositories
  40. 40. 40 © Hortonworks Inc. 2011–2018. All rights reserved. Community Engagement Participate now at: community.hortonworks.com© Hortonworks Inc. 2011 – 2015. All Rights Reserved 4,000+ Registered Users 10,000+ Answers 15,000+ Technical Assets One Website!

×