The document discusses how Apache NiFi can empower self-organizing teams by allowing them to control their own data movement pipelines. It describes NiFi as a data-movement production line that enables teams to build and change pipelines quickly. This helps reduce time-to-production and feedback loops by moving responsibility for data integration to individual teams rather than having changes controlled in a centralized manner.
Hortonworks for Financial Analysts PresentationHortonworks
Hortonworks was founded in 2011 by former Yahoo engineers to support the growth of Apache Hadoop. Their strategy is to overcome technology gaps by making Hadoop easier to install and use, enable an ecosystem of partners by defining open APIs, and overcome knowledge gaps by expanding technical content and training. This will help drive wider adoption of Apache Hadoop as the platform for managing big data in the enterprise.
The document discusses an overview presentation on Apache NiFi given by Timothy Spann. The presentation covered what NiFi is, how to install it, its terminology, user interface, extensibility, and ecosystem. It also included a demonstration of how to add a processor for data intake within 1 minute. The presentation was part of a larger meetup event on the future of data.
Hive2 Introduction -- Interactive SQL for Big DataYifeng Jiang
Introducing new feature of Hive 2 and how it achieve interactive SQL for big data. Features including the new LLAP engine, ACID merge, Hive + Druid integration, etc. I will explain what it is, how it works and what use cases it is for. I will also have some benchmark numbers to show.
This document discusses measurements of IPv6 deployment in Finland. It finds that while IPv6 readiness among internet infrastructure organizations in Finland is high, actual end-user IPv6 usage was only around 4% until recently, when one major Finnish ISP increased their IPv6 customers to 11%. IPv6 performance measurements show IPv6 is sometimes faster but usually on par with IPv4, suggesting dual-stack networks can take advantage of both protocols for optimal performance. In conclusion, Finland is beginning to deploy IPv6 more broadly but significant latent IPv6 capacity remains among infrastructure and end-users.
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...Timothy Spann
This document discusses using Apache NiFi and TensorFlow to ingest and analyze sensor and social media data in real-time. It describes setting up data flows in NiFi to ingest camera images and sensor data, run the images through TensorFlow to recognize objects, and store the enriched data in Hadoop. NiFi is also used to ingest social media feeds and perform sentiment analysis on the textual data. The document provides an overview of NiFi and TensorFlow and various options for integrating the two technologies.
The document provides an overview of machine learning concepts and techniques using Apache Spark. It discusses supervised and unsupervised learning methods like classification, regression, clustering and collaborative filtering. Specific algorithms like k-means clustering, decision trees and random forests are explained. It also introduces Apache Spark MLlib and how to build machine learning pipelines and models with Spark ML APIs.
The document discusses key considerations for running Hadoop in the cloud. It notes that running Hadoop in the cloud provides unlimited elastic scale, ephemeral and long-running workloads, no upfront hardware costs, and IT and business agility. It outlines some of the major cloud Hadoop solutions according to a Forrester Wave report and discusses architectural considerations like shared data and storage, on-demand ephemeral workloads, elastic resource management, and shared metadata, security, and governance.
- Introduction
This will cover a brief introduction of Apache Nifi and Minifi.
- Customer Use Cases
We will discuss few use cases using Apache Nifi.
- How to create your Processor in Nifi
We will show a sample implementation of Nifi Processor
We will show case a simple log processing using Apache Nifi
Hortonworks for Financial Analysts PresentationHortonworks
Hortonworks was founded in 2011 by former Yahoo engineers to support the growth of Apache Hadoop. Their strategy is to overcome technology gaps by making Hadoop easier to install and use, enable an ecosystem of partners by defining open APIs, and overcome knowledge gaps by expanding technical content and training. This will help drive wider adoption of Apache Hadoop as the platform for managing big data in the enterprise.
The document discusses an overview presentation on Apache NiFi given by Timothy Spann. The presentation covered what NiFi is, how to install it, its terminology, user interface, extensibility, and ecosystem. It also included a demonstration of how to add a processor for data intake within 1 minute. The presentation was part of a larger meetup event on the future of data.
Hive2 Introduction -- Interactive SQL for Big DataYifeng Jiang
Introducing new feature of Hive 2 and how it achieve interactive SQL for big data. Features including the new LLAP engine, ACID merge, Hive + Druid integration, etc. I will explain what it is, how it works and what use cases it is for. I will also have some benchmark numbers to show.
This document discusses measurements of IPv6 deployment in Finland. It finds that while IPv6 readiness among internet infrastructure organizations in Finland is high, actual end-user IPv6 usage was only around 4% until recently, when one major Finnish ISP increased their IPv6 customers to 11%. IPv6 performance measurements show IPv6 is sometimes faster but usually on par with IPv4, suggesting dual-stack networks can take advantage of both protocols for optimal performance. In conclusion, Finland is beginning to deploy IPv6 more broadly but significant latent IPv6 capacity remains among infrastructure and end-users.
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...Timothy Spann
This document discusses using Apache NiFi and TensorFlow to ingest and analyze sensor and social media data in real-time. It describes setting up data flows in NiFi to ingest camera images and sensor data, run the images through TensorFlow to recognize objects, and store the enriched data in Hadoop. NiFi is also used to ingest social media feeds and perform sentiment analysis on the textual data. The document provides an overview of NiFi and TensorFlow and various options for integrating the two technologies.
The document provides an overview of machine learning concepts and techniques using Apache Spark. It discusses supervised and unsupervised learning methods like classification, regression, clustering and collaborative filtering. Specific algorithms like k-means clustering, decision trees and random forests are explained. It also introduces Apache Spark MLlib and how to build machine learning pipelines and models with Spark ML APIs.
The document discusses key considerations for running Hadoop in the cloud. It notes that running Hadoop in the cloud provides unlimited elastic scale, ephemeral and long-running workloads, no upfront hardware costs, and IT and business agility. It outlines some of the major cloud Hadoop solutions according to a Forrester Wave report and discusses architectural considerations like shared data and storage, on-demand ephemeral workloads, elastic resource management, and shared metadata, security, and governance.
- Introduction
This will cover a brief introduction of Apache Nifi and Minifi.
- Customer Use Cases
We will discuss few use cases using Apache Nifi.
- How to create your Processor in Nifi
We will show a sample implementation of Nifi Processor
We will show case a simple log processing using Apache Nifi
This document discusses Hortonworks DataFlow (HDF) 3.0 for building IoT platforms. It introduces HDF 3.0 and its key components for data ingestion, management, security, and real-time analysis. These include NiFi for data movement, Streaming Analytics Manager (SAM) for building streaming analytics apps visually, and Schema Registry for managing schemas. The document also presents example IoT use cases and demonstrates building a real-time analytics app in SAM to analyze vehicle event data.
Introduction to Hortonworks Data Cloud for AWSYifeng Jiang
Hortonworks Data Cloud is a new cloud product from Hortonworks that offers pay-as-you-go pricing for launching and managing Hadoop clusters on AWS. It handles common big data use cases and focuses on ease of use by providing prescriptive cluster types. The product aims to improve enterprise readiness in the cloud by providing scalable storage, security and governance features, and reliability through auto-recovery of unhealthy nodes. It also matches Hadoop with cloud capabilities like scalable storage, customizability, and cost-effective compute.
Hortonworks Data Cloud for AWS 1.11 UpdatesYifeng Jiang
This document discusses Hortonworks Data Cloud, which provides an enterprise-ready Hadoop distribution on AWS. Key points include: HDC offers pre-configured Hortonworks Data Platform clusters on AWS that can be easily deployed and managed; the latest release of HDC (version 1.11) introduces compute nodes that allow using spot instances to reduce costs; and node recipes enable running custom scripts during cluster installation and configuration.
At the heart of much of the Bigdata revolution is the Apache Software Foundation. Many of the projects, including the big ones like Hadoop, Spark, Hive, and Kafka, are Apache projects. This means they follow "The Apache Way". Maybe you have heard phrases like "community over code" or "if it didn't happen on the lists, it didn't happen" and wondered what they meant. Maybe you would like to get involved with one or more of these projects but have not been sure how. Maybe you would just like to learn how Apache works, and how its process differs from the way companies build software. If so, this talk is for you. This talk will introduce Apache, how it is organized, the roles people play, who can contribute (hint, it is not just coders), Apache's tenants of community, meritocracy, collaboration, and openness, give some practical tips for new contributors and even old hands, as well as touch briefly on licenses and trademarks.
Using Apache Hadoop and related technologies as a data warehouse has been an area of interest since the early days of Hadoop. In recent years Hive has made great strides towards enabling data warehousing by expanding its SQL coverage, adding transactions, and enabling sub-second queries with LLAP. But data warehousing requires more than a full powered SQL engine. Security, governance, data movement, workload management, monitoring, and user tools are required as well. These functions are being addressed by other Apache projects such as Ranger, Atlas, Falcon, Ambari, and Zeppelin. This talk will examine how these projects can be assembled to build a data warehousing solution. It will also discuss features and performance work going on in Hive and the other projects that will enable more data warehousing use cases. These include use cases like data ingestion using merge, support for OLAP cubing queries via Hive’s integration with Druid, expanded SQL coverage, replication of data between data warehouses, advanced access control options, data discovery, and user tools to manage, monitor, and query the warehouse.
Speaker
Alan Gates, Co-founder, Hortonworks
This document discusses security requirements and solutions for Apache Spark production deployments. It covers authenticating users with Kerberos/AD, authorizing access to Spark jobs and data with Ranger, auditing access, and encrypting data at rest and in motion. It provides examples of configuring Kerberos authentication for Spark, using Ranger to control authorization to HDFS and SparkSQL, and demonstrates dynamic row filtering and masking of sensitive data in SparkSQL queries based on user policies.
Apache Hive is an Enterprise Data Warehouse build on top of Hadoop. Hive supports Insert/Update/Delete SQL statements with transactional semantics and read operations that run at Snapshot Isolation. This talk will describe the intended use cases, architecture of the implementation, new features such as SQL Merge statement and recent improvements. The talk will also cover Streaming Ingest API, which allows writing batches of events into a Hive table without using SQL. This API is used by Apache NiFi, Storm and Flume to stream data directly into Hive tables and make it visible to readers in near real time.
HPLN Web Performance Optimization - Liran talLiran Tal
Liran Tal presenting at the HP Office in Cluj Romania - review of how we optimized HP Live Network's web marketplace performance in various layers of the server-side stack to achieve 10x performance improvement.
Ranger’s pluggable architecture allows resource access policy administration and enforcement for standard and custom services from a “single pane of glass”. Apache Ranger has a rich Authorization Model, which provides the mechanism to author Policy in a Ranger Admin Server and serves as policy decision and audit point in authorizing user’s resource access within various components of Hadoop ecosystem.
This session will provide a deep dive into Ranger framework and a cook-book for extending Ranger to do authorization / auditing on resource access to external applications, including technical details of Rest APIs, Ranger policy engine and enriching authorization requests, with a demo of a sample application.We will then demonstrate a real-world example of how Ranger has simplified security enforcement for Hadoop-native MPP SQL engine like Apache HAWQ (incubating),which previously used its built-in Postgres-like authorization mechanisms. The integration design includes a Ranger Plugin Service that allows transparent authorization API calls between C-based Apache HAWQ and Java-based Apache Ranger.
Ambari 2.4.0 includes several new features and enhancements:
- Alerts now allow customizable check counts and parameters to avoid unnecessary notifications. New HDFS alerts also watch trends.
- Host filtering allows searching by various host attributes, services, and components for easier management.
- Services can now be removed directly from the Ambari web interface.
- Other improvements include customizable Ambari log and PID directories, a database consistency check, and View framework enhancements.
Developer and Fusion Middleware 2 _Greg Kirkendall _ How Australia Post teach...InSync2011
This document provides an introduction to Service Oriented Architecture (SOA) concepts using Australia Post as an example. It explains how SOA can be visualized as a data distribution system that routes business data (packages) between applications (cities) via an enterprise service bus (distribution centers). It then provides two examples of SOA solutions to common business requirements: 1) sharing customer data between an ERP and CRM, and 2) adding a sales portal while maintaining existing integration. The goal is to help translate technical SOA terms to business concepts.
O365con14 - the 4 major steps to migrate content from any on-premise source i...NCCOMMS
This document outlines the four major steps to migrate content from an on-premise source into SharePoint Online: analysis, requirements, structure and metadata, and preparation and testing. The analysis step involves examining the configuration, content, processes and performance of both the source and target systems. Requirements identify specific needs for the migration. Structure and metadata determines how content will be organized and metadata populated in the new system. Preparation and testing validates the migration strategy through testing with sample data before full deployment.
This document announces a PL/SQL Office Hours event focused on real world testing of PL/SQL code. It introduces several guest speakers who will share their experiences with challenges of testing, tools for testing, and promoting testing excellence, including Jasmin Fluri, Swathi Ambati and Maik Becker, Deepthi Bandari, Patrick Barel, and Samuel Nitsche. Attendees are invited to ask questions during the event.
S3Guard: What's in your consistency model?Hortonworks
S3Guard provides a consistent metadata store for S3 using DynamoDB. It allows file system operations on S3, like listing and getting file status, to be consistent by checking results from S3 against metadata stored in DynamoDB. Mutating operations write to both S3 and DynamoDB, while read operations first check S3 results against DynamoDB to handle eventual consistency in S3. The goal is to improve performance of real workloads by providing consistent metadata operations on S3 objects written with S3Guard enabled.
SCONUL Conference 20-21 June 2013, Dublin
SCONUL Fringe session - LSPs and APIs: Integration and the next generation of library management systems, with Colin Carter, Sales Account Manager for the UK and Northern Europe, Innovative Interfaces Inc.
The Apache Way describes the community patterns and style of governance that all projects at the Apache Software Foundation are guided by. With a span of more than 20 years, and now more than 300 projects, the Apache Way has helped to establish long lasting, diverse communities of volunteers who collaborate to build software used by millions of users worldwide.
In this talk, I’ll outline the underlying principles of the Apache Way, what this means for projects and their ecosystems, and how the Apache Software Foundation is structured to support such a large number of projects.
Speaker
Brett Porter, Director, Apache Software Foundation
Hadoop and NoSQL technologies like DynamoDB are complementary for managing and analyzing big data. Amazon's Elastic MapReduce (EMR) integrates with DynamoDB, providing an out-of-the-box solution that eliminates the high costs of administering and maintaining Hadoop clusters. EMR allows vast amounts of data to be moved into and analyzed against DynamoDB using SQL-like queries, distributing tasks across EMR instances.
The document is a presentation slide deck on Oracle Analytics Cloud. It provides an overview and demo of the product. The presentation agenda includes an overview of platform as a service (PaaS), an introduction to Oracle Analytics Cloud, its features and capabilities, and a demo. Key capabilities discussed include connecting to various data sources, preparing and analyzing data, visualizing insights, predictive modeling, collaborative sharing and embedding analytics applications. The presentation emphasizes that Oracle Analytics Cloud provides a unified platform for managed data discovery.
OOW16 - Running your E-Business Suite on Oracle Cloud (IaaS + PaaS) - Why, Wh...vasuballa
Oracle E-Business Suite is a powerful, complete suite of applications that can deliver tremendous value to organizations around the world. That value can be greatly extended when coupling it with Oracle Cloud offerings delivered by Oracle Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS). The purpose of this session is to understand the value proposition, the solutions offered and the use cases for Oracle E-Business Suite customers to deploy their environments on Oracle Cloud (IaaS and PaaS).
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiBryan Bende
This document provides an overview of a presentation about taking dataflow management to the edge with Apache NiFi and MiniFi. The presentation discusses the problem of moving data between systems with different formats, protocols, and security requirements. It introduces Apache NiFi as a solution for dataflow management and introduces Apache MiniFi for managing dataflows at the edge. The presentation includes a demo and time for Q&A.
This document discusses Hortonworks DataFlow (HDF) 3.0 for building IoT platforms. It introduces HDF 3.0 and its key components for data ingestion, management, security, and real-time analysis. These include NiFi for data movement, Streaming Analytics Manager (SAM) for building streaming analytics apps visually, and Schema Registry for managing schemas. The document also presents example IoT use cases and demonstrates building a real-time analytics app in SAM to analyze vehicle event data.
Introduction to Hortonworks Data Cloud for AWSYifeng Jiang
Hortonworks Data Cloud is a new cloud product from Hortonworks that offers pay-as-you-go pricing for launching and managing Hadoop clusters on AWS. It handles common big data use cases and focuses on ease of use by providing prescriptive cluster types. The product aims to improve enterprise readiness in the cloud by providing scalable storage, security and governance features, and reliability through auto-recovery of unhealthy nodes. It also matches Hadoop with cloud capabilities like scalable storage, customizability, and cost-effective compute.
Hortonworks Data Cloud for AWS 1.11 UpdatesYifeng Jiang
This document discusses Hortonworks Data Cloud, which provides an enterprise-ready Hadoop distribution on AWS. Key points include: HDC offers pre-configured Hortonworks Data Platform clusters on AWS that can be easily deployed and managed; the latest release of HDC (version 1.11) introduces compute nodes that allow using spot instances to reduce costs; and node recipes enable running custom scripts during cluster installation and configuration.
At the heart of much of the Bigdata revolution is the Apache Software Foundation. Many of the projects, including the big ones like Hadoop, Spark, Hive, and Kafka, are Apache projects. This means they follow "The Apache Way". Maybe you have heard phrases like "community over code" or "if it didn't happen on the lists, it didn't happen" and wondered what they meant. Maybe you would like to get involved with one or more of these projects but have not been sure how. Maybe you would just like to learn how Apache works, and how its process differs from the way companies build software. If so, this talk is for you. This talk will introduce Apache, how it is organized, the roles people play, who can contribute (hint, it is not just coders), Apache's tenants of community, meritocracy, collaboration, and openness, give some practical tips for new contributors and even old hands, as well as touch briefly on licenses and trademarks.
Using Apache Hadoop and related technologies as a data warehouse has been an area of interest since the early days of Hadoop. In recent years Hive has made great strides towards enabling data warehousing by expanding its SQL coverage, adding transactions, and enabling sub-second queries with LLAP. But data warehousing requires more than a full powered SQL engine. Security, governance, data movement, workload management, monitoring, and user tools are required as well. These functions are being addressed by other Apache projects such as Ranger, Atlas, Falcon, Ambari, and Zeppelin. This talk will examine how these projects can be assembled to build a data warehousing solution. It will also discuss features and performance work going on in Hive and the other projects that will enable more data warehousing use cases. These include use cases like data ingestion using merge, support for OLAP cubing queries via Hive’s integration with Druid, expanded SQL coverage, replication of data between data warehouses, advanced access control options, data discovery, and user tools to manage, monitor, and query the warehouse.
Speaker
Alan Gates, Co-founder, Hortonworks
This document discusses security requirements and solutions for Apache Spark production deployments. It covers authenticating users with Kerberos/AD, authorizing access to Spark jobs and data with Ranger, auditing access, and encrypting data at rest and in motion. It provides examples of configuring Kerberos authentication for Spark, using Ranger to control authorization to HDFS and SparkSQL, and demonstrates dynamic row filtering and masking of sensitive data in SparkSQL queries based on user policies.
Apache Hive is an Enterprise Data Warehouse build on top of Hadoop. Hive supports Insert/Update/Delete SQL statements with transactional semantics and read operations that run at Snapshot Isolation. This talk will describe the intended use cases, architecture of the implementation, new features such as SQL Merge statement and recent improvements. The talk will also cover Streaming Ingest API, which allows writing batches of events into a Hive table without using SQL. This API is used by Apache NiFi, Storm and Flume to stream data directly into Hive tables and make it visible to readers in near real time.
HPLN Web Performance Optimization - Liran talLiran Tal
Liran Tal presenting at the HP Office in Cluj Romania - review of how we optimized HP Live Network's web marketplace performance in various layers of the server-side stack to achieve 10x performance improvement.
Ranger’s pluggable architecture allows resource access policy administration and enforcement for standard and custom services from a “single pane of glass”. Apache Ranger has a rich Authorization Model, which provides the mechanism to author Policy in a Ranger Admin Server and serves as policy decision and audit point in authorizing user’s resource access within various components of Hadoop ecosystem.
This session will provide a deep dive into Ranger framework and a cook-book for extending Ranger to do authorization / auditing on resource access to external applications, including technical details of Rest APIs, Ranger policy engine and enriching authorization requests, with a demo of a sample application.We will then demonstrate a real-world example of how Ranger has simplified security enforcement for Hadoop-native MPP SQL engine like Apache HAWQ (incubating),which previously used its built-in Postgres-like authorization mechanisms. The integration design includes a Ranger Plugin Service that allows transparent authorization API calls between C-based Apache HAWQ and Java-based Apache Ranger.
Ambari 2.4.0 includes several new features and enhancements:
- Alerts now allow customizable check counts and parameters to avoid unnecessary notifications. New HDFS alerts also watch trends.
- Host filtering allows searching by various host attributes, services, and components for easier management.
- Services can now be removed directly from the Ambari web interface.
- Other improvements include customizable Ambari log and PID directories, a database consistency check, and View framework enhancements.
Developer and Fusion Middleware 2 _Greg Kirkendall _ How Australia Post teach...InSync2011
This document provides an introduction to Service Oriented Architecture (SOA) concepts using Australia Post as an example. It explains how SOA can be visualized as a data distribution system that routes business data (packages) between applications (cities) via an enterprise service bus (distribution centers). It then provides two examples of SOA solutions to common business requirements: 1) sharing customer data between an ERP and CRM, and 2) adding a sales portal while maintaining existing integration. The goal is to help translate technical SOA terms to business concepts.
O365con14 - the 4 major steps to migrate content from any on-premise source i...NCCOMMS
This document outlines the four major steps to migrate content from an on-premise source into SharePoint Online: analysis, requirements, structure and metadata, and preparation and testing. The analysis step involves examining the configuration, content, processes and performance of both the source and target systems. Requirements identify specific needs for the migration. Structure and metadata determines how content will be organized and metadata populated in the new system. Preparation and testing validates the migration strategy through testing with sample data before full deployment.
This document announces a PL/SQL Office Hours event focused on real world testing of PL/SQL code. It introduces several guest speakers who will share their experiences with challenges of testing, tools for testing, and promoting testing excellence, including Jasmin Fluri, Swathi Ambati and Maik Becker, Deepthi Bandari, Patrick Barel, and Samuel Nitsche. Attendees are invited to ask questions during the event.
S3Guard: What's in your consistency model?Hortonworks
S3Guard provides a consistent metadata store for S3 using DynamoDB. It allows file system operations on S3, like listing and getting file status, to be consistent by checking results from S3 against metadata stored in DynamoDB. Mutating operations write to both S3 and DynamoDB, while read operations first check S3 results against DynamoDB to handle eventual consistency in S3. The goal is to improve performance of real workloads by providing consistent metadata operations on S3 objects written with S3Guard enabled.
SCONUL Conference 20-21 June 2013, Dublin
SCONUL Fringe session - LSPs and APIs: Integration and the next generation of library management systems, with Colin Carter, Sales Account Manager for the UK and Northern Europe, Innovative Interfaces Inc.
The Apache Way describes the community patterns and style of governance that all projects at the Apache Software Foundation are guided by. With a span of more than 20 years, and now more than 300 projects, the Apache Way has helped to establish long lasting, diverse communities of volunteers who collaborate to build software used by millions of users worldwide.
In this talk, I’ll outline the underlying principles of the Apache Way, what this means for projects and their ecosystems, and how the Apache Software Foundation is structured to support such a large number of projects.
Speaker
Brett Porter, Director, Apache Software Foundation
Hadoop and NoSQL technologies like DynamoDB are complementary for managing and analyzing big data. Amazon's Elastic MapReduce (EMR) integrates with DynamoDB, providing an out-of-the-box solution that eliminates the high costs of administering and maintaining Hadoop clusters. EMR allows vast amounts of data to be moved into and analyzed against DynamoDB using SQL-like queries, distributing tasks across EMR instances.
The document is a presentation slide deck on Oracle Analytics Cloud. It provides an overview and demo of the product. The presentation agenda includes an overview of platform as a service (PaaS), an introduction to Oracle Analytics Cloud, its features and capabilities, and a demo. Key capabilities discussed include connecting to various data sources, preparing and analyzing data, visualizing insights, predictive modeling, collaborative sharing and embedding analytics applications. The presentation emphasizes that Oracle Analytics Cloud provides a unified platform for managed data discovery.
OOW16 - Running your E-Business Suite on Oracle Cloud (IaaS + PaaS) - Why, Wh...vasuballa
Oracle E-Business Suite is a powerful, complete suite of applications that can deliver tremendous value to organizations around the world. That value can be greatly extended when coupling it with Oracle Cloud offerings delivered by Oracle Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS). The purpose of this session is to understand the value proposition, the solutions offered and the use cases for Oracle E-Business Suite customers to deploy their environments on Oracle Cloud (IaaS and PaaS).
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiBryan Bende
This document provides an overview of a presentation about taking dataflow management to the edge with Apache NiFi and MiniFi. The presentation discusses the problem of moving data between systems with different formats, protocols, and security requirements. It introduces Apache NiFi as a solution for dataflow management and introduces Apache MiniFi for managing dataflows at the edge. The presentation includes a demo and time for Q&A.
How Customers are Optimizing their EDW for Fast, Secure, and Effective InsightsHortonworks
Hortonwork’s Hadoop Powered EDW (Enterprise Data Warehouse) Optimization Solution with Syncsort DMX-h enables organizations to liberate data from across the enterprise, quickly create and populate the data lake, and deliver actionable insights.
Customer case studies across a variety of industries will bring to life how organizations are using this solution to gain bigger insights from their enterprise data – securely and cost-effectively – with faster time to time value.
Devnexus 2018 - Let Your Data Flow with Apache NiFiBryan Bende
Introduction to Apache NiFi features such as interactive command and control, version control of process groups, record processing, provenance, and prioritzation, and building customer extensions.
The document discusses the evolution of Apache Hadoop from its origins in 2006 to the present day. It describes how Hadoop has grown from early implementations focused on HDFS, MapReduce, and batch applications to a full ecosystem that enables enterprise interoperability, deployment in data centers and clouds, and assembly of modern data applications. The document also presents several talks that will take place at the conference, focusing on topics like YARN, security, data governance, and running Hadoop in the cloud.
[Hortonworks] Future Of Data: Madrid - HDF & Data in motionRaúl Marín
First meetup event Future Of Data event where we introduced Hortonworks DataFlow (HDF).
The slides describe what HDF is, and we presented a very simple demo about sentiment analysis of tweets using Apache OpenNLP as the NLP framework to do so.
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks
VIEW THE ON-DEMAND WEBINAR: http://hortonworks.com/webinar/introduction-hortonworks-dataflow/
Learn about Hortonworks DataFlow (HDFTM) and how you can easily augment your existing data systems – Hadoop and otherwise. Learn what Dataflow is all about and how Apache NiFi, MiNiFi, Kafka and Storm work together for streaming analytics.
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHaimo Liu
Introducing the new Hortonworks DataFlow (HDF) release, HDF 2.0. Also provides introduction to the flow management part of the platform, powered by Apache NIFI and MINIFI.
Learn about HDF and how you can easily augment your existing data systems - Hadoop and otherwise. Learn what Dataflow is all about and how Apache NiFi, MiNiFi, Kafka and Storm work together for streaming analytics.
Learn more: http://hortonworks.com/hdf/
Log data can be complex to capture, typically collected in limited amounts and difficult to operationalize at scale. HDF expands the capabilities of log analytics integration options for easy and secure edge analytics of log files in the following ways:
More efficient collection and movement of log data by prioritizing, enriching and/or transforming data at the edge to dynamically separate critical data. The relevant data is then delivered into log analytics systems in a real-time, prioritized and secure manner.
Cost-effective expansion of existing log analytics infrastructure by improving error detection and troubleshooting through more comprehensive data sets.
Intelligent edge analytics to support real-time content-based routing, prioritization, and simultaneous delivery of data into Connected Data Platforms, log analytics and reporting systems for comprehensive coverage and retention of Internet of Anything data.
Learn more: http://hortonworks.com/hdf/
Log data can be complex to capture, typically collected in limited amounts and difficult to operationalize at scale. HDF expands the capabilities of log analytics integration options for easy and secure edge analytics of log files in the following ways:
More efficient collection and movement of log data by prioritizing, enriching and/or transforming data at the edge to dynamically separate critical data. The relevant data is then delivered into log analytics systems in a real-time, prioritized and secure manner.
Cost-effective expansion of existing log analytics infrastructure by improving error detection and troubleshooting through more comprehensive data sets.
Intelligent edge analytics to support real-time content-based routing, prioritization, and simultaneous delivery of data into Connected Data Platforms, log analytics and reporting systems for comprehensive coverage and retention of Internet of Anything data.
Keynote slides from Big Data Spain Nov 2016. Has some thoughts on how Hadoop ecosystem is growing and changing to support the enterprise, including Hive, Spark, NiFi, security and governance, streaming, and the cloud.
This document provides an overview of Apache NiFi, a dataflow management software. It begins with an introduction to dataflow and challenges in moving data effectively. It then discusses key features of Apache NiFi like guaranteed delivery, data buffering, and data provenance. The document outlines NiFi's architecture including repositories and extension points. It also advertises an upcoming Birds of a Feather session on streaming, dataflow and cybersecurity. Finally, it encourages learning more about NiFi and getting involved in the community.
Apache NiFi Crash Course San Jose Hadoop SummitDaniel Madrigal
This document provides an overview of Apache NiFi, a dataflow system. It begins with an introduction to dataflow and challenges in moving data effectively. It then discusses key features of Apache NiFi like guaranteed delivery, data buffering, and data provenance. The document outlines NiFi's architecture including repositories and discusses extension points. It proposes using NiFi and MiniNi to manage dataflow for a global courier service example. Finally, it promotes learning more and participating in the Apache NiFi community.
Introduction to Streaming Analytics ManagerYifeng Jiang
This document introduces Streaming Analytics Manager (SAM), an open source project led by Hortonworks to simplify building streaming analytics applications. SAM aims to provide the same easy experience for streaming analytics as NiFi does for flow management applications. It allows users to create a streaming analytics application in 10 minutes and supports prescriptive, predictive, and descriptive analytics functions including routing, filtering, predictive modeling, and real-time dashboards. SAM applications are scalable through one-click deployment on distributed streaming platforms.
This document provides an overview of Apache NiFi and dataflow. It begins with an introduction to the challenges of moving data effectively within and between systems. It then discusses Apache NiFi's key features for addressing these challenges, including guaranteed delivery, data buffering, prioritized queuing, and data provenance. The document outlines NiFi's architecture and components like repositories and extension points. It also previews a live demo and invites attendees to further discuss Apache NiFi at a Birds of a Feather session.
As Apache Solr becomes more powerful and easier to use, the accessibility of high quality data becomes key to unlocking the full potential of Solr’s search and analytic capabilities. Traditional approaches to acquiring data frequently involve a combination of homegrown tools and scripts, often requiring significant development efforts and becoming hard to change, hard to monitor, and hard to maintain. This talk will discuss how Apache NiFi addresses the above challenges and can be used to build production-grade data pipelines for Solr. We will start by giving an introduction to the core features of NiFi, such as visual command & control, dynamic prioritization, back-pressure, and provenance. We will then look at NiFi’s processors for integrating with Solr, covering topics such as ingesting and extracting data, interacting with secure Solr instances, and performance tuning. We will conclude by building a live dataflow from scratch, demonstrating how to prepare data and ingest to Solr.
SQL on Hadoop Batch, Interactive and Beyond.
Public Presentation showing history and where Hortonworks is looking to go with 100% Open Source Technology.
Apache Hive, Apache SparkSQL, Apache Pheonix, and Apache Druid
Introduction to Apache NiFi - Seattle Scalability MeetupSaptak Sen
The document introduces Apache NiFi, an open source tool for data flow. It discusses how data from the Internet of Things is growing faster than can be consumed and highlights Apache NiFi's ability to securely collect, process and distribute this data in motion. The key concepts of Apache NiFi are described as managing the flow of information, ensuring data provenance, and securing the control and data planes. Example use cases are provided and the document demonstrates Apache NiFi's visual interface for creating data flows between processors to ingest, transform and output data in real-time.
The document discusses bringing multi-tenancy to Apache Zeppelin through the use of Apache Livy. Livy is an open-source REST interface that allows interacting with Spark from anywhere and enables features like multi-user sessions and security. It improves on previous versions of interactive analysis in Zeppelin by allowing custom user sessions through Livy and improving security and isolation between users through mechanisms like SPNEGO and impersonation. The integration of Livy provides multi-tenancy, security, and isolation for interactive analysis in Zeppelin.
Similar to Using Apache® NiFi to Empower Self-Organising Teams (20)
How GenAI Can Improve Supplier Performance Management.pdfZycus
Data Collection and Analysis with GenAI enables organizations to gather, analyze, and visualize vast amounts of supplier data, identifying key performance indicators and trends. Predictive analytics forecast future supplier performance, mitigating risks and seizing opportunities. Supplier segmentation allows for tailored management strategies, optimizing resource allocation. Automated scorecards and reporting provide real-time insights, enhancing transparency and tracking progress. Collaboration is fostered through GenAI-powered platforms, driving continuous improvement. NLP analyzes unstructured feedback, uncovering deeper insights into supplier relationships. Simulation and scenario planning tools anticipate supply chain disruptions, supporting informed decision-making. Integration with existing systems enhances data accuracy and consistency. McKinsey estimates GenAI could deliver $2.6 trillion to $4.4 trillion in economic benefits annually across industries, revolutionizing procurement processes and delivering significant ROI.
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...kalichargn70th171
Visual testing plays a vital role in ensuring that software products meet the aesthetic requirements specified by clients in functional and non-functional specifications. In today's highly competitive digital landscape, users expect a seamless and visually appealing online experience. Visual testing, also known as automated UI testing or visual regression testing, verifies the accuracy of the visual elements that users interact with.
Superpower Your Apache Kafka Applications Development with Complementary Open...Paul Brebner
Kafka Summit talk (Bangalore, India, May 2, 2024, https://events.bizzabo.com/573863/agenda/session/1300469 )
Many Apache Kafka use cases take advantage of Kafka’s ability to integrate multiple heterogeneous systems for stream processing and real-time machine learning scenarios. But Kafka also exists in a rich ecosystem of related but complementary stream processing technologies and tools, particularly from the open-source community. In this talk, we’ll take you on a tour of a selection of complementary tools that can make Kafka even more powerful. We’ll focus on tools for stream processing and querying, streaming machine learning, stream visibility and observation, stream meta-data, stream visualisation, stream development including testing and the use of Generative AI and LLMs, and stream performance and scalability. By the end you will have a good idea of the types of Kafka “superhero” tools that exist, which are my favourites (and what superpowers they have), and how they combine to save your Kafka applications development universe from swamploads of data stagnation monsters!
🏎️Tech Transformation: DevOps Insights from the Experts 👩💻campbellclarkson
Connect with fellow Trailblazers, learn from industry experts Glenda Thomson (Salesforce, Principal Technical Architect) and Will Dinn (Judo Bank, Salesforce Development Lead), and discover how to harness DevOps tools with Salesforce.
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio, Inc.
Alluxio Webinar
June. 18, 2024
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Jianjian Xie (Staff Software Engineer, Alluxio)
As Trino users increasingly rely on cloud object storage for retrieving data, speed and cloud cost have become major challenges. The separation of compute and storage creates latency challenges when querying datasets; scanning data between storage and compute tiers becomes I/O bound. On the other hand, cloud API costs related to GET/LIST operations and cross-region data transfer add up quickly.
The newly introduced Trino file system cache by Alluxio aims to overcome the above challenges. In this session, Jianjian will dive into Trino data caching strategies, the latest test results, and discuss the multi-level caching architecture. This architecture makes Trino 10x faster for data lakes of any scale, from GB to EB.
What you will learn:
- Challenges relating to the speed and costs of running Trino in the cloud
- The new Trino file system cache feature overview, including the latest development status and test results
- A multi-level cache framework for maximized speed, including Trino file system cache and Alluxio distributed cache
- Real-world cases, including a large online payment firm and a top ridesharing company
- The future roadmap of Trino file system cache and Trino-Alluxio integration
The Ultimate Guide to Top 36 DevOps Testing Tools for 2024.pdfkalichargn70th171
Testing is pivotal in the DevOps framework, serving as a linchpin for early bug detection and the seamless transition from code creation to deployment.
DevOps teams frequently adopt a Continuous Integration/Continuous Deployment (CI/CD) methodology to automate processes. A robust testing strategy empowers them to confidently deploy new code, backed by assurance that it has passed rigorous unit and performance tests.
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid
IBM watsonx Code Assistant for Z, our latest Generative AI-assisted mainframe application modernization solution. Mainframe (IBM Z) application modernization is a topic that every mainframe client is addressing to various degrees today, driven largely from digital transformation. With generative AI comes the opportunity to reimagine the mainframe application modernization experience. Infusing generative AI will enable speed and trust, help de-risk, and lower total costs associated with heavy-lifting application modernization initiatives. This document provides an overview of the IBM watsonx Code Assistant for Z which uses the power of generative AI to make it easier for developers to selectively modernize COBOL business services while maintaining mainframe qualities of service.
Building the Ideal CI-CD Pipeline_ Achieving Visual PerfectionApplitools
Explore the advantages of integrating AI-powered testing into the CI/CD pipeline in this session from Applitools engineer Brandon Murray. More information and session materials at applitools.com
Discover how shift-left strategies and advanced testing in CI/CD pipelines can enhance customer satisfaction and streamline development processes, including:
• Significantly reduced time and effort needed for test creation and maintenance compared to traditional testing methods.
• Enhanced UI coverage that eliminates the necessity for manual testing, leading to quicker and more effective testing processes.
• Effortless integration with the development workflow, offering instant feedback on pull requests and facilitating swifter product releases.
Orca: Nocode Graphical Editor for Container OrchestrationPedro J. Molina
Tool demo on CEDI/SISTEDES/JISBD2024 at A Coruña, Spain. 2024.06.18
"Orca: Nocode Graphical Editor for Container Orchestration"
by Pedro J. Molina PhD. from Metadev
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Paul Brebner
Closing talk for the Performance Engineering track at Community Over Code EU (Bratislava, Slovakia, June 5 2024) https://eu.communityovercode.org/sessions/2024/why-apache-kafka-clusters-are-like-galaxies-and-other-cosmic-kafka-quandaries-explored/ Instaclustr (now part of NetApp) manages 100s of Apache Kafka clusters of many different sizes, for a variety of use cases and customers. For the last 7 years I’ve been focused outwardly on exploring Kafka application development challenges, but recently I decided to look inward and see what I could discover about the performance, scalability and resource characteristics of the Kafka clusters themselves. Using a suite of Performance Engineering techniques, I will reveal some surprising discoveries about cosmic Kafka mysteries in our data centres, related to: cluster sizes and distribution (using Zipf’s Law), horizontal vs. vertical scalability, and predicting Kafka performance using metrics, modelling and regression techniques. These insights are relevant to Kafka developers and operators.
A neural network is a machine learning program, or model, that makes decisions in a manner similar to the human brain, by using processes that mimic the way biological neurons work together to identify phenomena, weigh options and arrive at conclusions.
Enhanced Screen Flows UI/UX using SLDS with Tom KittPeter Caitens
Join us for an engaging session led by Flow Champion, Tom Kitt. This session will dive into a technique of enhancing the user interfaces and user experiences within Screen Flows using the Salesforce Lightning Design System (SLDS). This technique uses Native functionality, with No Apex Code, No Custom Components and No Managed Packages required.
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...The Third Creative Media
"Navigating Invideo: A Comprehensive Guide" is an essential resource for anyone looking to master Invideo, an AI-powered video creation tool. This guide provides step-by-step instructions, helpful tips, and comparisons with other AI video creators. Whether you're a beginner or an experienced video editor, you'll find valuable insights to enhance your video projects and bring your creative ideas to life.
[ 1 min ] [ 1.12 min ]
Hi There! I’m Sebastian
I’m a senior consultant here in EMEA and I focus on HDF technologies and Apache NiFi
Started using NiFi about three years ago in Australia.
Pre open-source days -- very different to now but that early learning has definitely helped
Using NiFi ever since those early days and really enjoyed the NiFi journey.
Fantastic product that really solves a lot of problems unlike anything else out there
One of the most transformational NiFi applications I have seen
[ 1 min ] [ 1 min ]
So first a quick poll:
Who has heard of NiFi?
Keep them up if you know what NiFi does?
Keep them up if you have ever used NiFi?
I summarise NiFi as the Data-Movement Production-line
[ 1 min ]
Data movement
Moving Data around
See on image
Business units
Machines
Bandwidth links
Availability times
Sounds easy but it isn’t
There are some tools that specialise in some areas but *in general* If youre moving data, NiFI is probably what you want
[ 1:50 min ]
Production line
Flow based programming model
FlowFiles - the actual data - product
Processors - the work units
Queues - connectors
[ 30 sec ]
Why would I use NiFi? Moving Data
What does it resemble? The production line
Don’t worry youll see more later
[ 1 min ] [ 2:40 ]
So armed with that ridiculously brief overview, I want to talk about how NiFi can be used to facilitate the agile philosphy of the self organising team
What is a self organising team: one where they define the how, the management defines the what.
Why is this good? Well lets take an example:
Management says - at 3pm, take the red truck out the back, load all the apples and deliver across the bridge to the supermarket.
For this to go well, everything has to work, all the steps have to be known in advance.
What happens if … truck is blue? Take it? No fuel? Break down?
Putting the decisions into the hands of the people who are most qualified to make them - speeds up delivery and increases quality (often solving the problems in novel ways to boot)
Generally it increases quality and decreases time to completion
How does this work with Data Ingestion Pipelines?
[ 1 min ]
Using NiFi can help to speed up the process:
Less risk of losing perishable insights
Reduce costs
[ 2 min ]
You might have a very simplistic representation of the organisation like so
We’re agile, so we have cross-functional teams (maybe consisting of developers, sysadmins, data professionals and subject matter experts)
This depicts the information flow through the teams
This isn’t that special at the moment, looks like a normal ‘change managed, data backbone’ -everything goes through core
And anytime one of the teams needs a change, it goes through the core team.
For example, if in store team wants information from the supply chain team they have to coordinate with core, who talks to supply chain and all the complexities there.
It’s difficult to get core to prioritise your work due to their requests coming in thick and fast
Often leads to prioritisation by volume - or those who yell the loudest win
Often have to do testing in test infrastructure to verify that changes work
Then can submit change request
Only to find out that test isn’t actually like production, you exceed the change window and have to revert
Starting the change management cycle over again.
Even companies that don’t have these specific procedures are often bogged down in slow change cycles
BUT .. let’s assume these are all NiFi instances.
Website, core, in store etc all have their own NiFi instance - we haven’t changed the organisational structure at all
Just changed the tool that passes data around - so right now it looks the same. But lets replay the scenario from above - In Store wants supply chain data
[ 1 min ]
Direct Connect
S2S
Easy to use once set up
Intuitive UI
Flow based
For simple data movement, anyone can use
[ 1 min ]
Team to Team - No Core
Direct connect - straight to the team in question
No core
No change requests minimal process
Team affected by change are the ones who implement it
Decisions made by people most well positioned to do so
[ 1 min ]
Individual to Individual
Just 2 people!
[ 1 min ]
Not just techies
Could be anyone theoretically on the team - not just NiFi Guru
Whoever is using the data, can get it
[ 1 min ]
Productionable
NiFi itself and these features are not gimmicks
They are the same, robust and secure features using in all our deployments
[ 1 min ]
Immediate
Changes take effect immediately
Can quickly see and debug issues
[ 1 min ]
Not Just S2S!
[ 4 min ]
Flexible - almost any endpoint
Integrators dream!
Options include: files, to Hadoop, to Kafka, to plain TCP, HTTP, JMS, CDC, WebSockets, Emails, Hbase, Mongo, SNMP, Solr, splunk and even twitter!
[ 1 min ]
Now I want to look at improvement over the oganisation as a whole
You’ve seen the improvements that NiFi can bring to a team if all teams have a their own NiFi instance that is under their control
But this requires NiFI everywhere, which is the case for only a handful of organisations
So how do we get there? Well we could just change over night right …. ? Mandate that All teams use NiFi? Pump huge amounts of money to looking at the potential risks and mitigating those and rolling out changes and all the tradition stuff? But ...
[ 2 min ]
[ .5 min ]
Here we have a traditional data movement pipeline
The Buyers are looking at historical sales trends and trying to see if their predictions were correct. For this to happen we need to go all the way back to the warehouse database:
Start at the database, the warehouse team want to get a report on all the items in the database
Ops probably does some sort of manual process - logging in to a firewalled machine, bringing up a shell and executing the required report
Here we have a traditional data movement pipeline
The Buyers are looking at historical sales trends and trying to see if their predictions were correct. For this to happen we need to go all the way back to the warehouse database:
Start at the database, the warehouse team want to get a report on all the items in the database
Ops probably does some sort of manual process - logging in to a firewalled machine, bringing up a shell and executing the required report
Then placed on the Shared Drive
The warehouse team pick this up and load into excel
Check that things look OK and pass to the Supply Chain team
However supply chain don’t sit in the warehouse and the Shared Drive is different. So it’s placed on an SFTP server
Its joined with other reports from other warehouses, reconciled and place on the second HQ internal SAN
Picked up by the buyers
They don’t need to modify the report, simply ingest the data for analysis
Do so and pass the results to the business by email.
All the markings in
Here we have a traditional data movement pipeline
The Buyers are looking at historical sales trends and trying to see if their predictions were correct. For this to happen we need to go all the way back to the warehouse database:
Start at the database, the warehouse team want to get a report on all the items in the database
Ops probably does some sort of manual process - logging in to a firewalled machine, bringing up a shell and executing the required report
Then placed on the Shared Drive
The warehouse team pick this up and load into excel
Check that things look OK and pass to the Supply Chain team
However supply chain don’t sit in the warehouse and the Shared Drive is different. So it’s placed on an SFTP server
Its joined with other reports from other warehouses, reconciled and place on the second HQ internal SAN
Picked up by the buyers
They don’t need to modify the report, simply ingest the data for analysis
Do so and pass the results to the business by email.
All the markings in
Here we have a traditional data movement pipeline
The Buyers are looking at historical sales trends and trying to see if their predictions were correct. For this to happen we need to go all the way back to the warehouse database:
Start at the database, the warehouse team want to get a report on all the items in the database
Ops probably does some sort of manual process - logging in to a firewalled machine, bringing up a shell and executing the required report
Then placed on the Shared Drive
The warehouse team pick this up and load into excel
Check that things look OK and pass to the Supply Chain team
However supply chain don’t sit in the warehouse and the Shared Drive is different. So it’s placed on an SFTP server
Its joined with other reports from other warehouses, reconciled and place on the second HQ internal SAN
Picked up by the buyers
They don’t need to modify the report, simply ingest the data for analysis
Do so and pass the results to the business by email.
Here we have a traditional data movement pipeline
The Buyers are looking at historical sales trends and trying to see if their predictions were correct. For this to happen we need to go all the way back to the warehouse database:
Start at the database, the warehouse team want to get a report on all the items in the database
Ops probably does some sort of manual process - logging in to a firewalled machine, bringing up a shell and executing the required report
Then placed on the Shared Drive
The warehouse team pick this up and load into excel
Check that things look OK and pass to the Supply Chain team
However supply chain don’t sit in the warehouse and the Shared Drive is different. So it’s placed on an SFTP server
Its joined with other reports from other warehouses, reconciled and place on the second HQ internal SAN
Picked up by the buyers
They don’t need to modify the report, simply ingest the data for analysis
Do so and pass the results to the business by email.
While obviously fictional, these sorts of ingestion pipelines are the norm not the exception, they are especially hard to root out as each team is generally siloed and don’t have visibility of the process as a whole
Lets pick a team that has decided to try out NiFi for automating some of this movement - the supply chain team.
We go ahead and install NiFi inside the Supply chain team
We change just one task at first: Using the FetchSFTP processor to watch the server and
pick up the warehouse report
And place it in a location where it can be worked on
So we have just automated one step
The employee looking after this step now knows that the file will appear, ready to be worked on
To simplify another simple step, once recieving this file we can send an email to the SC team notifying them of its arrival.
Now the warehouse employee is freed from doing that step.
Great! That’s two down. Theres still the manual analysis, but thats what humans are good at so let them do that.
Now the Warehouse guy makes a comment that that is a cool thing
You say, well If you have your own nifi, you could do the same thing!
So you convince them to try setting up thier own NiFi
At first, it just automates one step moving the file to the ftp server
That great for the warehouse guys, one less thing
But now we can eliminate the clunky file -ftp -pickup - putdown chain with NiFi S2S
At first, it just automates one step moving the file to the ftp server
That great for the warehouse guys, one less thing
But now we can eliminate the clunky file -ftp -pickup - putdown chain with NiFi S2S
[ 10 sec ]
We’ve seen what benefits can come from the NiFI web
But why NiFI?
What features make it a good fit?
[ 2 min ]
Cross funtional teams - wide skill set
Want anyone to come in with minimal training and be in control of thier own data
Most people are aware of flow charts and Graphical web interfaces
Don’t need special software - web browser
Changes take effect immediately:
Allows faster feed-back cycles
Avoids long or specialised deployment cycles (e.g. must know how to use git, jenkins or pull requests)
Lots of good visual queues to help show where issues are and how to resolve them
[ 1:30 min ]
Many systems its difficult to get a snapshot of the stages
Would have to get a sample, transport to somewhere you could access and then probably download
This is often just more work for the developer
Fast feedback
Again fast feedback!
Can see exactly what has changed
(12:20 to here from slide 41)
So as I’ve mentioned previously, S2S is a very nice way to communicate with NiFi clusters
Only have to open one port
Easy to configure
Ties in nicely with the UI
Maintains Attributes!
Balances accross clusters
But as previously mentioned
Integrates with bloody everything!
So as I’ve mentioned previously, S2S is a very nice way to communicate with NiFi clusters
Only have to open one port
Easy to configure
Ties in nicely with the UI
Maintains Attributes!
Balances accross clusters
But as previously mentioned
Integrates with bloody everything!
Can see anyone who has made changes
Strong Authorisation: Are sure that the people have authorised are the ones making the change
Can then reverse changes if required (no undo, but can just re-apply to old setting)
Can very tightly control who can do what on the system
E.g. have someone who can see the data but can’t move it, someone who can move but not see
Can now use processor groups to group flows and secure those
Could have one group with access to one and one group with access to the other but no cross talk
Similar model to the NiFi Web but on one cluster
Has pros/cons but is possible
Make managing these things easier
Can get started in 10 minutes but will also do GB and thousands of events/second
Can be secured