SlideShare a Scribd company logo

Building Data Pipelines for Solr with Apache NiFi

Bryan Bende
Bryan Bende
Bryan BendePrincipal Software Engineer at Cloudera

Apache NiFi is an easy to use, powerful, and reliable system to process and distribute data. It supports highly configurable directed graphs of data routing, transformation, and system mediation logic. Some of NiFi's key features include a web-based user interface for monitoring and controlling data flows, guaranteed delivery, data provenance, and easy extensibility through custom processor development. These features make NiFi a perfect candidate for building production quality data pipelines that interact with Apache Solr. This talk will demonstrate how to use a NiFi processor that delivers data to a Solr update handler, as well as a processor for extracting data from Solr on regular intervals for delivery to down-stream systems. In addition we will show how these processors can be combined with other built-in NiFi processors to solve a variety of use cases, including log aggregation, and indexing messages received from Kafka.

Building Data Pipelines for Solr with Apache NiFi

1 of 35
Download to read offline
Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Building Data Pipelines for Solr with
Apache NiFi
Bryan Bende – Member of Technical Staff
Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Outline
• Introduction to Apache NiFi
• Solr Indexing & Update Handlers
• NiFi/Solr Integration
• Use Cases
Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
About Me
• Member of Technical Staff at Hortonworks
• Apache NiFi Committer & PMC Member since June 2015
• Solr/Lucene user for several years
• Developed Solr integration for Apache NiFi 0.1.0 release
• Twitter: @bbende / Blog: bryanbende.com
Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Introduction
Installing Solr and getting started - easy (extract, bin/solr start)
Defining a schema and configuring Solr - easy
Getting all of your incoming data into Solr - not as easy
A lot of time spent…
• Cleaning and parsing data
• Writing custom code/scripts
• Building approaches for monitoring and debugging
• Deploying updates to code/scripts for small changes
Need something to make this easier…
Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Introduction to Apache NiFi
Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache NiFi
• Powerful and reliable system to process and
distribute data
• Directed graphs of data routing and transformation
• Web-based User Interface for creating, monitoring,
& controlling data flows
• Highly configurable - modify data flow at runtime,
dynamically prioritize data
• Data Provenance tracks data through entire
system
• Easily extensible through development of custom
components
[1] https://nifi.apache.org/

Recommended

Drone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFiDrone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFiTimothy Spann
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiLev Brailovskiy
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kai Wähner
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Timothy Spann
 

More Related Content

What's hot

Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiTimothy Spann
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseAldrin Piri
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsTimothy Spann
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developersconfluent
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaSyah Dwi Prihatmoko
 
[FFE19] Build a Flink AI Ecosystem
[FFE19] Build a Flink AI Ecosystem[FFE19] Build a Flink AI Ecosystem
[FFE19] Build a Flink AI EcosystemJiangjie Qin
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoopclairvoyantllc
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimDatabricks
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseDataWorks Summit
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019Timothy Spann
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connectconfluent
 
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native WayMigrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native WayDatabricks
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Grafana Labs
 

What's hot (20)

Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and Grafana
 
[FFE19] Build a Flink AI Ecosystem
[FFE19] Build a Flink AI Ecosystem[FFE19] Build a Flink AI Ecosystem
[FFE19] Build a Flink AI Ecosystem
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native WayMigrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018
 

Viewers also liked

Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesIsheeta Sanghi
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and FlinkBryan Bende
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveBryan Bende
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemApache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemBryan Bende
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiJoe Percivall
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkHortonworks
 
A Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkA Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkDongwon Kim
 
Monitoring as code
Monitoring as codeMonitoring as code
Monitoring as codeIcinga
 
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Spark Summit
 
How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...
How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...
How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...Kai Wähner
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 

Viewers also liked (20)

Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and Flink
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep Dive
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemApache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
IOT, Streaming Analytics and Machine Learning
IOT, Streaming Analytics and Machine Learning IOT, Streaming Analytics and Machine Learning
IOT, Streaming Analytics and Machine Learning
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Tuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for LogsTuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for Logs
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
The Elephant in the Clouds
The Elephant in the CloudsThe Elephant in the Clouds
The Elephant in the Clouds
 
A Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkA Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache Flink
 
Monitoring as code
Monitoring as codeMonitoring as code
Monitoring as code
 
Building a Smarter Home with Apache NiFi and Spark
Building a Smarter Home with Apache NiFi and SparkBuilding a Smarter Home with Apache NiFi and Spark
Building a Smarter Home with Apache NiFi and Spark
 
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
 
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFiFrom Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
 
How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...
How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...
How to choose the right Integration Framework - Apache Camel (JBoss, Talend),...
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Dataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJDataflow with Apache NiFi - Crash Course - HS16SJ
Dataflow with Apache NiFi - Crash Course - HS16SJ
 

Similar to Building Data Pipelines for Solr with Apache NiFi

Integrating Apache NiFi and Apache Apex
Integrating Apache NiFi and Apache Apex Integrating Apache NiFi and Apache Apex
Integrating Apache NiFi and Apache Apex Apache Apex
 
Integrating NiFi and Apex
Integrating NiFi and ApexIntegrating NiFi and Apex
Integrating NiFi and ApexBryan Bende
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDataWorks Summit
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Seetharam Venkatesh
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIsheeta Sanghi
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIsheeta Sanghi
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIsheeta Sanghi
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks
 
WebSocket in Enterprise Applications 2015
WebSocket in Enterprise Applications 2015WebSocket in Enterprise Applications 2015
WebSocket in Enterprise Applications 2015Pavel Bucek
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityAccumulo Summit
 
You Can't Search Without Data
You Can't Search Without DataYou Can't Search Without Data
You Can't Search Without DataBryan Bende
 
[253] apache ni fi
[253] apache ni fi[253] apache ni fi
[253] apache ni fiNAVER D2
 
Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupJoseph Witt
 
BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiAldrin Piri
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHortonworks
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksData Con LA
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFiHortonworks
 

Similar to Building Data Pipelines for Solr with Apache NiFi (20)

Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
 
Integrating Apache NiFi and Apache Apex
Integrating Apache NiFi and Apache Apex Integrating Apache NiFi and Apache Apex
Integrating Apache NiFi and Apache Apex
 
Integrating NiFi and Apex
Integrating NiFi and ApexIntegrating NiFi and Apex
Integrating NiFi and Apex
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search
 
WebSocket in Enterprise Applications 2015
WebSocket in Enterprise Applications 2015WebSocket in Enterprise Applications 2015
WebSocket in Enterprise Applications 2015
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
 
You Can't Search Without Data
You Can't Search Without DataYou Can't Search Without Data
You Can't Search Without Data
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
[253] apache ni fi
[253] apache ni fi[253] apache ni fi
[253] apache ni fi
 
Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming Meetup
 
BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFi
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical Workshop
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
 
Falcon Meetup
Falcon Meetup Falcon Meetup
Falcon Meetup
 

More from Bryan Bende

Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsBryan Bende
 
Apache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi RegistryApache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi RegistryBryan Bende
 
Devnexus 2018 - Let Your Data Flow with Apache NiFi
Devnexus 2018 - Let Your Data Flow with Apache NiFiDevnexus 2018 - Let Your Data Flow with Apache NiFi
Devnexus 2018 - Let Your Data Flow with Apache NiFiBryan Bende
 
Apache NiFi Record Processing
Apache NiFi Record ProcessingApache NiFi Record Processing
Apache NiFi Record ProcessingBryan Bende
 
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiTaking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiBryan Bende
 
Document Similarity with Cloud Computing
Document Similarity with Cloud ComputingDocument Similarity with Cloud Computing
Document Similarity with Cloud ComputingBryan Bende
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Bryan Bende
 

More from Bryan Bende (7)

Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC Improvements
 
Apache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi RegistryApache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi Registry
 
Devnexus 2018 - Let Your Data Flow with Apache NiFi
Devnexus 2018 - Let Your Data Flow with Apache NiFiDevnexus 2018 - Let Your Data Flow with Apache NiFi
Devnexus 2018 - Let Your Data Flow with Apache NiFi
 
Apache NiFi Record Processing
Apache NiFi Record ProcessingApache NiFi Record Processing
Apache NiFi Record Processing
 
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiTaking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
 
Document Similarity with Cloud Computing
Document Similarity with Cloud ComputingDocument Similarity with Cloud Computing
Document Similarity with Cloud Computing
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
 

Recently uploaded

Enabling Enterprise-wide OT Data access with Matrikon Data Broker.pdf
Enabling Enterprise-wide OT Data access  with Matrikon Data Broker.pdfEnabling Enterprise-wide OT Data access  with Matrikon Data Broker.pdf
Enabling Enterprise-wide OT Data access with Matrikon Data Broker.pdfJohn Archer
 
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdf
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdfIndia's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdf
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdfgranitesrijan
 
Microsoft Dynamics 365 IA - Copilot/ Fabric
Microsoft Dynamics 365 IA - Copilot/ FabricMicrosoft Dynamics 365 IA - Copilot/ Fabric
Microsoft Dynamics 365 IA - Copilot/ FabricJuan Fabian
 
Get Your Hands Off the Teams Work.pdf
Get Your Hands Off the Teams Work.pdfGet Your Hands Off the Teams Work.pdf
Get Your Hands Off the Teams Work.pdfAngela Johnson
 
SATToSE_2023_Presentation_slideshare.pdf
SATToSE_2023_Presentation_slideshare.pdfSATToSE_2023_Presentation_slideshare.pdf
SATToSE_2023_Presentation_slideshare.pdfnatarajan8993
 
unit I lecture 3 - Software Process Models.pdf
unit I lecture 3 - Software Process Models.pdfunit I lecture 3 - Software Process Models.pdf
unit I lecture 3 - Software Process Models.pdfStephenTec
 
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)GDSCNiT
 
unit I lecture 2 - Software Engineering Ethics - Software Process.pdf
unit I lecture 2 - Software Engineering Ethics - Software Process.pdfunit I lecture 2 - Software Engineering Ethics - Software Process.pdf
unit I lecture 2 - Software Engineering Ethics - Software Process.pdfStephenTec
 
BotSE2022-Natarajan.pdf
BotSE2022-Natarajan.pdfBotSE2022-Natarajan.pdf
BotSE2022-Natarajan.pdfnatarajan8993
 
owasp top 10 security risk categories and CWE
owasp top 10 security risk categories and CWEowasp top 10 security risk categories and CWE
owasp top 10 security risk categories and CWEArun Voleti
 
Microsoft 365 De Security pdf
Microsoft 365 De Security pdfMicrosoft 365 De Security pdf
Microsoft 365 De Security pdfMarkus Moeller
 
unit I lecture 5 - Software Development Life Cycle.pdf
unit I lecture 5 - Software Development Life Cycle.pdfunit I lecture 5 - Software Development Life Cycle.pdf
unit I lecture 5 - Software Development Life Cycle.pdfStephenTec
 
App Builder - Hierarchical Data Apps.pptx
App Builder - Hierarchical Data Apps.pptxApp Builder - Hierarchical Data Apps.pptx
App Builder - Hierarchical Data Apps.pptxPoojitha B
 
Slide Deck - Milestone 9 alx mils .pptx
Slide Deck  - Milestone 9 alx mils .pptxSlide Deck  - Milestone 9 alx mils .pptx
Slide Deck - Milestone 9 alx mils .pptxYassineBissaoui1
 
unit 1 lecture 1 - Introduction - Software Engineering Myths.pdf
unit 1 lecture 1 - Introduction - Software Engineering Myths.pdfunit 1 lecture 1 - Introduction - Software Engineering Myths.pdf
unit 1 lecture 1 - Introduction - Software Engineering Myths.pdfStephenTec
 
Manual de la Mezcladora SoundCraft Notepad -12Fx
Manual de la Mezcladora SoundCraft Notepad -12FxManual de la Mezcladora SoundCraft Notepad -12Fx
Manual de la Mezcladora SoundCraft Notepad -12Fxjavierdavidvelasco17
 
unit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdf
unit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdfunit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdf
unit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdfStephenTec
 
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTSi-engage
 

Recently uploaded (20)

Enabling Enterprise-wide OT Data access with Matrikon Data Broker.pdf
Enabling Enterprise-wide OT Data access  with Matrikon Data Broker.pdfEnabling Enterprise-wide OT Data access  with Matrikon Data Broker.pdf
Enabling Enterprise-wide OT Data access with Matrikon Data Broker.pdf
 
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdf
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdfIndia's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdf
India's_Generative_AI_Startup_Landscape_Report_2023_Inc42 (1).pdf
 
Microsoft Dynamics 365 IA - Copilot/ Fabric
Microsoft Dynamics 365 IA - Copilot/ FabricMicrosoft Dynamics 365 IA - Copilot/ Fabric
Microsoft Dynamics 365 IA - Copilot/ Fabric
 
Get Your Hands Off the Teams Work.pdf
Get Your Hands Off the Teams Work.pdfGet Your Hands Off the Teams Work.pdf
Get Your Hands Off the Teams Work.pdf
 
SATToSE_2023_Presentation_slideshare.pdf
SATToSE_2023_Presentation_slideshare.pdfSATToSE_2023_Presentation_slideshare.pdf
SATToSE_2023_Presentation_slideshare.pdf
 
unit I lecture 3 - Software Process Models.pdf
unit I lecture 3 - Software Process Models.pdfunit I lecture 3 - Software Process Models.pdf
unit I lecture 3 - Software Process Models.pdf
 
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
 
unit I lecture 2 - Software Engineering Ethics - Software Process.pdf
unit I lecture 2 - Software Engineering Ethics - Software Process.pdfunit I lecture 2 - Software Engineering Ethics - Software Process.pdf
unit I lecture 2 - Software Engineering Ethics - Software Process.pdf
 
BotSE2022-Natarajan.pdf
BotSE2022-Natarajan.pdfBotSE2022-Natarajan.pdf
BotSE2022-Natarajan.pdf
 
owasp top 10 security risk categories and CWE
owasp top 10 security risk categories and CWEowasp top 10 security risk categories and CWE
owasp top 10 security risk categories and CWE
 
Microsoft 365 De Security pdf
Microsoft 365 De Security pdfMicrosoft 365 De Security pdf
Microsoft 365 De Security pdf
 
unit I lecture 5 - Software Development Life Cycle.pdf
unit I lecture 5 - Software Development Life Cycle.pdfunit I lecture 5 - Software Development Life Cycle.pdf
unit I lecture 5 - Software Development Life Cycle.pdf
 
Features of IETM Software -Code and Pixels
Features of IETM Software -Code and PixelsFeatures of IETM Software -Code and Pixels
Features of IETM Software -Code and Pixels
 
App Builder - Hierarchical Data Apps.pptx
App Builder - Hierarchical Data Apps.pptxApp Builder - Hierarchical Data Apps.pptx
App Builder - Hierarchical Data Apps.pptx
 
Slide Deck - Milestone 9 alx mils .pptx
Slide Deck  - Milestone 9 alx mils .pptxSlide Deck  - Milestone 9 alx mils .pptx
Slide Deck - Milestone 9 alx mils .pptx
 
unit 1 lecture 1 - Introduction - Software Engineering Myths.pdf
unit 1 lecture 1 - Introduction - Software Engineering Myths.pdfunit 1 lecture 1 - Introduction - Software Engineering Myths.pdf
unit 1 lecture 1 - Introduction - Software Engineering Myths.pdf
 
Manual de la Mezcladora SoundCraft Notepad -12Fx
Manual de la Mezcladora SoundCraft Notepad -12FxManual de la Mezcladora SoundCraft Notepad -12Fx
Manual de la Mezcladora SoundCraft Notepad -12Fx
 
unit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdf
unit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdfunit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdf
unit I lecture 4 - AGILE DEVELOPMENT AND PLAN-DRIVEN.pdf
 
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS
100 TOOLS TO MEASURE AND ANALYSE YOUR DIGITAL MARKETING EFFORTS
 
Importance Of Smaket In Your Buussiness
Importance Of Smaket In Your BuussinessImportance Of Smaket In Your Buussiness
Importance Of Smaket In Your Buussiness
 

Building Data Pipelines for Solr with Apache NiFi

  • 1. Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Building Data Pipelines for Solr with Apache NiFi Bryan Bende – Member of Technical Staff
  • 2. Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Outline • Introduction to Apache NiFi • Solr Indexing & Update Handlers • NiFi/Solr Integration • Use Cases
  • 3. Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved About Me • Member of Technical Staff at Hortonworks • Apache NiFi Committer & PMC Member since June 2015 • Solr/Lucene user for several years • Developed Solr integration for Apache NiFi 0.1.0 release • Twitter: @bbende / Blog: bryanbende.com
  • 4. Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Introduction Installing Solr and getting started - easy (extract, bin/solr start) Defining a schema and configuring Solr - easy Getting all of your incoming data into Solr - not as easy A lot of time spent… • Cleaning and parsing data • Writing custom code/scripts • Building approaches for monitoring and debugging • Deploying updates to code/scripts for small changes Need something to make this easier…
  • 5. Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Introduction to Apache NiFi
  • 6. Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache NiFi • Powerful and reliable system to process and distribute data • Directed graphs of data routing and transformation • Web-based User Interface for creating, monitoring, & controlling data flows • Highly configurable - modify data flow at runtime, dynamically prioritize data • Data Provenance tracks data through entire system • Easily extensible through development of custom components [1] https://nifi.apache.org/
  • 7. Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved NiFi - Terminology FlowFile • Unit of data moving through the system • Content + Attributes (key/value pairs) Processor • Performs the work, can access FlowFiles Connection • Links between processors • Queues that can be dynamically prioritized Process Group • Set of processors and their connections • Receive data via input ports, send data via output ports
  • 8. Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved NiFi - User Interface • Drag and drop processors to build a flow • Start, stop, and configure components in real time • View errors and corresponding error messages • View statistics and health of data flow • Create templates of common processor & connections
  • 9. Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved NiFi - Provenance • Tracks data at each point as it flows through the system • Records, indexes, and makes events available for display • Handles fan-in/fan-out, i.e. merging and splitting data • View attributes and content at given points in time
  • 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved NiFi - Queue Prioritization • Configure a prioritizer per connection • Determine what is important for your data – time based, arrival order, importance of a data set • Funnel many connections down to a single connection to prioritize across data sets • Develop your own prioritizer if needed
  • 11. Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved NiFi - Extensibility Built from the ground up with extensions in mind Service-loader pattern for… • Processors • Controller Services • Reporting Tasks • Prioritizers Extensions packaged as NiFi Archives (NARs) • Deploy NiFi lib directory and restart • Provides ClassLoader isolation • Same model as standard components
  • 12. Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved NiFi - Architecture OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage OS/Host JVM NiFi Cluster Manager – Request Replicator Web Server Master NiFi Cluster Manager (NCM) OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage Slaves NiFi Nodes
  • 13. Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Solr Indexing & Update Handlers
  • 14. Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Solr – Indexing Data Update Handlers • XML, JSON, CSV • https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers Clients • Java, PHP, Python, Ruby, Scala, Perl, and more • https://wiki.apache.org/solr/IntegratingSolr
  • 15. Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Solr Update Handlers - XML Adding documents <add> <doc> <field name=”foo”>bad</field> </doc> </add> Deleting documents <delete> <id>1234567</id> <query>foo:bar</query> </delete> Other Operations <commit waitSearcher="false"/> <commit waitSearcher="false" expungeDeletes="true"/> <optimize waitSearcher="false"/>
  • 16. Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Solr Update Handlers - JSON Solr-Style JSON… Add Documents [ { "id": "1”, "title": "Doc 1” }, { "id": "2”, "title": "Doc 2” } ] Commands { "add": { "doc": { "id": "1”, "title": { "boost": 2.3, "value": "Doc1” } } } }
  • 17. Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Solr Update Handlers - JSON Custom JSON • Transform custom JSON based on Solr schema • Define paths to split JSON into multiple Solr documents • Field mappings from JSON field name to Solr field name Produces two Solr documents: - John, Math, term1, 90 - John, Biology, term1, 86 split=/exams& f=name:/name& f=subject:/exams/subject& f=test:/exams/test& f=marks:/exams/marks { "name": "John", "exams": [ { "subject": "Math", "test" : "term1", "marks" : 90}, { "subject": "Biology", "test" : "term1", "marks" : 86} ] }
  • 18. Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Solr Update Handlers - CSV /update with Content-Type:application/csv Important parameters: • separator • trim • header • fieldnames • skip • rowid
  • 19. Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved SolrJ Client SolrDocument Update SolrInputDocument doc = new SolrInputDocument(); doc.addField("first", "bob"); doc.addField("last", "smith"); solrClient.add(doc); ContentStream Update ContentStreamUpdateRequest request = new ContentStreamUpdateRequest( "/update/json/docs"); request.setParam("json.command", "false"); request.setParam("split", "/exams"); request.getParams().add("f", "name:/name"); request.getParams().add("f", "subject:/exams/subject"); request.getParams().add("f","test:/exams/test"); request.getParams().add("f","marks:/exams/marks"); request.addContentStream(new ContentStream...);
  • 20. Page20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved NiFi/Solr Integration
  • 21. Page21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved NiFi Solr Processors • Support Solr Cloud and stand-alone Solr instances • Leverage SolrJ – CloudSolrClient & HttpSolrClient • Extract new documents based on a date/time field – GetSolr • Stream FlowFile content to an update handler - PutSolrContentStream
  • 22. Page22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved PutSolrContentStream • Choose Solr Type - Cloud or Standard • Specify ZooKeeper hosts, or the Solr URL • Specify a collection if using Solr Cloud • Specify the Solr path for the ContentStream • Dynamic properties sent as key/value pairs on the request • Relationships for success, failure, and connection_failure
  • 23. Page23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved GetSolr • Solr Type, Solr Location, and Collection are the same as PutSolr • Specify a query to run on each execution of the processor • Specify a sort clause and a date field used to filter results • Schedule processor to run on a cron, or timer • Retrieves documents with ‘Date Field’ greater than time of last execution • Produces output in SolrJ XML
  • 24. Page24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Use Cases
  • 25. Page25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Use Cases – Index JSON 1. Pull in Tweets using Twitter API 2. Extract language and text into FlowFile attributes 3. Get non-empty English tweets ${twitter.text:isEmpty():not():and( ${twitter.lang:equals("en")})} 4. Merge together JSON documents based on quantity, or time 5. Use dynamic field mappings to select fields for indexing:
  • 26. Page26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Use Cases – Issue Commands 1. Generate a FlowFile on a cron, or timer, to initiate an action 2. Replace the contents of the FlowFile with a Solr command <delete> <query> timestamp:[* TO NOW-1HOUR] </query> </delete> 3. Send the command to the appropriate update handler
  • 27. Page27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Use Cases – Multiple Collections 1. Set a FlowFile attribute containing the name of a Solr collection 2. Use expression language when setting the Collection property on the Solr processor: ${solr.collection} Note: • If merging documents, merge per collection in this case • Current bug preventing this scenario from working: https://issues.apache.org/jira/browse/NIFI-959
  • 28. Page28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Use Cases – Log Aggregation 1. Listen for log events over UDP on a given port • Set ‘Flow File Per Datagram’ to true 2. Send JSON log events • Syslog UDP forwarding • Logback/log4j UDP appenders 3. Merge JSON events together based on size, or time 4. Stream JSON update to Solr http://bryanbende.com/development/2015/05/17/c ollecting-logs-with-apache-nifi/
  • 29. Page29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Use Cases – Index Avro 1. Receive an Avro datafile with binary encoding 2. Convert Avro to JSON using built in ConvertAvroToJSON processor 3. Stream JSON documents to Solr
  • 30. Page30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Use Cases – Index a Relational Database 1. GenerateFlowFile acts a timer to trigger ExecuteSQL (Future plans to not require in an incoming FlowFile to ExecuteSQL NIFI-932) 2. ExecuteSQL performs a SQL query and streams the results as an Avro datafile Use expression language to construct a dynamic date range: ${now():toNumber():minus(60000) :format(‘YYYY-MM-DD’} 3. Convert Avro to JSON using built in ConvertAvroToJSON processor 4. Stream JSON update to Solr
  • 31. Page31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Use Case – Extraction in a Cluster 1. Schedule GetSolr to run on Primary Node 2. Send results to a Remote Process Group pointing back to self 3. Data gets redistributed to “Solr XML Docs” Input Ports across cluster 4. Perform further processing on each node
  • 32. Page32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Future Work Unofficial ideas… PutSolrDocument • Parse FlowFile InputStream into one or more SolrDocuments • Allow developers to provide “FlowFile to SolrDocument” converter PutSolrAttributes • Create a SolrDocument from FlowFile attributes • Processor properties specify attributes to include/exclude Distribute & Execute Solr Commands • DistributeSolrCommand learns about Solr shards and produces commands per shard • ExecuteSolrCommand performs action based on the incoming command
  • 33. Page33 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Summary Resources • Apache NiFi Mailing Lists – https://nifi.apache.org/mailing_lists.html • Apache NiFi Documentation – https://nifi.apache.org/docs.html • Getting started developing extensions – https://cwiki.apache.org/confluence/display/NIFI/Maven+Projects+for+Extensions – https://nifi.apache.org/developer-guide.html Contact Info: • Email: bbende@hortonworks.com • Twitter: @bbende
  • 34. Page34 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Sources [1] https://nifi.apache.org/ [2] https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers [3] https://wiki.apache.org/solr/IntegratingSolr [4] http://lucidworks.com/blog/indexing-custom-json-data/
  • 35. Page35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Thank you