Submit Search
Upload
From Zero to Data Flow in Hours with Apache NiFi
•
Download as PPTX, PDF
•
12 likes
•
6,861 views
DataWorks Summit/Hadoop Summit
Follow
From Zero to Data Flow in Hours with Apache NiFi
Read less
Read more
Technology
Slideshow view
Report
Share
Slideshow view
Report
Share
1 of 25
Download now
Recommended
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Kai Wähner
Memory Management in Apache Spark
Memory Management in Apache Spark
Databricks
The delta architecture
The delta architecture
Prakash Chockalingam
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
StreamNative
Introduction to Spark Internals
Introduction to Spark Internals
Pietro Michiardi
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
confluent
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
DataWorks Summit/Hadoop Summit
Recommended
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Kai Wähner
Memory Management in Apache Spark
Memory Management in Apache Spark
Databricks
The delta architecture
The delta architecture
Prakash Chockalingam
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
StreamNative
Introduction to Spark Internals
Introduction to Spark Internals
Pietro Michiardi
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
confluent
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
DataWorks Summit/Hadoop Summit
Architecture of Big Data Solutions
Architecture of Big Data Solutions
Guido Schmutz
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Altinity Ltd
Deep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Databricks
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
Databricks
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Databricks
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
Xiang Fu
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Databricks
Introduction to NoSQL Databases
Introduction to NoSQL Databases
Derek Stainer
FLiP Into Trino
FLiP Into Trino
Timothy Spann
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
Databricks
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
Lev Brailovskiy
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
HostedbyConfluent
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL Analytics
Databricks
RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ Twitter
Redis Labs
Spark Performance Tuning .pdf
Spark Performance Tuning .pdf
Amit Raj
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
Cloudera, Inc.
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Data Con LA
More Related Content
What's hot
Architecture of Big Data Solutions
Architecture of Big Data Solutions
Guido Schmutz
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Altinity Ltd
Deep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Amazon Web Services
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Databricks
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
Databricks
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Databricks
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
Xiang Fu
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Databricks
Introduction to NoSQL Databases
Introduction to NoSQL Databases
Derek Stainer
FLiP Into Trino
FLiP Into Trino
Timothy Spann
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
Databricks
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
Lev Brailovskiy
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
HostedbyConfluent
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL Analytics
Databricks
RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ Twitter
Redis Labs
Spark Performance Tuning .pdf
Spark Performance Tuning .pdf
Amit Raj
What's hot
(20)
Architecture of Big Data Solutions
Architecture of Big Data Solutions
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Deep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Introduction to NoSQL Databases
Introduction to NoSQL Databases
FLiP Into Trino
FLiP Into Trino
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL Analytics
RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ Twitter
Spark Performance Tuning .pdf
Spark Performance Tuning .pdf
Similar to From Zero to Data Flow in Hours with Apache NiFi
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
Cloudera, Inc.
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Data Con LA
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Cloudera, Inc.
Architecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an example
hadooparchbook
Application Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015
Cloudera, Inc.
Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic
IntelAPAC
Application Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
Capital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting Platform
DataStax Academy
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
donaghmccabe
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
Riccardo Romani
End to End Streaming Architectures
End to End Streaming Architectures
Cloudera, Inc.
Customer Applications Of Hadoop On Red Hat Storage Server
Customer Applications Of Hadoop On Red Hat Storage Server
Red_Hat_Storage
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
Yafang Chang
Big Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast Data
Matt Stubbs
Spark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.
Architecting Applications with Hadoop
Architecting Applications with Hadoop
markgrover
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
Cloudera, Inc.
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
TDC2016SP - Trilha NoSQL
TDC2016SP - Trilha NoSQL
tdc-globalcode
Similar to From Zero to Data Flow in Hours with Apache NiFi
(20)
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Architecting application with Hadoop - using clickstream analytics as an example
Architecting application with Hadoop - using clickstream analytics as an example
Application Architectures with Hadoop
Application Architectures with Hadoop
Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015
Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic
Application Architectures with Hadoop
Application Architectures with Hadoop
Capital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting Platform
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
End to End Streaming Architectures
End to End Streaming Architectures
Customer Applications Of Hadoop On Red Hat Storage Server
Customer Applications Of Hadoop On Red Hat Storage Server
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
Big Data LDN 2016: When Big Data Meets Fast Data
Big Data LDN 2016: When Big Data Meets Fast Data
Spark One Platform Webinar
Spark One Platform Webinar
Architecting Applications with Hadoop
Architecting Applications with Hadoop
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
TDC2016SP - Trilha NoSQL
TDC2016SP - Trilha NoSQL
More from DataWorks Summit/Hadoop Summit
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
Hadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
Dataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
More from DataWorks Summit/Hadoop Summit
(20)
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Hadoop Crash Course
Data Science Crash Course
Data Science Crash Course
Apache Spark Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
HBase in Practice
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
Recently uploaded
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Padma Pradeep
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Slibray Presentation
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Ridwan Fadjar
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Addepto
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
Memoori
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
carlostorres15106
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April Automation LPDG
MarianaLemus7
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Fwdays
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
NavinnSomaal
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
2toLead Limited
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
2toLead Limited
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Scott Keck-Warren
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
Mark Billinghurst
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
UiPathCommunity
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
charlottematthew16
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
comworks
costume and set research powerpoint presentation
costume and set research powerpoint presentation
phoebematthew05
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Enterprise Knowledge
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Fwdays
Recently uploaded
(20)
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April Automation LPDG
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
costume and set research powerpoint presentation
costume and set research powerpoint presentation
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
From Zero to Data Flow in Hours with Apache NiFi
1.
Copyright © 2016,
Schlumberger, All rights reserved. From Zero to Data Flow In Hours with Apache Nifi Hadoop Summit – San Jose 2016 Chris Herrera Schlumberger
2.
Copyright © 2016,
Schlumberger, All rights reserved. Agenda • Why is composable data flow important to the drilling industry • Current State of the System • The Breaking Point to the new system • An unexpected workflow in testing • How are we using it today • What’s Next
3.
Copyright © 2016,
Schlumberger, All rights reserved. Legal Notices This presentation is for informational purposes only. STATEMENTS AND OPINIONS EXPRESSED IN THIS PRESENTATION ARE THOSE OF THE PRESENTER AND DO NOT REFLECT THE OPINIONS OF SCHLUMBERGER. SCHLUMBERGER AND THE PRESENTER HEREBY DISCLAIM ANY REPRESENTATIONS AND/OR WARRANTIES EXPRESS OR IMPLIED. SCHLUMBERGER AND THE PRESENTER HEREBY DISCLAIM ANY RESPONSIBILITY FOR THE CONTENT, ACCURACY, AND/OR COMPLETENESS OF THE INFORMATION IN this presentation. This presentation, and any recordings or reproductions in various media formats, including, without limitation, print, audio, and video, is the copyrighted work of Schlumberger, and Schlumberger hereby retains all intellectual property and/or proprietary rights related thereto. Schlumberger and the Schlumberger logo are trademarks of Schlumberger in the U.S. and/or other countries. Other names and brands referenced in this presentation are the trademarks of their respective owners, and any references thereto are not endorsements or approvals. Copyright © 2016, Schlumberger, All rights reserved.
4.
Copyright © 2016,
Schlumberger, All rights reserved. Introduction • 2 Years managing product development and innovation teams working on real time data ingestion and delivery • 5 years of experience in the Hadoop ecosystem • 11 years of experience with various aspects of the oilfield (operational and technical) Chris Herrera Schlumberger
5.
Copyright © 2016,
Schlumberger, All rights reserved. Wireline Measurement / Logging While Drilling Mud logging Fluids Completions Cementing Rig • Several contractors brought in to develop and complete the well • Can be comprised of one, or most of the time many companies • All bringing their own system, a lot of times without a central repository of data • Can be within decent cell connectivity, or out deep in the middle of a jungle with only 128k of high latency bandwidth The Major Components of a Drilling Project
6.
Copyright © 2016,
Schlumberger, All rights reserved. Where Does This Data Need to Go? RT Server Operational Support Client Monitoring Processing and Print Centers
7.
Copyright © 2016,
Schlumberger, All rights reserved. Workflow of Data During and Post Operations ProcessingCenter Acquisition DataServer Classification & Labelling Quality Control Classification Quality Control Hosting QC & Labelling Conversion Data Delivery KPI&Reporting ProcessingAcq Sales and Job Planning Data Processor Customer Manager Client Data Delivery Sales Field Engineer
8.
Copyright © 2016,
Schlumberger, All rights reserved. Input DLIS LAS 1.2 2.0 3.0 WITS Level 0 Level 1 Level 2 CSV Profibus Modbus What Does This Mean In A Data Sense Output CSV PDS LAS 1.2 2.0 3.0 DLIS RT Server
9.
Copyright © 2016,
Schlumberger, All rights reserved. What Does This Mean in a Volume Sense ~9000 Users / Month ~10 Files / Minute ~480 Data Queries / sec ~3050 Wells / month
10.
Copyright © 2016,
Schlumberger, All rights reserved. Context Fidelity Time Acquisition - Field Interpretation - Office A Quick(ish) Note On The Importance of Data Provenance • Need to retain the fidelity throughout the flow.
11.
Copyright © 2016,
Schlumberger, All rights reserved. Typical Data Problems Concerns • What is the time zone of the data we are receiving – one day UTC... • ”Ahh, I see you did not implement that part of the standard...” • Wait, Why are you sending data at 5 times the sampling rate of the sensor... • I did not get the memo that you were changing your data model today... • Governmental / Client data residency concerns
12.
Copyright © 2016,
Schlumberger, All rights reserved. Current Solution… • 100+ Man Years of effort over 14 years • ~2,000,000 + Lines of Code • Extreme barrier to entry for workflow changes • Very little understanding of what happened to the data Input DLIS LAS 1.2 2.0 3.0 WITS Level 0 Level 1 Level 2 CSV Profibus Modbus Output CSV PDS LAS 1.2 2.0 3.0 DLIS RT Server
13.
Copyright © 2016,
Schlumberger, All rights reserved. We Needed A Simpler – Maintainable Solution…
14.
Copyright © 2016,
Schlumberger, All rights reserved. The Original Plan… Rabbit MQ DLIS Parser ETP Endpoint LAS Parser Data Writer {} DB Event Publisher Node JS What About: • Data cleansing • Routing • The ability to debug what has gone wrong • TIME (estimated 6 man months)
15.
Copyright © 2016,
Schlumberger, All rights reserved. How does Nifi fit into the equation? • Knowing where data came from is crucial (and often missing) to real time decision making • The ability to visualize the data flow at a granular level aids in troubleshooting and operational understanding • With several processors already available, there is a low barrier to entry when it comes to data flow creation
16.
Copyright © 2016,
Schlumberger, All rights reserved. Enter Nifi… Processor Creation Data Flow Creation Creation Play… 10 Man Hours ETP WITSML 1.3.1.1 / 1.4.1.1 LAS 1.2 / 2.0 1 Man Day
17.
Copyright © 2016,
Schlumberger, All rights reserved. Prototype Setup Data Source Processor Input Data Cleansing Data Enrichment { } Repo Data Storage Put Data 2 Man Days • Append Well Name • Append Client Name • Append Run name • Append Pass Name Process Group: Get Update Process Group: Fix Time Zone Remove Absent indexes Data Cleansing Routing
18.
Copyright © 2016,
Schlumberger, All rights reserved. What About Testing!
19.
Copyright © 2016,
Schlumberger, All rights reserved. Testing Landscape Today 2.2 TB Test Data • 22 Applications • 14 Different formats of data • Data of questionable quality • Stored on a file share Effort • .5 man effort / sprint on maintenance • 2 weeks to perform a full test
20.
Copyright © 2016,
Schlumberger, All rights reserved. Step 1: Data Set Curation – Creating the Set of Reference LAS 1.2 2.0 3.0 WITS Level 0 Level 1 Level 2 CSV Clean Test Data Set 2.2 TB Test Data 6 Hours
21.
Copyright © 2016,
Schlumberger, All rights reserved. Docker Step 2: Immediate Test Harness Clean Test Data Set • Step 1: Need Data • Step 2: Docker pull xxx.xxx.xxx.xxx:xxxx/flowTest • Step 3: add put processor • Step 4: start dataflow From: 2 weeks to setup a test to:
22.
Copyright © 2016,
Schlumberger, All rights reserved. • Docker Step 3: Immediate Live Data Testing Production RT System Processor Input Testing Processor Group Anonymize Data • Significantly cuts down time to test application against real data • Especially in brownfield applications • Brings a level of confidence to the project that otherwise would be missing.
23.
Copyright © 2016,
Schlumberger, All rights reserved. Next Steps
24.
Copyright © 2016,
Schlumberger, All rights reserved. Use Cases to be Explored for MiniFi – Rig Data Ingestion with Provenance RT Server • Understanding the chain of custody from sensor to user • Tracking the provenance of the data as it traverses through the system
25.
Copyright © 2016,
Schlumberger, All rights reserved. Thank You! Questions?
Editor's Notes
Different arrival times Different Data streams Exchanging data amongst themselves Unknown quality
Download now