From Zero to Data Flow in Hours with Apache NiFi

Copyright © 2016, Schlumberger, All rights reserved.
From Zero to Data Flow
In Hours with Apache Nifi
Hadoop Summit – San Jose 2016
Chris Herrera
Schlumberger

Agenda
• Why is composable data flow important to the drilling industry
• Current State of the System
• The Breaking Point to the new system
• An unexpected workflow in testing
• How are we using it today
• What’s Next

Legal Notices
This presentation is for informational purposes only. STATEMENTS AND OPINIONS EXPRESSED IN THIS PRESENTATION ARE
THOSE OF THE PRESENTER AND DO NOT REFLECT THE OPINIONS OF SCHLUMBERGER. SCHLUMBERGER AND THE
PRESENTER HEREBY DISCLAIM ANY REPRESENTATIONS AND/OR WARRANTIES EXPRESS OR IMPLIED. SCHLUMBERGER AND
THE PRESENTER HEREBY DISCLAIM ANY RESPONSIBILITY FOR THE CONTENT, ACCURACY, AND/OR COMPLETENESS OF THE
INFORMATION IN this presentation.
This presentation, and any recordings or reproductions in various media formats, including, without limitation, print, audio,
and video, is the copyrighted work of Schlumberger, and Schlumberger hereby retains all intellectual property and/or
proprietary rights related thereto.
Schlumberger and the Schlumberger logo are trademarks of Schlumberger in the U.S. and/or other countries. Other names
and brands referenced in this presentation are the trademarks of their respective owners, and any references thereto are
not endorsements or approvals.

Introduction
• 2 Years managing product
development and innovation teams
working on real time data ingestion
and delivery
• 5 years of experience in the Hadoop
ecosystem
• 11 years of experience with various
aspects of the oilfield (operational
and technical)
Chris Herrera
Schlumberger

Wireline
Measurement / Logging
While Drilling
Mud logging
Fluids
Completions
Cementing
Rig • Several contractors brought in to
develop and complete the well
• Can be comprised of one, or most of
the time many companies
• All bringing their own system, a lot of
times without a central repository of
data
• Can be within decent cell connectivity,
or out deep in the middle of a jungle
with only 128k of high latency
bandwidth
The Major Components of a Drilling Project

Where Does This Data Need to Go?
RT Server
Operational
Support
Client
Monitoring
Processing and
Print Centers

Workflow of Data During and Post Operations
ProcessingCenter
Acquisition
DataServer
Classification
& Labelling
Quality Control
Classification
Quality Control
Hosting
QC & Labelling
Conversion
Data Delivery
KPI&Reporting
ProcessingAcq
Sales and Job Planning
Data
Processor
Customer
Manager
Client Data Delivery
Sales
Field
Engineer

Input
DLIS
LAS
1.2
2.0
3.0
WITS
Level 0
Level 1
Level 2
CSV
Profibus Modbus
What Does This Mean In A Data Sense
Output
CSV PDS
LAS
1.2
2.0
3.0
DLIS
RT Server

What Does This Mean in a Volume Sense
~9000
Users / Month
~10
Files / Minute
~480
Data
Queries / sec
~3050
Wells / month

Context
Fidelity
Time
Acquisition - Field Interpretation - Office
A Quick(ish) Note On The Importance of Data Provenance
• Need to retain the
fidelity
throughout the
flow.

Typical Data Problems Concerns
• What is the time zone of the data we are receiving – one day UTC...
• ”Ahh, I see you did not implement that part of the standard...”
• Wait, Why are you sending data at 5 times the sampling rate of the
sensor...
• I did not get the memo that you were changing your data model
today...
• Governmental / Client data residency concerns

Current Solution…
• 100+ Man Years of effort
over 14 years
• ~2,000,000 + Lines of Code
• Extreme barrier to entry
for workflow changes
• Very little understanding of
what happened to the data
Input
DLIS
LAS
1.2
2.0 3.0
WITS
Level 0
Level 1
Level 2
CSV
Profibus Modbus
Output
CSV PDS
LAS
1.2
2.0 3.0
DLIS
RT Server

We Needed A Simpler – Maintainable Solution…

The Original Plan…
Rabbit
MQ
DLIS
Parser
ETP
Endpoint
LAS
Parser Data
Writer
{}
DB
Event
Publisher
Node
JS
What About:
• Data cleansing
• Routing
• The ability to debug what
has gone wrong
• TIME (estimated 6 man
months)

How does Nifi fit into the equation?
• Knowing where data came from is crucial (and
often missing) to real time decision making
• The ability to visualize the data flow at a
granular level aids in troubleshooting and
operational understanding
• With several processors already available, there
is a low barrier to entry when it comes to data
flow creation

Enter Nifi…
Processor Creation
Data Flow Creation
Creation
Play…
10 Man Hours
ETP
WITSML 1.3.1.1 / 1.4.1.1
LAS 1.2 / 2.0
1 Man Day

Prototype Setup
Data Source
Processor
Input
Data Cleansing
Data
Enrichment
{ }
Repo
Data
Storage
Put Data
2 Man Days
• Append Well Name
• Append Client Name
• Append Run name
• Append Pass Name
Process Group:
Get
Update
Process Group:
Fix Time Zone
Remove Absent indexes
Data Cleansing
Routing

What About Testing!

Testing Landscape Today
2.2 TB Test Data
• 22 Applications
• 14 Different formats of data
• Data of questionable quality
• Stored on a file share
Effort
• .5 man effort / sprint on
maintenance
• 2 weeks to perform a full test

Step 1: Data Set Curation – Creating the Set of Reference
LAS
1.2
2.0
3.0
WITS
Level 0
Level 1
Level 2
CSV
Clean
Test
Data
Set
2.2 TB Test Data
6 Hours

Docker
Step 2: Immediate Test Harness
Clean
Test
Data
Set
• Step 1: Need Data
• Step 2: Docker pull
xxx.xxx.xxx.xxx:xxxx/flowTest
• Step 3: add put processor
• Step 4: start dataflow
From: 2 weeks to setup a test to:

• Docker
Step 3: Immediate Live Data Testing
Production
RT System
Processor
Input
Testing
Processor
Group
Anonymize
Data
• Significantly cuts
down time to test
application against
real data
• Especially in
brownfield
applications
• Brings a level of
confidence to the
project that
otherwise would be
missing.

Next Steps

Use Cases to be Explored for MiniFi – Rig Data Ingestion with Provenance
RT Server
• Understanding the chain of custody from sensor to user
• Tracking the provenance of the data as it traverses through
the system

Thank You! Questions?

From Zero to Data Flow in Hours with Apache NiFi

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to From Zero to Data Flow in Hours with Apache NiFi

Similar to From Zero to Data Flow in Hours with Apache NiFi (20)

More from DataWorks Summit/Hadoop Summit

More from DataWorks Summit/Hadoop Summit (20)

Recently uploaded

Recently uploaded (20)

From Zero to Data Flow in Hours with Apache NiFi

Editor's Notes