Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Copyright © 2016, Schlumberger, All rights reserved.
From Zero to Data Flow
In Hours with Apache Nifi
Hadoop Summit – San ...
Copyright © 2016, Schlumberger, All rights reserved.
Agenda
• Why is composable data flow important to the drilling indust...
Copyright © 2016, Schlumberger, All rights reserved.
Legal Notices
This presentation is for informational purposes only. S...
Copyright © 2016, Schlumberger, All rights reserved.
Introduction
• 2 Years managing product
development and innovation te...
Copyright © 2016, Schlumberger, All rights reserved.
Wireline
Measurement / Logging
While Drilling
Mud logging
Fluids
Comp...
Copyright © 2016, Schlumberger, All rights reserved.
Where Does This Data Need to Go?
RT Server
Operational
Support
Client...
Copyright © 2016, Schlumberger, All rights reserved.
Workflow of Data During and Post Operations
ProcessingCenter
Acquisit...
Copyright © 2016, Schlumberger, All rights reserved.
Input
DLIS
LAS
1.2
2.0
3.0
WITS
Level 0
Level 1
Level 2
CSV
Profibus ...
Copyright © 2016, Schlumberger, All rights reserved.
What Does This Mean in a Volume Sense
~9000
Users / Month
~10
Files /...
Copyright © 2016, Schlumberger, All rights reserved.
Context
Fidelity
Time
Acquisition - Field Interpretation - Office
A Q...
Copyright © 2016, Schlumberger, All rights reserved.
Typical Data Problems Concerns
• What is the time zone of the data we...
Copyright © 2016, Schlumberger, All rights reserved.
Current Solution…
• 100+ Man Years of effort
over 14 years
• ~2,000,0...
Copyright © 2016, Schlumberger, All rights reserved.
We Needed A Simpler – Maintainable Solution…
Copyright © 2016, Schlumberger, All rights reserved.
The Original Plan…
Rabbit
MQ
DLIS
Parser
ETP
Endpoint
LAS
Parser Data...
Copyright © 2016, Schlumberger, All rights reserved.
How does Nifi fit into the equation?
• Knowing where data came from i...
Copyright © 2016, Schlumberger, All rights reserved.
Enter Nifi…
Processor Creation
Data Flow Creation
Creation
Play…
10 M...
Copyright © 2016, Schlumberger, All rights reserved.
Prototype Setup
Data Source
Processor
Input
Data Cleansing
Data
Enric...
Copyright © 2016, Schlumberger, All rights reserved.
What About Testing!
Copyright © 2016, Schlumberger, All rights reserved.
Testing Landscape Today
2.2 TB Test Data
• 22 Applications
• 14 Diffe...
Copyright © 2016, Schlumberger, All rights reserved.
Step 1: Data Set Curation – Creating the Set of Reference
LAS
1.2
2.0...
Copyright © 2016, Schlumberger, All rights reserved.
Docker
Step 2: Immediate Test Harness
Clean
Test
Data
Set
• Step 1: N...
Copyright © 2016, Schlumberger, All rights reserved.
• Docker
Step 3: Immediate Live Data Testing
Production
RT System
Pro...
Copyright © 2016, Schlumberger, All rights reserved.
Next Steps
Copyright © 2016, Schlumberger, All rights reserved.
Use Cases to be Explored for MiniFi – Rig Data Ingestion with Provena...
Copyright © 2016, Schlumberger, All rights reserved.
Thank You! Questions?
Upcoming SlideShare
Loading in …5
×

From Zero to Data Flow in Hours with Apache NiFi

4,487 views

Published on

From Zero to Data Flow in Hours with Apache NiFi

Published in: Technology
  • Be the first to comment

From Zero to Data Flow in Hours with Apache NiFi

  1. 1. Copyright © 2016, Schlumberger, All rights reserved. From Zero to Data Flow In Hours with Apache Nifi Hadoop Summit – San Jose 2016 Chris Herrera Schlumberger
  2. 2. Copyright © 2016, Schlumberger, All rights reserved. Agenda • Why is composable data flow important to the drilling industry • Current State of the System • The Breaking Point to the new system • An unexpected workflow in testing • How are we using it today • What’s Next
  3. 3. Copyright © 2016, Schlumberger, All rights reserved. Legal Notices This presentation is for informational purposes only. STATEMENTS AND OPINIONS EXPRESSED IN THIS PRESENTATION ARE THOSE OF THE PRESENTER AND DO NOT REFLECT THE OPINIONS OF SCHLUMBERGER. SCHLUMBERGER AND THE PRESENTER HEREBY DISCLAIM ANY REPRESENTATIONS AND/OR WARRANTIES EXPRESS OR IMPLIED. SCHLUMBERGER AND THE PRESENTER HEREBY DISCLAIM ANY RESPONSIBILITY FOR THE CONTENT, ACCURACY, AND/OR COMPLETENESS OF THE INFORMATION IN this presentation. This presentation, and any recordings or reproductions in various media formats, including, without limitation, print, audio, and video, is the copyrighted work of Schlumberger, and Schlumberger hereby retains all intellectual property and/or proprietary rights related thereto. Schlumberger and the Schlumberger logo are trademarks of Schlumberger in the U.S. and/or other countries. Other names and brands referenced in this presentation are the trademarks of their respective owners, and any references thereto are not endorsements or approvals. Copyright © 2016, Schlumberger, All rights reserved.
  4. 4. Copyright © 2016, Schlumberger, All rights reserved. Introduction • 2 Years managing product development and innovation teams working on real time data ingestion and delivery • 5 years of experience in the Hadoop ecosystem • 11 years of experience with various aspects of the oilfield (operational and technical) Chris Herrera Schlumberger
  5. 5. Copyright © 2016, Schlumberger, All rights reserved. Wireline Measurement / Logging While Drilling Mud logging Fluids Completions Cementing Rig • Several contractors brought in to develop and complete the well • Can be comprised of one, or most of the time many companies • All bringing their own system, a lot of times without a central repository of data • Can be within decent cell connectivity, or out deep in the middle of a jungle with only 128k of high latency bandwidth The Major Components of a Drilling Project
  6. 6. Copyright © 2016, Schlumberger, All rights reserved. Where Does This Data Need to Go? RT Server Operational Support Client Monitoring Processing and Print Centers
  7. 7. Copyright © 2016, Schlumberger, All rights reserved. Workflow of Data During and Post Operations ProcessingCenter Acquisition DataServer Classification & Labelling Quality Control Classification Quality Control Hosting QC & Labelling Conversion Data Delivery KPI&Reporting ProcessingAcq Sales and Job Planning Data Processor Customer Manager Client Data Delivery Sales Field Engineer
  8. 8. Copyright © 2016, Schlumberger, All rights reserved. Input DLIS LAS 1.2 2.0 3.0 WITS Level 0 Level 1 Level 2 CSV Profibus Modbus What Does This Mean In A Data Sense Output CSV PDS LAS 1.2 2.0 3.0 DLIS RT Server
  9. 9. Copyright © 2016, Schlumberger, All rights reserved. What Does This Mean in a Volume Sense ~9000 Users / Month ~10 Files / Minute ~480 Data Queries / sec ~3050 Wells / month
  10. 10. Copyright © 2016, Schlumberger, All rights reserved. Context Fidelity Time Acquisition - Field Interpretation - Office A Quick(ish) Note On The Importance of Data Provenance • Need to retain the fidelity throughout the flow.
  11. 11. Copyright © 2016, Schlumberger, All rights reserved. Typical Data Problems Concerns • What is the time zone of the data we are receiving – one day UTC... • ”Ahh, I see you did not implement that part of the standard...” • Wait, Why are you sending data at 5 times the sampling rate of the sensor... • I did not get the memo that you were changing your data model today... • Governmental / Client data residency concerns
  12. 12. Copyright © 2016, Schlumberger, All rights reserved. Current Solution… • 100+ Man Years of effort over 14 years • ~2,000,000 + Lines of Code • Extreme barrier to entry for workflow changes • Very little understanding of what happened to the data Input DLIS LAS 1.2 2.0 3.0 WITS Level 0 Level 1 Level 2 CSV Profibus Modbus Output CSV PDS LAS 1.2 2.0 3.0 DLIS RT Server
  13. 13. Copyright © 2016, Schlumberger, All rights reserved. We Needed A Simpler – Maintainable Solution…
  14. 14. Copyright © 2016, Schlumberger, All rights reserved. The Original Plan… Rabbit MQ DLIS Parser ETP Endpoint LAS Parser Data Writer {} DB Event Publisher Node JS What About: • Data cleansing • Routing • The ability to debug what has gone wrong • TIME (estimated 6 man months)
  15. 15. Copyright © 2016, Schlumberger, All rights reserved. How does Nifi fit into the equation? • Knowing where data came from is crucial (and often missing) to real time decision making • The ability to visualize the data flow at a granular level aids in troubleshooting and operational understanding • With several processors already available, there is a low barrier to entry when it comes to data flow creation
  16. 16. Copyright © 2016, Schlumberger, All rights reserved. Enter Nifi… Processor Creation Data Flow Creation Creation Play… 10 Man Hours ETP WITSML 1.3.1.1 / 1.4.1.1 LAS 1.2 / 2.0 1 Man Day
  17. 17. Copyright © 2016, Schlumberger, All rights reserved. Prototype Setup Data Source Processor Input Data Cleansing Data Enrichment { } Repo Data Storage Put Data 2 Man Days • Append Well Name • Append Client Name • Append Run name • Append Pass Name Process Group: Get Update Process Group: Fix Time Zone Remove Absent indexes Data Cleansing Routing
  18. 18. Copyright © 2016, Schlumberger, All rights reserved. What About Testing!
  19. 19. Copyright © 2016, Schlumberger, All rights reserved. Testing Landscape Today 2.2 TB Test Data • 22 Applications • 14 Different formats of data • Data of questionable quality • Stored on a file share Effort • .5 man effort / sprint on maintenance • 2 weeks to perform a full test
  20. 20. Copyright © 2016, Schlumberger, All rights reserved. Step 1: Data Set Curation – Creating the Set of Reference LAS 1.2 2.0 3.0 WITS Level 0 Level 1 Level 2 CSV Clean Test Data Set 2.2 TB Test Data 6 Hours
  21. 21. Copyright © 2016, Schlumberger, All rights reserved. Docker Step 2: Immediate Test Harness Clean Test Data Set • Step 1: Need Data • Step 2: Docker pull xxx.xxx.xxx.xxx:xxxx/flowTest • Step 3: add put processor • Step 4: start dataflow From: 2 weeks to setup a test to:
  22. 22. Copyright © 2016, Schlumberger, All rights reserved. • Docker Step 3: Immediate Live Data Testing Production RT System Processor Input Testing Processor Group Anonymize Data • Significantly cuts down time to test application against real data • Especially in brownfield applications • Brings a level of confidence to the project that otherwise would be missing.
  23. 23. Copyright © 2016, Schlumberger, All rights reserved. Next Steps
  24. 24. Copyright © 2016, Schlumberger, All rights reserved. Use Cases to be Explored for MiniFi – Rig Data Ingestion with Provenance RT Server • Understanding the chain of custody from sensor to user • Tracking the provenance of the data as it traverses through the system
  25. 25. Copyright © 2016, Schlumberger, All rights reserved. Thank You! Questions?

×