THE BIG DATA GUSHER: 
Big Data Analytics, the Internet of Things 
and the Oil Business 
SPONSORED BY
InnoTvaHtioAnN: NKe YedOedU A! gain 
Jim Crompton 
Noah Consulting 
SPONSORED BY
Innovation Has Always Been Part of the Oil and Gas Industry 
SPONSORED BY 
• Deepwater 
• High Pressure/ High Temperature 
• Enhanced Oil Recovery 
• Heavy Oil/ Steam Flooding 
• Complex wells & real-time drilling 
• Imaging beneath salt
SPONSORED BY 
Shale Gas Extraction
U.S. Oil and Gas Production 
SPONSORED BY 
MBOE/D 
25 
20 
15 
10 
05 
Unconventional Gas 
Conventional Gas 
Unconventional Oil 
Conventional Oil 
1980 1985 1990 1995 2000 2005 2010 2015 2020 2025 2030 2035
SPONSORED BY 
Price of Oil
The Digitization of the Oil Field 
Hortonworks & Noah Consulting, LLC copyright 2014 
SPONSORED BY
But We Can’t Quit Yet 
• Falling oil and gas prices threaten margins 
• Complex reservoirs require new completion and production techniques 
• Complex supply chains require factory-like coordination and management 
• Unconventional resources requires a supply chain solution through to the sales 
point 
SPONSORED BY
Become aT HDaAtaN DKri vYenO OUrg! anization 
with Hadoop 
Don Hilborn 
Hortonworks 
SPONSORED BY
Why is Data the Next Natural Resource of our Century? 
“In God we trust, all others 
must bring data.” 
Dr. W. Edwards Deming 
SPONSORED BY
Hadoop in Oil and Gas 
Production Optimization 
• Production parameter optimization is 
intelligent management of the 
parameters that maximize a well’s 
useful life, such as pressures, flow 
rates, and thermal characteristics of 
injected fluid mixtures. 
Real Time Operations 
• Join disparate sources of data together 
presenting real time and historical 
combinations of E&P data at each stage 
of the oil and gas production process. 
LAS Predictive Analytics 
• Leverage the “shovel-ready” nature of 
LAS files for predictive analytics across 
multiple datasets and the power of 
Hadoop for normalization, transformation 
and economical storage 
Seismic Analytics/Management 
• Storing seismic data from multiple 
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
experiences permits learning in the 
aggregate across all of those 
experiences. 
Enterprise Archive (Unstructured) 
• Process unstructured data into an 
enterprise archive and blend search with 
machine-learning algorithms to discover 
value and automatically categorize the 
data for eDiscovery and other applications 
Other 
• Preventative Maintenance 
• Condition Monitoring 
• Supply Chain and Manufacturing 
• Asset Optimization 
• Lease Bidding 
• QHSE
..Allows a Shift from Reactive to Proactive Interactions 
A shift in Production 
From speed …to Accelerated Intervention 
constraints 
A shift in Drilling 
From gut feel …to Real Time Automation 
A shift in GeoScience 
From manual process …to Dynamic Automation 
SPONSORED BY 
Hadoop allows 
organizations to shift 
interactions from… 
Reactive 
Post Transaction 
Proactive & 
A shift in Retail 
From static branding …to Real-time Personalization Prescriptive 
A shift in Refining 
From break then fix …to repair before break
Manage the Modern Volume, Velocity and 
Variety of Data Hadoop Value: 
SPONSORED BY 
Clickstream 
Capture and analyze 
website visitors’ data 
trails and optimize 
your website 
Sensors 
Discover patterns in 
data streaming 
automatically from 
remote sensors and 
machines 
Server Logs 
Research logs to 
diagnose process 
failures and prevent 
security breaches 
Sentiment 
Understand how 
your customers feel 
about your brand 
and products – 
right now 
Geographic 
Analyze location-based 
data to 
manage operations 
where they occur 
Unstructured 
Understand patterns 
in files across 
millions of web 
pages, emails, and 
documents
A Brief History of Apache Hadoop 
SPONSORED BY 
2013 
Focus on INNOVATION 
2005: Yahoo! creates team 
under E14 to work on 
Hadoop 
Yahoo! begins to 
Operate at scale 
Enterprise 
Hadoop 
Apache Project 
Established 
Hortonworks 
Data Platform 
2004 2006 2008 2010 2012
A Brief History of Apache Hadoop 
Yahoo! begins to 
Operate at scale 
SPONSORED BY 
2013 
Focus on INNOVATION 
2005: Yahoo! creates team 
under E14 to work on 
Hadoop 
Focus on OPERATIONS 
2008: Yahoo team extends focus to 
operations to support multiple projects & 
growing clusters 
Enterprise 
Hadoop 
Apache Project 
Established 
Hortonworks 
Data Platform 
2004 2006 2008 2010 2012
A Brief History of Apache Hadoop 
Yahoo! begins to 
Operate at scale 
SPONSORED BY 
2013 
Focus on INNOVATION 
2005: Yahoo! creates team 
under E14 to work on 
Hadoop 
Focus on OPERATIONS 
2008: Yahoo team extends focus to 
operations to support multiple projects & 
growing clusters 
Enterprise 
Hadoop 
Apache Project 
Established 
Hortonworks 
Data Platform 
2004 2006 2008 2010 2012 
2011: Hortonworks created to focus on STABILITY 
“Enterprise Hadoop“. Starts with 24 key 
Hadoop engineers from Yahoo
Hadoop Makes Handling Big Data Feasible 
Low Cost 
Open source software on 
commodity hardware 
Linear Scale 
Infrastructure demands 
proportional to data volume 
Any Data Type 
Structure not required to 
store data in Hadoop 
SPONSORED BY 
Limit risk and allow 
experimentation 
Grow smoothly with 
data needs 
Store everything and 
gain value over time
Hadoop: It’s About Scale & Structure or Lack Thereof 
Required on write Required on read 
Standards and structured Multiple Structures 
processing 
Limited, no data processing Processing coupled with data 
Structured data types Multi and unstructured 
SPONSORED BY 
Hadoop 
schema 
governance 
best fit use 
Complex ACID Transactions 
Operational Data Store 
Data Discovery 
Processing unstructured data 
Interactive Analytics 
Traditional 
RDBMS SCALE 
(storage & processing) 
Optimized, reliable transactions Optimized for analytics
Hortonworks Process for Enterprise Hadoop 
Fixed Issues 
Stable Project Releases 
Design & 
Develop 
SPONSORED BY 
Upstream Community Projects 
Downstream Enterprise Product 
Certified at scale using the most advanced 
Hadoop test bed on the planet 
• 1000’s of production nodes at Yahoo! 
• Over 1500 unit & system tests 
Integrate 
& Test 
HDP 2.1 
Distribute 
Package 
& Certify 
Release Apache 
Hadoop 
Test & 
Patch 
Design & Develop 
Virtuous cycle when development & fixed issues done 
upstream & stable project releases flow downstream 
Apache 
Hive 
Apache 
HBase 
Apache 
Pig 
Apache 
Falcon 
Apache 
Knox 
Apache 
Storm
HDP2 and YARN Enable the Modern Data Architecture 
Hortonworks architected and 
led development of YARN 
Common data set, multiple applications 
• Optionally land all data in a single cluster 
• Batch, interactive & real-time use cases 
• Support multi-tenant access, processing 
& segmentation of data 
YARN: Architectural center of Hadoop 
• Consistent security, governance & operations 
• Ecosystem applications certified 
by Hortonworks to run natively in Hadoop 
Batch Interactive Real-Time 
SPONSORED BY 
SOURCES 
EXISTING 
Systems 
Clickstream 
Web 
&Social 
HDFS 
(Hadoop Distributed File System) 
Geoloca9on 
Sensor 
& 
Machine 
Server 
Logs 
Unstructured 
DATA SYSTEM APPLICATIONS 
Business 
Analytics 
Custom 
Applications 
Packaged 
Applications 
RDBMS EDW MPP YARN: Data Operating System 
1 ° ° ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° ° N
SPONSORED BY 
HDP and Platfora 
SOURCES 
Unstructured Existing 
Systems 
Web & 
Social 
Weather 
Geolocation 
ANALYZE 
BUSINESS 
USER 
DATA 
SCIENTIST 
Sensor & 
Machine 
Logs
HADOOP : HORTONWORKS DATA PLATFORM (HDP) 
COMPLEX DATASETS - ENTERPRISE ANALYTICS 
SPONSORED BY 
HDP and Platfora 
SOURCES 
Unstructured Existing 
Systems 
Web & 
Social 
Weather 
Geolocation 
ANALYZE 
BUSINESS 
USER 
DATA 
SCIENTIST 
Tez 
Slider 
1 ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° 
° ° 
° ° 
° ° ° ° ° 
° ° ° ° ° 
Script 
Pig 
SQL 
Hive 
Java 
Scala 
Cascading 
Stream 
Storm 
Search 
Solr 
NoSQL 
HBase 
Accumulo 
In-Memory 
Spark 
Others 
ISV 
Engines 
YARN: Data Operating System 
(Cluster Resource Management) 
HDFS 
(Hadoop Distributed File System) 
Sensor & Tez 
Tez 
Slider 
Machine 
Logs
Big DataT HAnAaNlytKic sY fOorU O!i l & Gas 
Molly Stamos 
Platfora 
SPONSORED BY
Proven Leader in Big Data Analytics for Hadoop 
MOMENTUM OVER THE 
PAST 12 MONTHS 
PROVEN COMPANY TO WATCH WORLD CLASS CUSTOMERS 
SPONSORED BY 
April 2014 10 Hot Hadoop 
Startups to Watch 
Ones to Watch in Big Data 
CRN 10 Coolest Big Data 
Products of 2013 
BACKED BY LEADING INVESTORS 
• Launched 9 product versions with feature 
innovations 
• Grew customers by 4x and employees by 2x
SPONSORED BY 
Introducing Platfora 
MISSION 
LEAD THE INDUSTRY TRANSITION FROM BUSINESS 
INTELLIGENCE TO BIG DATA ANALYTICS. 
#1 Big Data Analytics 
platform native on Hadoop 
End-to-end platform built 
for Multi-Structured Data 
Self-service, iterative, 
interactive, and fast
Data Architect, Data Scientist, Power Analyst Data Scientist, Analysts, Viewers 
SPONSORED BY 
Platfora Data Workflow 
Hadoop 
Upload 
Files 
Data Sources 
Other 
Datasets Lenses Vizboards 
1. Link to the location of the source 
files. 
3. Analyst selects data fields and 
filtering conditions of interest. 
Data is materialized into a 
Lens 
4. Analyst visually explores the 
data from the lens in the 
vizboards. 
2. Create a metadata definition of 
how the raw data will be read and 
the relationships between the data.
SPONSORED BY 
Additional Information 
•Big Data Analytics Blog: www.Platfora.com/blog 
•Big Data Resources: www.Platfora.com/resources 
• Big Data Solutions: www.Platfora.com/solutions/
THANK YOU! 
SPONSORED BY
THE BIG DATA GUSHER: 
Big Data Analytics, the Internet 
of Things and the Oil Business 
SPONSORED BY 
SPONSORED BY

The Big Data Gusher: Big Data Analytics, the Internet of Things and the Oil Business

  • 1.
    THE BIG DATAGUSHER: Big Data Analytics, the Internet of Things and the Oil Business SPONSORED BY
  • 2.
    InnoTvaHtioAnN: NKe YedOedUA! gain Jim Crompton Noah Consulting SPONSORED BY
  • 3.
    Innovation Has AlwaysBeen Part of the Oil and Gas Industry SPONSORED BY • Deepwater • High Pressure/ High Temperature • Enhanced Oil Recovery • Heavy Oil/ Steam Flooding • Complex wells & real-time drilling • Imaging beneath salt
  • 4.
    SPONSORED BY ShaleGas Extraction
  • 5.
    U.S. Oil andGas Production SPONSORED BY MBOE/D 25 20 15 10 05 Unconventional Gas Conventional Gas Unconventional Oil Conventional Oil 1980 1985 1990 1995 2000 2005 2010 2015 2020 2025 2030 2035
  • 6.
  • 7.
    The Digitization ofthe Oil Field Hortonworks & Noah Consulting, LLC copyright 2014 SPONSORED BY
  • 8.
    But We Can’tQuit Yet • Falling oil and gas prices threaten margins • Complex reservoirs require new completion and production techniques • Complex supply chains require factory-like coordination and management • Unconventional resources requires a supply chain solution through to the sales point SPONSORED BY
  • 9.
    Become aT HDaAtaNDKri vYenO OUrg! anization with Hadoop Don Hilborn Hortonworks SPONSORED BY
  • 10.
    Why is Datathe Next Natural Resource of our Century? “In God we trust, all others must bring data.” Dr. W. Edwards Deming SPONSORED BY
  • 11.
    Hadoop in Oiland Gas Production Optimization • Production parameter optimization is intelligent management of the parameters that maximize a well’s useful life, such as pressures, flow rates, and thermal characteristics of injected fluid mixtures. Real Time Operations • Join disparate sources of data together presenting real time and historical combinations of E&P data at each stage of the oil and gas production process. LAS Predictive Analytics • Leverage the “shovel-ready” nature of LAS files for predictive analytics across multiple datasets and the power of Hadoop for normalization, transformation and economical storage Seismic Analytics/Management • Storing seismic data from multiple Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved experiences permits learning in the aggregate across all of those experiences. Enterprise Archive (Unstructured) • Process unstructured data into an enterprise archive and blend search with machine-learning algorithms to discover value and automatically categorize the data for eDiscovery and other applications Other • Preventative Maintenance • Condition Monitoring • Supply Chain and Manufacturing • Asset Optimization • Lease Bidding • QHSE
  • 12.
    ..Allows a Shiftfrom Reactive to Proactive Interactions A shift in Production From speed …to Accelerated Intervention constraints A shift in Drilling From gut feel …to Real Time Automation A shift in GeoScience From manual process …to Dynamic Automation SPONSORED BY Hadoop allows organizations to shift interactions from… Reactive Post Transaction Proactive & A shift in Retail From static branding …to Real-time Personalization Prescriptive A shift in Refining From break then fix …to repair before break
  • 13.
    Manage the ModernVolume, Velocity and Variety of Data Hadoop Value: SPONSORED BY Clickstream Capture and analyze website visitors’ data trails and optimize your website Sensors Discover patterns in data streaming automatically from remote sensors and machines Server Logs Research logs to diagnose process failures and prevent security breaches Sentiment Understand how your customers feel about your brand and products – right now Geographic Analyze location-based data to manage operations where they occur Unstructured Understand patterns in files across millions of web pages, emails, and documents
  • 14.
    A Brief Historyof Apache Hadoop SPONSORED BY 2013 Focus on INNOVATION 2005: Yahoo! creates team under E14 to work on Hadoop Yahoo! begins to Operate at scale Enterprise Hadoop Apache Project Established Hortonworks Data Platform 2004 2006 2008 2010 2012
  • 15.
    A Brief Historyof Apache Hadoop Yahoo! begins to Operate at scale SPONSORED BY 2013 Focus on INNOVATION 2005: Yahoo! creates team under E14 to work on Hadoop Focus on OPERATIONS 2008: Yahoo team extends focus to operations to support multiple projects & growing clusters Enterprise Hadoop Apache Project Established Hortonworks Data Platform 2004 2006 2008 2010 2012
  • 16.
    A Brief Historyof Apache Hadoop Yahoo! begins to Operate at scale SPONSORED BY 2013 Focus on INNOVATION 2005: Yahoo! creates team under E14 to work on Hadoop Focus on OPERATIONS 2008: Yahoo team extends focus to operations to support multiple projects & growing clusters Enterprise Hadoop Apache Project Established Hortonworks Data Platform 2004 2006 2008 2010 2012 2011: Hortonworks created to focus on STABILITY “Enterprise Hadoop“. Starts with 24 key Hadoop engineers from Yahoo
  • 17.
    Hadoop Makes HandlingBig Data Feasible Low Cost Open source software on commodity hardware Linear Scale Infrastructure demands proportional to data volume Any Data Type Structure not required to store data in Hadoop SPONSORED BY Limit risk and allow experimentation Grow smoothly with data needs Store everything and gain value over time
  • 18.
    Hadoop: It’s AboutScale & Structure or Lack Thereof Required on write Required on read Standards and structured Multiple Structures processing Limited, no data processing Processing coupled with data Structured data types Multi and unstructured SPONSORED BY Hadoop schema governance best fit use Complex ACID Transactions Operational Data Store Data Discovery Processing unstructured data Interactive Analytics Traditional RDBMS SCALE (storage & processing) Optimized, reliable transactions Optimized for analytics
  • 19.
    Hortonworks Process forEnterprise Hadoop Fixed Issues Stable Project Releases Design & Develop SPONSORED BY Upstream Community Projects Downstream Enterprise Product Certified at scale using the most advanced Hadoop test bed on the planet • 1000’s of production nodes at Yahoo! • Over 1500 unit & system tests Integrate & Test HDP 2.1 Distribute Package & Certify Release Apache Hadoop Test & Patch Design & Develop Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstream Apache Hive Apache HBase Apache Pig Apache Falcon Apache Knox Apache Storm
  • 20.
    HDP2 and YARNEnable the Modern Data Architecture Hortonworks architected and led development of YARN Common data set, multiple applications • Optionally land all data in a single cluster • Batch, interactive & real-time use cases • Support multi-tenant access, processing & segmentation of data YARN: Architectural center of Hadoop • Consistent security, governance & operations • Ecosystem applications certified by Hortonworks to run natively in Hadoop Batch Interactive Real-Time SPONSORED BY SOURCES EXISTING Systems Clickstream Web &Social HDFS (Hadoop Distributed File System) Geoloca9on Sensor & Machine Server Logs Unstructured DATA SYSTEM APPLICATIONS Business Analytics Custom Applications Packaged Applications RDBMS EDW MPP YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N
  • 21.
    SPONSORED BY HDPand Platfora SOURCES Unstructured Existing Systems Web & Social Weather Geolocation ANALYZE BUSINESS USER DATA SCIENTIST Sensor & Machine Logs
  • 22.
    HADOOP : HORTONWORKSDATA PLATFORM (HDP) COMPLEX DATASETS - ENTERPRISE ANALYTICS SPONSORED BY HDP and Platfora SOURCES Unstructured Existing Systems Web & Social Weather Geolocation ANALYZE BUSINESS USER DATA SCIENTIST Tez Slider 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Script Pig SQL Hive Java Scala Cascading Stream Storm Search Solr NoSQL HBase Accumulo In-Memory Spark Others ISV Engines YARN: Data Operating System (Cluster Resource Management) HDFS (Hadoop Distributed File System) Sensor & Tez Tez Slider Machine Logs
  • 23.
    Big DataT HAnAaNlytKicsY fOorU O!i l & Gas Molly Stamos Platfora SPONSORED BY
  • 24.
    Proven Leader inBig Data Analytics for Hadoop MOMENTUM OVER THE PAST 12 MONTHS PROVEN COMPANY TO WATCH WORLD CLASS CUSTOMERS SPONSORED BY April 2014 10 Hot Hadoop Startups to Watch Ones to Watch in Big Data CRN 10 Coolest Big Data Products of 2013 BACKED BY LEADING INVESTORS • Launched 9 product versions with feature innovations • Grew customers by 4x and employees by 2x
  • 25.
    SPONSORED BY IntroducingPlatfora MISSION LEAD THE INDUSTRY TRANSITION FROM BUSINESS INTELLIGENCE TO BIG DATA ANALYTICS. #1 Big Data Analytics platform native on Hadoop End-to-end platform built for Multi-Structured Data Self-service, iterative, interactive, and fast
  • 26.
    Data Architect, DataScientist, Power Analyst Data Scientist, Analysts, Viewers SPONSORED BY Platfora Data Workflow Hadoop Upload Files Data Sources Other Datasets Lenses Vizboards 1. Link to the location of the source files. 3. Analyst selects data fields and filtering conditions of interest. Data is materialized into a Lens 4. Analyst visually explores the data from the lens in the vizboards. 2. Create a metadata definition of how the raw data will be read and the relationships between the data.
  • 27.
    SPONSORED BY AdditionalInformation •Big Data Analytics Blog: www.Platfora.com/blog •Big Data Resources: www.Platfora.com/resources • Big Data Solutions: www.Platfora.com/solutions/
  • 28.
  • 29.
    THE BIG DATAGUSHER: Big Data Analytics, the Internet of Things and the Oil Business SPONSORED BY SPONSORED BY