SlideShare a Scribd company logo
1 of 34
Download to read offline
1
Copyright  2018 All rights reserved.
George Trujillo
Designing the Next Generation Data Lake
George Trujillo Jr.
www.linkedin.com/in/georgetrujillo @georgetrujillo
2
Copyright  2018 All rights reserved.
George Trujillo, Jr.
Director of Global Enablement
NE Tier One Data Specialist, COE
Master Principal Big Data Specialist
Vice President of Big Data
Managing Director of Big Data
Chief Executive Officer
3
Copyright  2018 All rights reserved.
George Trujillo, Jr.
 20+ years Oracle: RAC, Data Warehousing, Data Guard, Oracle Middle-Tier, …
 Recognized Oracle Double ACE
 Independent Oracle Users Group (IOUG) Board of Directors
 Served on Oracle Fusion Council & Oracle Beta Leadership Council
 Recognized as one of the “Oracles of Oracle” by IOUG
 Sun Microsystem's Ambassador for Appl. Middleware Platform
 Recognized VMware vExpert
 VMware Certified Instructor (VCI)
 MySQL Certified DBA
4
Copyright  2018 All rights reserved.
Agenda
 Vision and Direction
 Analytic Platforms Have to Change
 What is Causing Change
 How are Hadoop, Big Data and Data Lakes Changing
 Impacts of Cloud Technologies
 Self Driving Data Platforms
 Evolving Big Data Architectures
 Impact to You
5
Copyright  2018 All rights reserved.
Imagining the Speed of Trains
What can be more palpably absurd than the prospect held out of
locomotives traveling twice as fast as stagecoaches? The Quarterly Review,
March, 1825.
6
Copyright  2018 All rights reserved.
The Speed of Trains Today, Tomorrow?
270 mph? 4000 mph?
7
Copyright  2018 All rights reserved.
The Future of Movies
"Who in Hades wants to hear actors talk?" --H.M. Warner, Warner Brothers,
1927
8
Copyright  2018 All rights reserved.
What do we need, or want?
Would a silent movie “customer” panel in 1927 have come up with green
screens, computer animation and 3-D?
9
Copyright  2018 All rights reserved.
Are you pointed at the Right Target?
Can you innovate with linear thought? How can you improve your
organizations ability to deliver insight faster avoiding linear thought?
10
Copyright  2018 All rights reserved.
What do we need, or want?
How do you help keep your company from being at a competitive
disadvantage?
11
Copyright  2018 All rights reserved.
What Do All These Have in Common?
“Space Travel is Impossible”, Lee De Forest, inventor of the vacuum tube, 1957
Telephones and the Internet are just toys
 1890: “Telephones were considered for the fancy of the rich, it’s ridiculous to
consider the cost required to lay telephone wires across a city let alone the
country or the world.”
 1980s: “The Internet is ridiculous because: it’s ridiculous to consider the cost
required to lay cables across a city let alone the country or the world.”
"Remote shopping, while entirely feasible, will flop.” — Time Magazine, 1966
“The more important fundamental laws and facts of physical science have all been
discovered, and these are now so firmly established that the possibility of their ever
being supplanted in consequence of new discoveries is exceedingly remote.” –
Albert A. Michelson, physicist, 1894.
“We’ll never put our data in the cloud”, 2016
“An invention has to make sense in the world it
finishes in, not in the world it started.“
12
Copyright  2018 All rights reserved.
So Where Are Analytical Platforms Headed?
 Analytical platforms are not keeping up with business demands today
 Most data lakes have been built one use case at a time
Culture eats strategy for breakfast
Data Marshall YardData Refinery
Data Lake Enterprise Data Hub Data Reservoir
Data Warehouses
13
Copyright  2018 All rights reserved.
Are We Ready For the Future, Predictions by 2025
 80% production apps will be in the cloud
 Two SaaS Suite providers will have 80% market share
 Number of corporate-owned data centers will decrease by 80%.
 80% of IT budgets will be spent on cloud services.
 80% of IT budgets will be spent on business innovation, and only 20% on
system maintenance.
 All enterprise data will be stored in the cloud
 100% of application development and testing will be done in the cloud
 Enterprise clouds will be the most secure place for IT processing
14
Copyright  2018 All rights reserved.
How to Compete, When Everything is Getting Faster
15
Copyright  2018 All rights reserved.
Challenges Today
16
Copyright  2018 All rights reserved.
Starting New Projects
Compute (CPUs) Data Warehouse
Networking Proof of Concept
Storage Data Mart
17
Copyright  2018 All rights reserved.
Resources for Projects
18
Copyright  2018 All rights reserved.
Resources for Projects
19
Copyright  2018 All rights reserved.
How Do We Improve Our Analytical Platforms?
20
Copyright  2018 All rights reserved.
Cloud Technologies are Changing Data Lake Strategies
 Cloud technologies are adding significant new capabilities and flexibility to
data lakes
 A characteristic of a data lake is a storage repository
 Object storage has significant strategies over HDFS
 Replication to data centers
 Detach compute from storage
 Lower cost storage
 Dynamic scaling reduces the need for YARN
21
Copyright  2018 All rights reserved.
Data Architecture
DLM
(Batch,
Microbatch)
Web HDFS
Storm
(Streaming)
Kafka
(Messaging)
Source Data
CRM
Social
Connection
Ratings/Revi
ews
Jive
Article
Comments
Ask/Answer
Social Data
LinkedIn
Facebook
Twitter
ED
W
File
JMS
REST
Streamin
g
Data Ingestion
Transactional
(PI, WI, FI)
FBSI,
FPRS,
FILI
Tools
(Talend, Trifecta, …)
PIG HIVE
Raw Layer
Serving Layer
Access Layer
Data Lake - Ingest, Storage, Compute, Analytics Grid
HCatalog
(Schema metadata repository)
Scheduling
(Control-M ?,
Oozie,
Talend, etc.)
Speed LayerSqoop
Flume
22
Copyright  2018 All rights reserved.
Data Architecture
Raw Layer
(Oracle Object Store, S3, HDFS, …)
Serving Layer
(Oracle Object Store, S3, HDFS)
Access Layer
Data Lake - Ingest, Storage, Compute, Analytics Grid
Speed Layer
(Spark, NoSQL, Alluxio, LLAP, …)
23
Copyright  2018 All rights reserved.
Compute
(Yarn)
Storage
(HDFS)
Service
Discovery
(Zookeeper)
Libraries,
Notebooks
inside
Cluster
Tightly coupled storage and compute
HDFS as the Data Lake
Artifacts stored inside cluster
2
3
Big Data 1.0 – Monolithic Architecture
24
Copyright  2018 All rights reserved.
Compute (Yarn) Storage (Cloud
Storage)
Service
Discovery
(Zookeeper)
Libraries,
Notebooks etc
Outside Cluster
Independent Elastic Compute and Storage
Cloud Storage as the Data Lake
Artifacts stored outside cluster
Big Data 2.0 – A Micro Services Based Architecture
25
Copyright  2018 All rights reserved.
Directionally Correct
Yesterday Today Tomorrow
Sun OS
HP-UX
AIX
Windows
…
Hortonworks
Cloudera
MapR
Oracle Distribution
of Hadoop
…
Oracle Cloud
Amazon
Microsoft
…
26
Copyright  2018 All rights reserved.
“Status Quo is Latin for “the mess we’re in” – Ronald Reagan
 "It’s easier to let disillusionment with data inspire inertia than work to tame the data
beast”
27
Copyright  2018 All rights reserved.
Critical Factors for Success For Enterprise Data Platforms
Data Architecture
Data Governance
Data Security
28
Copyright  2018 All rights reserved.
More Management Tasks Than People to Do the Work
Less time on Administration
Less time on Infrastructure
Less time on Patching, Upgrades
Less time on Ensuring Availability
Less time on Tuning
Less time on Troubleshooting
More time on Innovation
More time on Design
More time on New Applications
More time on Analytics
More time on Securing data
More time on Delivering
29
Copyright  2018 All rights reserved.
Empowering Users
Streaming Engine Data Lake Enterprise Data & Reporting
Discovery Lab
Input
Events
Execution
Innovation
Discovery
Output
Data
Structured
Enterprise
Data
Notebooks/Analytic Services
Object Store Hadoop/HDFS
Actionable
Events
Actionable
Metrics
Actionable
Data Sets
30
Copyright  2018 All rights reserved.
The Power of SQL – Unified Query with Big Data SQL
Hive
DN
DN
DN
DN
ORACLE SQL Engine
Storage
Table Table
Big Data-enabled
Oracle Tables
Python GraphRnode.js JavaREST SQL
Data Local Processing
Big Data SQL Cells
Leverage Metadata
Oracle Big Data SQL
Oracle Data Visualization
31
Copyright  2018 All rights reserved.
The First Self-Driving Database – OOW October 2017
 The Autonomous Data Warehouse Cloud
 Easy
 Automated management
 Automated tuning: Simply load data and run
 Fast
 Based on Oracle’s unique data warehouse technology
 Elastic
 Instant scaling of compute or storage with no downtime
32
Copyright  2018 All rights reserved.
Determine your Target
Big Data Strategy
Hadoop
Data Lakes
Analytics Strategy
Requirements, Capabilities
Centralized Data Architecture
Don’t Focus on Technology Focus on Delivering Results
33
Copyright  2018 All rights reserved.
Summary
How Will:
 Impact of Cloud Technologies
 Object Storage
 Micro Services Architecture
 Self Driving Data Platforms
 Speed to Insight
Impact Future:
 Projects
 Career goals
 Skill Development
34
Copyright  2018 All rights reserved.
Questions
Thank you
Questions?

More Related Content

What's hot

Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveGeekNightHyderabad
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesDataWorks Summit
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digitalsambiswal
 
Data Governance for Data Lakes
Data Governance for Data LakesData Governance for Data Lakes
Data Governance for Data LakesKiran Kamreddy
 
The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.Richard Vermillion
 
Hadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteHadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteMark van Rijmenam
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architectureMilos Milovanovic
 
Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe? Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe? Zaloni
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeCaserta
 
Why Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data ArchitectureWhy Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data ArchitectureAgilisium Consulting
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...DataWorks Summit
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
 
Data Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsData Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsSnapLogic
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Zaloni
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
 

What's hot (20)

Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's Perspective
 
Datalake Architecture
Datalake ArchitectureDatalake Architecture
Datalake Architecture
 
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
 
Data Governance for Data Lakes
Data Governance for Data LakesData Governance for Data Lakes
Data Governance for Data Lakes
 
The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.The Warranty Data Lake – After, Inc.
The Warranty Data Lake – After, Inc.
 
Hadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteHadoop Big Data Lakes Keynote
Hadoop Big Data Lakes Keynote
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architecture
 
Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe? Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe?
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Why Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data ArchitectureWhy Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data Architecture
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
 
Data Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsData Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management Requirements
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
 
Data lake
Data lakeData lake
Data lake
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 

Similar to Designing the Next Generation Data Lake

Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life RevolutionCapgemini
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Managementrightsize
 
Capitalize on Big Data Through Hitachi Innovation
Capitalize on Big Data Through Hitachi InnovationCapitalize on Big Data Through Hitachi Innovation
Capitalize on Big Data Through Hitachi InnovationHitachi Vantara
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big DataJean-Marc Desvaux
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessionsJessicaMurrell3
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar ibi
 
What is AI without Data?
What is AI without Data?What is AI without Data?
What is AI without Data?InnoTech
 
Saama Presents Is your Big Data Solution Ready for Streaming
Saama Presents Is your Big Data Solution Ready for StreamingSaama Presents Is your Big Data Solution Ready for Streaming
Saama Presents Is your Big Data Solution Ready for StreamingSaama
 
20181127 オラクル講演資料(DataRobot AI Experience Tokyo)
20181127 オラクル講演資料(DataRobot AI Experience Tokyo)20181127 オラクル講演資料(DataRobot AI Experience Tokyo)
20181127 オラクル講演資料(DataRobot AI Experience Tokyo)オラクルエンジニア通信
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course pptNjain85
 
2018 Oracle Impact 발표자료: Oracle Enterprise AI
2018  Oracle Impact 발표자료: Oracle Enterprise AI2018  Oracle Impact 발표자료: Oracle Enterprise AI
2018 Oracle Impact 발표자료: Oracle Enterprise AITaewan Kim
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccionFran Navarro
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond Rajesh Kumar
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big DataNetApp
 
CSIT 534 Presentation Cherri_edmond
CSIT 534 Presentation Cherri_edmondCSIT 534 Presentation Cherri_edmond
CSIT 534 Presentation Cherri_edmondlowedmond
 
Refactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsRefactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsLuke Han
 
Introduction to Harnessing Big Data
Introduction to Harnessing Big DataIntroduction to Harnessing Big Data
Introduction to Harnessing Big DataPaul Barsch
 
Big Data
Big DataBig Data
Big DataNGDATA
 

Similar to Designing the Next Generation Data Lake (20)

Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life Revolution
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Capitalize on Big Data Through Hitachi Innovation
Capitalize on Big Data Through Hitachi InnovationCapitalize on Big Data Through Hitachi Innovation
Capitalize on Big Data Through Hitachi Innovation
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big Data
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessions
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar
 
What is AI without Data?
What is AI without Data?What is AI without Data?
What is AI without Data?
 
Saama Presents Is your Big Data Solution Ready for Streaming
Saama Presents Is your Big Data Solution Ready for StreamingSaama Presents Is your Big Data Solution Ready for Streaming
Saama Presents Is your Big Data Solution Ready for Streaming
 
20181127 オラクル講演資料(DataRobot AI Experience Tokyo)
20181127 オラクル講演資料(DataRobot AI Experience Tokyo)20181127 オラクル講演資料(DataRobot AI Experience Tokyo)
20181127 オラクル講演資料(DataRobot AI Experience Tokyo)
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course ppt
 
2018 Oracle Impact 발표자료: Oracle Enterprise AI
2018  Oracle Impact 발표자료: Oracle Enterprise AI2018  Oracle Impact 발표자료: Oracle Enterprise AI
2018 Oracle Impact 발표자료: Oracle Enterprise AI
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccion
 
Big data Analytics
Big data Analytics Big data Analytics
Big data Analytics
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
CSIT 534 Presentation Cherri_edmond
CSIT 534 Presentation Cherri_edmondCSIT 534 Presentation Cherri_edmond
CSIT 534 Presentation Cherri_edmond
 
Refactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsRefactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics Products
 
Introduction to Harnessing Big Data
Introduction to Harnessing Big DataIntroduction to Harnessing Big Data
Introduction to Harnessing Big Data
 
Big Data
Big DataBig Data
Big Data
 

Recently uploaded

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Designing the Next Generation Data Lake

  • 1. 1 Copyright  2018 All rights reserved. George Trujillo Designing the Next Generation Data Lake George Trujillo Jr. www.linkedin.com/in/georgetrujillo @georgetrujillo
  • 2. 2 Copyright  2018 All rights reserved. George Trujillo, Jr. Director of Global Enablement NE Tier One Data Specialist, COE Master Principal Big Data Specialist Vice President of Big Data Managing Director of Big Data Chief Executive Officer
  • 3. 3 Copyright  2018 All rights reserved. George Trujillo, Jr.  20+ years Oracle: RAC, Data Warehousing, Data Guard, Oracle Middle-Tier, …  Recognized Oracle Double ACE  Independent Oracle Users Group (IOUG) Board of Directors  Served on Oracle Fusion Council & Oracle Beta Leadership Council  Recognized as one of the “Oracles of Oracle” by IOUG  Sun Microsystem's Ambassador for Appl. Middleware Platform  Recognized VMware vExpert  VMware Certified Instructor (VCI)  MySQL Certified DBA
  • 4. 4 Copyright  2018 All rights reserved. Agenda  Vision and Direction  Analytic Platforms Have to Change  What is Causing Change  How are Hadoop, Big Data and Data Lakes Changing  Impacts of Cloud Technologies  Self Driving Data Platforms  Evolving Big Data Architectures  Impact to You
  • 5. 5 Copyright  2018 All rights reserved. Imagining the Speed of Trains What can be more palpably absurd than the prospect held out of locomotives traveling twice as fast as stagecoaches? The Quarterly Review, March, 1825.
  • 6. 6 Copyright  2018 All rights reserved. The Speed of Trains Today, Tomorrow? 270 mph? 4000 mph?
  • 7. 7 Copyright  2018 All rights reserved. The Future of Movies "Who in Hades wants to hear actors talk?" --H.M. Warner, Warner Brothers, 1927
  • 8. 8 Copyright  2018 All rights reserved. What do we need, or want? Would a silent movie “customer” panel in 1927 have come up with green screens, computer animation and 3-D?
  • 9. 9 Copyright  2018 All rights reserved. Are you pointed at the Right Target? Can you innovate with linear thought? How can you improve your organizations ability to deliver insight faster avoiding linear thought?
  • 10. 10 Copyright  2018 All rights reserved. What do we need, or want? How do you help keep your company from being at a competitive disadvantage?
  • 11. 11 Copyright  2018 All rights reserved. What Do All These Have in Common? “Space Travel is Impossible”, Lee De Forest, inventor of the vacuum tube, 1957 Telephones and the Internet are just toys  1890: “Telephones were considered for the fancy of the rich, it’s ridiculous to consider the cost required to lay telephone wires across a city let alone the country or the world.”  1980s: “The Internet is ridiculous because: it’s ridiculous to consider the cost required to lay cables across a city let alone the country or the world.” "Remote shopping, while entirely feasible, will flop.” — Time Magazine, 1966 “The more important fundamental laws and facts of physical science have all been discovered, and these are now so firmly established that the possibility of their ever being supplanted in consequence of new discoveries is exceedingly remote.” – Albert A. Michelson, physicist, 1894. “We’ll never put our data in the cloud”, 2016 “An invention has to make sense in the world it finishes in, not in the world it started.“
  • 12. 12 Copyright  2018 All rights reserved. So Where Are Analytical Platforms Headed?  Analytical platforms are not keeping up with business demands today  Most data lakes have been built one use case at a time Culture eats strategy for breakfast Data Marshall YardData Refinery Data Lake Enterprise Data Hub Data Reservoir Data Warehouses
  • 13. 13 Copyright  2018 All rights reserved. Are We Ready For the Future, Predictions by 2025  80% production apps will be in the cloud  Two SaaS Suite providers will have 80% market share  Number of corporate-owned data centers will decrease by 80%.  80% of IT budgets will be spent on cloud services.  80% of IT budgets will be spent on business innovation, and only 20% on system maintenance.  All enterprise data will be stored in the cloud  100% of application development and testing will be done in the cloud  Enterprise clouds will be the most secure place for IT processing
  • 14. 14 Copyright  2018 All rights reserved. How to Compete, When Everything is Getting Faster
  • 15. 15 Copyright  2018 All rights reserved. Challenges Today
  • 16. 16 Copyright  2018 All rights reserved. Starting New Projects Compute (CPUs) Data Warehouse Networking Proof of Concept Storage Data Mart
  • 17. 17 Copyright  2018 All rights reserved. Resources for Projects
  • 18. 18 Copyright  2018 All rights reserved. Resources for Projects
  • 19. 19 Copyright  2018 All rights reserved. How Do We Improve Our Analytical Platforms?
  • 20. 20 Copyright  2018 All rights reserved. Cloud Technologies are Changing Data Lake Strategies  Cloud technologies are adding significant new capabilities and flexibility to data lakes  A characteristic of a data lake is a storage repository  Object storage has significant strategies over HDFS  Replication to data centers  Detach compute from storage  Lower cost storage  Dynamic scaling reduces the need for YARN
  • 21. 21 Copyright  2018 All rights reserved. Data Architecture DLM (Batch, Microbatch) Web HDFS Storm (Streaming) Kafka (Messaging) Source Data CRM Social Connection Ratings/Revi ews Jive Article Comments Ask/Answer Social Data LinkedIn Facebook Twitter ED W File JMS REST Streamin g Data Ingestion Transactional (PI, WI, FI) FBSI, FPRS, FILI Tools (Talend, Trifecta, …) PIG HIVE Raw Layer Serving Layer Access Layer Data Lake - Ingest, Storage, Compute, Analytics Grid HCatalog (Schema metadata repository) Scheduling (Control-M ?, Oozie, Talend, etc.) Speed LayerSqoop Flume
  • 22. 22 Copyright  2018 All rights reserved. Data Architecture Raw Layer (Oracle Object Store, S3, HDFS, …) Serving Layer (Oracle Object Store, S3, HDFS) Access Layer Data Lake - Ingest, Storage, Compute, Analytics Grid Speed Layer (Spark, NoSQL, Alluxio, LLAP, …)
  • 23. 23 Copyright  2018 All rights reserved. Compute (Yarn) Storage (HDFS) Service Discovery (Zookeeper) Libraries, Notebooks inside Cluster Tightly coupled storage and compute HDFS as the Data Lake Artifacts stored inside cluster 2 3 Big Data 1.0 – Monolithic Architecture
  • 24. 24 Copyright  2018 All rights reserved. Compute (Yarn) Storage (Cloud Storage) Service Discovery (Zookeeper) Libraries, Notebooks etc Outside Cluster Independent Elastic Compute and Storage Cloud Storage as the Data Lake Artifacts stored outside cluster Big Data 2.0 – A Micro Services Based Architecture
  • 25. 25 Copyright  2018 All rights reserved. Directionally Correct Yesterday Today Tomorrow Sun OS HP-UX AIX Windows … Hortonworks Cloudera MapR Oracle Distribution of Hadoop … Oracle Cloud Amazon Microsoft …
  • 26. 26 Copyright  2018 All rights reserved. “Status Quo is Latin for “the mess we’re in” – Ronald Reagan  "It’s easier to let disillusionment with data inspire inertia than work to tame the data beast”
  • 27. 27 Copyright  2018 All rights reserved. Critical Factors for Success For Enterprise Data Platforms Data Architecture Data Governance Data Security
  • 28. 28 Copyright  2018 All rights reserved. More Management Tasks Than People to Do the Work Less time on Administration Less time on Infrastructure Less time on Patching, Upgrades Less time on Ensuring Availability Less time on Tuning Less time on Troubleshooting More time on Innovation More time on Design More time on New Applications More time on Analytics More time on Securing data More time on Delivering
  • 29. 29 Copyright  2018 All rights reserved. Empowering Users Streaming Engine Data Lake Enterprise Data & Reporting Discovery Lab Input Events Execution Innovation Discovery Output Data Structured Enterprise Data Notebooks/Analytic Services Object Store Hadoop/HDFS Actionable Events Actionable Metrics Actionable Data Sets
  • 30. 30 Copyright  2018 All rights reserved. The Power of SQL – Unified Query with Big Data SQL Hive DN DN DN DN ORACLE SQL Engine Storage Table Table Big Data-enabled Oracle Tables Python GraphRnode.js JavaREST SQL Data Local Processing Big Data SQL Cells Leverage Metadata Oracle Big Data SQL Oracle Data Visualization
  • 31. 31 Copyright  2018 All rights reserved. The First Self-Driving Database – OOW October 2017  The Autonomous Data Warehouse Cloud  Easy  Automated management  Automated tuning: Simply load data and run  Fast  Based on Oracle’s unique data warehouse technology  Elastic  Instant scaling of compute or storage with no downtime
  • 32. 32 Copyright  2018 All rights reserved. Determine your Target Big Data Strategy Hadoop Data Lakes Analytics Strategy Requirements, Capabilities Centralized Data Architecture Don’t Focus on Technology Focus on Delivering Results
  • 33. 33 Copyright  2018 All rights reserved. Summary How Will:  Impact of Cloud Technologies  Object Storage  Micro Services Architecture  Self Driving Data Platforms  Speed to Insight Impact Future:  Projects  Career goals  Skill Development
  • 34. 34 Copyright  2018 All rights reserved. Questions Thank you Questions?