1. 1
Copyright 2018 All rights reserved.
George Trujillo
Designing the Next Generation Data Lake
George Trujillo Jr.
www.linkedin.com/in/georgetrujillo @georgetrujillo
2. 2
Copyright 2018 All rights reserved.
George Trujillo, Jr.
Director of Global Enablement
NE Tier One Data Specialist, COE
Master Principal Big Data Specialist
Vice President of Big Data
Managing Director of Big Data
Chief Executive Officer
3. 3
Copyright 2018 All rights reserved.
George Trujillo, Jr.
20+ years Oracle: RAC, Data Warehousing, Data Guard, Oracle Middle-Tier, …
Recognized Oracle Double ACE
Independent Oracle Users Group (IOUG) Board of Directors
Served on Oracle Fusion Council & Oracle Beta Leadership Council
Recognized as one of the “Oracles of Oracle” by IOUG
Sun Microsystem's Ambassador for Appl. Middleware Platform
Recognized VMware vExpert
VMware Certified Instructor (VCI)
MySQL Certified DBA
4. 4
Copyright 2018 All rights reserved.
Agenda
Vision and Direction
Analytic Platforms Have to Change
What is Causing Change
How are Hadoop, Big Data and Data Lakes Changing
Impacts of Cloud Technologies
Self Driving Data Platforms
Evolving Big Data Architectures
Impact to You
5. 5
Copyright 2018 All rights reserved.
Imagining the Speed of Trains
What can be more palpably absurd than the prospect held out of
locomotives traveling twice as fast as stagecoaches? The Quarterly Review,
March, 1825.
6. 6
Copyright 2018 All rights reserved.
The Speed of Trains Today, Tomorrow?
270 mph? 4000 mph?
7. 7
Copyright 2018 All rights reserved.
The Future of Movies
"Who in Hades wants to hear actors talk?" --H.M. Warner, Warner Brothers,
1927
8. 8
Copyright 2018 All rights reserved.
What do we need, or want?
Would a silent movie “customer” panel in 1927 have come up with green
screens, computer animation and 3-D?
9. 9
Copyright 2018 All rights reserved.
Are you pointed at the Right Target?
Can you innovate with linear thought? How can you improve your
organizations ability to deliver insight faster avoiding linear thought?
10. 10
Copyright 2018 All rights reserved.
What do we need, or want?
How do you help keep your company from being at a competitive
disadvantage?
11. 11
Copyright 2018 All rights reserved.
What Do All These Have in Common?
“Space Travel is Impossible”, Lee De Forest, inventor of the vacuum tube, 1957
Telephones and the Internet are just toys
1890: “Telephones were considered for the fancy of the rich, it’s ridiculous to
consider the cost required to lay telephone wires across a city let alone the
country or the world.”
1980s: “The Internet is ridiculous because: it’s ridiculous to consider the cost
required to lay cables across a city let alone the country or the world.”
"Remote shopping, while entirely feasible, will flop.” — Time Magazine, 1966
“The more important fundamental laws and facts of physical science have all been
discovered, and these are now so firmly established that the possibility of their ever
being supplanted in consequence of new discoveries is exceedingly remote.” –
Albert A. Michelson, physicist, 1894.
“We’ll never put our data in the cloud”, 2016
“An invention has to make sense in the world it
finishes in, not in the world it started.“
12. 12
Copyright 2018 All rights reserved.
So Where Are Analytical Platforms Headed?
Analytical platforms are not keeping up with business demands today
Most data lakes have been built one use case at a time
Culture eats strategy for breakfast
Data Marshall YardData Refinery
Data Lake Enterprise Data Hub Data Reservoir
Data Warehouses
13. 13
Copyright 2018 All rights reserved.
Are We Ready For the Future, Predictions by 2025
80% production apps will be in the cloud
Two SaaS Suite providers will have 80% market share
Number of corporate-owned data centers will decrease by 80%.
80% of IT budgets will be spent on cloud services.
80% of IT budgets will be spent on business innovation, and only 20% on
system maintenance.
All enterprise data will be stored in the cloud
100% of application development and testing will be done in the cloud
Enterprise clouds will be the most secure place for IT processing
14. 14
Copyright 2018 All rights reserved.
How to Compete, When Everything is Getting Faster
19. 19
Copyright 2018 All rights reserved.
How Do We Improve Our Analytical Platforms?
20. 20
Copyright 2018 All rights reserved.
Cloud Technologies are Changing Data Lake Strategies
Cloud technologies are adding significant new capabilities and flexibility to
data lakes
A characteristic of a data lake is a storage repository
Object storage has significant strategies over HDFS
Replication to data centers
Detach compute from storage
Lower cost storage
Dynamic scaling reduces the need for YARN
21. 21
Copyright 2018 All rights reserved.
Data Architecture
DLM
(Batch,
Microbatch)
Web HDFS
Storm
(Streaming)
Kafka
(Messaging)
Source Data
CRM
Social
Connection
Ratings/Revi
ews
Jive
Article
Comments
Ask/Answer
Social Data
LinkedIn
Facebook
Twitter
ED
W
File
JMS
REST
Streamin
g
Data Ingestion
Transactional
(PI, WI, FI)
FBSI,
FPRS,
FILI
Tools
(Talend, Trifecta, …)
PIG HIVE
Raw Layer
Serving Layer
Access Layer
Data Lake - Ingest, Storage, Compute, Analytics Grid
HCatalog
(Schema metadata repository)
Scheduling
(Control-M ?,
Oozie,
Talend, etc.)
Speed LayerSqoop
Flume
22. 22
Copyright 2018 All rights reserved.
Data Architecture
Raw Layer
(Oracle Object Store, S3, HDFS, …)
Serving Layer
(Oracle Object Store, S3, HDFS)
Access Layer
Data Lake - Ingest, Storage, Compute, Analytics Grid
Speed Layer
(Spark, NoSQL, Alluxio, LLAP, …)
23. 23
Copyright 2018 All rights reserved.
Compute
(Yarn)
Storage
(HDFS)
Service
Discovery
(Zookeeper)
Libraries,
Notebooks
inside
Cluster
Tightly coupled storage and compute
HDFS as the Data Lake
Artifacts stored inside cluster
2
3
Big Data 1.0 – Monolithic Architecture
24. 24
Copyright 2018 All rights reserved.
Compute (Yarn) Storage (Cloud
Storage)
Service
Discovery
(Zookeeper)
Libraries,
Notebooks etc
Outside Cluster
Independent Elastic Compute and Storage
Cloud Storage as the Data Lake
Artifacts stored outside cluster
Big Data 2.0 – A Micro Services Based Architecture
25. 25
Copyright 2018 All rights reserved.
Directionally Correct
Yesterday Today Tomorrow
Sun OS
HP-UX
AIX
Windows
…
Hortonworks
Cloudera
MapR
Oracle Distribution
of Hadoop
…
Oracle Cloud
Amazon
Microsoft
…
26. 26
Copyright 2018 All rights reserved.
“Status Quo is Latin for “the mess we’re in” – Ronald Reagan
"It’s easier to let disillusionment with data inspire inertia than work to tame the data
beast”
27. 27
Copyright 2018 All rights reserved.
Critical Factors for Success For Enterprise Data Platforms
Data Architecture
Data Governance
Data Security
28. 28
Copyright 2018 All rights reserved.
More Management Tasks Than People to Do the Work
Less time on Administration
Less time on Infrastructure
Less time on Patching, Upgrades
Less time on Ensuring Availability
Less time on Tuning
Less time on Troubleshooting
More time on Innovation
More time on Design
More time on New Applications
More time on Analytics
More time on Securing data
More time on Delivering
29. 29
Copyright 2018 All rights reserved.
Empowering Users
Streaming Engine Data Lake Enterprise Data & Reporting
Discovery Lab
Input
Events
Execution
Innovation
Discovery
Output
Data
Structured
Enterprise
Data
Notebooks/Analytic Services
Object Store Hadoop/HDFS
Actionable
Events
Actionable
Metrics
Actionable
Data Sets
30. 30
Copyright 2018 All rights reserved.
The Power of SQL – Unified Query with Big Data SQL
Hive
DN
DN
DN
DN
ORACLE SQL Engine
Storage
Table Table
Big Data-enabled
Oracle Tables
Python GraphRnode.js JavaREST SQL
Data Local Processing
Big Data SQL Cells
Leverage Metadata
Oracle Big Data SQL
Oracle Data Visualization
31. 31
Copyright 2018 All rights reserved.
The First Self-Driving Database – OOW October 2017
The Autonomous Data Warehouse Cloud
Easy
Automated management
Automated tuning: Simply load data and run
Fast
Based on Oracle’s unique data warehouse technology
Elastic
Instant scaling of compute or storage with no downtime
32. 32
Copyright 2018 All rights reserved.
Determine your Target
Big Data Strategy
Hadoop
Data Lakes
Analytics Strategy
Requirements, Capabilities
Centralized Data Architecture
Don’t Focus on Technology Focus on Delivering Results
33. 33
Copyright 2018 All rights reserved.
Summary
How Will:
Impact of Cloud Technologies
Object Storage
Micro Services Architecture
Self Driving Data Platforms
Speed to Insight
Impact Future:
Projects
Career goals
Skill Development