Data Lake, beyond the Warehouse
1 Cheow Lan Lake, Thailand
โกเมษ​​จันทวิมล
February, 3, 2016
Komes Chandavimol
Data Science Thailand Meetup#4
Shifting to the 3rd gen platform with Data Lake
2
http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427
https://www.domo.com/learn/data-never-sleeps-3-0
The Growth of Data
3
http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427
https://www.domo.com/learn/data-never-sleeps-3-0
4
http://www.adweek.com/prnewser/how-many-times-do-the-worlds-social-media-users-click-every-minute/117427
https://www.domo.com/learn/data-never-sleeps-3-0
Can these tools support Big Data?
 Spreadsheet?
 Database?
 Data Mart?
 Data Warehouse?
5
Source: Forrester Research’s James Kobielus
The Emergence of Big Data Tools
6
http://blogs.forrester.com/category/hadoop
http://solutions.forrester.com/Global/FileLib/webinars/Big_Data_-_Gold_Rush_or_Illusion.pdf
HADOOP
7http://opensource.com/life/14/8/intro-apache-hadoop-big-data
Analytics 3.0
Data Mining Tools
8
Data Discovery and Visualization Tools
Tableu.com, RapidMiner.com
How to apply to current environment?
9
http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/
Traditional Data Warehouse
10
http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/
New Data Management Architecture
11
http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/
New Data Management Architecture
12
http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/
Data Lake
13
https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now
Data Lake
A single place to store every type of data in its native format
with no fixed limits on account size or file size, high throughput
to increase analytic performance and native integration with the
Hadoop ecosystem.
15
Reference: James Serra's Blog
Data Lake Development with Big Data , Pradeep Pasupuleti (2015)
https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now
Data Lake Processes
16
www.emc.com
Data Lake and Data Warehouse
17
Hadoop Distributed Compared,BlazeClan Technology,2015
Data Lake and Data Warehouse
18
Hadoop Distributed Compared,BlazeClan Technology,2015
Data Lakes
19
http://www.kdnuggets.com/2015/09/data-lake-vs-data-warehouse-key- differences.html
Data Lake
 Type of Data
 Raw Data
 Derived Data
 Aggregated Data
 Type of Environment
 Discovery Environment
 Production Environment
20
The Definition of Data Lake, John O’Brien(2015)
How the Data Lake works?
21
http://www.clearpeaks.com/blog/category/tableau
Traditional Enterprise Data warehouse
New Data Management Architecture
22
http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/
23
http://www.kdnuggets.com/2014/05/big-data-landscape-v30-
analyzed.html
Data Lake Maturity
25
The Definition of Data Lake, John O’Brien(2015)
4 Maturity Stages of Data Lake
 Stage 1 – Pilot Project (Understand the Technology)
 Stage 2 – Productionize Hadoop and its capabilities
 Stage 3 – Proactive consolidate data to (Big) Data Analytics
 Stage 4 – Platform the Data Lake to Core Competency
26
The Definition of Data Lake, John O’Brien(2015)
Putting the Data Lake to Work, Teradata, Hortonworks (2015)
Stage 1 – Pilot Project
 Handling data at scale
 Involves getting the plumbing in place and learning to acquire
and transform data at scale.
 The analytics may be quite simple, but much is learned about
making Hadoop work the way you desire.
27
The Definition of Data Lake, John O’Brien(2015)
Putting the Data Lake to Work, Teradata, Hortonworks (2015)
Stage 2– Productionize Hadoop
and its capabilities
 Involves improving the ability to transform and analyze data.
 Find the tools that are most appropriate to their skillset
 Acquiring more data and build applications.
28
The Definition of Data Lake, John O’Brien(2015)
Putting the Data Lake to Work, Teradata, Hortonworks (2015)
Stage 3 – Proactive consolidate data to
(Big) Data Analytics
 Involves getting data and analytics into the hands of as many
people as possible.
 It is in this stage that the data lake and the enterprise data
warehouse start to work in unison, each playing its role.
 Started with a data lake eventually added an enterprise data
warehouse to operationalize its data.
29
The Definition of Data Lake, John O’Brien(2015)
Putting the Data Lake to Work, Teradata, Hortonworks (2015)
Big Data Analytics
30
http://dataofthings.blogspot.com/2014/04/the-bbbt-sessions-hortonworks-big-data.html
Data Lake and Big Data Analytics
31http://hortonworks.com/blog/big-data-refinery-fuels-next-generation-data-architecture/
Stage 4 – Platform the Data Lake to
Core Competency
 Enhance Enterprise Capabilities are added to the data lake.
 Few companies have reached this level of maturity, but many
will as the use of big data grows,
 Require Data governance, compliance, security, and auditing
(and incorporate to Company Data Strategy)
32
The Technology of the Business Data Lake, Capgemini (2013)
Business Data Lake
33
The Technology of the Business Data Lake, Capgemini (2014)
34https://shefsite.files.wordpress.com/2014/04/where.jpg
35
36
http://image.slidesharecdn.com/mapr-db-in-hadoop-nosql-overview-150929062856-lva1-
app6892/95/maprdb-the-first-inhadoop-document-database-12-638.jpg?cb=1443536326
37http://www.predictiveanalyticstoday.com/waterline-data-
self-service-for-the-hadoop-data-lake/
The Data Lake Unifies Data Discovery,
Data Science, and BI 3.0
38
Big Data
Self Serve Business
Data Science
Machine Learning
Visual Analytics
Business Discovery
Deep Learning
Self Serve Business
Hadoop
Feature Engineering
Spark
Business Intelligence 3.0
YARN
Predictive Analytics Hive
Data Lake
Data Visualization
Graph Analytics
Big Data
 20+ posts relates to “Data Lake”
 Type “Data Science Thailand” “Data Lake”
40
41
42
http://www.clearpeaks.com/blog/category/tableau
Traditional Enterprise Data warehouse
Questions?
43
44

Data Lake,beyond the Data Warehouse