The document discusses the concept of data lakes as an evolution of traditional data management systems, emphasizing their ability to store various types of data in its native format without fixed limits. It outlines the four maturity stages for implementing a data lake, from pilot projects to fully integrated systems that enhance enterprise capabilities. Additionally, it highlights the importance of big data analytics and the tools that facilitate data discovery and visualization in the context of data lakes.
Data Lake, beyondthe Warehouse
1 Cheow Lan Lake, Thailand
โกเมษจันทวิมล
February, 3, 2016
Komes Chandavimol
Data Science Thailand Meetup#4
Shifting to the 3rd gen platform with Data Lake
Can these toolssupport Big Data?
Spreadsheet?
Database?
Data Mart?
Data Warehouse?
5
Source: Forrester Research’s James Kobielus
6.
The Emergence ofBig Data Tools
6
http://blogs.forrester.com/category/hadoop
http://solutions.forrester.com/Global/FileLib/webinars/Big_Data_-_Gold_Rush_or_Illusion.pdf
Data Lake
A singleplace to store every type of data in its native format
with no fixed limits on account size or file size, high throughput
to increase analytic performance and native integration with the
Hadoop ecosystem.
15
Reference: James Serra's Blog
Data Lake Development with Big Data , Pradeep Pasupuleti (2015)
https://www.digitalnewsasia.com/business/forget-data-warehousing-its-data-lakes-now
Data Lake
Typeof Data
Raw Data
Derived Data
Aggregated Data
Type of Environment
Discovery Environment
Production Environment
20
The Definition of Data Lake, John O’Brien(2015)
21.
How the DataLake works?
21
http://www.clearpeaks.com/blog/category/tableau
Traditional Enterprise Data warehouse
22.
New Data ManagementArchitecture
22
http://hortonworks.com/blog/optimize-your-data-architecture-with-hadoop/
4 Maturity Stagesof Data Lake
Stage 1 – Pilot Project (Understand the Technology)
Stage 2 – Productionize Hadoop and its capabilities
Stage 3 – Proactive consolidate data to (Big) Data Analytics
Stage 4 – Platform the Data Lake to Core Competency
26
The Definition of Data Lake, John O’Brien(2015)
Putting the Data Lake to Work, Teradata, Hortonworks (2015)
27.
Stage 1 –Pilot Project
Handling data at scale
Involves getting the plumbing in place and learning to acquire
and transform data at scale.
The analytics may be quite simple, but much is learned about
making Hadoop work the way you desire.
27
The Definition of Data Lake, John O’Brien(2015)
Putting the Data Lake to Work, Teradata, Hortonworks (2015)
28.
Stage 2– ProductionizeHadoop
and its capabilities
Involves improving the ability to transform and analyze data.
Find the tools that are most appropriate to their skillset
Acquiring more data and build applications.
28
The Definition of Data Lake, John O’Brien(2015)
Putting the Data Lake to Work, Teradata, Hortonworks (2015)
29.
Stage 3 –Proactive consolidate data to
(Big) Data Analytics
Involves getting data and analytics into the hands of as many
people as possible.
It is in this stage that the data lake and the enterprise data
warehouse start to work in unison, each playing its role.
Started with a data lake eventually added an enterprise data
warehouse to operationalize its data.
29
The Definition of Data Lake, John O’Brien(2015)
Putting the Data Lake to Work, Teradata, Hortonworks (2015)
Data Lake andBig Data Analytics
31http://hortonworks.com/blog/big-data-refinery-fuels-next-generation-data-architecture/
32.
Stage 4 –Platform the Data Lake to
Core Competency
Enhance Enterprise Capabilities are added to the data lake.
Few companies have reached this level of maturity, but many
will as the use of big data grows,
Require Data governance, compliance, security, and auditing
(and incorporate to Company Data Strategy)
32
The Technology of the Business Data Lake, Capgemini (2013)
The Data LakeUnifies Data Discovery,
Data Science, and BI 3.0
38
Big Data
Self Serve Business
Data Science
Machine Learning
Visual Analytics
Business Discovery
Deep Learning
Self Serve Business
Hadoop
Feature Engineering
Spark
Business Intelligence 3.0
YARN
Predictive Analytics Hive
Data Lake
Data Visualization
Graph Analytics
Big Data
40.
20+ postsrelates to “Data Lake”
Type “Data Science Thailand” “Data Lake”
40