Glimpse of advantage, limitations of Hadoop and Goals / Business benefits of Data Warehouse and few use cases where Hadoop can be used to strengthen Enterprise Data Warehouse of any organization.
2. Hadoop
- A Set of Technologies
Data Warehouse
- A Concept or Process
And many more..
3. Comparing Hadoop with Enterprise Data Warehouse ??
Vs
Any attempt to implement Hadoop technology to
replace the organizations existing data warehouse may
lead to failure..
4. Hadoop set of technologies should be used to make EDW more powerful.
A meaningful and honest assessment need to be done
To decide where and how Hadoop can be integrated to achieve the optimized
architecture
5. Finally look at few high level use cases utilizing Hadoop capabilities in DWH
Let's get into some more detail..
Explore Data Warehouse Business Goals / Benefits
Glimpse of Core Advantages of Hadoop
Understand Limitations of Hadoop
6. Enterprise Data warehouse Business Goals / Benefits:
• Evaluate, monitor, manage and improve corporate performance.
• Customer relationship management and enhancement.
• Cleanse and improve the quality of organization's data.
• Decision support and Forecast future growth and needs
• Support, Monitor and modify a marketing campaign.
7. Scalable
Hadoop is highly scalable, it can
easily store and distribute very
large datasets on servers that
operate in parallel
Cost Effective
Hadoop is very cost-effective. It is
based on scale out architecture
which can affordably store big
volume of data for future use.
Data are managed through clusters based
on distributed file systems. The technique
used in mapping the data result in faster
data processing
Fast
Flexible
Failure Resistant
Hadoop enables enterprises
to access and process data in
a very easy way to generate
the values required, thereby
providing the enterprises
with the tools to get valuable
insights from various types of
data sources operating in
parallel.
One of the great advantages of Hadoop is its fault
tolerance, which is provided by replicating the data to
another node in the cluster. The data from the
replicated node can be used in the event of a failure.
Hadoop core Advantages
8. Hadoop Limitations
Vulnerable
Latency
Inaptness with
small data
Stability Issues
Security Concern
Hadoop is written in java which is
most used language, and been most
heavily exploited by cyber attackers
and as a result, implicated in
numerous security breaches.
Hadoop is not suited for small
data. HDFS lacks the ability to
efficiently support the random
reading of small files because of
its high capacity design.
Hadoop being an open
source platform has a
Fair possibilities of
stability issues.
HDFS is optimized to access batches of data set
quicker (high throughput), rather than
particular records in that data set (low latency)
Hadoop is missing encryption at storage and
network levels, which is a major concern.
Hadoop supports Kerberos authentication,
which is not easy to manage
9. Some scenarios where power of Hadoop is needed to strengthen the Data Warehouse
Storage and Processing of semi structured and un structured data
Reducing the cost of Data Storage in case of huge data volumes
Increase Data retention to avoid premature data death
Pre processing of big volume of data
10. CRM
ERP
Legacy
Source Systems
Third Party
External Data
Extract
Transform &
Load
Enterprise Data
Warehouse
ODS
Data Mart
Data Mart
Analytics
ETL Layer Data Repository Layer Analytics Layer
Conventional Data Warehouse Architecture
This is traditional Data Warehouse Architecture which is being used for many
organizations. There are some variance to this based on technical and organizational
needs.
11. Unstructured
Data Sources
Semi structured
Data Sources
Structured Data
Sources Enterprise Data
Warehouse
Advance
Analytical
Applications
Business
Intelligence
Layer
In this use case, Hadoop is being used for loading the unstructured and semi structured
data and making it available for EDW based on the organizations requirement and also
offering it for further analytical processing. The integration of new data sources into the
existing EDW will empower organizations more and deeper analytics and insights.
12. CRM
ERP
Legacy
Third Party
External Data
Extract
Transform &
Load
Enterprise Data
Warehouse
ODS
Data Mart
Data Mart
Analytics
Unstructured
Sources
XMLs, Doc
Files
Web Logs,
Emails
Images,
Videos
File Copy Analytic Tools
In this use case, Hadoop is being used as a main data repository and data from data
warehouse is being archived in Hadoop taking advantage of its low cost storage. Data
warehouse is being taken here as a source for Hadoop. Another point to note here is that
there is no change in existing setup of organization's EDW.
13. Unstructured
Sources
Structured
Sources
CRM
ERP
Legacy
XMLs, Doc Files
Web Logs, Emails
Images, Videos
Enterprise Data
Warehouse
ODS
Data Mart
Data Mart
Analytics LayerAnalytic tools
In this use case, Hadoop is shown as a layer before existing EDW. Sourcing all of the data,
Hadoop's capability of parallel processing is being utilized. It offloads majority of
transformations from EDW and feed pre processed data. EDW is used to more focus on
Aggregations and Analytical reporting.
14. Data Sources
XMLs, Doc Files
Web Logs, Emails
Images, Videos
CRM
ERP
Legacy
Data Lake
Extract
&
Load
Analytic Sandbox
Transformation
Enterprise Data
Warehouse
Business
Intelligence
Layer
In this scenario, Data lake is utilized and ELT over ETL is being used. A Data lake is a
storage repository that hold a vast amount of raw data in its native form and can be
transformed later as per the need. EDW is applying transformations and utilizing the data.
This kind of architecture is great for Organization's data science needs where Data
Scientists can use sandbox to apply their models on the raw data stored in Data Lake.
15. To Conclude..
Data Warehouse architects have more tools to play with and there is a need of detailed
analysis for the organization and business goals before choosing the right set of
technologies to build a data warehouse.
The core benefits of data warehouse are still in need and will always be. There is always
an opportunity to strengthen them by smart use of appropriate tools and technologies.
Hadoop can only fail if there is an attempt to use it just for replacement of existing data
warehouse without the proper feasibility analysis and intent to come up with optimized
architecture aligned with Organizational goals.