SlideShare a Scribd company logo
1 of 16
Download to read offline
Hadoop
- A Set of Technologies
Data Warehouse
- A Concept or Process
And many more..
Comparing Hadoop with Enterprise Data Warehouse ??
Vs
Any attempt to implement Hadoop technology to
replace the organizations existing data warehouse may
lead to failure..
 Hadoop set of technologies should be used to make EDW more powerful.
 A meaningful and honest assessment need to be done
 To decide where and how Hadoop can be integrated to achieve the optimized
architecture
 Finally look at few high level use cases utilizing Hadoop capabilities in DWH
Let's get into some more detail..
 Explore Data Warehouse Business Goals / Benefits
 Glimpse of Core Advantages of Hadoop
 Understand Limitations of Hadoop
Enterprise Data warehouse Business Goals / Benefits:
• Evaluate, monitor, manage and improve corporate performance.
• Customer relationship management and enhancement.
• Cleanse and improve the quality of organization's data.
• Decision support and Forecast future growth and needs
• Support, Monitor and modify a marketing campaign.
Scalable
Hadoop is highly scalable, it can
easily store and distribute very
large datasets on servers that
operate in parallel
Cost Effective
Hadoop is very cost-effective. It is
based on scale out architecture
which can affordably store big
volume of data for future use.
Data are managed through clusters based
on distributed file systems. The technique
used in mapping the data result in faster
data processing
Fast
Flexible
Failure Resistant
Hadoop enables enterprises
to access and process data in
a very easy way to generate
the values required, thereby
providing the enterprises
with the tools to get valuable
insights from various types of
data sources operating in
parallel.
One of the great advantages of Hadoop is its fault
tolerance, which is provided by replicating the data to
another node in the cluster. The data from the
replicated node can be used in the event of a failure.
Hadoop core Advantages
Hadoop Limitations
Vulnerable
Latency
Inaptness with
small data
Stability Issues
Security Concern
Hadoop is written in java which is
most used language, and been most
heavily exploited by cyber attackers
and as a result, implicated in
numerous security breaches.
Hadoop is not suited for small
data. HDFS lacks the ability to
efficiently support the random
reading of small files because of
its high capacity design.
Hadoop being an open
source platform has a
Fair possibilities of
stability issues.
HDFS is optimized to access batches of data set
quicker (high throughput), rather than
particular records in that data set (low latency)
Hadoop is missing encryption at storage and
network levels, which is a major concern.
Hadoop supports Kerberos authentication,
which is not easy to manage
Some scenarios where power of Hadoop is needed to strengthen the Data Warehouse
 Storage and Processing of semi structured and un structured data
 Reducing the cost of Data Storage in case of huge data volumes
 Increase Data retention to avoid premature data death
 Pre processing of big volume of data
CRM
ERP
Legacy
Source Systems
Third Party
External Data
Extract
Transform &
Load
Enterprise Data
Warehouse
ODS
Data Mart
Data Mart
Analytics
ETL Layer Data Repository Layer Analytics Layer
Conventional Data Warehouse Architecture
This is traditional Data Warehouse Architecture which is being used for many
organizations. There are some variance to this based on technical and organizational
needs.
Unstructured
Data Sources
Semi structured
Data Sources
Structured Data
Sources Enterprise Data
Warehouse
Advance
Analytical
Applications
Business
Intelligence
Layer
In this use case, Hadoop is being used for loading the unstructured and semi structured
data and making it available for EDW based on the organizations requirement and also
offering it for further analytical processing. The integration of new data sources into the
existing EDW will empower organizations more and deeper analytics and insights.
CRM
ERP
Legacy
Third Party
External Data
Extract
Transform &
Load
Enterprise Data
Warehouse
ODS
Data Mart
Data Mart
Analytics
Unstructured
Sources
XMLs, Doc
Files
Web Logs,
Emails
Images,
Videos
File Copy Analytic Tools
In this use case, Hadoop is being used as a main data repository and data from data
warehouse is being archived in Hadoop taking advantage of its low cost storage. Data
warehouse is being taken here as a source for Hadoop. Another point to note here is that
there is no change in existing setup of organization's EDW.
Unstructured
Sources
Structured
Sources
CRM
ERP
Legacy
XMLs, Doc Files
Web Logs, Emails
Images, Videos
Enterprise Data
Warehouse
ODS
Data Mart
Data Mart
Analytics LayerAnalytic tools
In this use case, Hadoop is shown as a layer before existing EDW. Sourcing all of the data,
Hadoop's capability of parallel processing is being utilized. It offloads majority of
transformations from EDW and feed pre processed data. EDW is used to more focus on
Aggregations and Analytical reporting.
Data Sources
XMLs, Doc Files
Web Logs, Emails
Images, Videos
CRM
ERP
Legacy
Data Lake
Extract
&
Load
Analytic Sandbox
Transformation
Enterprise Data
Warehouse
Business
Intelligence
Layer
In this scenario, Data lake is utilized and ELT over ETL is being used. A Data lake is a
storage repository that hold a vast amount of raw data in its native form and can be
transformed later as per the need. EDW is applying transformations and utilizing the data.
This kind of architecture is great for Organization's data science needs where Data
Scientists can use sandbox to apply their models on the raw data stored in Data Lake.
To Conclude..
Data Warehouse architects have more tools to play with and there is a need of detailed
analysis for the organization and business goals before choosing the right set of
technologies to build a data warehouse.
The core benefits of data warehouse are still in need and will always be. There is always
an opportunity to strengthen them by smart use of appropriate tools and technologies.
Hadoop can only fail if there is an attempt to use it just for replacement of existing data
warehouse without the proper feasibility analysis and intent to come up with optimized
architecture aligned with Organizational goals.
Hadoop & Data Warehouse

More Related Content

What's hot

Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataHaluan Irsad
 
Cloud Computing and Big Data
Cloud Computing and Big DataCloud Computing and Big Data
Cloud Computing and Big DataRobert Keahey
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive TutorialSandeep Patil
 
Ppt for Application of big data
Ppt for Application of big dataPpt for Application of big data
Ppt for Application of big dataPrashant Sharma
 
Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introductionIBM Analytics
 
Better decision making with proper business intelligence
Better decision making with proper business intelligenceBetter decision making with proper business intelligence
Better decision making with proper business intelligencemadhavlankapati
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databasesAshwani Kumar
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component rebeccatho
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfDr. Radhey Shyam
 
Applications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesApplications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesT.S. Lim
 
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Robert McDermott
 
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...Simplilearn
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Simplilearn
 

What's hot (20)

Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Cloud Computing and Big Data
Cloud Computing and Big DataCloud Computing and Big Data
Cloud Computing and Big Data
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
 
Ppt for Application of big data
Ppt for Application of big dataPpt for Application of big data
Ppt for Application of big data
 
Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introduction
 
Better decision making with proper business intelligence
Better decision making with proper business intelligenceBetter decision making with proper business intelligence
Better decision making with proper business intelligence
 
The age of GANs
The age of GANsThe age of GANs
The age of GANs
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
 
Applications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesApplications of Big Data Analytics in Businesses
Applications of Big Data Analytics in Businesses
 
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
Pig Tutorial | Apache Pig Tutorial | What Is Pig In Hadoop? | Apache Pig Arch...
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
 

Similar to Hadoop & Data Warehouse

WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsJane Roberts
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoptionfaizrashid1995
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsCognizant
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paperSupratim Ray
 
Rajesh Angadi Brochure
Rajesh Angadi Brochure Rajesh Angadi Brochure
Rajesh Angadi Brochure Rajesh Angadi
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Non geeks-big-data-playbook-106947
Non geeks-big-data-playbook-106947Non geeks-big-data-playbook-106947
Non geeks-big-data-playbook-106947CMR WORLD TECH
 
Non-geek's big data playbook - Hadoop & EDW - SAS Best Practices
Non-geek's big data playbook - Hadoop & EDW - SAS Best PracticesNon-geek's big data playbook - Hadoop & EDW - SAS Best Practices
Non-geek's big data playbook - Hadoop & EDW - SAS Best PracticesJyrki Määttä
 
Big Data Hadoop Training- Multisoft Systems
Big Data Hadoop Training- Multisoft SystemsBig Data Hadoop Training- Multisoft Systems
Big Data Hadoop Training- Multisoft SystemsMultisoft Systems
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseCloudera, Inc.
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeSysfore Technologies
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyInside Analysis
 
Understanding Big Data And Hadoop
Understanding Big Data And HadoopUnderstanding Big Data And Hadoop
Understanding Big Data And HadoopEdureka!
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?James Serra
 

Similar to Hadoop & Data Warehouse (20)

WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoption
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
 
Rajesh Angadi Brochure
Rajesh Angadi Brochure Rajesh Angadi Brochure
Rajesh Angadi Brochure
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Non geeks-big-data-playbook-106947
Non geeks-big-data-playbook-106947Non geeks-big-data-playbook-106947
Non geeks-big-data-playbook-106947
 
Non-geek's big data playbook - Hadoop & EDW - SAS Best Practices
Non-geek's big data playbook - Hadoop & EDW - SAS Best PracticesNon-geek's big data playbook - Hadoop & EDW - SAS Best Practices
Non-geek's big data playbook - Hadoop & EDW - SAS Best Practices
 
Big Data Hadoop Training- Multisoft Systems
Big Data Hadoop Training- Multisoft SystemsBig Data Hadoop Training- Multisoft Systems
Big Data Hadoop Training- Multisoft Systems
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Big data overview
Big data overviewBig data overview
Big data overview
 
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data Warehouse
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
 
Understanding Big Data And Hadoop
Understanding Big Data And HadoopUnderstanding Big Data And Hadoop
Understanding Big Data And Hadoop
 
paper
paperpaper
paper
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 

Recently uploaded

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 

Recently uploaded (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 

Hadoop & Data Warehouse

  • 1.
  • 2. Hadoop - A Set of Technologies Data Warehouse - A Concept or Process And many more..
  • 3. Comparing Hadoop with Enterprise Data Warehouse ?? Vs Any attempt to implement Hadoop technology to replace the organizations existing data warehouse may lead to failure..
  • 4.  Hadoop set of technologies should be used to make EDW more powerful.  A meaningful and honest assessment need to be done  To decide where and how Hadoop can be integrated to achieve the optimized architecture
  • 5.  Finally look at few high level use cases utilizing Hadoop capabilities in DWH Let's get into some more detail..  Explore Data Warehouse Business Goals / Benefits  Glimpse of Core Advantages of Hadoop  Understand Limitations of Hadoop
  • 6. Enterprise Data warehouse Business Goals / Benefits: • Evaluate, monitor, manage and improve corporate performance. • Customer relationship management and enhancement. • Cleanse and improve the quality of organization's data. • Decision support and Forecast future growth and needs • Support, Monitor and modify a marketing campaign.
  • 7. Scalable Hadoop is highly scalable, it can easily store and distribute very large datasets on servers that operate in parallel Cost Effective Hadoop is very cost-effective. It is based on scale out architecture which can affordably store big volume of data for future use. Data are managed through clusters based on distributed file systems. The technique used in mapping the data result in faster data processing Fast Flexible Failure Resistant Hadoop enables enterprises to access and process data in a very easy way to generate the values required, thereby providing the enterprises with the tools to get valuable insights from various types of data sources operating in parallel. One of the great advantages of Hadoop is its fault tolerance, which is provided by replicating the data to another node in the cluster. The data from the replicated node can be used in the event of a failure. Hadoop core Advantages
  • 8. Hadoop Limitations Vulnerable Latency Inaptness with small data Stability Issues Security Concern Hadoop is written in java which is most used language, and been most heavily exploited by cyber attackers and as a result, implicated in numerous security breaches. Hadoop is not suited for small data. HDFS lacks the ability to efficiently support the random reading of small files because of its high capacity design. Hadoop being an open source platform has a Fair possibilities of stability issues. HDFS is optimized to access batches of data set quicker (high throughput), rather than particular records in that data set (low latency) Hadoop is missing encryption at storage and network levels, which is a major concern. Hadoop supports Kerberos authentication, which is not easy to manage
  • 9. Some scenarios where power of Hadoop is needed to strengthen the Data Warehouse  Storage and Processing of semi structured and un structured data  Reducing the cost of Data Storage in case of huge data volumes  Increase Data retention to avoid premature data death  Pre processing of big volume of data
  • 10. CRM ERP Legacy Source Systems Third Party External Data Extract Transform & Load Enterprise Data Warehouse ODS Data Mart Data Mart Analytics ETL Layer Data Repository Layer Analytics Layer Conventional Data Warehouse Architecture This is traditional Data Warehouse Architecture which is being used for many organizations. There are some variance to this based on technical and organizational needs.
  • 11. Unstructured Data Sources Semi structured Data Sources Structured Data Sources Enterprise Data Warehouse Advance Analytical Applications Business Intelligence Layer In this use case, Hadoop is being used for loading the unstructured and semi structured data and making it available for EDW based on the organizations requirement and also offering it for further analytical processing. The integration of new data sources into the existing EDW will empower organizations more and deeper analytics and insights.
  • 12. CRM ERP Legacy Third Party External Data Extract Transform & Load Enterprise Data Warehouse ODS Data Mart Data Mart Analytics Unstructured Sources XMLs, Doc Files Web Logs, Emails Images, Videos File Copy Analytic Tools In this use case, Hadoop is being used as a main data repository and data from data warehouse is being archived in Hadoop taking advantage of its low cost storage. Data warehouse is being taken here as a source for Hadoop. Another point to note here is that there is no change in existing setup of organization's EDW.
  • 13. Unstructured Sources Structured Sources CRM ERP Legacy XMLs, Doc Files Web Logs, Emails Images, Videos Enterprise Data Warehouse ODS Data Mart Data Mart Analytics LayerAnalytic tools In this use case, Hadoop is shown as a layer before existing EDW. Sourcing all of the data, Hadoop's capability of parallel processing is being utilized. It offloads majority of transformations from EDW and feed pre processed data. EDW is used to more focus on Aggregations and Analytical reporting.
  • 14. Data Sources XMLs, Doc Files Web Logs, Emails Images, Videos CRM ERP Legacy Data Lake Extract & Load Analytic Sandbox Transformation Enterprise Data Warehouse Business Intelligence Layer In this scenario, Data lake is utilized and ELT over ETL is being used. A Data lake is a storage repository that hold a vast amount of raw data in its native form and can be transformed later as per the need. EDW is applying transformations and utilizing the data. This kind of architecture is great for Organization's data science needs where Data Scientists can use sandbox to apply their models on the raw data stored in Data Lake.
  • 15. To Conclude.. Data Warehouse architects have more tools to play with and there is a need of detailed analysis for the organization and business goals before choosing the right set of technologies to build a data warehouse. The core benefits of data warehouse are still in need and will always be. There is always an opportunity to strengthen them by smart use of appropriate tools and technologies. Hadoop can only fail if there is an attempt to use it just for replacement of existing data warehouse without the proper feasibility analysis and intent to come up with optimized architecture aligned with Organizational goals.