SlideShare a Scribd company logo
1 of 14
ETL: STILL RELEVANT IN
THE AGE OF HADOOP
Yaniv Mor, CEO and co-founder Xplenty
THE NEW KID
Some people think that Hadoop means the end of ETL (extract,
transform, load). Not surprising. Often the birth of a new
technology spells death for the old one.
While ETL stems from data warehousing methodologies
created in the 1960s, Hadoop, born in 2005, has only caught
fire over the last few years.
OLD HABITS
The data architecture world may be tired of using the same
process for over 40 years, and Hadoop does revolutionize Big
Data. Nevertheless, it doesn’t look like ETL is going anywhere.
HIGH HOPES
Hadoop comes with great promise:
▪ Stores large amounts of data on fairly inexpensive
distributed systems
▪ Structured, semi-structured, and unstructured data, it can all
be stored on the same platform
▪ High scalability and practically infinite storage space
THE END OF ETL?!?
A company’s raw data can be dumped straight into Hadoop
from several sources and later analyzed without having to
change it even one bit.
H(ADOOP)OUDINI?
But ETL didn’t disappear! In fact, it turns out that Hadoop is
commonly used for three main cases:
▪ Cheap storage
▪ ETL
▪ Data Exploration
Cheap storage makes complete sense - Hadoop is a great
alternative to specialized servers and RAID technology since
you can always add machines with more space to the Hadoop
cluster. Still, using Hadoop for ETL? How come? Isn’t it
supposed to replace that old horse?
NOT FOR EVERYONE…
Technology is only a means to an end. Who is using it and what
are their needs? Data scientists, for instance, might not need
any ETL. They could really gain from access to huge amounts of
raw data via Hadoop and take their time to find insights.
…BUT GOOD FOR MOST
However, the majority of workers in an organization need
something else. They need to do analysis and reporting, and
quickly with their existing BI and reporting tools.
No matter the source from which it comes, this data has to be
clean, well formed, and use the common business terminology
across the organization.
This means the majority still need ETL.
PROGRESS
The good news: Hadoop helps the ETL process. There are good
reasons why ETL tools like Informatica, Talend, and Pentaho
integrate with Hadoop, and handling huge volumes of data is
one of them.
While traditional tools have a limit on the file sizes that they
can process, Hadoop can handle petabytes - just ask Facebook,
which stores 100 PB of data on Hadoop (and that was over a
year ago).
$PEED
Because Hadoop is a distributed system, it processes data in
parallel on the cluster’s machines, thus offloading heavy ETL
tasks to Hadoop makes them run faster. Scalability and price
are other well known advantages.
If ETL usually needs expensive machines that scale vertically,
Hadoop scales horizontally by adding off-the-shelf servers to
the cluster.
IT’S ALIVE!!!
The fact of the matter is that ETL is far from dead, and with the
help of good friend Hadoop, it is alive and kicking. We see it
too.
Our Data Integration-as-a-Service is used by our clients mainly
as an ETL tool. They use Xplenty to extract data from cloud
sources, process and transform the data on Hadoop, and load it
back into the cloud or their data warehouse.
NOT GOING ANYWHERE SOON
Hadoop does provide a new option to skip ETL and process raw
data in any format directly and in one location. Although this
helps some professionals, the rest of us still need to process
the data before we can use it, so we can’t get rid of ETL that
easily.
MULTI-FACETED
Luckily, Hadoop is proving to be an enterprise-class solution not
just for Big Data in general, but also to take a load off that good
old ETL and give its electric wheelchair a friendly boost.
XPLENTY
WWW.XPLENTY.COM

More Related Content

Viewers also liked

Open Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETLOpen Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETLJonathan Levin
 
Interactive workflow management using Azkaban
Interactive workflow management using AzkabanInteractive workflow management using Azkaban
Interactive workflow management using Azkabandatamantra
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streamingdatamantra
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with HadoopOReillyStrata
 

Viewers also liked (7)

Open Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETLOpen Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETL
 
What is ETL?
What is ETL?What is ETL?
What is ETL?
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
ETL Process
ETL ProcessETL Process
ETL Process
 
Interactive workflow management using Azkaban
Interactive workflow management using AzkabanInteractive workflow management using Azkaban
Interactive workflow management using Azkaban
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
 

Recently uploaded

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 

Recently uploaded (20)

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 

Etl: Still Relevant in the Age of Hadoop

  • 1. ETL: STILL RELEVANT IN THE AGE OF HADOOP Yaniv Mor, CEO and co-founder Xplenty
  • 2. THE NEW KID Some people think that Hadoop means the end of ETL (extract, transform, load). Not surprising. Often the birth of a new technology spells death for the old one. While ETL stems from data warehousing methodologies created in the 1960s, Hadoop, born in 2005, has only caught fire over the last few years.
  • 3. OLD HABITS The data architecture world may be tired of using the same process for over 40 years, and Hadoop does revolutionize Big Data. Nevertheless, it doesn’t look like ETL is going anywhere.
  • 4. HIGH HOPES Hadoop comes with great promise: ▪ Stores large amounts of data on fairly inexpensive distributed systems ▪ Structured, semi-structured, and unstructured data, it can all be stored on the same platform ▪ High scalability and practically infinite storage space
  • 5. THE END OF ETL?!? A company’s raw data can be dumped straight into Hadoop from several sources and later analyzed without having to change it even one bit.
  • 6. H(ADOOP)OUDINI? But ETL didn’t disappear! In fact, it turns out that Hadoop is commonly used for three main cases: ▪ Cheap storage ▪ ETL ▪ Data Exploration Cheap storage makes complete sense - Hadoop is a great alternative to specialized servers and RAID technology since you can always add machines with more space to the Hadoop cluster. Still, using Hadoop for ETL? How come? Isn’t it supposed to replace that old horse?
  • 7. NOT FOR EVERYONE… Technology is only a means to an end. Who is using it and what are their needs? Data scientists, for instance, might not need any ETL. They could really gain from access to huge amounts of raw data via Hadoop and take their time to find insights.
  • 8. …BUT GOOD FOR MOST However, the majority of workers in an organization need something else. They need to do analysis and reporting, and quickly with their existing BI and reporting tools. No matter the source from which it comes, this data has to be clean, well formed, and use the common business terminology across the organization. This means the majority still need ETL.
  • 9. PROGRESS The good news: Hadoop helps the ETL process. There are good reasons why ETL tools like Informatica, Talend, and Pentaho integrate with Hadoop, and handling huge volumes of data is one of them. While traditional tools have a limit on the file sizes that they can process, Hadoop can handle petabytes - just ask Facebook, which stores 100 PB of data on Hadoop (and that was over a year ago).
  • 10. $PEED Because Hadoop is a distributed system, it processes data in parallel on the cluster’s machines, thus offloading heavy ETL tasks to Hadoop makes them run faster. Scalability and price are other well known advantages. If ETL usually needs expensive machines that scale vertically, Hadoop scales horizontally by adding off-the-shelf servers to the cluster.
  • 11. IT’S ALIVE!!! The fact of the matter is that ETL is far from dead, and with the help of good friend Hadoop, it is alive and kicking. We see it too. Our Data Integration-as-a-Service is used by our clients mainly as an ETL tool. They use Xplenty to extract data from cloud sources, process and transform the data on Hadoop, and load it back into the cloud or their data warehouse.
  • 12. NOT GOING ANYWHERE SOON Hadoop does provide a new option to skip ETL and process raw data in any format directly and in one location. Although this helps some professionals, the rest of us still need to process the data before we can use it, so we can’t get rid of ETL that easily.
  • 13. MULTI-FACETED Luckily, Hadoop is proving to be an enterprise-class solution not just for Big Data in general, but also to take a load off that good old ETL and give its electric wheelchair a friendly boost.