SlideShare a Scribd company logo
1 of 21
ETL Process in Data
Warehouse
Babjee Reddy
babjee@gmail.com
BI-Gems Technology
OLTP
 Objective of OLTP is to process data as
quickly as possible
 Support client Server technoligy
 Support Large Amount of Data
 Data is secure
OLAP/DSS/DWH
 Objective of OLAP database to process
data as Quickly as possible with less
complexity
 It is use for decision making purpose
 Used by Management People
Difference between OLTP, OLAP
Transactional oriented Analytical Oriented
Normalized tables De-normailzed tables
Used by clerical people Use by management people
Current data Historical data
Insert , Updates, deletes Bulk load
Select query retrieves less no of records Retrieves large no of records
ETL
---------------------------------------------------------------------------------------------------------------------
• This Section Covers
• What is ETL is
• Motivation
• Where to Use ETL
• House To Implement ETL
• Key ETL Aspect
Motivation
---------------------------------------------------------------------------------------------------------------------
• Is ETL is Interesting area?
70 to 80% BI(DI or DW) projects is reliable ETL process
• Let’s have a look on the DW & DI market size
• In 2003, DI was USD 9.3 billion market
• In 2008, DI was USD 13 billion market
• By 2015, yearly grow estimated to USD 20 Billon
• The more systems in the world, the more work in Data Integration!
What is ETL?
• ETL = Extract – Transform – Load
• Extract
• Get the data from source system as efficiently as
possible
• Transform
• Perform calculations on data
• Load
• Load the data in the target storage
Why is ETL (System) Important?
 Adds value to data
Removes mistakes and corrects data
Documented measures of confidence in data
Captures the flow of transactional data
Adjusts data from multiple sources to be used together (conforming)
Structures data to be usable by BI tools
Enables subsequent business / analytical data procesing
ETL Disambiguation
•ETL = Extract – Transform – Load
›Not tight specifically to DW anymore
•Process/System
›A complete process including
•Data extraction
•Enforcing DQ and consistency standards
•Conforming data from disparate systems
•Delivering data to target
•People, HW, Documentation, Support, etc.
•Tool
›A piece of software implementing the
•three (four) E-(C)-T-L steps.
•A tool designed specifically to perform data transformations
ETLProcess
ETL Tool:true Datata Intigration
ETL Data Integration Solutions
Where is ETL used?
How to implement ETL system
How to implement ETL
•Scripting (shell, perl, python)
•PL/SQL, sqlldr
•Transformation hardcoded in Java, C#
•Develop (universal) ETL tool in-house
•Using off-the-shelf ETL tool
ETL tool Key Feture
Extract, Load => flexible on interfaces
›Flat files, DBMS, XML data, XLS,
›MQ, web services, LDAP
›Semi-structured data (emails, web logs, wiki pages)
›Unstructured data (blogs, documents)
›Extensibility with custom connectors
›Local data, remote data FTP(S), SFTP, SCP, http(s)
•Clean
›Lookups, Validations, Filters, Translations
•Transform
›Changing data structure, Joins, (De)Normalization, Aggregation, RollUp,
Sorting, Partitioning, Data De-duplication
›Ability to call external tools
•Performance
›Symmetric Multiprocessing (SMP)
•Pipeline processing
•Multithreaded processing
›Massively Parallel Processing (MPP)
•Clustering
•MapReduce
›Load balancing
•User friendliness
›GUI
›Metadata capture
›Training time
•Development
›Reusable components
›Impact Analysis / Data Lineage
•Manageability
›Team collaboration
›Transformation repository
›Metadata repository
›Development process (Dev -> Test -> Prod)
›Security
•Runtime
›Scheduler Automation
›Recovery and Restart
›Workflow
•Others
›Vendor stability
›Release cycle
›Support
Well Known ETL Tools
•Commercial
›Ab Initio
›IBM DataStage
›Informatica PowerCenter
›Microsoft Data Integration Services
›Oracle Data Integrator
›SAP Business Objects – Data Integrator
›SAS Data Integration Studio
•Open-source based
›Adeptia Integration Suite
›Apatar
›CloverETL
›Pentaho Data Integration (Kettle)
›Talend Open Studio/Integration Suite

More Related Content

Similar to ETL (1).ppt

ETL Tools Ankita Dubey
ETL Tools Ankita DubeyETL Tools Ankita Dubey
ETL Tools Ankita DubeyAnkita Dubey
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?RTTS
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lakepunedevscom
 
Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentRTTS
 
Test labs 2016. Тестирование data warehouse
Test labs 2016. Тестирование data warehouse Test labs 2016. Тестирование data warehouse
Test labs 2016. Тестирование data warehouse Sasha Soleev
 
Capacity management for ETL System
Capacity management for ETL SystemCapacity management for ETL System
Capacity management for ETL SystemASHOK BHATLA
 
Capacity Management of an ETL System
Capacity Management of an ETL SystemCapacity Management of an ETL System
Capacity Management of an ETL SystemASHOK BHATLA
 
Extract, Transform and Load.pptx
Extract, Transform and Load.pptxExtract, Transform and Load.pptx
Extract, Transform and Load.pptxJesusaEspeleta
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapersKai Zhao
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEZalpa Rathod
 
Building the DW - ETL
Building the DW - ETLBuilding the DW - ETL
Building the DW - ETLganblues
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?James Serra
 
OLAP & Data Warehouse
OLAP & Data WarehouseOLAP & Data Warehouse
OLAP & Data WarehouseZalpa Rathod
 
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Andreas Buckenhofer
 
Etl overview training
Etl overview trainingEtl overview training
Etl overview trainingMondy Holten
 

Similar to ETL (1).ppt (20)

ETL Tools Ankita Dubey
ETL Tools Ankita DubeyETL Tools Ankita Dubey
ETL Tools Ankita Dubey
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
Datastage ppt
Datastage pptDatastage ppt
Datastage ppt
 
Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing Assignment
 
Test labs 2016. Тестирование data warehouse
Test labs 2016. Тестирование data warehouse Test labs 2016. Тестирование data warehouse
Test labs 2016. Тестирование data warehouse
 
Pentaho etl-tool
Pentaho etl-toolPentaho etl-tool
Pentaho etl-tool
 
Capacity management for ETL System
Capacity management for ETL SystemCapacity management for ETL System
Capacity management for ETL System
 
Capacity Management of an ETL System
Capacity Management of an ETL SystemCapacity Management of an ETL System
Capacity Management of an ETL System
 
Extract, Transform and Load.pptx
Extract, Transform and Load.pptxExtract, Transform and Load.pptx
Extract, Transform and Load.pptx
 
ETL Process
ETL ProcessETL Process
ETL Process
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapers
 
What is ETL?
What is ETL?What is ETL?
What is ETL?
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
Building the DW - ETL
Building the DW - ETLBuilding the DW - ETL
Building the DW - ETL
 
ETL DW-RealTime
ETL DW-RealTimeETL DW-RealTime
ETL DW-RealTime
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
OLAP & Data Warehouse
OLAP & Data WarehouseOLAP & Data Warehouse
OLAP & Data Warehouse
 
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 3 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
 
Etl overview training
Etl overview trainingEtl overview training
Etl overview training
 

Recently uploaded

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 

Recently uploaded (20)

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 

ETL (1).ppt

  • 1. ETL Process in Data Warehouse Babjee Reddy babjee@gmail.com BI-Gems Technology
  • 2. OLTP  Objective of OLTP is to process data as quickly as possible  Support client Server technoligy  Support Large Amount of Data  Data is secure
  • 3. OLAP/DSS/DWH  Objective of OLAP database to process data as Quickly as possible with less complexity  It is use for decision making purpose  Used by Management People
  • 4. Difference between OLTP, OLAP Transactional oriented Analytical Oriented Normalized tables De-normailzed tables Used by clerical people Use by management people Current data Historical data Insert , Updates, deletes Bulk load Select query retrieves less no of records Retrieves large no of records
  • 5. ETL --------------------------------------------------------------------------------------------------------------------- • This Section Covers • What is ETL is • Motivation • Where to Use ETL • House To Implement ETL • Key ETL Aspect
  • 6. Motivation --------------------------------------------------------------------------------------------------------------------- • Is ETL is Interesting area? 70 to 80% BI(DI or DW) projects is reliable ETL process • Let’s have a look on the DW & DI market size • In 2003, DI was USD 9.3 billion market • In 2008, DI was USD 13 billion market • By 2015, yearly grow estimated to USD 20 Billon • The more systems in the world, the more work in Data Integration!
  • 7. What is ETL? • ETL = Extract – Transform – Load • Extract • Get the data from source system as efficiently as possible • Transform • Perform calculations on data • Load • Load the data in the target storage
  • 8. Why is ETL (System) Important?  Adds value to data Removes mistakes and corrects data Documented measures of confidence in data Captures the flow of transactional data Adjusts data from multiple sources to be used together (conforming) Structures data to be usable by BI tools Enables subsequent business / analytical data procesing
  • 9. ETL Disambiguation •ETL = Extract – Transform – Load ›Not tight specifically to DW anymore •Process/System ›A complete process including •Data extraction •Enforcing DQ and consistency standards •Conforming data from disparate systems •Delivering data to target •People, HW, Documentation, Support, etc. •Tool ›A piece of software implementing the •three (four) E-(C)-T-L steps. •A tool designed specifically to perform data transformations
  • 11. ETL Tool:true Datata Intigration
  • 12. ETL Data Integration Solutions
  • 13.
  • 14. Where is ETL used?
  • 15. How to implement ETL system
  • 16. How to implement ETL •Scripting (shell, perl, python) •PL/SQL, sqlldr •Transformation hardcoded in Java, C# •Develop (universal) ETL tool in-house •Using off-the-shelf ETL tool
  • 17. ETL tool Key Feture Extract, Load => flexible on interfaces ›Flat files, DBMS, XML data, XLS, ›MQ, web services, LDAP ›Semi-structured data (emails, web logs, wiki pages) ›Unstructured data (blogs, documents) ›Extensibility with custom connectors ›Local data, remote data FTP(S), SFTP, SCP, http(s) •Clean ›Lookups, Validations, Filters, Translations •Transform ›Changing data structure, Joins, (De)Normalization, Aggregation, RollUp, Sorting, Partitioning, Data De-duplication ›Ability to call external tools
  • 18. •Performance ›Symmetric Multiprocessing (SMP) •Pipeline processing •Multithreaded processing ›Massively Parallel Processing (MPP) •Clustering •MapReduce ›Load balancing •User friendliness ›GUI ›Metadata capture ›Training time •Development ›Reusable components ›Impact Analysis / Data Lineage
  • 19. •Manageability ›Team collaboration ›Transformation repository ›Metadata repository ›Development process (Dev -> Test -> Prod) ›Security •Runtime ›Scheduler Automation ›Recovery and Restart ›Workflow •Others ›Vendor stability ›Release cycle ›Support
  • 20.
  • 21. Well Known ETL Tools •Commercial ›Ab Initio ›IBM DataStage ›Informatica PowerCenter ›Microsoft Data Integration Services ›Oracle Data Integrator ›SAP Business Objects – Data Integrator ›SAS Data Integration Studio •Open-source based ›Adeptia Integration Suite ›Apatar ›CloverETL ›Pentaho Data Integration (Kettle) ›Talend Open Studio/Integration Suite