SlideShare a Scribd company logo
1 of 10
Data Flow-I
Extraction
By,
Dr. Dipti Patil
Building the logical data map
•Analyzing the source system has 2 steps:
1. Data discovery
2. Anomaly detection
Data discovery I
•Identify and examine the data sources
•Data modeling sessions should be organized to define the data
models and design the mapping details
•Not all sources may be covered in such sessions. So it is
important to ensure that all data points are collected including
the external and supporting data points that can be used for
references.
•Documentation of the source systems including the details like -
purpose, current users, frequency of updates etc. are important
•Need to track the data sources and keep in sync with the
updates to the same. Mechanisms to capture the changes in the
sources should be known well in advance and in depth
Data discovery II
•Source system tracking report should be
maintained. This should include
–Data mart into which the source feeds into
–Interface name from the transaction application
–Common term used in business
–Priority of the data
–Purpose of the data
–Technical owner of the data (who generates it)
–Business owner of the data (who uses it)
–DBMS system name
–Production system details where the data source
resides
Data discovery III
•Track the system-of-record: the exact source of
data origin. Helps to avoid data duplication and
incompleteness in data.
•Data which is derived (from one of more data
sources) should be tracked individually
•Analyze the source systems to discover the
content better. Tracked best using ER diagrams,
this may require to reverse engineer the
systems. Characteristics to consider here:
–Unique identifiers and keys
–Data types of all columns
Data content analysis and anomaly
detection
•Some common anomalies handling includes:
–NULL value
–Date fields
–Numeric fields
–Unique keys
This step will also include collection of business
rules for the ETL process. These are much more
technical than other business rules in the data
warehouse projects. ETL architect is expected to
translate the user requirements to usable ETL
definitions
Heterogeneous data sources I
•Challenges in integrating with different data
sources
•Alignment of data points and KPIs
•Conformed dimensions are cohesive design
that unifies disparate data systems scattered
across the enterprise
•Data source should be identified during data
profiling and also identify the fact and
dimension tables in the data warehouse
Heterogeneous data sources II
•Understanding the source system is essential to
be able to integrate multiple systems together
•Matching algorithms for joining data from
multiple sources
•If there is collision in the ETL process, survivor
rules must be defined to resolve the same. This
should be noted after system-of-record is
created
•Business rules must be identified
•Load the conformed dimensions taking into
Handling multiple data source
platforms - challenges
•Most commonly used connection is via ODBC
(open database connectivity)
•Mainframe sources provide a different arena of
integration issues due to their customized
hardware architecture
–Most of the legacy code on mainframes is in
COBOL
–EBCDIC character sets need to be converted to
ASCII as required
–Data transfer across nodes and platforms over
network
Tracking the data changes
•Detect the changes happening on the source
•Pull versus push approach for tracking the
changes
•Sniffing the intermediate logs

More Related Content

What's hot

Data warehousing
Data warehousingData warehousing
Data warehousingAnne Lee
 
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...rajappaiyer
 
Data Warehouse (ETL) testing process
Data Warehouse (ETL) testing processData Warehouse (ETL) testing process
Data Warehouse (ETL) testing processRakesh Hansalia
 
K technology (Knowledge Management)
K  technology  (Knowledge Management)K  technology  (Knowledge Management)
K technology (Knowledge Management)Syarifah Alfieyzah
 
Hsc project management 2015
Hsc project management 2015Hsc project management 2015
Hsc project management 2015greg robertson
 
ETL Testing Training Presentation
ETL Testing Training PresentationETL Testing Training Presentation
ETL Testing Training PresentationApurba Biswas
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingVibrant Event
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process Omid Vahdaty
 
What is ETL testing & how to enforce it in Data Wharehouse
What is ETL testing & how to enforce it in Data WharehouseWhat is ETL testing & how to enforce it in Data Wharehouse
What is ETL testing & how to enforce it in Data WharehouseBugRaptors
 
Management Information System
Management Information SystemManagement Information System
Management Information SystemMBA Rockers
 

What's hot (17)

Data Mining
Data MiningData Mining
Data Mining
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...
Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...
 
Data flow diagram with symbols
Data flow diagram with symbolsData flow diagram with symbols
Data flow diagram with symbols
 
ETL QA
ETL QAETL QA
ETL QA
 
Data Warehouse (ETL) testing process
Data Warehouse (ETL) testing processData Warehouse (ETL) testing process
Data Warehouse (ETL) testing process
 
K technology (Knowledge Management)
K  technology  (Knowledge Management)K  technology  (Knowledge Management)
K technology (Knowledge Management)
 
Hsc project management 2015
Hsc project management 2015Hsc project management 2015
Hsc project management 2015
 
ETL Testing Training Presentation
ETL Testing Training PresentationETL Testing Training Presentation
ETL Testing Training Presentation
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
 
ETL Testing Overview
ETL Testing OverviewETL Testing Overview
ETL Testing Overview
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process
 
Etl testing
Etl testingEtl testing
Etl testing
 
What is ETL testing & how to enforce it in Data Wharehouse
What is ETL testing & how to enforce it in Data WharehouseWhat is ETL testing & how to enforce it in Data Wharehouse
What is ETL testing & how to enforce it in Data Wharehouse
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
 
Management Information System
Management Information SystemManagement Information System
Management Information System
 
Indexing
IndexingIndexing
Indexing
 

Similar to Data flow ii extract

Data quality architecture
Data quality architectureData quality architecture
Data quality architectureanicewick
 
AIS PPt.pptx
AIS PPt.pptxAIS PPt.pptx
AIS PPt.pptxdereje33
 
ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingVibrant Event
 
Data Governance Overview - Doreen Christian
Data Governance Overview - Doreen ChristianData Governance Overview - Doreen Christian
Data Governance Overview - Doreen ChristianDoreen Christian
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptRafiulHasan19
 
System imolementation(Modern Systems Analysis and Design)
System imolementation(Modern Systems Analysis and Design)System imolementation(Modern Systems Analysis and Design)
System imolementation(Modern Systems Analysis and Design)yukidiagnosticimagin
 
ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxParnalSatle
 
Master data management and data warehousing
Master data management and data warehousingMaster data management and data warehousing
Master data management and data warehousingZahra Mansoori
 
Informatica_ Basics_Demo_9.6.ppt
Informatica_ Basics_Demo_9.6.pptInformatica_ Basics_Demo_9.6.ppt
Informatica_ Basics_Demo_9.6.pptCarlCj1
 
Requirements engineering iv
Requirements engineering ivRequirements engineering iv
Requirements engineering ivindrisrozas
 
chapter05-120827115357-phpapp01.pdf
chapter05-120827115357-phpapp01.pdfchapter05-120827115357-phpapp01.pdf
chapter05-120827115357-phpapp01.pdfAxmedMaxamuud6
 
Etl - Extract Transform Load
Etl - Extract Transform LoadEtl - Extract Transform Load
Etl - Extract Transform LoadABDUL KHALIQ
 
Chapter 5 Data and Process Modeling .pptx
Chapter 5 Data and Process Modeling .pptxChapter 5 Data and Process Modeling .pptx
Chapter 5 Data and Process Modeling .pptxAxmedMaxamuudYoonis
 
chapter11-220725121546-671fc36c.pdf
chapter11-220725121546-671fc36c.pdfchapter11-220725121546-671fc36c.pdf
chapter11-220725121546-671fc36c.pdfMahmoudSOLIMAN380726
 

Similar to Data flow ii extract (20)

System design
System designSystem design
System design
 
Data quality architecture
Data quality architectureData quality architecture
Data quality architecture
 
AIS PPt.pptx
AIS PPt.pptxAIS PPt.pptx
AIS PPt.pptx
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
 
ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL Testing
 
DW (1).ppt
DW (1).pptDW (1).ppt
DW (1).ppt
 
Data Governance Overview - Doreen Christian
Data Governance Overview - Doreen ChristianData Governance Overview - Doreen Christian
Data Governance Overview - Doreen Christian
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.ppt
 
Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
 
System imolementation(Modern Systems Analysis and Design)
System imolementation(Modern Systems Analysis and Design)System imolementation(Modern Systems Analysis and Design)
System imolementation(Modern Systems Analysis and Design)
 
ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptx
 
Master data management and data warehousing
Master data management and data warehousingMaster data management and data warehousing
Master data management and data warehousing
 
Informatica_ Basics_Demo_9.6.ppt
Informatica_ Basics_Demo_9.6.pptInformatica_ Basics_Demo_9.6.ppt
Informatica_ Basics_Demo_9.6.ppt
 
Requirements engineering iv
Requirements engineering ivRequirements engineering iv
Requirements engineering iv
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
chapter05-120827115357-phpapp01.pdf
chapter05-120827115357-phpapp01.pdfchapter05-120827115357-phpapp01.pdf
chapter05-120827115357-phpapp01.pdf
 
ETL-Datawarehousing.ppt.pptx
ETL-Datawarehousing.ppt.pptxETL-Datawarehousing.ppt.pptx
ETL-Datawarehousing.ppt.pptx
 
Etl - Extract Transform Load
Etl - Extract Transform LoadEtl - Extract Transform Load
Etl - Extract Transform Load
 
Chapter 5 Data and Process Modeling .pptx
Chapter 5 Data and Process Modeling .pptxChapter 5 Data and Process Modeling .pptx
Chapter 5 Data and Process Modeling .pptx
 
chapter11-220725121546-671fc36c.pdf
chapter11-220725121546-671fc36c.pdfchapter11-220725121546-671fc36c.pdf
chapter11-220725121546-671fc36c.pdf
 

Recently uploaded

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 

Recently uploaded (20)

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 

Data flow ii extract

  • 2. Building the logical data map •Analyzing the source system has 2 steps: 1. Data discovery 2. Anomaly detection
  • 3. Data discovery I •Identify and examine the data sources •Data modeling sessions should be organized to define the data models and design the mapping details •Not all sources may be covered in such sessions. So it is important to ensure that all data points are collected including the external and supporting data points that can be used for references. •Documentation of the source systems including the details like - purpose, current users, frequency of updates etc. are important •Need to track the data sources and keep in sync with the updates to the same. Mechanisms to capture the changes in the sources should be known well in advance and in depth
  • 4. Data discovery II •Source system tracking report should be maintained. This should include –Data mart into which the source feeds into –Interface name from the transaction application –Common term used in business –Priority of the data –Purpose of the data –Technical owner of the data (who generates it) –Business owner of the data (who uses it) –DBMS system name –Production system details where the data source resides
  • 5. Data discovery III •Track the system-of-record: the exact source of data origin. Helps to avoid data duplication and incompleteness in data. •Data which is derived (from one of more data sources) should be tracked individually •Analyze the source systems to discover the content better. Tracked best using ER diagrams, this may require to reverse engineer the systems. Characteristics to consider here: –Unique identifiers and keys –Data types of all columns
  • 6. Data content analysis and anomaly detection •Some common anomalies handling includes: –NULL value –Date fields –Numeric fields –Unique keys This step will also include collection of business rules for the ETL process. These are much more technical than other business rules in the data warehouse projects. ETL architect is expected to translate the user requirements to usable ETL definitions
  • 7. Heterogeneous data sources I •Challenges in integrating with different data sources •Alignment of data points and KPIs •Conformed dimensions are cohesive design that unifies disparate data systems scattered across the enterprise •Data source should be identified during data profiling and also identify the fact and dimension tables in the data warehouse
  • 8. Heterogeneous data sources II •Understanding the source system is essential to be able to integrate multiple systems together •Matching algorithms for joining data from multiple sources •If there is collision in the ETL process, survivor rules must be defined to resolve the same. This should be noted after system-of-record is created •Business rules must be identified •Load the conformed dimensions taking into
  • 9. Handling multiple data source platforms - challenges •Most commonly used connection is via ODBC (open database connectivity) •Mainframe sources provide a different arena of integration issues due to their customized hardware architecture –Most of the legacy code on mainframes is in COBOL –EBCDIC character sets need to be converted to ASCII as required –Data transfer across nodes and platforms over network
  • 10. Tracking the data changes •Detect the changes happening on the source •Pull versus push approach for tracking the changes •Sniffing the intermediate logs