SlideShare a Scribd company logo
Introduction to ETL process
Omid Vahdaty
Assuming
● ETL = Extract transform load
● SQL knowledge
● DW concepts
Concepts
● Dimensions
● Facts
● Aggregate facts
● Data mart
BI vs ETL?
● ETL is from DB to DB
○ Tools: Talend
○ Informatica
○ SAP BODS
○ Oracle DATA integrator
○ Microsoft SSIS
● BI is
○ AD hoc queries
○ Dash boarding
○ Tools: SAP BO , IBM cognos, Jasper soft , Tablue , Oracle BI.
ETL
● Extract data from DB via jobs.
● Transform -
○ change the format of data before loading.
○ Cleaning the data
○ Remove bad data or fix it.
○ Data integrity
● Load - simply load the data.
ETL Tool layers
1. Staging - where extracted data is saved
2. Integration - process of data is loaded
3. Access - where the data will be queried,.
ETL tasks
● Understand the data to be used for reporting
● Review the Data Model
● Source to target mapping
● Data checks on source data
● Packages and schema validation
● Data verification in the target system
● Verification of data transformation calculations and aggregation rules
● Sample data comparison between the source and the target system
● Data integrity and quality checks in the target system
● Performance testing on data
ETL testing
Validation of data movement from the source to the target system.
Verification of data count in the source and the target system.
Verifying data extraction, transformation as per requirement and
expectation.
Verifying if table relations – joins and keys – are preserved during the
transformation.
Database testing
Verifying if primary and foreign keys are maintained.
Verifying if the columns in a table have valid data values.
Verifying data accuracy in columns. Example − Number of months column shouldn’t have
a value greater than 12.
Verifying missing data in columns. Check if there are null columns which actually should
have a valid value.
ETL testing categories
● Source 2 target
○ count testing
○ data validation testing (duplicates? Data integrity )
○ Data transformation
○ Constraint testing (null, unique, keys, ranges)
● Change /delta testing
● End Report test
ETL Challenges
● Data loss during ETL
● Incorrect, incomplete or duplicate data.
● DW system contains historical data, so the data volume is too large and
extremely complex to perform ETL testing in the target system.
● Performance
● Checking Critical columns
● Support Date time format and time zone conversation
● Supported text encoding
● Ignoring headers in CSV
● Incorrect column number due to separator usage in text field
Extract validation
● Count check
● Reconcile records with the source data
● Data type check
● Ensure no spam data loaded
● Remove duplicate data
● Check all the keys are in place
Transform validation
● Data threshold validation check, for example, age value shouldn’t be more than 100.
● Record count check, before and after the transformation logic applied.
● Data flow validation from the staging area to the intermediate tables.
● Surrogate key check.
Load verification
Record count check from the intermediate table to the target system.
Ensure the key field data is not missing or Null.
Check if the aggregate values and calculated measures are loaded in the fact tables.
Check modeling views based on the target tables.
Check if CDC has been applied on the incremental load table.
Data check in dimension table and history table check.
Check the BI reports based on the loaded fact and dimension table and as per the expected results.
Data duplication validation
● Example: Select Cust_Id, Cust_NAME, Quantity, COUNT (*)
FROM Customer GROUP BY Cust_Id, Cust_NAME, Quantity HAVING COUNT (*) >1;
● Reasons for duplicate data:
○ If no primary key is defined, then duplicate values may come.
○ Due to incorrect mapping or environmental issues.
○ Manual errors while transferring data from the source to the target system.
Data Integrity testing
● number check,
● date check,
● null check,
● precision check
● invalid characters,
● incorrect upper/lower case order,
Detailed use cases for testing:
https://www.tutorialspoint.com/etl_testing/etl_testing_scenarios.htm
Best practices
● Analyze data
● Fix bad data in the source
● Find a compatible ETL tool
● Monitor ETL job
● Apply Incremental ETL techniques when timestamp available.
Courses & books
● https://www.udemy.com/automatingetl/
● https://books.google.co.il/books?id=TCLfzU2ilVkC&pg=PA205&lpg=PA205&d
q=example+of+time+series+etl&source=bl&ots=86zwrsmHtF&sig=ssNKHMS
ph9L2_N_wBI5OVmB1rqg&hl=en&sa=X&redir_esc=y#v=onepage&q=time%2
0serias&f=false
Courses & books
● http://www.robertomarchetto.com/talend_data_integration_free_book
● Basic Time series ETL by Omid:
https://docs.google.com/document/d/1KoFMeFtxXDGiZIswcGS1o8zmp2ZlPG
bX0yj6ceQ_zlU/edit?usp=sharing
Exercise: Original table, create, drop, and add new data.
/*
drop table t;
create table t(
i int IDENTITY(1,1) NOT NULL,
d datetime
);
*/
-- assuming unique datevalues in t, not null
insert into t (d) values (getdate());
SELECT * from t where d>=DATEADD(minute, -1, GETDATE());
Exercise: Staging table
insert into t_staging (i,d) SELECT * from t where d>=DATEADD(minute, -1, GETDATE()) ;
insert into t_presentation (d,i) select distinct(d),i from t_staging order by d desc;
truncate table t_staging;
Exercise: presentation
select count (*) from t_presentation;
select count (*) from t;
select * from t_presentation order by d desc
Talent
Introduction to ETL process
Introduction to ETL process
Talend sources
Add mssql jdbc :
https://www.talendforge.org/forum/viewtopic.php?id=54068
How to connect 2 components:
https://www.talendforge.org/forum/viewtopic.php?id=6493
How to Create loop of a job (FYI, right click on project name, create project. Under Job design - far left, upper corner)
https://help.talend.com/display/TalendOpenStudioComponentsReferenceGuide62EN/tLoop
Running in parallel:
https://www.talendbyexample.com/talend-job-parallelization-reference.html
Running Several Queries for ETL such insert into, truncate
http://www.vikramtakkar.com/2013/05/example-to-execute-multiple-sql-queries.html

More Related Content

What's hot

Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
thomasmary607
 
ETL QA
ETL QAETL QA
ETL QA
dillip kar
 
Data warehouse
Data warehouseData warehouse
Data warehouse
Ramkrishna bhagat
 
ETL Testing Overview
ETL Testing OverviewETL Testing Overview
ETL Testing Overview
Chetan Gadodia
 
Etl techniques
Etl techniquesEtl techniques
Etl techniques
mahezabeenIlkal
 
EDA-Unit 1.pdf
EDA-Unit 1.pdfEDA-Unit 1.pdf
EDA-Unit 1.pdf
Nirmalavenkatachalam
 
ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
Gaurav Bhatnagar
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
PanaEk Warawit
 
Data warehouse
Data warehouseData warehouse
Data warehouse
Sonali Chawla
 
ETL
ETLETL
ETL Process
ETL ProcessETL Process
ETL Process
Karthik Selvaraj
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
vivekjv
 
Tableau PPT Intro, Features, Advantages, Disadvantages
Tableau PPT Intro, Features, Advantages, DisadvantagesTableau PPT Intro, Features, Advantages, Disadvantages
Tableau PPT Intro, Features, Advantages, Disadvantages
Burn & Born
 
Etl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large ApplicationsEtl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large Applications
Wayne Yaddow
 
Components of a Data-Warehouse
Components of a Data-WarehouseComponents of a Data-Warehouse
Components of a Data-Warehouse
Abdul Aslam
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
Simplilearn
 
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | EdurekaData Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Edureka!
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
Zalpa Rathod
 
OLTP vs OLAP
OLTP vs OLAPOLTP vs OLAP
OLTP vs OLAP
BI_Solutions
 
Tableau ppt
Tableau pptTableau ppt

What's hot (20)

Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
ETL QA
ETL QAETL QA
ETL QA
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
ETL Testing Overview
ETL Testing OverviewETL Testing Overview
ETL Testing Overview
 
Etl techniques
Etl techniquesEtl techniques
Etl techniques
 
EDA-Unit 1.pdf
EDA-Unit 1.pdfEDA-Unit 1.pdf
EDA-Unit 1.pdf
 
ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
ETL
ETLETL
ETL
 
ETL Process
ETL ProcessETL Process
ETL Process
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
Tableau PPT Intro, Features, Advantages, Disadvantages
Tableau PPT Intro, Features, Advantages, DisadvantagesTableau PPT Intro, Features, Advantages, Disadvantages
Tableau PPT Intro, Features, Advantages, Disadvantages
 
Etl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large ApplicationsEtl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large Applications
 
Components of a Data-Warehouse
Components of a Data-WarehouseComponents of a Data-Warehouse
Components of a Data-Warehouse
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
 
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | EdurekaData Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
OLTP vs OLAP
OLTP vs OLAPOLTP vs OLAP
OLTP vs OLAP
 
Tableau ppt
Tableau pptTableau ppt
Tableau ppt
 

Viewers also liked

Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)
LizLavaveshkul
 
Etl process in data warehouse
Etl process in data warehouseEtl process in data warehouse
Etl process in data warehouse
Komal Choudhary
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
jeshocarme
 
Le processus ETL (Extraction, Transformation, Chargement)
Le processus ETL (Extraction, Transformation, Chargement)Le processus ETL (Extraction, Transformation, Chargement)
Le processus ETL (Extraction, Transformation, Chargement)
Salah Eddine BENTALBA (+15K Connections)
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etl
Aashish Rathod
 
Data extraction, transformation, and loading
Data extraction, transformation, and loadingData extraction, transformation, and loading
Data extraction, transformation, and loading
Siddique Ibrahim
 
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Health Catalyst
 

Viewers also liked (7)

Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)
 
Etl process in data warehouse
Etl process in data warehouseEtl process in data warehouse
Etl process in data warehouse
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
 
Le processus ETL (Extraction, Transformation, Chargement)
Le processus ETL (Extraction, Transformation, Chargement)Le processus ETL (Extraction, Transformation, Chargement)
Le processus ETL (Extraction, Transformation, Chargement)
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etl
 
Data extraction, transformation, and loading
Data extraction, transformation, and loadingData extraction, transformation, and loading
Data extraction, transformation, and loading
 
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
 

Similar to Introduction to ETL process

Data Verification In QA Department Final
Data Verification In QA Department FinalData Verification In QA Department Final
Data Verification In QA Department Final
Wayne Yaddow
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
RTTS
 
EDI Training Module 5: Creating Clean Data foro Publishing
EDI Training Module 5:  Creating Clean Data foro PublishingEDI Training Module 5:  Creating Clean Data foro Publishing
EDI Training Module 5: Creating Clean Data foro Publishing
Environmental Data Initiative
 
Extract, Transform and Load.pptx
Extract, Transform and Load.pptxExtract, Transform and Load.pptx
Extract, Transform and Load.pptx
JesusaEspeleta
 
ETL_Methodology.pptx
ETL_Methodology.pptxETL_Methodology.pptx
ETL_Methodology.pptx
yogeshsuryawanshi47
 
Data preprocessing.pdf
Data preprocessing.pdfData preprocessing.pdf
Data preprocessing.pdf
sankirtishiravale
 
Data Warehouse Testing: It’s All about the Planning
Data Warehouse Testing: It’s All about the PlanningData Warehouse Testing: It’s All about the Planning
Data Warehouse Testing: It’s All about the Planning
TechWell
 
Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...
Omid Vahdaty
 
ETL Testing Services - Safeguard Your Data
ETL Testing Services - Safeguard Your DataETL Testing Services - Safeguard Your Data
ETL Testing Services - Safeguard Your Data
BugRaptors
 
DATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing PlanDATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing Plan
Madhu Nepal
 
BI Suite Overview
BI Suite OverviewBI Suite Overview
BI Suite Overview
Bruno Saraiva
 
Etl confessions pg conf us 2017
Etl confessions   pg conf us 2017Etl confessions   pg conf us 2017
Etl confessions pg conf us 2017
Corey Huinker
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application quality
Lars Albertsson
 
Database Testing
Database TestingDatabase Testing
Database Testing
Siva Kotilingam Pallikonda
 
Model selection and tuning at scale
Model selection and tuning at scaleModel selection and tuning at scale
Model selection and tuning at scale
Owen Zhang
 
Why PostgreSQL for Analytics Infrastructure (DW)?
Why PostgreSQL for Analytics Infrastructure (DW)?Why PostgreSQL for Analytics Infrastructure (DW)?
Why PostgreSQL for Analytics Infrastructure (DW)?
Huy Nguyen
 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
UiPathCommunity
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 
20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vs20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vs
Ian Feller
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
Lars Albertsson
 

Similar to Introduction to ETL process (20)

Data Verification In QA Department Final
Data Verification In QA Department FinalData Verification In QA Department Final
Data Verification In QA Department Final
 
Creating a Data validation and Testing Strategy
Creating a Data validation and Testing StrategyCreating a Data validation and Testing Strategy
Creating a Data validation and Testing Strategy
 
EDI Training Module 5: Creating Clean Data foro Publishing
EDI Training Module 5:  Creating Clean Data foro PublishingEDI Training Module 5:  Creating Clean Data foro Publishing
EDI Training Module 5: Creating Clean Data foro Publishing
 
Extract, Transform and Load.pptx
Extract, Transform and Load.pptxExtract, Transform and Load.pptx
Extract, Transform and Load.pptx
 
ETL_Methodology.pptx
ETL_Methodology.pptxETL_Methodology.pptx
ETL_Methodology.pptx
 
Data preprocessing.pdf
Data preprocessing.pdfData preprocessing.pdf
Data preprocessing.pdf
 
Data Warehouse Testing: It’s All about the Planning
Data Warehouse Testing: It’s All about the PlanningData Warehouse Testing: It’s All about the Planning
Data Warehouse Testing: It’s All about the Planning
 
Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...
 
ETL Testing Services - Safeguard Your Data
ETL Testing Services - Safeguard Your DataETL Testing Services - Safeguard Your Data
ETL Testing Services - Safeguard Your Data
 
DATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing PlanDATA WAREHOUSE -- ETL testing Plan
DATA WAREHOUSE -- ETL testing Plan
 
BI Suite Overview
BI Suite OverviewBI Suite Overview
BI Suite Overview
 
Etl confessions pg conf us 2017
Etl confessions   pg conf us 2017Etl confessions   pg conf us 2017
Etl confessions pg conf us 2017
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application quality
 
Database Testing
Database TestingDatabase Testing
Database Testing
 
Model selection and tuning at scale
Model selection and tuning at scaleModel selection and tuning at scale
Model selection and tuning at scale
 
Why PostgreSQL for Analytics Infrastructure (DW)?
Why PostgreSQL for Analytics Infrastructure (DW)?Why PostgreSQL for Analytics Infrastructure (DW)?
Why PostgreSQL for Analytics Infrastructure (DW)?
 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vs20150814 Wrangling Data From Raw to Tidy vs
20150814 Wrangling Data From Raw to Tidy vs
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
 

More from Omid Vahdaty

Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
Omid Vahdaty
 
Couchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedCouchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data Demystified
Omid Vahdaty
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
Omid Vahdaty
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
Omid Vahdaty
 
The technology of fake news between a new front and a new frontier | Big Dat...
The technology of fake news  between a new front and a new frontier | Big Dat...The technology of fake news  between a new front and a new frontier | Big Dat...
The technology of fake news between a new front and a new frontier | Big Dat...
Omid Vahdaty
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Omid Vahdaty
 
Making your analytics talk business | Big Data Demystified
Making your analytics talk business | Big Data DemystifiedMaking your analytics talk business | Big Data Demystified
Making your analytics talk business | Big Data Demystified
Omid Vahdaty
 
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
Omid Vahdaty
 
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
Omid Vahdaty
 
Aerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data DemystifiedAerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data Demystified
Omid Vahdaty
 
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
Omid Vahdaty
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
Omid Vahdaty
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
Omid Vahdaty
 
AWS Big Data Demystified #4 data governance demystified [security, networ...
AWS Big Data Demystified #4   data governance demystified   [security, networ...AWS Big Data Demystified #4   data governance demystified   [security, networ...
AWS Big Data Demystified #4 data governance demystified [security, networ...
Omid Vahdaty
 
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
Omid Vahdaty
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
Omid Vahdaty
 
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Omid Vahdaty
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
Omid Vahdaty
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystified
Omid Vahdaty
 
Emr zeppelin & Livy demystified
Emr zeppelin & Livy demystifiedEmr zeppelin & Livy demystified
Emr zeppelin & Livy demystified
Omid Vahdaty
 

More from Omid Vahdaty (20)

Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
 
Couchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedCouchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data Demystified
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
 
Machine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data DemystifiedMachine Learning Essentials Demystified part1 | Big Data Demystified
Machine Learning Essentials Demystified part1 | Big Data Demystified
 
The technology of fake news between a new front and a new frontier | Big Dat...
The technology of fake news  between a new front and a new frontier | Big Dat...The technology of fake news  between a new front and a new frontier | Big Dat...
The technology of fake news between a new front and a new frontier | Big Dat...
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
Making your analytics talk business | Big Data Demystified
Making your analytics talk business | Big Data DemystifiedMaking your analytics talk business | Big Data Demystified
Making your analytics talk business | Big Data Demystified
 
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
BI STRATEGY FROM A BIRD'S EYE VIEW (How to become a trusted advisor) | Omri H...
 
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
AI and Big Data in Health Sector Opportunities and challenges | Big Data Demy...
 
Aerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data DemystifiedAerospike meetup july 2019 | Big Data Demystified
Aerospike meetup july 2019 | Big Data Demystified
 
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
ALIGNING YOUR BI OPERATIONS WITH YOUR CUSTOMERS' UNSPOKEN NEEDS, by Eyal Stei...
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
AWS Big Data Demystified #4 data governance demystified [security, networ...
AWS Big Data Demystified #4   data governance demystified   [security, networ...AWS Big Data Demystified #4   data governance demystified   [security, networ...
AWS Big Data Demystified #4 data governance demystified [security, networ...
 
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
 
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystified
 
Emr zeppelin & Livy demystified
Emr zeppelin & Livy demystifiedEmr zeppelin & Livy demystified
Emr zeppelin & Livy demystified
 

Recently uploaded

李易峰祝绪丹做爱视频流出【网芷:ht28.co】可爱学生妹>>>[网趾:ht28.co】]<<<
李易峰祝绪丹做爱视频流出【网芷:ht28.co】可爱学生妹>>>[网趾:ht28.co】]<<<李易峰祝绪丹做爱视频流出【网芷:ht28.co】可爱学生妹>>>[网趾:ht28.co】]<<<
李易峰祝绪丹做爱视频流出【网芷:ht28.co】可爱学生妹>>>[网趾:ht28.co】]<<<
amzhoxvzidbke
 
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
Mani Krishna Sarkar
 
The world of Technology Management MEM 814.pptx
The world of Technology Management MEM 814.pptxThe world of Technology Management MEM 814.pptx
The world of Technology Management MEM 814.pptx
engrasjadshahzad
 
Time-State Analytics: MinneAnalytics 2024 Talk
Time-State Analytics: MinneAnalytics 2024 TalkTime-State Analytics: MinneAnalytics 2024 Talk
Time-State Analytics: MinneAnalytics 2024 Talk
Evan Chan
 
Monitoring and reporting of transparent forest data and information under the...
Monitoring and reporting of transparent forest data and information under the...Monitoring and reporting of transparent forest data and information under the...
Monitoring and reporting of transparent forest data and information under the...
Pilar Valbuena Perez
 
Ship Repair Occupational Health & Safety.ppt
Ship Repair Occupational Health & Safety.pptShip Repair Occupational Health & Safety.ppt
Ship Repair Occupational Health & Safety.ppt
MgZin3
 
Adv. Digital Signal Processing LAB MANUAL.pdf
Adv. Digital Signal Processing LAB MANUAL.pdfAdv. Digital Signal Processing LAB MANUAL.pdf
Adv. Digital Signal Processing LAB MANUAL.pdf
T.D. Shashikala
 
IE-469-Lecture-Notes-3IE-469-Lecture-Notes-3.pptx
IE-469-Lecture-Notes-3IE-469-Lecture-Notes-3.pptxIE-469-Lecture-Notes-3IE-469-Lecture-Notes-3.pptx
IE-469-Lecture-Notes-3IE-469-Lecture-Notes-3.pptx
BehairyAhmed2
 
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
IJAEMSJORNAL
 
Online airline reservation system project report.pdf
Online airline reservation system project report.pdfOnline airline reservation system project report.pdf
Online airline reservation system project report.pdf
Kamal Acharya
 
杨洋李一桐做爱视频流出【网芷:ht28.co】国产国产午夜精华>>>[网趾:ht28.co】]<<<
杨洋李一桐做爱视频流出【网芷:ht28.co】国产国产午夜精华>>>[网趾:ht28.co】]<<<杨洋李一桐做爱视频流出【网芷:ht28.co】国产国产午夜精华>>>[网趾:ht28.co】]<<<
杨洋李一桐做爱视频流出【网芷:ht28.co】国产国产午夜精华>>>[网趾:ht28.co】]<<<
amzhoxvzidbke
 
21EC63_Module1B.pptx VLSI design 21ec63 MOS TRANSISTOR THEORY
21EC63_Module1B.pptx VLSI design 21ec63 MOS TRANSISTOR THEORY21EC63_Module1B.pptx VLSI design 21ec63 MOS TRANSISTOR THEORY
21EC63_Module1B.pptx VLSI design 21ec63 MOS TRANSISTOR THEORY
PradeepKumarSK3
 
CONFINED SPACE ENTRY TRAINING FOR OIL INDUSTRY ppt
CONFINED SPACE ENTRY TRAINING FOR OIL INDUSTRY pptCONFINED SPACE ENTRY TRAINING FOR OIL INDUSTRY ppt
CONFINED SPACE ENTRY TRAINING FOR OIL INDUSTRY ppt
ASHOK KUMAR SINGH
 
Metrology Book, Bachelors in Mechanical Engineering
Metrology Book, Bachelors in Mechanical EngineeringMetrology Book, Bachelors in Mechanical Engineering
Metrology Book, Bachelors in Mechanical Engineering
leakingvideo
 
SCADAmetrics Instrumentation for Sensus Water Meters - Core and Main Training...
SCADAmetrics Instrumentation for Sensus Water Meters - Core and Main Training...SCADAmetrics Instrumentation for Sensus Water Meters - Core and Main Training...
SCADAmetrics Instrumentation for Sensus Water Meters - Core and Main Training...
Jim Mimlitz, P.E.
 
Unit 1 Information Storage and Retrieval
Unit 1 Information Storage and RetrievalUnit 1 Information Storage and Retrieval
Unit 1 Information Storage and Retrieval
KishorMahale5
 
RECENT DEVELOPMENTS IN RING SPINNING.pptx
RECENT DEVELOPMENTS IN RING SPINNING.pptxRECENT DEVELOPMENTS IN RING SPINNING.pptx
RECENT DEVELOPMENTS IN RING SPINNING.pptx
peacesoul123
 
Chlorine and Nitric Acid application, properties, impacts.pptx
Chlorine and Nitric Acid application, properties, impacts.pptxChlorine and Nitric Acid application, properties, impacts.pptx
Chlorine and Nitric Acid application, properties, impacts.pptx
yadavsuyash008
 
Data Visualization in Python of b.tech student.pptx
Data Visualization in Python of b.tech student.pptxData Visualization in Python of b.tech student.pptx
Data Visualization in Python of b.tech student.pptx
TelanganaPakkaFolk
 
Ludo system project report management .pdf
Ludo  system project report management .pdfLudo  system project report management .pdf
Ludo system project report management .pdf
Kamal Acharya
 

Recently uploaded (20)

李易峰祝绪丹做爱视频流出【网芷:ht28.co】可爱学生妹>>>[网趾:ht28.co】]<<<
李易峰祝绪丹做爱视频流出【网芷:ht28.co】可爱学生妹>>>[网趾:ht28.co】]<<<李易峰祝绪丹做爱视频流出【网芷:ht28.co】可爱学生妹>>>[网趾:ht28.co】]<<<
李易峰祝绪丹做爱视频流出【网芷:ht28.co】可爱学生妹>>>[网趾:ht28.co】]<<<
 
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
 
The world of Technology Management MEM 814.pptx
The world of Technology Management MEM 814.pptxThe world of Technology Management MEM 814.pptx
The world of Technology Management MEM 814.pptx
 
Time-State Analytics: MinneAnalytics 2024 Talk
Time-State Analytics: MinneAnalytics 2024 TalkTime-State Analytics: MinneAnalytics 2024 Talk
Time-State Analytics: MinneAnalytics 2024 Talk
 
Monitoring and reporting of transparent forest data and information under the...
Monitoring and reporting of transparent forest data and information under the...Monitoring and reporting of transparent forest data and information under the...
Monitoring and reporting of transparent forest data and information under the...
 
Ship Repair Occupational Health & Safety.ppt
Ship Repair Occupational Health & Safety.pptShip Repair Occupational Health & Safety.ppt
Ship Repair Occupational Health & Safety.ppt
 
Adv. Digital Signal Processing LAB MANUAL.pdf
Adv. Digital Signal Processing LAB MANUAL.pdfAdv. Digital Signal Processing LAB MANUAL.pdf
Adv. Digital Signal Processing LAB MANUAL.pdf
 
IE-469-Lecture-Notes-3IE-469-Lecture-Notes-3.pptx
IE-469-Lecture-Notes-3IE-469-Lecture-Notes-3.pptxIE-469-Lecture-Notes-3IE-469-Lecture-Notes-3.pptx
IE-469-Lecture-Notes-3IE-469-Lecture-Notes-3.pptx
 
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
 
Online airline reservation system project report.pdf
Online airline reservation system project report.pdfOnline airline reservation system project report.pdf
Online airline reservation system project report.pdf
 
杨洋李一桐做爱视频流出【网芷:ht28.co】国产国产午夜精华>>>[网趾:ht28.co】]<<<
杨洋李一桐做爱视频流出【网芷:ht28.co】国产国产午夜精华>>>[网趾:ht28.co】]<<<杨洋李一桐做爱视频流出【网芷:ht28.co】国产国产午夜精华>>>[网趾:ht28.co】]<<<
杨洋李一桐做爱视频流出【网芷:ht28.co】国产国产午夜精华>>>[网趾:ht28.co】]<<<
 
21EC63_Module1B.pptx VLSI design 21ec63 MOS TRANSISTOR THEORY
21EC63_Module1B.pptx VLSI design 21ec63 MOS TRANSISTOR THEORY21EC63_Module1B.pptx VLSI design 21ec63 MOS TRANSISTOR THEORY
21EC63_Module1B.pptx VLSI design 21ec63 MOS TRANSISTOR THEORY
 
CONFINED SPACE ENTRY TRAINING FOR OIL INDUSTRY ppt
CONFINED SPACE ENTRY TRAINING FOR OIL INDUSTRY pptCONFINED SPACE ENTRY TRAINING FOR OIL INDUSTRY ppt
CONFINED SPACE ENTRY TRAINING FOR OIL INDUSTRY ppt
 
Metrology Book, Bachelors in Mechanical Engineering
Metrology Book, Bachelors in Mechanical EngineeringMetrology Book, Bachelors in Mechanical Engineering
Metrology Book, Bachelors in Mechanical Engineering
 
SCADAmetrics Instrumentation for Sensus Water Meters - Core and Main Training...
SCADAmetrics Instrumentation for Sensus Water Meters - Core and Main Training...SCADAmetrics Instrumentation for Sensus Water Meters - Core and Main Training...
SCADAmetrics Instrumentation for Sensus Water Meters - Core and Main Training...
 
Unit 1 Information Storage and Retrieval
Unit 1 Information Storage and RetrievalUnit 1 Information Storage and Retrieval
Unit 1 Information Storage and Retrieval
 
RECENT DEVELOPMENTS IN RING SPINNING.pptx
RECENT DEVELOPMENTS IN RING SPINNING.pptxRECENT DEVELOPMENTS IN RING SPINNING.pptx
RECENT DEVELOPMENTS IN RING SPINNING.pptx
 
Chlorine and Nitric Acid application, properties, impacts.pptx
Chlorine and Nitric Acid application, properties, impacts.pptxChlorine and Nitric Acid application, properties, impacts.pptx
Chlorine and Nitric Acid application, properties, impacts.pptx
 
Data Visualization in Python of b.tech student.pptx
Data Visualization in Python of b.tech student.pptxData Visualization in Python of b.tech student.pptx
Data Visualization in Python of b.tech student.pptx
 
Ludo system project report management .pdf
Ludo  system project report management .pdfLudo  system project report management .pdf
Ludo system project report management .pdf
 

Introduction to ETL process

  • 1. Introduction to ETL process Omid Vahdaty
  • 2. Assuming ● ETL = Extract transform load ● SQL knowledge ● DW concepts
  • 3. Concepts ● Dimensions ● Facts ● Aggregate facts ● Data mart
  • 4. BI vs ETL? ● ETL is from DB to DB ○ Tools: Talend ○ Informatica ○ SAP BODS ○ Oracle DATA integrator ○ Microsoft SSIS ● BI is ○ AD hoc queries ○ Dash boarding ○ Tools: SAP BO , IBM cognos, Jasper soft , Tablue , Oracle BI.
  • 5. ETL ● Extract data from DB via jobs. ● Transform - ○ change the format of data before loading. ○ Cleaning the data ○ Remove bad data or fix it. ○ Data integrity ● Load - simply load the data.
  • 6. ETL Tool layers 1. Staging - where extracted data is saved 2. Integration - process of data is loaded 3. Access - where the data will be queried,.
  • 7. ETL tasks ● Understand the data to be used for reporting ● Review the Data Model ● Source to target mapping ● Data checks on source data ● Packages and schema validation ● Data verification in the target system ● Verification of data transformation calculations and aggregation rules ● Sample data comparison between the source and the target system ● Data integrity and quality checks in the target system ● Performance testing on data
  • 8. ETL testing Validation of data movement from the source to the target system. Verification of data count in the source and the target system. Verifying data extraction, transformation as per requirement and expectation. Verifying if table relations – joins and keys – are preserved during the transformation.
  • 9. Database testing Verifying if primary and foreign keys are maintained. Verifying if the columns in a table have valid data values. Verifying data accuracy in columns. Example − Number of months column shouldn’t have a value greater than 12. Verifying missing data in columns. Check if there are null columns which actually should have a valid value.
  • 10. ETL testing categories ● Source 2 target ○ count testing ○ data validation testing (duplicates? Data integrity ) ○ Data transformation ○ Constraint testing (null, unique, keys, ranges) ● Change /delta testing ● End Report test
  • 11. ETL Challenges ● Data loss during ETL ● Incorrect, incomplete or duplicate data. ● DW system contains historical data, so the data volume is too large and extremely complex to perform ETL testing in the target system. ● Performance ● Checking Critical columns ● Support Date time format and time zone conversation ● Supported text encoding ● Ignoring headers in CSV ● Incorrect column number due to separator usage in text field
  • 12. Extract validation ● Count check ● Reconcile records with the source data ● Data type check ● Ensure no spam data loaded ● Remove duplicate data ● Check all the keys are in place
  • 13. Transform validation ● Data threshold validation check, for example, age value shouldn’t be more than 100. ● Record count check, before and after the transformation logic applied. ● Data flow validation from the staging area to the intermediate tables. ● Surrogate key check.
  • 14. Load verification Record count check from the intermediate table to the target system. Ensure the key field data is not missing or Null. Check if the aggregate values and calculated measures are loaded in the fact tables. Check modeling views based on the target tables. Check if CDC has been applied on the incremental load table. Data check in dimension table and history table check. Check the BI reports based on the loaded fact and dimension table and as per the expected results.
  • 15. Data duplication validation ● Example: Select Cust_Id, Cust_NAME, Quantity, COUNT (*) FROM Customer GROUP BY Cust_Id, Cust_NAME, Quantity HAVING COUNT (*) >1; ● Reasons for duplicate data: ○ If no primary key is defined, then duplicate values may come. ○ Due to incorrect mapping or environmental issues. ○ Manual errors while transferring data from the source to the target system.
  • 16. Data Integrity testing ● number check, ● date check, ● null check, ● precision check ● invalid characters, ● incorrect upper/lower case order,
  • 17. Detailed use cases for testing: https://www.tutorialspoint.com/etl_testing/etl_testing_scenarios.htm
  • 18. Best practices ● Analyze data ● Fix bad data in the source ● Find a compatible ETL tool ● Monitor ETL job ● Apply Incremental ETL techniques when timestamp available.
  • 19. Courses & books ● https://www.udemy.com/automatingetl/ ● https://books.google.co.il/books?id=TCLfzU2ilVkC&pg=PA205&lpg=PA205&d q=example+of+time+series+etl&source=bl&ots=86zwrsmHtF&sig=ssNKHMS ph9L2_N_wBI5OVmB1rqg&hl=en&sa=X&redir_esc=y#v=onepage&q=time%2 0serias&f=false
  • 20. Courses & books ● http://www.robertomarchetto.com/talend_data_integration_free_book ● Basic Time series ETL by Omid: https://docs.google.com/document/d/1KoFMeFtxXDGiZIswcGS1o8zmp2ZlPG bX0yj6ceQ_zlU/edit?usp=sharing
  • 21. Exercise: Original table, create, drop, and add new data. /* drop table t; create table t( i int IDENTITY(1,1) NOT NULL, d datetime ); */ -- assuming unique datevalues in t, not null insert into t (d) values (getdate()); SELECT * from t where d>=DATEADD(minute, -1, GETDATE());
  • 22. Exercise: Staging table insert into t_staging (i,d) SELECT * from t where d>=DATEADD(minute, -1, GETDATE()) ; insert into t_presentation (d,i) select distinct(d),i from t_staging order by d desc; truncate table t_staging;
  • 23. Exercise: presentation select count (*) from t_presentation; select count (*) from t; select * from t_presentation order by d desc
  • 27. Talend sources Add mssql jdbc : https://www.talendforge.org/forum/viewtopic.php?id=54068 How to connect 2 components: https://www.talendforge.org/forum/viewtopic.php?id=6493 How to Create loop of a job (FYI, right click on project name, create project. Under Job design - far left, upper corner) https://help.talend.com/display/TalendOpenStudioComponentsReferenceGuide62EN/tLoop Running in parallel: https://www.talendbyexample.com/talend-job-parallelization-reference.html Running Several Queries for ETL such insert into, truncate http://www.vikramtakkar.com/2013/05/example-to-execute-multiple-sql-queries.html