SlideShare a Scribd company logo
1 of 41
ETL Process in Data Warehouse
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Outline
 ETL
 Ext ract ion
 Transf ormat ion
 Loading
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
ETL Overview
 Ext ract ion Transf ormat ion Loading – ETL
 To get dat a out of t he source and load it int o t he
dat a warehouse – simply a process of copying dat a
f rom one dat abase t o ot her
 Dat a is ext ract ed f rom an OLTP dat abase,
t ransf ormed t o mat ch t he dat a warehouse schema
and loaded int o t he dat a warehouse dat abase
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
 Many dat a warehouses also incorporat e
dat a f rom non-OLTP syst ems such as
t ext f iles, legacy syst ems, and
spreadsheet s; such dat a also requires
ext ract ion, t ransf ormat ion, and loading
 When def ining ETL f or a dat a
warehouse, it is import ant t o t hink of
ETL as a process, not a physical
implement at ion
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
ETL Overview
 ETL is of t en a complex combinat ion of pr ocess
and t echnology t hat consumes a signif icant
por t ion of t he dat a war ehouse development
ef f or t s and requires t he skills of business
analyst s, dat abase designer s, and applicat ion
developers
 I t is not a one t ime event as new dat a is added
t o t he Dat a War ehouse per iodically – mont hly,
daily, hourly
 Because ETL is an int egr al, ongoing, and
recur ring par t of a dat a war ehouse
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
ETL Staging Database
 ETL oper at ions should be perf or med on a
relat ional dat abase server separ at e f rom t he
sour ce dat abases and t he dat a war ehouse
dat abase
 Creat es a logical and physical separ at ion
bet ween t he sour ce syst ems and t he dat a
war ehouse
 Minimizes t he impact of t he int ense per iodic
ETL act ivit y on sour ce and dat a war ehouse
dat abases
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Extraction
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Extraction
 The int egrat ion of all of t he disparat e syst ems across
t he ent erprise is t he real challenge t o get t ing t he
dat a warehouse t o a st at e where it is usable
 Dat a is ext ract ed f rom het erogeneous dat a sources
 Each dat a source has it s dist inct set of
charact erist ics t hat need t o be managed and
int egrat ed int o t he ETL syst em in order t o
ef f ect ively ext ract dat a.
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
 ETL pr ocess needs t o ef f ect ively
int egrat e syst ems t hat have dif f er ent :
 DBMS
 Operat ing Syst ems
 Hardware
 Communicat ion prot ocols
 Need t o have a logical dat a map bef or e
t he physical dat a can be t r ansf or med
 The logical dat a map descr ibes t he
relat ionship bet ween t he ext reme st ar t ing
point s and t he ext r eme ending point s of
your ETL syst em usually present ed in a
Extraction
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
 The cont ent of t he logical dat a mapping
document has been pr oven t o be t he cr it ical
element r equired t o ef f icient ly plan ETL
pr ocesses
 The t able t ype gives us our queue f or t he
ordinal posit ion of our dat a load processes—
f irst dimensions, t hen f act s.
 The pr imary purpose of t his document is t o
pr ovide t he ETL developer wit h a clear -cut
Target Source Transformation
Table
Name
Column Name Data Type Table Name Column Name Data Type
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
 The analysis of t he source syst em
is usually broken int o t wo maj or
phases:
The dat a discovery phase
The anomaly det ect ion phase
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Extraction - Data Discovery Phase
 Dat a Discovery Phase
 key crit erion f or t he success of t he
dat a warehouse is t he cleanliness and
cohesiveness of t he dat a wit hin it
 Once you underst and what t he t arget
needs t o look like, you need t o ident if y
and examine t he dat a sources
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Data Discovery Phase
 It is up to the ETL team to drill down further into the data
requirements to determine each and every source system, table,
and attribute required to load the data warehouse
 Collecting and Documenting Source Systems
 Keeping track of source systems
 Determining the System of Record - Point of originating of data
 Definition of the system-of-record is important because in most
enterprises data is stored redundantly across many different systems.
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Data Content Analysis - Extraction
 Underst anding t he cont ent of t he dat a is
crucial f or det ermining t he best appr oach f or
ret r ieval
- NULL values. An unhandled NULL value can
dest r oy any ETL process. NULL values pose
t he biggest risk when t hey are in f or eign key
columns. J oining t wo or mor e t ables based on
a column t hat cont ains NULL values
 will cause dat a loss! Remember , in a relat ional
dat abase NULL is not equal t o NULL. That is
why t hose j oins f ail. Check f or NULL values in
every f oreign key in t he sour ce dat abase.
When NULL values are present , you must outer
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
 Dur ing t he init ial load, capt ur ing changes t o
dat a cont ent in t he sour ce dat a is
unimpor t ant because you ar e most likely
ext r act ing t he ent ir e dat a source or a pot ion
of it f r om a predet ermined point in t ime.
 Lat er t he abilit y t o capt ur e dat a changes in
t he source syst em inst ant ly becomes priorit y
 The ETL t eam is r esponsible f or capt uring
dat a-cont ent changes dur ing t he increment al
load.
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Determining Changed Data
 Audit Columns - Used by DB and updat ed by
t r iggers
 Audit columns ar e appended t o t he end of
each t able t o st or e t he dat e and t ime a
record was added or modif ied
 You must analyze and t est each of t he
columns t o ensur e t hat it is a reliable source
t o indicat e changed dat a. I f you f ind any
NULL values, you must t o f ind an alt ernat ive
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Pr ocess of Eliminat ion
 Pr ocess of eliminat ion pr eser ves exact ly one
copy of each pr evious ext ract ion in t he
st aging ar ea f or f ut ur e use.
 During t he next run, t he process t akes t he
ent ir e source t able(s) int o t he st aging ar ea
and makes a compar ison against t he ret ained
dat a f rom t he last process.
 Only dif f erences (delt as) are sent t o t he dat a
warehouse.
 Not t he most ef f icient t echnique, but most
reliable f or capt ur ing changed dat a
Determining Changed Data
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
I nit ial and I ncrement al Loads
 Creat e t wo t ables: pr evious load and curr ent
load.
 The init ial pr ocess bulk loads int o t he curr ent
load t able. Since change det ect ion is
ir relevant dur ing t he init ial load, t he dat a
cont inues on t o be t ransf or med and loaded
int o t he ult imat e t arget f act t able.
 When t he pr ocess is complet e, it dr ops t he
pr evious load t able, r enames t he cur rent load
t able t o pr evious load, and creat es an empt y
cur rent load t able. Since none of t hese t asks
involve dat abase logging, t hey are ver y f ast !
Det ermining Changed Dat a
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Transformation
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Transformation
 Main st ep where t he ETL adds value
 Act ually changes dat a and provides
guidance whet her dat a can be used f or
it s int ended purposes
 Perf ormed in st aging area
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Dat a Qualit y paradigm
 Cor r ect
 Unambiguous
 Consist ent
 Complet e
 Dat a qualit y checks are run at 2 places - af t er
ext r act ion and af t er cleaning and conf ir ming
addit ional check are r un at t his point
Transformation
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Transformation - Cleaning
Data
 Anomaly Det ect ion
 Dat a sampling – count (*) of t he rows f or a
depart ment column
 Column Pr oper t y Enf orcement
 Null Values in reqd columns
 Numeric values t hat f all out side of expect ed high
and lows
 Cols whose lengt hs are except ionally short / long
 Cols wit h cert ain values out side of discret e valid
value set s
 Adherence t o a reqd pat t ern/ member of a set of
pat t ern
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Transf ormat ion - Conf irming
 St ruct ure Enf orcement
Tables have proper pr imary and f oreign
keys
Obey r ef erent ial int egrit y
 Dat a and Rule value enf orcement
Simple business r ules
Logical dat a checks
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Staged Data
Cleaning
And
Confirming
Fatal Errors
Stop
Loading
Yes
No
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Loading
 Loading Dimensions
 Loading Fact s
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Loading Dimensions
 Physically built t o have t he minimal set s of
component s
 The primary key is a single f ield cont aining
meaningless unique int eger – Surrogat e Keys
 The DW owns t hese keys and never allows any ot her
ent it y t o assign t hem
 De-normalized f lat t ables – all at t ribut es in a
dimension must t ake on a single value in t he presence
of a dimension primary key.
 Should possess one or more ot her f ields t hat
compose t he nat ural key of t he dimension
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
 The dat a loading module consist s of all t he st eps
required t o administ er slowly changing dimensions
(SCD) and writ e t he dimension t o disk as a physical
t able in t he proper dimensional f ormat wit h correct
primary keys, correct nat ural keys, and f inal
descript ive at t ribut es.
 Creat ing and assigning t he surrogat e keys occur in
t his module.
 The t able is def init ely st aged, since it is t he obj ect
t o be loaded int o t he present at ion syst em of t he dat a
warehouse.
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Loading dimensions
 When DW receives not if icat ion t hat an
exist ing row in dimension has changed it
gives out 3 t ypes of responses
Type 1
Type 2
Type 3
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Type 1 Dimension
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Type 2 Dimension
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Type 3 Dimensions
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Loading facts
 Fact s
Fact t ables hold t he measurement s of
an ent erprise. The relat ionship bet ween
f act t ables and measurement s is
ext remely simple. I f a measurement
exist s, it can be modeled as a f act t able
row. I f a f act t able row exist s, it is a
measurement
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Key Building Process - Facts
 When building a f act t able, t he f inal ETL st ep is
convert ing t he nat ural keys in t he new input records
int o t he correct , cont emporary surrogat e keys
 ETL maint ains a special surrogat e key lookup t able f or
each dimension. This t able is updat ed whenever a new
dimension ent it y is creat ed and whenever a Type 2
change occurs on an exist ing dimension ent it y
 All of t he required lookup t ables should be pinned in
memory so t hat t hey can be randomly accessed as
each incoming f act record present s it s nat ural keys.
This is one of t he reasons f or making t he lookup
t ables separat e f rom t he original dat a warehouse
dimension t ables.
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Key Building Process
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Loading Fact Tables
 Managing I ndexes
Perf or mance Killers at load t ime
Dr op all indexes in pre-load t ime
Segregat e Updat es f r om inser t s
Load updat es
Rebuild indexes
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
 Managing Part it ions
Part it ions allow a t able (and it s indexes) t o
be physically divided int o minitablesf or
administ r at ive pur poses and t o impr ove
query per f or mance
The most common par t it ioning st rat egy on
f act t ables is t o part it ion t he t able by t he
dat e key. Because t he dat e dimension is
pr eloaded and st at ic, you know exact ly what
t he sur rogat e keys are
Need t o part it ion t he f act t able on t he key
t hat j oins t o t he dat e dimension f or t he
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Outwitting the Rollback Log
 The r ollback log, also known as t he redo log, is
invaluable in t ransact ion (OLTP) syst ems. But
in a dat a war ehouse envir onment where all
t ransact ions ar e managed by t he ETL pr ocess,
t he r ollback log is a superf luous f eat ur e t hat
must be dealt wit h t o achieve opt imal load
per f or mance. Reasons why t he dat a
war ehouse does not need rollback logging ar e:
 All dat a is ent ered by a managed process—t he ETL
syst em.
 Dat a is loaded in bulk.
 Dat a can easily be reloaded if a load process f ails.
 Each dat abase management syst em has dif f erent
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
info@quontrasolutions.com
www.quontrasolutions.com Ph. (404)-900-9988
Phone : +1-(404)-900-9988
email: info@quontrasolutions.com
httP://www.quontrasolutions.com

More Related Content

More from QUONTRASOLUTIONS

Business analyst overview by quontra solutions
Business analyst overview by quontra solutionsBusiness analyst overview by quontra solutions
Business analyst overview by quontra solutions
QUONTRASOLUTIONS
 
Business analyst overview by quontra solutions
Business analyst overview by quontra solutionsBusiness analyst overview by quontra solutions
Business analyst overview by quontra solutions
QUONTRASOLUTIONS
 
Saas overview by quontra solutions
Saas overview  by quontra solutionsSaas overview  by quontra solutions
Saas overview by quontra solutions
QUONTRASOLUTIONS
 

More from QUONTRASOLUTIONS (20)

Business analyst overview by quontra solutions
Business analyst overview by quontra solutionsBusiness analyst overview by quontra solutions
Business analyst overview by quontra solutions
 
Business analyst overview by quontra solutions
Business analyst overview by quontra solutionsBusiness analyst overview by quontra solutions
Business analyst overview by quontra solutions
 
Cognos Overview
Cognos Overview Cognos Overview
Cognos Overview
 
Hibernate online training
Hibernate online trainingHibernate online training
Hibernate online training
 
Java j2eeTutorial
Java j2eeTutorialJava j2eeTutorial
Java j2eeTutorial
 
Software Quality Assurance training by QuontraSolutions
Software Quality Assurance training by QuontraSolutionsSoftware Quality Assurance training by QuontraSolutions
Software Quality Assurance training by QuontraSolutions
 
Introduction to software quality assurance by QuontraSolutions
Introduction to software quality assurance by QuontraSolutionsIntroduction to software quality assurance by QuontraSolutions
Introduction to software quality assurance by QuontraSolutions
 
.Net introduction by Quontra Solutions
.Net introduction by Quontra Solutions.Net introduction by Quontra Solutions
.Net introduction by Quontra Solutions
 
Introduction to j2 ee patterns online training class
Introduction to j2 ee patterns online training classIntroduction to j2 ee patterns online training class
Introduction to j2 ee patterns online training class
 
Saas overview by quontra solutions
Saas overview  by quontra solutionsSaas overview  by quontra solutions
Saas overview by quontra solutions
 
Sharepoint taxonomy introduction us
Sharepoint taxonomy introduction   usSharepoint taxonomy introduction   us
Sharepoint taxonomy introduction us
 
Introduction to the sharepoint 2013 userprofile service By Quontra
Introduction to the sharepoint 2013 userprofile service By QuontraIntroduction to the sharepoint 2013 userprofile service By Quontra
Introduction to the sharepoint 2013 userprofile service By Quontra
 
Introduction to SharePoint 2013 REST API
Introduction to SharePoint 2013 REST APIIntroduction to SharePoint 2013 REST API
Introduction to SharePoint 2013 REST API
 
Performance Testing and OBIEE by QuontraSolutions
Performance Testing and OBIEE by QuontraSolutionsPerformance Testing and OBIEE by QuontraSolutions
Performance Testing and OBIEE by QuontraSolutions
 
Obiee introduction building reports by QuontraSolutions
Obiee introduction building reports by QuontraSolutionsObiee introduction building reports by QuontraSolutions
Obiee introduction building reports by QuontraSolutions
 
Sharepoint designer workflow by quontra us
Sharepoint designer workflow by quontra usSharepoint designer workflow by quontra us
Sharepoint designer workflow by quontra us
 
Qa by quontra us
Qa by quontra   usQa by quontra   us
Qa by quontra us
 
MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra
 
Msbi by quontra us
Msbi by quontra usMsbi by quontra us
Msbi by quontra us
 
Introduction to .NET by QuontraSolutions
Introduction to .NET by QuontraSolutionsIntroduction to .NET by QuontraSolutions
Introduction to .NET by QuontraSolutions
 

Recently uploaded

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Recently uploaded (20)

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 

Cognos etl process by quontra solutions

  • 1. ETL Process in Data Warehouse info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 2. Outline  ETL  Ext ract ion  Transf ormat ion  Loading info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 3. ETL Overview  Ext ract ion Transf ormat ion Loading – ETL  To get dat a out of t he source and load it int o t he dat a warehouse – simply a process of copying dat a f rom one dat abase t o ot her  Dat a is ext ract ed f rom an OLTP dat abase, t ransf ormed t o mat ch t he dat a warehouse schema and loaded int o t he dat a warehouse dat abase info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 4.  Many dat a warehouses also incorporat e dat a f rom non-OLTP syst ems such as t ext f iles, legacy syst ems, and spreadsheet s; such dat a also requires ext ract ion, t ransf ormat ion, and loading  When def ining ETL f or a dat a warehouse, it is import ant t o t hink of ETL as a process, not a physical implement at ion info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 5. ETL Overview  ETL is of t en a complex combinat ion of pr ocess and t echnology t hat consumes a signif icant por t ion of t he dat a war ehouse development ef f or t s and requires t he skills of business analyst s, dat abase designer s, and applicat ion developers  I t is not a one t ime event as new dat a is added t o t he Dat a War ehouse per iodically – mont hly, daily, hourly  Because ETL is an int egr al, ongoing, and recur ring par t of a dat a war ehouse info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 6. ETL Staging Database  ETL oper at ions should be perf or med on a relat ional dat abase server separ at e f rom t he sour ce dat abases and t he dat a war ehouse dat abase  Creat es a logical and physical separ at ion bet ween t he sour ce syst ems and t he dat a war ehouse  Minimizes t he impact of t he int ense per iodic ETL act ivit y on sour ce and dat a war ehouse dat abases info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 8. Extraction  The int egrat ion of all of t he disparat e syst ems across t he ent erprise is t he real challenge t o get t ing t he dat a warehouse t o a st at e where it is usable  Dat a is ext ract ed f rom het erogeneous dat a sources  Each dat a source has it s dist inct set of charact erist ics t hat need t o be managed and int egrat ed int o t he ETL syst em in order t o ef f ect ively ext ract dat a. info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 9.  ETL pr ocess needs t o ef f ect ively int egrat e syst ems t hat have dif f er ent :  DBMS  Operat ing Syst ems  Hardware  Communicat ion prot ocols  Need t o have a logical dat a map bef or e t he physical dat a can be t r ansf or med  The logical dat a map descr ibes t he relat ionship bet ween t he ext reme st ar t ing point s and t he ext r eme ending point s of your ETL syst em usually present ed in a Extraction info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 10.  The cont ent of t he logical dat a mapping document has been pr oven t o be t he cr it ical element r equired t o ef f icient ly plan ETL pr ocesses  The t able t ype gives us our queue f or t he ordinal posit ion of our dat a load processes— f irst dimensions, t hen f act s.  The pr imary purpose of t his document is t o pr ovide t he ETL developer wit h a clear -cut Target Source Transformation Table Name Column Name Data Type Table Name Column Name Data Type info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 11.  The analysis of t he source syst em is usually broken int o t wo maj or phases: The dat a discovery phase The anomaly det ect ion phase info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 12. Extraction - Data Discovery Phase  Dat a Discovery Phase  key crit erion f or t he success of t he dat a warehouse is t he cleanliness and cohesiveness of t he dat a wit hin it  Once you underst and what t he t arget needs t o look like, you need t o ident if y and examine t he dat a sources info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 13. Data Discovery Phase  It is up to the ETL team to drill down further into the data requirements to determine each and every source system, table, and attribute required to load the data warehouse  Collecting and Documenting Source Systems  Keeping track of source systems  Determining the System of Record - Point of originating of data  Definition of the system-of-record is important because in most enterprises data is stored redundantly across many different systems. info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 14. Data Content Analysis - Extraction  Underst anding t he cont ent of t he dat a is crucial f or det ermining t he best appr oach f or ret r ieval - NULL values. An unhandled NULL value can dest r oy any ETL process. NULL values pose t he biggest risk when t hey are in f or eign key columns. J oining t wo or mor e t ables based on a column t hat cont ains NULL values  will cause dat a loss! Remember , in a relat ional dat abase NULL is not equal t o NULL. That is why t hose j oins f ail. Check f or NULL values in every f oreign key in t he sour ce dat abase. When NULL values are present , you must outer info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 15.  Dur ing t he init ial load, capt ur ing changes t o dat a cont ent in t he sour ce dat a is unimpor t ant because you ar e most likely ext r act ing t he ent ir e dat a source or a pot ion of it f r om a predet ermined point in t ime.  Lat er t he abilit y t o capt ur e dat a changes in t he source syst em inst ant ly becomes priorit y  The ETL t eam is r esponsible f or capt uring dat a-cont ent changes dur ing t he increment al load. info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 16. Determining Changed Data  Audit Columns - Used by DB and updat ed by t r iggers  Audit columns ar e appended t o t he end of each t able t o st or e t he dat e and t ime a record was added or modif ied  You must analyze and t est each of t he columns t o ensur e t hat it is a reliable source t o indicat e changed dat a. I f you f ind any NULL values, you must t o f ind an alt ernat ive info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 17. Pr ocess of Eliminat ion  Pr ocess of eliminat ion pr eser ves exact ly one copy of each pr evious ext ract ion in t he st aging ar ea f or f ut ur e use.  During t he next run, t he process t akes t he ent ir e source t able(s) int o t he st aging ar ea and makes a compar ison against t he ret ained dat a f rom t he last process.  Only dif f erences (delt as) are sent t o t he dat a warehouse.  Not t he most ef f icient t echnique, but most reliable f or capt ur ing changed dat a Determining Changed Data info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 18. I nit ial and I ncrement al Loads  Creat e t wo t ables: pr evious load and curr ent load.  The init ial pr ocess bulk loads int o t he curr ent load t able. Since change det ect ion is ir relevant dur ing t he init ial load, t he dat a cont inues on t o be t ransf or med and loaded int o t he ult imat e t arget f act t able.  When t he pr ocess is complet e, it dr ops t he pr evious load t able, r enames t he cur rent load t able t o pr evious load, and creat es an empt y cur rent load t able. Since none of t hese t asks involve dat abase logging, t hey are ver y f ast ! Det ermining Changed Dat a info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 20. Transformation  Main st ep where t he ETL adds value  Act ually changes dat a and provides guidance whet her dat a can be used f or it s int ended purposes  Perf ormed in st aging area info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 21. Dat a Qualit y paradigm  Cor r ect  Unambiguous  Consist ent  Complet e  Dat a qualit y checks are run at 2 places - af t er ext r act ion and af t er cleaning and conf ir ming addit ional check are r un at t his point Transformation info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 22. Transformation - Cleaning Data  Anomaly Det ect ion  Dat a sampling – count (*) of t he rows f or a depart ment column  Column Pr oper t y Enf orcement  Null Values in reqd columns  Numeric values t hat f all out side of expect ed high and lows  Cols whose lengt hs are except ionally short / long  Cols wit h cert ain values out side of discret e valid value set s  Adherence t o a reqd pat t ern/ member of a set of pat t ern info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 24. Transf ormat ion - Conf irming  St ruct ure Enf orcement Tables have proper pr imary and f oreign keys Obey r ef erent ial int egrit y  Dat a and Rule value enf orcement Simple business r ules Logical dat a checks info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 26. Loading  Loading Dimensions  Loading Fact s info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 27. Loading Dimensions  Physically built t o have t he minimal set s of component s  The primary key is a single f ield cont aining meaningless unique int eger – Surrogat e Keys  The DW owns t hese keys and never allows any ot her ent it y t o assign t hem  De-normalized f lat t ables – all at t ribut es in a dimension must t ake on a single value in t he presence of a dimension primary key.  Should possess one or more ot her f ields t hat compose t he nat ural key of t he dimension info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 29.  The dat a loading module consist s of all t he st eps required t o administ er slowly changing dimensions (SCD) and writ e t he dimension t o disk as a physical t able in t he proper dimensional f ormat wit h correct primary keys, correct nat ural keys, and f inal descript ive at t ribut es.  Creat ing and assigning t he surrogat e keys occur in t his module.  The t able is def init ely st aged, since it is t he obj ect t o be loaded int o t he present at ion syst em of t he dat a warehouse. info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 30. Loading dimensions  When DW receives not if icat ion t hat an exist ing row in dimension has changed it gives out 3 t ypes of responses Type 1 Type 2 Type 3 info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 34. Loading facts  Fact s Fact t ables hold t he measurement s of an ent erprise. The relat ionship bet ween f act t ables and measurement s is ext remely simple. I f a measurement exist s, it can be modeled as a f act t able row. I f a f act t able row exist s, it is a measurement info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 35. Key Building Process - Facts  When building a f act t able, t he f inal ETL st ep is convert ing t he nat ural keys in t he new input records int o t he correct , cont emporary surrogat e keys  ETL maint ains a special surrogat e key lookup t able f or each dimension. This t able is updat ed whenever a new dimension ent it y is creat ed and whenever a Type 2 change occurs on an exist ing dimension ent it y  All of t he required lookup t ables should be pinned in memory so t hat t hey can be randomly accessed as each incoming f act record present s it s nat ural keys. This is one of t he reasons f or making t he lookup t ables separat e f rom t he original dat a warehouse dimension t ables. info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 38. Loading Fact Tables  Managing I ndexes Perf or mance Killers at load t ime Dr op all indexes in pre-load t ime Segregat e Updat es f r om inser t s Load updat es Rebuild indexes info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 39.  Managing Part it ions Part it ions allow a t able (and it s indexes) t o be physically divided int o minitablesf or administ r at ive pur poses and t o impr ove query per f or mance The most common par t it ioning st rat egy on f act t ables is t o part it ion t he t able by t he dat e key. Because t he dat e dimension is pr eloaded and st at ic, you know exact ly what t he sur rogat e keys are Need t o part it ion t he f act t able on t he key t hat j oins t o t he dat e dimension f or t he info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 40. Outwitting the Rollback Log  The r ollback log, also known as t he redo log, is invaluable in t ransact ion (OLTP) syst ems. But in a dat a war ehouse envir onment where all t ransact ions ar e managed by t he ETL pr ocess, t he r ollback log is a superf luous f eat ur e t hat must be dealt wit h t o achieve opt imal load per f or mance. Reasons why t he dat a war ehouse does not need rollback logging ar e:  All dat a is ent ered by a managed process—t he ETL syst em.  Dat a is loaded in bulk.  Dat a can easily be reloaded if a load process f ails.  Each dat abase management syst em has dif f erent info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988
  • 41. info@quontrasolutions.com www.quontrasolutions.com Ph. (404)-900-9988 Phone : +1-(404)-900-9988 email: info@quontrasolutions.com httP://www.quontrasolutions.com