SlideShare a Scribd company logo
Introduction
To DataStage
Two Data Warehousing Strategies
• Enterprise-wide warehouse, top down, the Inmon
methodology
• Data mart, bottom up, the Kimball methodology
• When properly executed, both result in an enterprise-wide
data warehouse
The Data Mart Strategy
• The most common approach
• Begins with a single mart and architected marts are added
over time for more subject areas
• Relatively inexpensive and easy to implement
• Can be used as a proof of concept for data warehousing
• Can perpetuate the “silos of information” problem
• Can postpone difficult decisions and activities
• Requires an overall integration plan
The Enterprise-wide Strategy
• A comprehensive warehouse is built initially
• An initial dependent data mart is built using a subset of the data in the
warehouse
• Additional data marts are built using subsets of the data in the warehouse
• Like all complex projects, it is expensive, time consuming, and prone to
failure
• When successful, it results in an integrated, scalable warehouse
Data Sources and Types
• Primarily from legacy, operational systems
• Almost exclusively numerical data at the present time
• External data may be included, often purchased from third-
party sources
• Technology exists for storing unstructured data and expect
this to become more important over time
Extraction, Transformation, and Loading
(ETL) Processes
• The “plumbing” work of data warehousing
• Data are moved from source to target data bases
• A very costly, time consuming part of data warehousing
Recent Development:
More Frequent Updates
• Updates can be done in bulk and trickle modes
• Business requirements, such as trading partner access to a
Web site, requires current data
• For international firms, there is no good time to load the
warehouse
Recent Development: Clickstream Data
• Results from clicks at web sites
• A dialog manager handles user interactions. An ODS
(operational data store in the data staging area) helps to
custom tailor the dialog
• The clickstream data is filtered and parsed and sent to a data
warehouse where it is analyzed
• Software is available to analyze the clickstream data
Data Extraction
• Often performed by COBOL routines
(not recommended because of high program maintenance
and no automatically generated meta data)
• Sometimes source data is copied to the target database using
the replication capabilities of standard RDMS (not
recommended because of “dirty data” in the source systems)
• Increasing performed by specialized ETL software
Sample ETL Tools
• Teradata Warehouse Builder from Teradata
• DataStage from Ascential Software
• SAS System from SAS Institute
• Power Mart/Power Center from Informatica
• Sagent Solution from Sagent Software
• Hummingbird Genio Suite from Hummingbird
Communications
Reasons for “Dirty” Data
• Dummy Values
• Absence of Data
• Multipurpose Fields
• Cryptic Data
• Contradicting Data
• Inappropriate Use of Address Lines
• Violation of Business Rules
• Reused Primary Keys,
• Non-Unique Identifiers
• Data Integration Problems
Data Cleansing
• Source systems contain “dirty data” that must be cleansed
• ETL software contains rudimentary data cleansing capabilities
• Specialized data cleansing software is often used. Important
for performing name and address correction and
householding functions
• Leading data cleansing vendors include Vality (Integrity),
Harte-Hanks (Trillium), and Firstlogic (i.d.Centric)
Steps in Data Cleansing
• Parsing
• Correcting
• Standardizing
• Matching
• Consolidating
Parsing
• Parsing locates and identifies individual data elements in the
source files and then isolates these data elements in the
target files.
• Examples include parsing the first, middle, and last name;
street number and street name; and city and state.
Correcting
• Corrects parsed individual data components using
sophisticated data algorithms and secondary data sources.
• Example include replacing a vanity address and adding a zip
code.
Standardizing
• Standardizing applies conversion routines to transform data
into its preferred (and consistent) format using both standard
and custom business rules.
• Examples include adding a pre name, replacing a nickname,
and using a preferred street name.
Matching
• Searching and matching records within and across the parsed,
corrected and standardized data based on predefined
business rules to eliminate duplications.
• Examples include identifying similar names and addresses.
Consolidating
• Analyzing and identifying relationships between matched
records and consolidating/merging them into ONE
representation.
Data Staging
• Often used as an interim step between data extraction and
later steps
• Accumulates data from asynchronous sources using native
interfaces, flat files, FTP sessions, or other processes
• At a predefined cutoff time, data in the staging file is
transformed and loaded to the warehouse
• There is usually no end user access to the staging file
• An operational data store may be used for data staging
Data Transformation
• Transforms the data in accordance with the business rules
and standards that have been established
• Example include: format changes, deduplication, splitting up
fields, replacement of codes, derived values, and aggregates
Data Loading
• Data are physically moved to the data warehouse
• The loading takes place within a “load window”
• The trend is to near real time updates of the data warehouse
as the warehouse is increasingly used for operational
applications
Meta Data
• Data about data
• Needed by both information technology personnel and users
• IT personnel need to know data sources and targets;
database, table and column names; refresh schedules; data
usage measures; etc.
• Users need to know entity/attribute definitions;
reports/query tools available; report distribution information;
help desk contact information, etc.
Recent Development:Meta Data Integration
• A growing realization that meta data is critical to data
warehousing success
• Progress is being made on getting vendors to agree on
standards and to incorporate the sharing of meta data among
their tools
• Vendors like Microsoft, Computer Associates, and Oracle
have entered the meta data marketplace with significant
product offerings
Thank You !!!Thank You !!!
For More Information click below link:
Follow Us on:
http://vibranttechnologies.co.in/datastage-classes-in-mumbai.html

More Related Content

What's hot

Datastage to ODI
Datastage to ODIDatastage to ODI
Datastage to ODI
Nagendra K
 
Migration from 8.1 to 11.3
Migration from 8.1 to 11.3Migration from 8.1 to 11.3
Migration from 8.1 to 11.3
Suryakant Bharati
 
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSINGSKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
Skillwise Group
 
Etl - Extract Transform Load
Etl - Extract Transform LoadEtl - Extract Transform Load
Etl - Extract Transform Load
ABDUL KHALIQ
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
CloverDX (formerly known as CloverETL)
 
What is ETL?
What is ETL?What is ETL?
What is ETL?
Ismail El Gayar
 
Data stage interview questions and answers|DataStage FAQS
Data stage interview questions and answers|DataStage FAQSData stage interview questions and answers|DataStage FAQS
Data stage interview questions and answers|DataStage FAQS
BigClasses.com
 
Etl techniques
Etl techniquesEtl techniques
Etl techniques
mahezabeenIlkal
 
Datastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsDatastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobs
shanker_uma
 
To Study E T L ( Extract, Transform, Load) Tools Specially S Q L Server I...
To Study  E T L ( Extract, Transform, Load) Tools Specially  S Q L  Server  I...To Study  E T L ( Extract, Transform, Load) Tools Specially  S Q L  Server  I...
To Study E T L ( Extract, Transform, Load) Tools Specially S Q L Server I...
Shahzad
 
Teradata training
Teradata trainingTeradata training
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
Mr. Fmhyudin
 
Online Datastage training
Online Datastage trainingOnline Datastage training
Online Datastage training
chpriyaa1
 
StreamHorizon overview
StreamHorizon overviewStreamHorizon overview
StreamHorizon overview
StreamHorizon
 
CloverETL Basic Training Excerpt
CloverETL Basic Training ExcerptCloverETL Basic Training Excerpt
CloverETL Basic Training Excerpt
CloverDX (formerly known as CloverETL)
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasir
yasir873
 
Teradata - Architecture of Teradata
Teradata - Architecture of TeradataTeradata - Architecture of Teradata
Teradata - Architecture of Teradata
Vibrant Technologies & Computers
 
Ch 7 Physical D B Design
Ch 7  Physical D B  DesignCh 7  Physical D B  Design
Ch 7 Physical D B Design
guest8fdbdd
 
Spark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.com
Syed Hadoop
 
Etl overview training
Etl overview trainingEtl overview training
Etl overview training
Mondy Holten
 

What's hot (20)

Datastage to ODI
Datastage to ODIDatastage to ODI
Datastage to ODI
 
Migration from 8.1 to 11.3
Migration from 8.1 to 11.3Migration from 8.1 to 11.3
Migration from 8.1 to 11.3
 
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSINGSKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSING
 
Etl - Extract Transform Load
Etl - Extract Transform LoadEtl - Extract Transform Load
Etl - Extract Transform Load
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
What is ETL?
What is ETL?What is ETL?
What is ETL?
 
Data stage interview questions and answers|DataStage FAQS
Data stage interview questions and answers|DataStage FAQSData stage interview questions and answers|DataStage FAQS
Data stage interview questions and answers|DataStage FAQS
 
Etl techniques
Etl techniquesEtl techniques
Etl techniques
 
Datastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsDatastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobs
 
To Study E T L ( Extract, Transform, Load) Tools Specially S Q L Server I...
To Study  E T L ( Extract, Transform, Load) Tools Specially  S Q L  Server  I...To Study  E T L ( Extract, Transform, Load) Tools Specially  S Q L  Server  I...
To Study E T L ( Extract, Transform, Load) Tools Specially S Q L Server I...
 
Teradata training
Teradata trainingTeradata training
Teradata training
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 
Online Datastage training
Online Datastage trainingOnline Datastage training
Online Datastage training
 
StreamHorizon overview
StreamHorizon overviewStreamHorizon overview
StreamHorizon overview
 
CloverETL Basic Training Excerpt
CloverETL Basic Training ExcerptCloverETL Basic Training Excerpt
CloverETL Basic Training Excerpt
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasir
 
Teradata - Architecture of Teradata
Teradata - Architecture of TeradataTeradata - Architecture of Teradata
Teradata - Architecture of Teradata
 
Ch 7 Physical D B Design
Ch 7  Physical D B  DesignCh 7  Physical D B  Design
Ch 7 Physical D B Design
 
Spark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.com
 
Etl overview training
Etl overview trainingEtl overview training
Etl overview training
 

Similar to Datastage Introduction To Data Warehousing

ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
Vibrant Technologies & Computers
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
Vibrant Event
 
ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL Testing
Vibrant Event
 
ETL-Datawarehousing.ppt.pptx
ETL-Datawarehousing.ppt.pptxETL-Datawarehousing.ppt.pptx
ETL-Datawarehousing.ppt.pptx
karanamlakshminarasa
 
Data warehouse
Data warehouseData warehouse
Data warehouse
Shwetabh Jaiswal
 
Data warehouseold
Data warehouseoldData warehouseold
Data warehouseold
Shwetabh Jaiswal
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
Y Parandama Reddy
 
DW (1).ppt
DW (1).pptDW (1).ppt
DW (1).ppt
RahulSingh986955
 
Datawarehouse org
Datawarehouse orgDatawarehouse org
Datawarehouse org
Shwetabh Jaiswal
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.ppt
RafiulHasan19
 
D01 etl
D01 etlD01 etl
D01 etl
Prince Jain
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
Murli Jha
 
ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptx
ParnalSatle
 
Data Management Workshop - ETOT 2016
Data Management Workshop - ETOT 2016Data Management Workshop - ETOT 2016
Data Management Workshop - ETOT 2016
DataGenic Ltd
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
thomasmary607
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data mart
Amit Sarkar
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
Kiran kumar
 
Etl data processing system which is very useful for the engineering students
Etl data processing system which is very useful for the engineering studentsEtl data processing system which is very useful for the engineering students
Etl data processing system which is very useful for the engineering students
utsav25khel
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
Saurabh K. Gupta
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data Warehousing
AAKANKSHA JAIN
 

Similar to Datastage Introduction To Data Warehousing (20)

ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
 
ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL Testing
 
ETL-Datawarehousing.ppt.pptx
ETL-Datawarehousing.ppt.pptxETL-Datawarehousing.ppt.pptx
ETL-Datawarehousing.ppt.pptx
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehouseold
Data warehouseoldData warehouseold
Data warehouseold
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
DW (1).ppt
DW (1).pptDW (1).ppt
DW (1).ppt
 
Datawarehouse org
Datawarehouse orgDatawarehouse org
Datawarehouse org
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.ppt
 
D01 etl
D01 etlD01 etl
D01 etl
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
 
ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptx
 
Data Management Workshop - ETOT 2016
Data Management Workshop - ETOT 2016Data Management Workshop - ETOT 2016
Data Management Workshop - ETOT 2016
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data mart
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
 
Etl data processing system which is very useful for the engineering students
Etl data processing system which is very useful for the engineering studentsEtl data processing system which is very useful for the engineering students
Etl data processing system which is very useful for the engineering students
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data Warehousing
 

More from Vibrant Technologies & Computers

Buisness analyst business analysis overview ppt 5
Buisness analyst business analysis overview ppt 5Buisness analyst business analysis overview ppt 5
Buisness analyst business analysis overview ppt 5
Vibrant Technologies & Computers
 
SQL Introduction to displaying data from multiple tables
SQL Introduction to displaying data from multiple tables  SQL Introduction to displaying data from multiple tables
SQL Introduction to displaying data from multiple tables
Vibrant Technologies & Computers
 
SQL- Introduction to MySQL
SQL- Introduction to MySQLSQL- Introduction to MySQL
SQL- Introduction to MySQL
Vibrant Technologies & Computers
 
SQL- Introduction to SQL database
SQL- Introduction to SQL database SQL- Introduction to SQL database
SQL- Introduction to SQL database
Vibrant Technologies & Computers
 
ITIL - introduction to ITIL
ITIL - introduction to ITILITIL - introduction to ITIL
ITIL - introduction to ITIL
Vibrant Technologies & Computers
 
Salesforce - Introduction to Security & Access
Salesforce -  Introduction to Security & Access Salesforce -  Introduction to Security & Access
Salesforce - Introduction to Security & Access
Vibrant Technologies & Computers
 
Data ware housing- Introduction to olap .
Data ware housing- Introduction to  olap .Data ware housing- Introduction to  olap .
Data ware housing- Introduction to olap .
Vibrant Technologies & Computers
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
Vibrant Technologies & Computers
 
Data ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housingData ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housing
Vibrant Technologies & Computers
 
Salesforce - classification of cloud computing
Salesforce - classification of cloud computingSalesforce - classification of cloud computing
Salesforce - classification of cloud computing
Vibrant Technologies & Computers
 
Salesforce - cloud computing fundamental
Salesforce - cloud computing fundamentalSalesforce - cloud computing fundamental
Salesforce - cloud computing fundamental
Vibrant Technologies & Computers
 
SQL- Introduction to PL/SQL
SQL- Introduction to  PL/SQLSQL- Introduction to  PL/SQL
SQL- Introduction to PL/SQL
Vibrant Technologies & Computers
 
SQL- Introduction to advanced sql concepts
SQL- Introduction to  advanced sql conceptsSQL- Introduction to  advanced sql concepts
SQL- Introduction to advanced sql concepts
Vibrant Technologies & Computers
 
SQL Inteoduction to SQL manipulating of data
SQL Inteoduction to SQL manipulating of data   SQL Inteoduction to SQL manipulating of data
SQL Inteoduction to SQL manipulating of data
Vibrant Technologies & Computers
 
SQL- Introduction to SQL Set Operations
SQL- Introduction to SQL Set OperationsSQL- Introduction to SQL Set Operations
SQL- Introduction to SQL Set Operations
Vibrant Technologies & Computers
 
Sas - Introduction to designing the data mart
Sas - Introduction to designing the data martSas - Introduction to designing the data mart
Sas - Introduction to designing the data mart
Vibrant Technologies & Computers
 
Sas - Introduction to working under change management
Sas - Introduction to working under change managementSas - Introduction to working under change management
Sas - Introduction to working under change management
Vibrant Technologies & Computers
 
SAS - overview of SAS
SAS - overview of SASSAS - overview of SAS
SAS - overview of SAS
Vibrant Technologies & Computers
 
Teradata - Restoring Data
Teradata - Restoring Data Teradata - Restoring Data
Teradata - Restoring Data
Vibrant Technologies & Computers
 
Datastage database design and data modeling ppt 4
Datastage database design and data modeling ppt 4Datastage database design and data modeling ppt 4
Datastage database design and data modeling ppt 4
Vibrant Technologies & Computers
 

More from Vibrant Technologies & Computers (20)

Buisness analyst business analysis overview ppt 5
Buisness analyst business analysis overview ppt 5Buisness analyst business analysis overview ppt 5
Buisness analyst business analysis overview ppt 5
 
SQL Introduction to displaying data from multiple tables
SQL Introduction to displaying data from multiple tables  SQL Introduction to displaying data from multiple tables
SQL Introduction to displaying data from multiple tables
 
SQL- Introduction to MySQL
SQL- Introduction to MySQLSQL- Introduction to MySQL
SQL- Introduction to MySQL
 
SQL- Introduction to SQL database
SQL- Introduction to SQL database SQL- Introduction to SQL database
SQL- Introduction to SQL database
 
ITIL - introduction to ITIL
ITIL - introduction to ITILITIL - introduction to ITIL
ITIL - introduction to ITIL
 
Salesforce - Introduction to Security & Access
Salesforce -  Introduction to Security & Access Salesforce -  Introduction to Security & Access
Salesforce - Introduction to Security & Access
 
Data ware housing- Introduction to olap .
Data ware housing- Introduction to  olap .Data ware housing- Introduction to  olap .
Data ware housing- Introduction to olap .
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
 
Data ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housingData ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housing
 
Salesforce - classification of cloud computing
Salesforce - classification of cloud computingSalesforce - classification of cloud computing
Salesforce - classification of cloud computing
 
Salesforce - cloud computing fundamental
Salesforce - cloud computing fundamentalSalesforce - cloud computing fundamental
Salesforce - cloud computing fundamental
 
SQL- Introduction to PL/SQL
SQL- Introduction to  PL/SQLSQL- Introduction to  PL/SQL
SQL- Introduction to PL/SQL
 
SQL- Introduction to advanced sql concepts
SQL- Introduction to  advanced sql conceptsSQL- Introduction to  advanced sql concepts
SQL- Introduction to advanced sql concepts
 
SQL Inteoduction to SQL manipulating of data
SQL Inteoduction to SQL manipulating of data   SQL Inteoduction to SQL manipulating of data
SQL Inteoduction to SQL manipulating of data
 
SQL- Introduction to SQL Set Operations
SQL- Introduction to SQL Set OperationsSQL- Introduction to SQL Set Operations
SQL- Introduction to SQL Set Operations
 
Sas - Introduction to designing the data mart
Sas - Introduction to designing the data martSas - Introduction to designing the data mart
Sas - Introduction to designing the data mart
 
Sas - Introduction to working under change management
Sas - Introduction to working under change managementSas - Introduction to working under change management
Sas - Introduction to working under change management
 
SAS - overview of SAS
SAS - overview of SASSAS - overview of SAS
SAS - overview of SAS
 
Teradata - Restoring Data
Teradata - Restoring Data Teradata - Restoring Data
Teradata - Restoring Data
 
Datastage database design and data modeling ppt 4
Datastage database design and data modeling ppt 4Datastage database design and data modeling ppt 4
Datastage database design and data modeling ppt 4
 

Recently uploaded

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 

Recently uploaded (20)

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 

Datastage Introduction To Data Warehousing

  • 1.
  • 3. Two Data Warehousing Strategies • Enterprise-wide warehouse, top down, the Inmon methodology • Data mart, bottom up, the Kimball methodology • When properly executed, both result in an enterprise-wide data warehouse
  • 4. The Data Mart Strategy • The most common approach • Begins with a single mart and architected marts are added over time for more subject areas • Relatively inexpensive and easy to implement • Can be used as a proof of concept for data warehousing • Can perpetuate the “silos of information” problem • Can postpone difficult decisions and activities • Requires an overall integration plan
  • 5. The Enterprise-wide Strategy • A comprehensive warehouse is built initially • An initial dependent data mart is built using a subset of the data in the warehouse • Additional data marts are built using subsets of the data in the warehouse • Like all complex projects, it is expensive, time consuming, and prone to failure • When successful, it results in an integrated, scalable warehouse
  • 6. Data Sources and Types • Primarily from legacy, operational systems • Almost exclusively numerical data at the present time • External data may be included, often purchased from third- party sources • Technology exists for storing unstructured data and expect this to become more important over time
  • 7. Extraction, Transformation, and Loading (ETL) Processes • The “plumbing” work of data warehousing • Data are moved from source to target data bases • A very costly, time consuming part of data warehousing
  • 8. Recent Development: More Frequent Updates • Updates can be done in bulk and trickle modes • Business requirements, such as trading partner access to a Web site, requires current data • For international firms, there is no good time to load the warehouse
  • 9. Recent Development: Clickstream Data • Results from clicks at web sites • A dialog manager handles user interactions. An ODS (operational data store in the data staging area) helps to custom tailor the dialog • The clickstream data is filtered and parsed and sent to a data warehouse where it is analyzed • Software is available to analyze the clickstream data
  • 10. Data Extraction • Often performed by COBOL routines (not recommended because of high program maintenance and no automatically generated meta data) • Sometimes source data is copied to the target database using the replication capabilities of standard RDMS (not recommended because of “dirty data” in the source systems) • Increasing performed by specialized ETL software
  • 11. Sample ETL Tools • Teradata Warehouse Builder from Teradata • DataStage from Ascential Software • SAS System from SAS Institute • Power Mart/Power Center from Informatica • Sagent Solution from Sagent Software • Hummingbird Genio Suite from Hummingbird Communications
  • 12. Reasons for “Dirty” Data • Dummy Values • Absence of Data • Multipurpose Fields • Cryptic Data • Contradicting Data • Inappropriate Use of Address Lines • Violation of Business Rules • Reused Primary Keys, • Non-Unique Identifiers • Data Integration Problems
  • 13. Data Cleansing • Source systems contain “dirty data” that must be cleansed • ETL software contains rudimentary data cleansing capabilities • Specialized data cleansing software is often used. Important for performing name and address correction and householding functions • Leading data cleansing vendors include Vality (Integrity), Harte-Hanks (Trillium), and Firstlogic (i.d.Centric)
  • 14. Steps in Data Cleansing • Parsing • Correcting • Standardizing • Matching • Consolidating
  • 15. Parsing • Parsing locates and identifies individual data elements in the source files and then isolates these data elements in the target files. • Examples include parsing the first, middle, and last name; street number and street name; and city and state.
  • 16. Correcting • Corrects parsed individual data components using sophisticated data algorithms and secondary data sources. • Example include replacing a vanity address and adding a zip code.
  • 17. Standardizing • Standardizing applies conversion routines to transform data into its preferred (and consistent) format using both standard and custom business rules. • Examples include adding a pre name, replacing a nickname, and using a preferred street name.
  • 18. Matching • Searching and matching records within and across the parsed, corrected and standardized data based on predefined business rules to eliminate duplications. • Examples include identifying similar names and addresses.
  • 19. Consolidating • Analyzing and identifying relationships between matched records and consolidating/merging them into ONE representation.
  • 20. Data Staging • Often used as an interim step between data extraction and later steps • Accumulates data from asynchronous sources using native interfaces, flat files, FTP sessions, or other processes • At a predefined cutoff time, data in the staging file is transformed and loaded to the warehouse • There is usually no end user access to the staging file • An operational data store may be used for data staging
  • 21. Data Transformation • Transforms the data in accordance with the business rules and standards that have been established • Example include: format changes, deduplication, splitting up fields, replacement of codes, derived values, and aggregates
  • 22. Data Loading • Data are physically moved to the data warehouse • The loading takes place within a “load window” • The trend is to near real time updates of the data warehouse as the warehouse is increasingly used for operational applications
  • 23. Meta Data • Data about data • Needed by both information technology personnel and users • IT personnel need to know data sources and targets; database, table and column names; refresh schedules; data usage measures; etc. • Users need to know entity/attribute definitions; reports/query tools available; report distribution information; help desk contact information, etc.
  • 24. Recent Development:Meta Data Integration • A growing realization that meta data is critical to data warehousing success • Progress is being made on getting vendors to agree on standards and to incorporate the sharing of meta data among their tools • Vendors like Microsoft, Computer Associates, and Oracle have entered the meta data marketplace with significant product offerings
  • 25. Thank You !!!Thank You !!! For More Information click below link: Follow Us on: http://vibranttechnologies.co.in/datastage-classes-in-mumbai.html

Editor's Notes

  1. There is still debate over which approach is best.
  2. The key is to have an overall plan, processes, and technologies for integrating the different marts. The marts may be logically rather than physically separate.
  3. Even with the enterprise-wide strategy, the warehouse is developed in phases and each phase should be designed to deliver business value.
  4. It is not unusual to extract data from over 100 source systems. While the technology is available to store structured and unstructured data together, the reality is that warehouse data is almost exclusively structured -- numerical with simple textual identifiers.
  5. ETL tends to be “pick and shovel” work. Most organization’s data is even worse than imagined.
  6. As data warehousing becomes more critical to decision making and operational processes, the pressure is to have more current data, which leads to trickle updates.
  7. The ODS is used to support the web site dialog -- an operational process -- while the data in the warehouse is analyzed -- to better understand customers and their use of the web site.
  8. It’s changing, but COBOL extracts are still the most common ETL process. There are multiple reasons for this -- the cost of specialized ETL software, in-house programmers who have a good knowledge of the COBOL based source systems that will be used, and the peculiarities of the source systems that make the use of ETL software difficult.
  9. You might go to the vendors’ web sites to find a good demo to show your students.
  10. Here’s a couple of examples: Dummy data -- a clerk enters 999-99-9999 as a SSN rather than asking the customer for theirs. Reused primary keys -- a branch bank is closed. Several years later, a new branch is opened, and the old identifier is used again.
  11. Data cleansing is critical to customer relationship management initiatives.
  12. A good example to use is cleansing customer data. Most students can identify with receiving multiple copies of the same catalog because the company is not doing a good data cleansing job.
  13. The record is broken down into atomic data elements.
  14. External data, such as census data, is often used in this process.
  15. Companies decide on the standards that they want to use.
  16. Commercial data cleansing software often uses AI techniques to match records.
  17. All of the data are now combined in a standard format.
  18. Data staging is used in cleansing, transforming, and integrating the data.
  19. Aggregates, such as sales totals, are often precalculated and stored in the warehouse to speed queries that require summary totals.
  20. Most loads involve only change data rather than a bulk reloading of all of the data in the warehouse.
  21. The importance of meta data is now realized, even though creating it is not glamorous work.
  22. Historically, each vendor had their own meta data solution -- which was incompatible with other vendors’ solutions. This is changing.