SlideShare a Scribd company logo
1 of 15
Datawarehouse :
Bill Inmon in 1990, which he defined in the following way :
"A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of
data in support of management's decision making process". He defined the terms in the
sentence as follows:
Bill Inmon's paradigm: Data warehouse is one part of the overall business intelligence
system. An enterprise has one data warehouse, and data marts source their information
from the data warehouse. In the data warehouse, information is stored in 3rd normal
form.
Subject Oriented:
Data that gives information about a particular subject instead of about a company's
ongoing operations.
Integrated:
Data that is gathered into the data warehouse from a variety of sources and merged into a
coherent whole.
Time-variant:
All data in the data warehouse is identified with a particular time period.
Non-volatile :
Data is stable in a data warehouse. More data is added but data is never removed.
However, a single-subject data warehouse is typically referred to as a data mart,
while data warehouses are generally enterprise in scope.
Also, data warehouses can be volatile. Due to the large amount of storage required for a
data warehouse, (multi-terabyte data warehouses are not uncommon), only a certain
number of periods of history are kept in the warehouse. For instance, if three years of
data are decided on and loaded into the warehouse, every month the oldest month will be
"rolled off" the database, and the newest month added.
===============================================================
Ralph Kimball provided a much simpler definition of a data warehouse.
a data warehouse is "a copy of transaction data specifically structured for query and
analysis".
Ralph Kimball's paradigm: Data warehouse is the conglomerate of all data marts within
the enterprise. Information is always stored in the dimensional model.
===============================================================
Steps :
• Requirement Gathering
• Physical Environment Setup
• Data Modeling
• ETL
• OLAP Cube Design
• Front End Development
• Performance Tuning
• Quality Assurance
• Rolling out to Production
• Production Maintenance
• Incremental Enhancements
Components of Dimensional Data Model :
Dimension: A category of information. For example, the time dimension.
Attribute: A unique level within a dimension. For example, Month is an attribute in the Time
Dimension.
Hierarchy: The specification of levels that represents relationship between different attributes
within a dimension. For example, one possible hierarchy in the Time dimension is Year Quarter→
Month Day.→ →
Fact Table : A fact table is a table that contains the measures of interest. For example, sales
amount would be such a measure.
A dimensional model includes fact tables and lookup tables. Fact tables connect to
one or more lookup tables, but fact tables do not have direct relationships to one another.
In designing data models for data warehouses / data marts, the most commonly used
schema types are Star Schema and Snowflake Schema.
Star Schema: In the star schema design, a single object (the fact
table) sits in the middle and is radially connected to other
surrounding objects (dimension lookup tables) like a star. A star
schema can be simple or complex. A simple star consists of one fact
table; a complex star can have more than one fact table. Fact tables in
star schema are mostly in third normal form (3NF), but dimensional
tables are in de-normalized second normal form (2NF).
Snowflake Schema: The snowflake schema (sometimes called snowflake join
schema) is a more complex schema than the star schema because the tables
which describe the dimensions are normalized.
The main advantage of the snowflake schema is the improvement in query performance
due to minimized disk storage requirements and joining smaller lookup tables. The main
disadvantage of the snowflake schema is the additional maintenance efforts needed due to
the increase number of lookup tables
Dimensions :
what are the types of dimension tables
There are three types of Dimensions
Confirmed Dimensions, Junk Dimensions, Degenerative Dimensions
Conformed Dimension: A dimension that has exactly the same meaning and
content when being referred from different fact tables. Comfirmed is some thing
which can be shared by shared by multiple Fact Tables or multiple Data Marts. Some
of the examples are time dimension, customer dimensions, product dimension.
Junk Dimensions :
Occasionally, there are miscellaneous attributes, such as yes/no attributes or
comment attributes, that don’t fit into tight star schemas. Rather than discarding flag
fields and yes/no attributes, place them in a junk dimension. In addition, you can
handle comment and open-ended text attributes by creating a text-based junk
dimension.
A junk dimension is a convenient grouping of flags and indicators. It's helpful, but
not absolutely required, if there's a positive correlation among the values.
what is degenerated dimension?
I have a fact table that stores insurance contracts and one important dimension is
the year signed. So the fact table does have many columns, like CUSTOMER_ID,
CONTRACT_ID, etc and one column YEAR_SIGNED as varchar(4). The
CUSTOMER_ID is the foreign key column to the DIM_CUSTOMER with all the
customer date, name address, .... CONTRACT_ID relates to the DIM_CONTRACT with
all the contract specific information. Any YEAR_SIGNED? Should I really have a
DIM_YEAR_SIGNED and it will have one column only. What other attributes should a
year have?
Therefore, we do not create an explicit dimension table and call that YEAR_SIGNED
column a degenerated dimension.
Degenerate dimension is a dimension key generated in the fact table that doesn't connected to
any dimension table i.e,it corresponds to a dimension table that has no attributes.
Types of Facts
There are three types of facts:
• Additive: Additive facts are facts that can be summed up through all of the dimensions
in the fact table.
• Semi-Additive: Semi-additive facts are facts that can be summed up for some of the
dimensions in the fact table, but not the others.
• Non-Additive: Non-additive facts are facts that cannot be summed up for any of the
dimensions present in the fact table.
Let us use examples to illustrate each of the three types of facts. The first example assumes that
we are a retailer, and we have a fact table with the following columns:
Date
Store
Product
Sales_Amount
The purpose of this table is to record the sales amount for each product in each store on a daily
basis. Sales_Amount is the fact. In this case, Sales_Amount is an additive fact, because you
can sum up this fact along any of the three dimensions present in the fact table -- date, store, and
product. For example, the sum of Sales_Amount for all 7 days in a week represent the total
sales amount for that week.
Say we are a bank with the following fact table:
Date
Account
Current_Balance
Profit_Margin
The purpose of this table is to record the current balance for each account at the end of each day,
as well as the profit margin for each account for each day. Current_Balance and
Profit_Margin are the facts. Current_Balance is a semi-additive fact, as it makes sense to
add them up for all accounts (what's the total current balance for all accounts in the bank?), but it
does not make sense to add them up through time (adding up all current balances for a given
account for each day of the month does not give us any useful information). Profit_Margin is a
non-additive fact, for it does not make sense to add them up for the account level or the day level.
Types of Fact Tables
Based on the above classifications, there are two types of fact tables:
• Cumulative: This type of fact table describes what has happened over a period of time.
For example, this fact table may describe the total sales by product by store by day. The
facts for this type of fact tables are mostly additive facts. The first example presented
here is a cumulative fact table.
• Snapshot: This type of fact table describes the state of things in a particular instance of
time, and usually includes more semi-additive and non-additive facts. The second
example presented here is a snapshot fact table.
• ===========================================================
=======
factless facts and in which scenario will you use such kinds of fact tables
Factless Fact : very useful fact tables don't have any facts at all
FIGURE 1
-- A factless fact table for recording student attendance on a daily basis at a college.
The five dimension tables contain rich descriptions of dates, students, courses,
teachers, and facilities. There are no additive, numeric facts.
Which classes were the most heavily attended? Which classes were the
most consistently attended? Which teachers taught the most students?
Tools :
 Scalability: How can the system grow as your data storage needs grow?
 Parallel Processing Support:
Popular Relational Databases
• Oracle ,Microsoft SQL Server ,IBM DB2,Teradata ,Sybase ,MySQL
Popular OS Platforms
• Linux
• FreeBSD
• Microsoft
ETL Tools :
• IBM WebSphere Information Integration (Ascential DataStage)
• Ab Initio
• Informatica
OLAP Tool Functionalities
1. MOLAP: In this type of OLAP, a cube is aggregated from the relational data
source (data warehouse). When user generates a report request, the MOLAP tool can
generate the create quickly because all data is already pre-aggregated within the
cube.
2. ROLAP: In this type of OLAP, instead of pre-aggregating everything into a cube,
the ROLAP engine essentially acts as a smart SQL generator. The ROLAP tool
typically comes with a 'Designer' piece, where the data warehouse administrator can
specify the relationship between the relational tables, as well as how dimensions,
attributes, and hierarchies map to the underlying database tables.
Popular Tools
• Business Objects
• Cognos
• Hyperion
• Microsoft Analysis Services
• MicroStrategy
Reporting Tool
• Business Objects (Crystal Reports)
• Cognos
• Actuate
==================================================================
Questions ?
What is Molap and Rolap? What is Diff between Them?
multidimensional online analytical processing and
relational online analytical processing. In MOLAP data is
stored in form of multidimensional cubes. The advantages of
this mode is that it provides excellent query performance
and the cubes are built for fast data retrieval. All
calculations are pre-generated when the cube is created and
can be easily applied while querying data.
In ROLAP, the data is stored in relational databases this model gives
the appearance of traditional OLAP’s slicing and dicing functionality.
The advantages of this model is it can handle a large amount of data
and can leverage all the functionalities of the relational database.
MOLAP has aggregated value stored in cube.Since the data is
aggregated, query performance is fast.
ROLAP has data sored in relational databases.Here query has
to access the database for retrieving the data every time.So
performance is slow when compared to molap. Size is larger
than molap.
===============================================================
What is BCP?
Bulk Copy Pogram
Two plugins are automatically installed with Data stage.
1. BCPLoad plugin-used to bulk load data in single table in
MS SQL server.
2. OraBulk Plugin
What is Data Mining?
Data mining is the process of finding correlations or patterns among dozens of fields
in large relational databases.
Generally, data mining (sometimes called data or knowledge discovery) is the
process of analyzing data from different perspectives and summarizing it into useful
information - information that can be used to increase revenue, cuts costs, or both.
These analysts look for patterns hidden in data.
how can one connect two fact tables ? is it possible ? how?
Fact Tables are connected by confirmed dimensions, Fact
tables cannot be connected directly, so means of dimension
we can connect.Example : We_site_id.
When should you use a STAR and when a SNOW-FLAKE schema?
STAR SCHEMA:-
1. If PERFORMANCE is the priority than go for
star schema,since here dimension tables are DE-NORMALIZED.
2. Usually star schema is the best option for end users due to
its simple design and navigation.
The snowflake schema (sometimes called snowflake join
schema) is a more complex schema than the star schema
because the tables which describe the dimensions are
normalized.
Snowflake schema is nothing but one dimension table will be
connected to another dimension table and so on.
1. If a dimension is very sparse (i.e. most of the
possible values for the dimension have no data) and/or a
2. dimension has a very long list of attributes which may be
used in a query, the dimension table may occupy a
significant proportion of the database and snow flaking may
be appropriate.
SNOW-FLAKE SCHEMA:-if MEMORY SPACE is the priority than go
for snoflake schema,since here dimension tables are
NORMALIZED
What is the difference between OLAP, ROLAP, MOLAP and HOLAP?
MOLAP
------
MOLAP(Multidimensional OLAP), provides the analysis of data
stored in a multi-dimensional data cube.
ROLAP
------
ROLAP stands for Relational Online Analytical Process that
provides multidimensional analysis of data, stored in a Relational
database(RDBMS).
HOLAP
------
HOLAP(Hybrid OLAP) a combination of both ROLAP and MOLAP can
provide multidimensional analysis simultaneously of data stored in a
multidimensional database and in a relational database(RDBMS).
DOLAP
-----
DOLAP(Desktop OLAP or Database OLAP)provide multidimensional analysis
locally in the client machine on the data collected from relational or
multidimensional database servers.
what is the difference between aggregate table and fact table ? how do you
load these two tables
Fact tables contains million of records and retriving the records from fact table takes
time.where as aggregate table contains limited data from all the required tables,and
we retrive the data it takes less time.
Which kind of index is preferred in DWH?
Bitmap index is the best one.
why because B-tree is suited for unique values(eg: empid) and
Bitmap is best for repeated values(eg: gender m/f)
What are CUBES?
The cubes divide the data into subsets that are defined by dimensions.
Cube Dimensions Measures
mscsCampaign Advertiser
DateHour
Events
Page Group
Site
UserType
Count EventsDistinct Users
OrdImpLeaf
mscsCampaignEvents Advertiser
DateHour
Events
Page Group
Site
UserType
Count EventsDistinct Users
===============================================================
What are materialized views ? how they can be used in datawarehouse to increase the
performance?
MVs are segments similar to tables, in which the output of queries is stored in the
database.
The following is a common query at Acme Bank:
SELECT acc_type, SUM(cleared_bal) totbal
FROM accounts
GROUP BY acc_type;
And the following is an MV, mv_bal, for this query:
CREATE OR REPLACE MATERIALIZED VIEW mv_bal
REFRESH ON DEMAND AS
SELECT acc_type, SUM(cleared_bal) totbal
FROM accounts
GROUP BY acc_type;
Now suppose a user wants to get the total of all account balances for the account type 'C'
and issues the following query:
SELECT SUM(cleared_bal)
FROM accounts
WHERE acc_type = 'C';
Because the mv_bal MV already contains the totals by account type, the user could have
gotten this information directly from the MV, by issuing the following:
SELECT totbal
FROM mv_bal
WHERE acc_type = 'C';
This query against the mv_bal MV would have returned results much more quickly than
the query against the accounts table. Running a query against the MV will be faster
than running the original query, because querying the MV does not query the source
tables.
To keep the data in sync, the MV is refreshed from time to time, either manually or
automatically. There are two ways to refresh data in MVs. In one of them, the MV is
completely wiped clean and then repopulated with data from the source
tables—a process known as complete refresh. In some cases, however, when the
source tables may have changed very little, it is possible to refresh the MV only for
changed records on the source tables—a process known as fast refresh. To
use fast refresh, however, you must have created the MV as fast-refreshable.
Because it updates only changed records, fast refresh is faster than complete
refresh. (See the Oracle Database Data Warehousing Guide for more information on
refreshing MVs.)
A materialized view can be either read-only, updatable, or writeable. Users cannot
perform data manipulation language (DML) statements on read-only materialized views,
but they can perform DML on updatable and writeable materialized views.
===============================================================
What is SQL*Loader and what is it used for?
SQL*Loader is a bulk loader utility used for moving data from external files into the
Oracle database.
Is there a SQL*Unloader to download data to a flat file?
Oracle does not supply any data unload utilities. Here are some workarounds:
Using SQL*Plus ,You can use SQL*Plus to select and format your data and then spool
it to a file.
Skipping unwanted data ?
One can skip unwanted header records or continue an interrupted load (for example if you run
out of space) by specifying the "SKIP=n" keyword. "n" specifies the number of logical rows to
skip.
sqlldr userid=ora_id/ora_passwd control=control_file_name.ctl skip=4
What is data purging ?
Explain about Control M JObs detaily?How to execute this.
What is the difference between a W/H and an OLTP application?
Difference between DSS & OLTP?
What is operational data source (ODS)?
What is Snow Flake Schema design in database?
What is ETL process in Data warehousing?
Advantages of de normalized data?
What is the difference between choosing a multidimensional database and a relational
database?
Mulitidimentional database: OLAP(OnLineAnnaliticalProcessing)
Relational database: OLTP(OnLineTransactionProcessing)
what is the difference between E-R modelling and Dimendional modelling? and what
are semi additive facts?
ER modeling:
- focused how data will be efficient for processing (insert, update, delete)
- Minimalize (limit to zero) data redundancies
Dimensional:
- focused how data will be efficient for retrieving
(example, by report and analysis tools).
- many data redundancies
- Consist of Fact and Dimension table
What is the difference between aggregate table and materliazed view?
Aggregate tables are pre-computed totals in the form of hierarchical mutidimensional
structure
materliazed view ,is an database object which caches the query result in a concrete
table and updates it from the original database table from time to time
Aggregate tables are used to speed up the query computing whereas materialized
view speed up the data retrieval .
How many clustered indexes can u create for a table in DWH?
You can have only one clustered index per table.
==========================================================
Views
A view takes the output of a query and makes it appear like a virtual table.
All operations performed on a view will affect data in the base table and so are
subject to the integrity constraints and triggers of the base table.
A View can also be used to improve security by restricting access to a predetermined
set of rows or columns.
one View can be based on another, a view can also JOIN a view with a table (GROUP
BY or UNION).
Read-Only vs Updatable Views The data dictionary views
ALL_UPDATABLE_COLUMNS, DBA_UPDATABLE_COLUMNS, and
USER_UPDATABLE_COLUMNS indicate which view columns are updatable.
An updatable view lets you insert, update, and delete rows in the view and propagate
the changes to the target master table.
In order to be updatable, a view cannot contain any of the following constructs:
SET or DISTINCT operators, an aggregate or analytic function, a GROUP BY, ORDER
BY, CONNECT BY, or START WITH clause, a subquery (or collection expression) in a
SELECT list or finally (with some exceptions) a JOIN .
Views that are not updatable can be modified using an INSTEAD OF trigger.
Materialized Views
Materialized views are schema objects that can be used to summarize, precompute ,
replicate, and distribute data
The existence of a materialized view is transparent to SQL, but when used for query
rewrites will improve the performance of SQL execution
MV are use more for performance improvement.
MV helps query rewrite..In shout if u have a MV defined as "select * from sales group by
region_id" and u have a query selct * from sales group by region_id fired on the oracle db. Oracle
will automatically re-write a query and refer it to MV instade of Sales table. Now in DW
environment this is a big performance improvement. There are some paramters which needs to
be set for this to happen.
MV can undergo fast referesh. In short if i have 10 Mill rows in the Fact table and i add 500 rows.
Then b making use of MVLOGS oracle will do a fast refresh on the MView. with extra 500 rows
only.
A materialized view provides indirect access to table data by storing the results of a
query in a separate schema object. Unlike an ordinary view, which does not take up
any storage space or contain any data.
An updatable materialized view lets you insert, update, and delete.
You can define a materialized view on a base table, partitioned table or view and you
can define indexes on a materialized view.
A materialized view can be stored in the same database as its base table(s) or in a
different database.
A materialized view log is a schema object that records changes to a master
table's data so that a materialized view defined on the master table can be refreshed
incrementally.
===================================================
Synonyms
A synonym is an alias for any table, view, materialized view, sequence,
procedure, function, or package.
A public synonym is owned by the user group PUBLIC and every user in a
database can access it.
A private synonym is in the schema of a specific user who has control over its
availability to others.
Synonyms are used to:
- Mask the real name and owner of a schema object
- Provide global (public) access to a schema object
- Provide location transparency for tables, views, or program units of a remote
database.
- Simplify SQL statements for database users
e.g. to query the table PATIENT_REFERRALS with SQL:
SELECT * FROM MySchema.PATIENT_REFERRALS;
CREATE PUBLIC SYNONYM referrals FOR
MySchema.PATIENT_REFERRALS;
After the public synonym is created, you can query with a simple SQL statement:
SELECT * FROM referrals;

More Related Content

What's hot

Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional ModellingAshish Chandwani
 
DW DIMENSN MODELNG
DW DIMENSN MODELNGDW DIMENSN MODELNG
DW DIMENSN MODELNGDivya Tadi
 
Intro to Data warehousing lecture 13
Intro to Data warehousing   lecture 13Intro to Data warehousing   lecture 13
Intro to Data warehousing lecture 13AnwarrChaudary
 
Difference between fact tables and dimension tables
Difference between fact tables and dimension tablesDifference between fact tables and dimension tables
Difference between fact tables and dimension tablesKamran Haider
 
Fact table design for data ware house
Fact table design for data ware houseFact table design for data ware house
Fact table design for data ware houseSayed Ahmed
 
Data warehousing
Data warehousingData warehousing
Data warehousingAllen Woods
 
Dimensional data modeling
Dimensional data modelingDimensional data modeling
Dimensional data modelingAdam Hutson
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data modeljagdish_93
 
Data Warehouse Design & Dimensional Modeling
Data Warehouse Design & Dimensional ModelingData Warehouse Design & Dimensional Modeling
Data Warehouse Design & Dimensional ModelingCode Mastery
 
Business Intelligence: Multidimensional Analysis
Business Intelligence: Multidimensional AnalysisBusiness Intelligence: Multidimensional Analysis
Business Intelligence: Multidimensional AnalysisMichael Lamont
 
Difference between ER-Modeling and Dimensional Modeling
Difference between ER-Modeling and Dimensional ModelingDifference between ER-Modeling and Dimensional Modeling
Difference between ER-Modeling and Dimensional ModelingAbdul Aslam
 
Data modeling dimensions for dta warehousing
Data modeling dimensions for dta warehousingData modeling dimensions for dta warehousing
Data modeling dimensions for dta warehousingDr. Dipti Patil
 
Star ,Snow and Fact-Constullation Schemas??
Star ,Snow and  Fact-Constullation Schemas??Star ,Snow and  Fact-Constullation Schemas??
Star ,Snow and Fact-Constullation Schemas??Abdul Aslam
 
Project report aditi paul1
Project report aditi paul1Project report aditi paul1
Project report aditi paul1guest9529cb
 
Storytelling with data and data visualization
Storytelling with data and data visualizationStorytelling with data and data visualization
Storytelling with data and data visualizationFrehiwot Mulugeta
 

What's hot (20)

Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional Modelling
 
DW DIMENSN MODELNG
DW DIMENSN MODELNGDW DIMENSN MODELNG
DW DIMENSN MODELNG
 
Intro to Data warehousing lecture 13
Intro to Data warehousing   lecture 13Intro to Data warehousing   lecture 13
Intro to Data warehousing lecture 13
 
Difference between fact tables and dimension tables
Difference between fact tables and dimension tablesDifference between fact tables and dimension tables
Difference between fact tables and dimension tables
 
Dw concepts
Dw conceptsDw concepts
Dw concepts
 
Fact table design for data ware house
Fact table design for data ware houseFact table design for data ware house
Fact table design for data ware house
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Dimensional data modeling
Dimensional data modelingDimensional data modeling
Dimensional data modeling
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data model
 
Data Warehouse Design & Dimensional Modeling
Data Warehouse Design & Dimensional ModelingData Warehouse Design & Dimensional Modeling
Data Warehouse Design & Dimensional Modeling
 
Business Intelligence: Multidimensional Analysis
Business Intelligence: Multidimensional AnalysisBusiness Intelligence: Multidimensional Analysis
Business Intelligence: Multidimensional Analysis
 
Business Intelligence: A Review
Business Intelligence: A ReviewBusiness Intelligence: A Review
Business Intelligence: A Review
 
Difference between ER-Modeling and Dimensional Modeling
Difference between ER-Modeling and Dimensional ModelingDifference between ER-Modeling and Dimensional Modeling
Difference between ER-Modeling and Dimensional Modeling
 
Data modeling dimensions for dta warehousing
Data modeling dimensions for dta warehousingData modeling dimensions for dta warehousing
Data modeling dimensions for dta warehousing
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Star ,Snow and Fact-Constullation Schemas??
Star ,Snow and  Fact-Constullation Schemas??Star ,Snow and  Fact-Constullation Schemas??
Star ,Snow and Fact-Constullation Schemas??
 
Project report aditi paul1
Project report aditi paul1Project report aditi paul1
Project report aditi paul1
 
Data Warehouse Designing: Dimensional Modelling and E-R Modelling
Data Warehouse Designing: Dimensional Modelling and E-R ModellingData Warehouse Designing: Dimensional Modelling and E-R Modelling
Data Warehouse Designing: Dimensional Modelling and E-R Modelling
 
Storytelling with data and data visualization
Storytelling with data and data visualizationStorytelling with data and data visualization
Storytelling with data and data visualization
 
Data processing
Data processingData processing
Data processing
 

Viewers also liked

terms+%26+conditions
terms+%26+conditionsterms+%26+conditions
terms+%26+conditionstheextraaedge
 
69-TEE+Logo+Possibilities_V1.0.ppt
69-TEE+Logo+Possibilities_V1.0.ppt69-TEE+Logo+Possibilities_V1.0.ppt
69-TEE+Logo+Possibilities_V1.0.ppttheextraaedge
 
check+slideshare+functionality
check+slideshare+functionalitycheck+slideshare+functionality
check+slideshare+functionalitytheextraaedge
 
103-Terms+%26+Conditions.docx
103-Terms+%26+Conditions.docx103-Terms+%26+Conditions.docx
103-Terms+%26+Conditions.docxtheextraaedge
 
ExtraAEdge - CollEDGE Product Demo
ExtraAEdge - CollEDGE Product DemoExtraAEdge - CollEDGE Product Demo
ExtraAEdge - CollEDGE Product Demotheextraaedge
 
112-TEE+Email+Notifications_v1.0.docx
112-TEE+Email+Notifications_v1.0.docx112-TEE+Email+Notifications_v1.0.docx
112-TEE+Email+Notifications_v1.0.docxtheextraaedge
 
139-UI+and+Color+feedback+%26+suggestions.docx
139-UI+and+Color+feedback+%26+suggestions.docx139-UI+and+Color+feedback+%26+suggestions.docx
139-UI+and+Color+feedback+%26+suggestions.docxtheextraaedge
 
Engineering as a Career Choice
Engineering as a Career ChoiceEngineering as a Career Choice
Engineering as a Career Choicetheextraaedge
 
Basic+Accountancy+Principles+for+Non-commerce+students
Basic+Accountancy+Principles+for+Non-commerce+studentsBasic+Accountancy+Principles+for+Non-commerce+students
Basic+Accountancy+Principles+for+Non-commerce+studentstheextraaedge
 

Viewers also liked (19)

68-Contact+us.docx
68-Contact+us.docx68-Contact+us.docx
68-Contact+us.docx
 
terms+%26+conditions
terms+%26+conditionsterms+%26+conditions
terms+%26+conditions
 
69-TEE+Logo+Possibilities_V1.0.ppt
69-TEE+Logo+Possibilities_V1.0.ppt69-TEE+Logo+Possibilities_V1.0.ppt
69-TEE+Logo+Possibilities_V1.0.ppt
 
check+slideshare+functionality
check+slideshare+functionalitycheck+slideshare+functionality
check+slideshare+functionality
 
test+ppt+2
test+ppt+2test+ppt+2
test+ppt+2
 
pdf+testing
pdf+testingpdf+testing
pdf+testing
 
103-Terms+%26+Conditions.docx
103-Terms+%26+Conditions.docx103-Terms+%26+Conditions.docx
103-Terms+%26+Conditions.docx
 
ExtraAEdge - CollEDGE Product Demo
ExtraAEdge - CollEDGE Product DemoExtraAEdge - CollEDGE Product Demo
ExtraAEdge - CollEDGE Product Demo
 
album
albumalbum
album
 
File+upload+test
File+upload+testFile+upload+test
File+upload+test
 
112-TEE+Email+Notifications_v1.0.docx
112-TEE+Email+Notifications_v1.0.docx112-TEE+Email+Notifications_v1.0.docx
112-TEE+Email+Notifications_v1.0.docx
 
test+ppt+2
test+ppt+2test+ppt+2
test+ppt+2
 
139-UI+and+Color+feedback+%26+suggestions.docx
139-UI+and+Color+feedback+%26+suggestions.docx139-UI+and+Color+feedback+%26+suggestions.docx
139-UI+and+Color+feedback+%26+suggestions.docx
 
BP
BPBP
BP
 
 
Training sample
Training sampleTraining sample
Training sample
 
Engineering as a Career Choice
Engineering as a Career ChoiceEngineering as a Career Choice
Engineering as a Career Choice
 
new+comments
new+commentsnew+comments
new+comments
 
Basic+Accountancy+Principles+for+Non-commerce+students
Basic+Accountancy+Principles+for+Non-commerce+studentsBasic+Accountancy+Principles+for+Non-commerce+students
Basic+Accountancy+Principles+for+Non-commerce+students
 

Similar to Basics+of+Datawarehousing

Data Warehousing for students educationpptx
Data Warehousing for students educationpptxData Warehousing for students educationpptx
Data Warehousing for students educationpptxjainyshah20
 
Data warehouse
Data warehouseData warehouse
Data warehouse_123_
 
Modelado Dimensional 4 Etapas
Modelado Dimensional 4 EtapasModelado Dimensional 4 Etapas
Modelado Dimensional 4 EtapasRoberto Espinosa
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousingShahed Khalili
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olapSalah Amean
 
(Lecture 3) Star Schema.pdf
(Lecture 3) Star Schema.pdf(Lecture 3) Star Schema.pdf
(Lecture 3) Star Schema.pdfMobeenMasoudi
 
Modelado Dimensional 4 etapas.ppt
Modelado Dimensional 4 etapas.pptModelado Dimensional 4 etapas.ppt
Modelado Dimensional 4 etapas.pptssuser39e08e
 
Service Analysis - Microsoft Dynamics CRM 2016 Customer Service
Service Analysis - Microsoft Dynamics CRM 2016 Customer ServiceService Analysis - Microsoft Dynamics CRM 2016 Customer Service
Service Analysis - Microsoft Dynamics CRM 2016 Customer ServiceNaveen Kumar
 
Designing the business process dimensional model
Designing the business process dimensional modelDesigning the business process dimensional model
Designing the business process dimensional modelGersiton Pila Challco
 
Intro to Data warehousing lecture 15
Intro to Data warehousing   lecture 15Intro to Data warehousing   lecture 15
Intro to Data warehousing lecture 15AnwarrChaudary
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsQuontra Solutions
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional ModelingSunita Sahu
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 

Similar to Basics+of+Datawarehousing (20)

Data Warehouse_Architecture.pptx
Data Warehouse_Architecture.pptxData Warehouse_Architecture.pptx
Data Warehouse_Architecture.pptx
 
Data Warehousing for students educationpptx
Data Warehousing for students educationpptxData Warehousing for students educationpptx
Data Warehousing for students educationpptx
 
Star schema
Star schemaStar schema
Star schema
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Modelado Dimensional 4 Etapas
Modelado Dimensional 4 EtapasModelado Dimensional 4 Etapas
Modelado Dimensional 4 Etapas
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousing
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
 
(Lecture 3) Star Schema.pdf
(Lecture 3) Star Schema.pdf(Lecture 3) Star Schema.pdf
(Lecture 3) Star Schema.pdf
 
Modelado Dimensional 4 etapas.ppt
Modelado Dimensional 4 etapas.pptModelado Dimensional 4 etapas.ppt
Modelado Dimensional 4 etapas.ppt
 
Service Analysis - Microsoft Dynamics CRM 2016 Customer Service
Service Analysis - Microsoft Dynamics CRM 2016 Customer ServiceService Analysis - Microsoft Dynamics CRM 2016 Customer Service
Service Analysis - Microsoft Dynamics CRM 2016 Customer Service
 
Designing the business process dimensional model
Designing the business process dimensional modelDesigning the business process dimensional model
Designing the business process dimensional model
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Intro to Data warehousing lecture 15
Intro to Data warehousing   lecture 15Intro to Data warehousing   lecture 15
Intro to Data warehousing lecture 15
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
 
Data warehouse logical design
Data warehouse logical designData warehouse logical design
Data warehouse logical design
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
My2dw
My2dwMy2dw
My2dw
 
Dwbi Project
Dwbi ProjectDwbi Project
Dwbi Project
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
 

More from theextraaedge

Coll edge demo_deck_v1.3
Coll edge demo_deck_v1.3Coll edge demo_deck_v1.3
Coll edge demo_deck_v1.3theextraaedge
 
Lean startup simplified for learners, entrepreneurs & practitioners
Lean startup simplified   for learners, entrepreneurs & practitionersLean startup simplified   for learners, entrepreneurs & practitioners
Lean startup simplified for learners, entrepreneurs & practitionerstheextraaedge
 
Lean startup simplified for learners, entrepreneurs & practitioners
Lean startup simplified   for learners, entrepreneurs & practitionersLean startup simplified   for learners, entrepreneurs & practitioners
Lean startup simplified for learners, entrepreneurs & practitionerstheextraaedge
 
Empowering our Engineering colleges
Empowering our Engineering collegesEmpowering our Engineering colleges
Empowering our Engineering collegestheextraaedge
 
Empowering our engineering colleges
Empowering our engineering collegesEmpowering our engineering colleges
Empowering our engineering collegestheextraaedge
 
ExtraAEdge - CollEDGE Product Demo
ExtraAEdge - CollEDGE Product DemoExtraAEdge - CollEDGE Product Demo
ExtraAEdge - CollEDGE Product Demotheextraaedge
 
Business+Applications+-+Practise+paper3
Business+Applications+-+Practise+paper3Business+Applications+-+Practise+paper3
Business+Applications+-+Practise+paper3theextraaedge
 
Peter+Thiels+-+Stanford+Lecture
Peter+Thiels+-+Stanford+LecturePeter+Thiels+-+Stanford+Lecture
Peter+Thiels+-+Stanford+Lecturetheextraaedge
 

More from theextraaedge (20)

test doc
test doctest doc
test doc
 
demo slide
demo slidedemo slide
demo slide
 
Coll edge demo_deck_v1.3
Coll edge demo_deck_v1.3Coll edge demo_deck_v1.3
Coll edge demo_deck_v1.3
 
Lean startup simplified for learners, entrepreneurs & practitioners
Lean startup simplified   for learners, entrepreneurs & practitionersLean startup simplified   for learners, entrepreneurs & practitioners
Lean startup simplified for learners, entrepreneurs & practitioners
 
Lean startup simplified for learners, entrepreneurs & practitioners
Lean startup simplified   for learners, entrepreneurs & practitionersLean startup simplified   for learners, entrepreneurs & practitioners
Lean startup simplified for learners, entrepreneurs & practitioners
 
Empowering our Engineering colleges
Empowering our Engineering collegesEmpowering our Engineering colleges
Empowering our Engineering colleges
 
Empowering our engineering colleges
Empowering our engineering collegesEmpowering our engineering colleges
Empowering our engineering colleges
 
ExtraAEdge - CollEDGE Product Demo
ExtraAEdge - CollEDGE Product DemoExtraAEdge - CollEDGE Product Demo
ExtraAEdge - CollEDGE Product Demo
 
 
Business+Applications+-+Practise+paper3
Business+Applications+-+Practise+paper3Business+Applications+-+Practise+paper3
Business+Applications+-+Practise+paper3
 
Peter+Thiels+-+Stanford+Lecture
Peter+Thiels+-+Stanford+LecturePeter+Thiels+-+Stanford+Lecture
Peter+Thiels+-+Stanford+Lecture
 
letter+review
letter+reviewletter+review
letter+review
 
About+us
About+usAbout+us
About+us
 
letter
letterletter
letter
 
just+dial+proces
just+dial+procesjust+dial+proces
just+dial+proces
 
letter
letterletter
letter
 
innovation
innovationinnovation
innovation
 
advt+list
advt+listadvt+list
advt+list
 
ENTREPRENEUR
ENTREPRENEURENTREPRENEUR
ENTREPRENEUR
 
retest
retestretest
retest
 

Basics+of+Datawarehousing

  • 1. Datawarehouse : Bill Inmon in 1990, which he defined in the following way : "A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process". He defined the terms in the sentence as follows: Bill Inmon's paradigm: Data warehouse is one part of the overall business intelligence system. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form. Subject Oriented: Data that gives information about a particular subject instead of about a company's ongoing operations. Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole. Time-variant: All data in the data warehouse is identified with a particular time period. Non-volatile : Data is stable in a data warehouse. More data is added but data is never removed. However, a single-subject data warehouse is typically referred to as a data mart, while data warehouses are generally enterprise in scope. Also, data warehouses can be volatile. Due to the large amount of storage required for a data warehouse, (multi-terabyte data warehouses are not uncommon), only a certain number of periods of history are kept in the warehouse. For instance, if three years of data are decided on and loaded into the warehouse, every month the oldest month will be "rolled off" the database, and the newest month added. =============================================================== Ralph Kimball provided a much simpler definition of a data warehouse. a data warehouse is "a copy of transaction data specifically structured for query and analysis". Ralph Kimball's paradigm: Data warehouse is the conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model. ===============================================================
  • 2. Steps : • Requirement Gathering • Physical Environment Setup • Data Modeling • ETL • OLAP Cube Design • Front End Development • Performance Tuning • Quality Assurance • Rolling out to Production • Production Maintenance • Incremental Enhancements Components of Dimensional Data Model : Dimension: A category of information. For example, the time dimension. Attribute: A unique level within a dimension. For example, Month is an attribute in the Time Dimension. Hierarchy: The specification of levels that represents relationship between different attributes within a dimension. For example, one possible hierarchy in the Time dimension is Year Quarter→ Month Day.→ → Fact Table : A fact table is a table that contains the measures of interest. For example, sales amount would be such a measure. A dimensional model includes fact tables and lookup tables. Fact tables connect to one or more lookup tables, but fact tables do not have direct relationships to one another. In designing data models for data warehouses / data marts, the most commonly used schema types are Star Schema and Snowflake Schema. Star Schema: In the star schema design, a single object (the fact table) sits in the middle and is radially connected to other surrounding objects (dimension lookup tables) like a star. A star schema can be simple or complex. A simple star consists of one fact table; a complex star can have more than one fact table. Fact tables in star schema are mostly in third normal form (3NF), but dimensional tables are in de-normalized second normal form (2NF). Snowflake Schema: The snowflake schema (sometimes called snowflake join schema) is a more complex schema than the star schema because the tables which describe the dimensions are normalized. The main advantage of the snowflake schema is the improvement in query performance due to minimized disk storage requirements and joining smaller lookup tables. The main disadvantage of the snowflake schema is the additional maintenance efforts needed due to the increase number of lookup tables
  • 3. Dimensions : what are the types of dimension tables There are three types of Dimensions Confirmed Dimensions, Junk Dimensions, Degenerative Dimensions Conformed Dimension: A dimension that has exactly the same meaning and content when being referred from different fact tables. Comfirmed is some thing which can be shared by shared by multiple Fact Tables or multiple Data Marts. Some of the examples are time dimension, customer dimensions, product dimension. Junk Dimensions : Occasionally, there are miscellaneous attributes, such as yes/no attributes or comment attributes, that don’t fit into tight star schemas. Rather than discarding flag fields and yes/no attributes, place them in a junk dimension. In addition, you can handle comment and open-ended text attributes by creating a text-based junk dimension. A junk dimension is a convenient grouping of flags and indicators. It's helpful, but not absolutely required, if there's a positive correlation among the values. what is degenerated dimension? I have a fact table that stores insurance contracts and one important dimension is the year signed. So the fact table does have many columns, like CUSTOMER_ID, CONTRACT_ID, etc and one column YEAR_SIGNED as varchar(4). The CUSTOMER_ID is the foreign key column to the DIM_CUSTOMER with all the customer date, name address, .... CONTRACT_ID relates to the DIM_CONTRACT with all the contract specific information. Any YEAR_SIGNED? Should I really have a DIM_YEAR_SIGNED and it will have one column only. What other attributes should a year have? Therefore, we do not create an explicit dimension table and call that YEAR_SIGNED column a degenerated dimension. Degenerate dimension is a dimension key generated in the fact table that doesn't connected to any dimension table i.e,it corresponds to a dimension table that has no attributes.
  • 4. Types of Facts There are three types of facts: • Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table. • Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others. • Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table. Let us use examples to illustrate each of the three types of facts. The first example assumes that we are a retailer, and we have a fact table with the following columns: Date Store Product Sales_Amount The purpose of this table is to record the sales amount for each product in each store on a daily basis. Sales_Amount is the fact. In this case, Sales_Amount is an additive fact, because you can sum up this fact along any of the three dimensions present in the fact table -- date, store, and product. For example, the sum of Sales_Amount for all 7 days in a week represent the total sales amount for that week. Say we are a bank with the following fact table: Date Account Current_Balance Profit_Margin The purpose of this table is to record the current balance for each account at the end of each day, as well as the profit margin for each account for each day. Current_Balance and Profit_Margin are the facts. Current_Balance is a semi-additive fact, as it makes sense to add them up for all accounts (what's the total current balance for all accounts in the bank?), but it does not make sense to add them up through time (adding up all current balances for a given account for each day of the month does not give us any useful information). Profit_Margin is a non-additive fact, for it does not make sense to add them up for the account level or the day level. Types of Fact Tables Based on the above classifications, there are two types of fact tables:
  • 5. • Cumulative: This type of fact table describes what has happened over a period of time. For example, this fact table may describe the total sales by product by store by day. The facts for this type of fact tables are mostly additive facts. The first example presented here is a cumulative fact table. • Snapshot: This type of fact table describes the state of things in a particular instance of time, and usually includes more semi-additive and non-additive facts. The second example presented here is a snapshot fact table. • =========================================================== ======= factless facts and in which scenario will you use such kinds of fact tables Factless Fact : very useful fact tables don't have any facts at all FIGURE 1 -- A factless fact table for recording student attendance on a daily basis at a college. The five dimension tables contain rich descriptions of dates, students, courses, teachers, and facilities. There are no additive, numeric facts. Which classes were the most heavily attended? Which classes were the most consistently attended? Which teachers taught the most students? Tools :  Scalability: How can the system grow as your data storage needs grow?  Parallel Processing Support:
  • 6. Popular Relational Databases • Oracle ,Microsoft SQL Server ,IBM DB2,Teradata ,Sybase ,MySQL Popular OS Platforms • Linux • FreeBSD • Microsoft ETL Tools : • IBM WebSphere Information Integration (Ascential DataStage) • Ab Initio • Informatica OLAP Tool Functionalities 1. MOLAP: In this type of OLAP, a cube is aggregated from the relational data source (data warehouse). When user generates a report request, the MOLAP tool can generate the create quickly because all data is already pre-aggregated within the cube. 2. ROLAP: In this type of OLAP, instead of pre-aggregating everything into a cube, the ROLAP engine essentially acts as a smart SQL generator. The ROLAP tool typically comes with a 'Designer' piece, where the data warehouse administrator can specify the relationship between the relational tables, as well as how dimensions, attributes, and hierarchies map to the underlying database tables. Popular Tools • Business Objects • Cognos • Hyperion • Microsoft Analysis Services • MicroStrategy Reporting Tool • Business Objects (Crystal Reports) • Cognos • Actuate ================================================================== Questions ? What is Molap and Rolap? What is Diff between Them?
  • 7. multidimensional online analytical processing and relational online analytical processing. In MOLAP data is stored in form of multidimensional cubes. The advantages of this mode is that it provides excellent query performance and the cubes are built for fast data retrieval. All calculations are pre-generated when the cube is created and can be easily applied while querying data. In ROLAP, the data is stored in relational databases this model gives the appearance of traditional OLAP’s slicing and dicing functionality. The advantages of this model is it can handle a large amount of data and can leverage all the functionalities of the relational database. MOLAP has aggregated value stored in cube.Since the data is aggregated, query performance is fast. ROLAP has data sored in relational databases.Here query has to access the database for retrieving the data every time.So performance is slow when compared to molap. Size is larger than molap. =============================================================== What is BCP? Bulk Copy Pogram Two plugins are automatically installed with Data stage. 1. BCPLoad plugin-used to bulk load data in single table in MS SQL server. 2. OraBulk Plugin What is Data Mining? Data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. These analysts look for patterns hidden in data. how can one connect two fact tables ? is it possible ? how? Fact Tables are connected by confirmed dimensions, Fact tables cannot be connected directly, so means of dimension we can connect.Example : We_site_id. When should you use a STAR and when a SNOW-FLAKE schema? STAR SCHEMA:- 1. If PERFORMANCE is the priority than go for star schema,since here dimension tables are DE-NORMALIZED.
  • 8. 2. Usually star schema is the best option for end users due to its simple design and navigation. The snowflake schema (sometimes called snowflake join schema) is a more complex schema than the star schema because the tables which describe the dimensions are normalized. Snowflake schema is nothing but one dimension table will be connected to another dimension table and so on. 1. If a dimension is very sparse (i.e. most of the possible values for the dimension have no data) and/or a 2. dimension has a very long list of attributes which may be used in a query, the dimension table may occupy a significant proportion of the database and snow flaking may be appropriate. SNOW-FLAKE SCHEMA:-if MEMORY SPACE is the priority than go for snoflake schema,since here dimension tables are NORMALIZED What is the difference between OLAP, ROLAP, MOLAP and HOLAP? MOLAP ------ MOLAP(Multidimensional OLAP), provides the analysis of data stored in a multi-dimensional data cube. ROLAP ------ ROLAP stands for Relational Online Analytical Process that provides multidimensional analysis of data, stored in a Relational database(RDBMS). HOLAP ------ HOLAP(Hybrid OLAP) a combination of both ROLAP and MOLAP can provide multidimensional analysis simultaneously of data stored in a multidimensional database and in a relational database(RDBMS). DOLAP ----- DOLAP(Desktop OLAP or Database OLAP)provide multidimensional analysis locally in the client machine on the data collected from relational or multidimensional database servers. what is the difference between aggregate table and fact table ? how do you load these two tables Fact tables contains million of records and retriving the records from fact table takes time.where as aggregate table contains limited data from all the required tables,and we retrive the data it takes less time.
  • 9. Which kind of index is preferred in DWH? Bitmap index is the best one. why because B-tree is suited for unique values(eg: empid) and Bitmap is best for repeated values(eg: gender m/f) What are CUBES? The cubes divide the data into subsets that are defined by dimensions. Cube Dimensions Measures mscsCampaign Advertiser DateHour Events Page Group Site UserType Count EventsDistinct Users OrdImpLeaf mscsCampaignEvents Advertiser DateHour Events Page Group Site UserType Count EventsDistinct Users =============================================================== What are materialized views ? how they can be used in datawarehouse to increase the performance? MVs are segments similar to tables, in which the output of queries is stored in the database. The following is a common query at Acme Bank: SELECT acc_type, SUM(cleared_bal) totbal FROM accounts GROUP BY acc_type; And the following is an MV, mv_bal, for this query: CREATE OR REPLACE MATERIALIZED VIEW mv_bal REFRESH ON DEMAND AS SELECT acc_type, SUM(cleared_bal) totbal FROM accounts GROUP BY acc_type;
  • 10. Now suppose a user wants to get the total of all account balances for the account type 'C' and issues the following query: SELECT SUM(cleared_bal) FROM accounts WHERE acc_type = 'C'; Because the mv_bal MV already contains the totals by account type, the user could have gotten this information directly from the MV, by issuing the following: SELECT totbal FROM mv_bal WHERE acc_type = 'C'; This query against the mv_bal MV would have returned results much more quickly than the query against the accounts table. Running a query against the MV will be faster than running the original query, because querying the MV does not query the source tables. To keep the data in sync, the MV is refreshed from time to time, either manually or automatically. There are two ways to refresh data in MVs. In one of them, the MV is completely wiped clean and then repopulated with data from the source tables—a process known as complete refresh. In some cases, however, when the source tables may have changed very little, it is possible to refresh the MV only for changed records on the source tables—a process known as fast refresh. To use fast refresh, however, you must have created the MV as fast-refreshable. Because it updates only changed records, fast refresh is faster than complete refresh. (See the Oracle Database Data Warehousing Guide for more information on refreshing MVs.) A materialized view can be either read-only, updatable, or writeable. Users cannot perform data manipulation language (DML) statements on read-only materialized views, but they can perform DML on updatable and writeable materialized views. =============================================================== What is SQL*Loader and what is it used for? SQL*Loader is a bulk loader utility used for moving data from external files into the Oracle database.
  • 11. Is there a SQL*Unloader to download data to a flat file? Oracle does not supply any data unload utilities. Here are some workarounds: Using SQL*Plus ,You can use SQL*Plus to select and format your data and then spool it to a file. Skipping unwanted data ? One can skip unwanted header records or continue an interrupted load (for example if you run out of space) by specifying the "SKIP=n" keyword. "n" specifies the number of logical rows to skip. sqlldr userid=ora_id/ora_passwd control=control_file_name.ctl skip=4 What is data purging ? Explain about Control M JObs detaily?How to execute this. What is the difference between a W/H and an OLTP application? Difference between DSS & OLTP? What is operational data source (ODS)? What is Snow Flake Schema design in database? What is ETL process in Data warehousing? Advantages of de normalized data? What is the difference between choosing a multidimensional database and a relational database? Mulitidimentional database: OLAP(OnLineAnnaliticalProcessing) Relational database: OLTP(OnLineTransactionProcessing)
  • 12. what is the difference between E-R modelling and Dimendional modelling? and what are semi additive facts? ER modeling: - focused how data will be efficient for processing (insert, update, delete) - Minimalize (limit to zero) data redundancies Dimensional: - focused how data will be efficient for retrieving (example, by report and analysis tools). - many data redundancies - Consist of Fact and Dimension table What is the difference between aggregate table and materliazed view? Aggregate tables are pre-computed totals in the form of hierarchical mutidimensional structure materliazed view ,is an database object which caches the query result in a concrete table and updates it from the original database table from time to time Aggregate tables are used to speed up the query computing whereas materialized view speed up the data retrieval . How many clustered indexes can u create for a table in DWH? You can have only one clustered index per table. ==========================================================
  • 13. Views A view takes the output of a query and makes it appear like a virtual table. All operations performed on a view will affect data in the base table and so are subject to the integrity constraints and triggers of the base table. A View can also be used to improve security by restricting access to a predetermined set of rows or columns. one View can be based on another, a view can also JOIN a view with a table (GROUP BY or UNION). Read-Only vs Updatable Views The data dictionary views ALL_UPDATABLE_COLUMNS, DBA_UPDATABLE_COLUMNS, and USER_UPDATABLE_COLUMNS indicate which view columns are updatable. An updatable view lets you insert, update, and delete rows in the view and propagate the changes to the target master table. In order to be updatable, a view cannot contain any of the following constructs: SET or DISTINCT operators, an aggregate or analytic function, a GROUP BY, ORDER BY, CONNECT BY, or START WITH clause, a subquery (or collection expression) in a SELECT list or finally (with some exceptions) a JOIN . Views that are not updatable can be modified using an INSTEAD OF trigger. Materialized Views Materialized views are schema objects that can be used to summarize, precompute , replicate, and distribute data The existence of a materialized view is transparent to SQL, but when used for query rewrites will improve the performance of SQL execution MV are use more for performance improvement. MV helps query rewrite..In shout if u have a MV defined as "select * from sales group by region_id" and u have a query selct * from sales group by region_id fired on the oracle db. Oracle will automatically re-write a query and refer it to MV instade of Sales table. Now in DW environment this is a big performance improvement. There are some paramters which needs to be set for this to happen.
  • 14. MV can undergo fast referesh. In short if i have 10 Mill rows in the Fact table and i add 500 rows. Then b making use of MVLOGS oracle will do a fast refresh on the MView. with extra 500 rows only. A materialized view provides indirect access to table data by storing the results of a query in a separate schema object. Unlike an ordinary view, which does not take up any storage space or contain any data. An updatable materialized view lets you insert, update, and delete. You can define a materialized view on a base table, partitioned table or view and you can define indexes on a materialized view. A materialized view can be stored in the same database as its base table(s) or in a different database. A materialized view log is a schema object that records changes to a master table's data so that a materialized view defined on the master table can be refreshed incrementally. =================================================== Synonyms A synonym is an alias for any table, view, materialized view, sequence, procedure, function, or package. A public synonym is owned by the user group PUBLIC and every user in a database can access it. A private synonym is in the schema of a specific user who has control over its availability to others. Synonyms are used to: - Mask the real name and owner of a schema object - Provide global (public) access to a schema object - Provide location transparency for tables, views, or program units of a remote database. - Simplify SQL statements for database users e.g. to query the table PATIENT_REFERRALS with SQL:
  • 15. SELECT * FROM MySchema.PATIENT_REFERRALS; CREATE PUBLIC SYNONYM referrals FOR MySchema.PATIENT_REFERRALS; After the public synonym is created, you can query with a simple SQL statement: SELECT * FROM referrals;