The document discusses data warehousing and describes its key characteristics and components. It defines a data warehouse as a copy of transaction data structured for querying and reporting to support strategic decision making. It outlines the stages of constructing a data warehouse including extraction, integration, and dimensional analysis to design the data warehouse database.
Know different types of tips about Importance of dataware housing, Data Cleansing and Extracting etc . For more details visit: http://www.skylinecollege.com/business-analytics-course
In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence.[1] DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for knowledge workers throughout the enterprise.
Introduction to Data Warehouse. Summarized from the first chapter of 'The Data Warehouse Lifecyle Toolkit : Expert Methods for Designing, Developing, and Deploying Data Warehouses' by Ralph Kimball
Know different types of tips about Importance of dataware housing, Data Cleansing and Extracting etc . For more details visit: http://www.skylinecollege.com/business-analytics-course
In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence.[1] DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for knowledge workers throughout the enterprise.
Introduction to Data Warehouse. Summarized from the first chapter of 'The Data Warehouse Lifecyle Toolkit : Expert Methods for Designing, Developing, and Deploying Data Warehouses' by Ralph Kimball
Data Warehouse, Data Warehouse Architecture, Data Warehouse Concept, Data Warehouse Modeling, OLAP, OLAP Operations, Data Cube, Data Processing, Data Cleaning, Data Reduction, Data Integration, Data Transformation
Data Warehouse Design and Best PracticesIvo Andreev
A data warehouse is a database designed for query and analysis rather than for transaction processing. An appropriate design leads to scalable, balanced and flexible architecture that is capable to meet both present and long-term future needs. This session covers a comparison of the main data warehouse architectures together with best practices for the logical and physical design that support staging, load and querying.
This lecture gives various definitions of Data Mining. It also gives why Data Mining is required. Various examples on Classification , Cluster and Association rules are given.
Building an Effective Data Warehouse ArchitectureJames Serra
Why use a data warehouse? What is the best methodology to use when creating a data warehouse? Should I use a normalized or dimensional approach? What is the difference between the Kimball and Inmon methodologies? Does the new Tabular model in SQL Server 2012 change things? What is the difference between a data warehouse and a data mart? Is there hardware that is optimized for a data warehouse? What if I have a ton of data? During this session James will help you to answer these questions.
A Seminar Presentation on Big Data for Students.
Big data refers to a process that is used when traditional data mining and handling techniques cannot uncover the insights and meaning of the underlying data. Data that is unstructured or time sensitive or simply very large cannot be processed by relational database engines. This type of data requires a different processing approach called big data, which uses massive parallelism on readily-available hardware.
Data Warehouse – Introduction, characteristics, architecture, scheme and modelling, Differences between operational database systems and data warehouse.
DAMA, Oregon Chapter, 2012 presentation - an introduction to Data Vault modeling. I will be covering parts of the methodology, comparison and contrast of issues in general for the EDW space. Followed by a brief technical introduction of the Data Vault modeling method.
After the presentation i I will be providing a demonstration of the ETL loading layers, LIVE!
You can find more on-line training at: http://LearnDataVault.com/training
Data Warehouse, Data Warehouse Architecture, Data Warehouse Concept, Data Warehouse Modeling, OLAP, OLAP Operations, Data Cube, Data Processing, Data Cleaning, Data Reduction, Data Integration, Data Transformation
Data Warehouse Design and Best PracticesIvo Andreev
A data warehouse is a database designed for query and analysis rather than for transaction processing. An appropriate design leads to scalable, balanced and flexible architecture that is capable to meet both present and long-term future needs. This session covers a comparison of the main data warehouse architectures together with best practices for the logical and physical design that support staging, load and querying.
This lecture gives various definitions of Data Mining. It also gives why Data Mining is required. Various examples on Classification , Cluster and Association rules are given.
Building an Effective Data Warehouse ArchitectureJames Serra
Why use a data warehouse? What is the best methodology to use when creating a data warehouse? Should I use a normalized or dimensional approach? What is the difference between the Kimball and Inmon methodologies? Does the new Tabular model in SQL Server 2012 change things? What is the difference between a data warehouse and a data mart? Is there hardware that is optimized for a data warehouse? What if I have a ton of data? During this session James will help you to answer these questions.
A Seminar Presentation on Big Data for Students.
Big data refers to a process that is used when traditional data mining and handling techniques cannot uncover the insights and meaning of the underlying data. Data that is unstructured or time sensitive or simply very large cannot be processed by relational database engines. This type of data requires a different processing approach called big data, which uses massive parallelism on readily-available hardware.
Data Warehouse – Introduction, characteristics, architecture, scheme and modelling, Differences between operational database systems and data warehouse.
DAMA, Oregon Chapter, 2012 presentation - an introduction to Data Vault modeling. I will be covering parts of the methodology, comparison and contrast of issues in general for the EDW space. Followed by a brief technical introduction of the Data Vault modeling method.
After the presentation i I will be providing a demonstration of the ETL loading layers, LIVE!
You can find more on-line training at: http://LearnDataVault.com/training
Data Warehousing is a topic on Management of Information Technology that would help students on their subject matter and as reference for their assigned report.
History, definition, need, attributes, applications of data warehousing ; difference between data mining, big data, database and data warehouse ; future scope
Data Vault Modeling and Methodology introduction that I provided to a Montreal event in September 2011. It covers an introduction and overview of the Data Vault components for Business Intelligence and Data Warehousing. I am Dan Linstedt, the author and inventor of Data Vault Modeling and methodology.
If you use the images anywhere in your presentations, please credit http://LearnDataVault.com as the source (me).
Thank-you kindly,
Daniel Linstedt
1. Data Warehousing
Data Warehousing
Lectures based on material from
Phil Trinder (HW)
Monica Farrow
email : M.Farrow@hw.ac.uk
02/08/13 Data Warehousing 1
2. Data Warehouse
Two definitions:
“A data warehouse is a copy of transaction data
specifically structured for querying and reporting.”
Data Warehousing Information Center
http://www.dwinfocenter.org/defined.html
A data warehouse is a specialised database to support
strategic decision making
Decision making involves:
Analysing the problem, e.g.
Why are my sales not meeting my targets?
What products are not meeting their targets?
What are the trends for the failing products?
Generating alternative solutions, evaluating them, and choosing
the best
02/08/1306/30/08 Data Warehousing 1.2
3. Decision Support Systems
These are used by management to make strategic
or policy decisions
They have existed for a long time
Characteristics
Aimed at loosely specified problems
Combine models and analytical approaches with
data retrieval
Good usability for non-specialist use
Flexible: to support multiple decision-making
approaches
02/08/13 Data Warehousing 3
4. A wine club example
100,000 members, 2000 wines, 150 suppliers,
750,000 orders per year
Systems : storage technology
Member administration : indexed sequential files
Stock control: relational database
Order processing: relational database
Despatch: proprietary database
02/08/13 Data Warehousing 4
5. Wine Club Operational Schema
Member
places
supplies
Supplier Wine MemberOrder
in On
Stock OrderItem
Is for
02/08/13 Data Warehousing 5
6. Wine Club Questions
Competitors have moved in. Is our market share
falling?
What products are increasing/decreasing in
popularity?
Which products are seasonal?
Which members place regular orders?
Are some products more popular in certain parts
of the country?
Which members concentrate on particular
products?
02/08/13 Data Warehousing 6
7. Strategic vs Operational Issues
Strategic*: planning and policy making, long term
and broad brush, higher levels of management,
e.g.
When to launch a new product?
What would be the effect of closing the Edinburgh
branch
Operational: day-to-day running of business.
Details and immediate, lower levels of
management
Which items are out of stock?
What is the status of order 34522?
*Here, ‘strategic’ is in the management context, not executive
02/08/13 Data Warehousing 7
8. Motivation for data warehousing
Operational data is not suitable to guide strategic
decisions
Some of the data is not relevant
Data may be archived regularly once it is not
regularly required
Need to examine trends
What is happening over time?
Queries over time may significantly affect the
speed of operational processing
Solution: record sales on a regular basis, separate
from the operational system, and analyse them
This is the start of a warehouse
02/08/13 Data Warehousing 8
9. Data warehouse characteristics
Subject-oriented e.g. sales
Non-volatile – no alteration to records once they
are added
Whereas in operational processing, records will
frequently be updated (e.g. alteration to prices,
quantity etc)
Integrated, data from multiple (operational)
sources are accumulated in an integrated format
E.g. wine club has >1 operational db
Time variant: data is recorded against time to
allow trend analysis
02/08/13 Data Warehousing 9
10. Data warehouse characteristics continued
Records are extracted to make future querying
easy. Therefore
There is likely to be some data duplication,
including storage of derived data (data obtained
from calculations and aggregations)
There will be less joins and more indexes than in a
well-designed operational database.
The data warehouse will be larger than the
corresponding operational database
Data in operational databases will be archived
periodically, whereas a data warehouse keeps data
for years to allow trend analysis.
02/08/13 Data Warehousing 10
11. Warehouse construction
Now we have a look at each stage in warehouse
construction:
Source1
Extraction Presentation
Source n
Integration Aggregate
Navigators
DBMS
02/08/13 Data Warehousing 11
12. Extraction
Retrieve data from all data sources: files,
databases etc
The process to extract data will be an add-on to
the existing operational system. For example,
Day-end extraction run
When a sale is recorded, this triggers extraction of
the sale data
02/08/13 Data Warehousing 12
13. Integration
When data is extracted from different sources,
integration may be required:
Format Integration, similar to type mismatch
Examples: gender
‘male’, ‘female’
‘M’, ‘F’
0 and 1
Semantic integration: does a word have the same
meaning in all the data being integrated?
Example – a ‘sale’ means:
order processing: order received
stock control: extracted items from physical warehouse
despatch: goods shipped
02/08/13 Data Warehousing 13
14. Data Warehouse design: dimensional analysis
Dimensional analysis is used to identify the
requirements of the warehouse
What are the aspects of the data that are
strategically important? e.g.
Member
Product - wine
Time always
We don’t know in advance exactly what the
queries will be!
02/08/13 Data Warehousing 14
15. 3 dimensions example
Smith
MEMBER
Jones
Q1 2008
Q4 2007
Bloggs
Q3 2007 TIME
Macon Chablis Merlot Chardonnay
PRODUCT
02/08/13 Data Warehousing 15
16. Star Schema
A star schema is one of the simplest designs for a
data warehouse.
A central fact table, containing all the main
information, is the centre of the star
Smaller dimension tables, containing look-up
information for attributes in the fact table, at the
points.
Wine SALES
Central
fact
Member table Time
02/08/13 Data Warehousing 16
17. Star Schema Design for DB
Wine
winecode,
winename,
vintage,
description,
SALES
price
Central Time
Member fact
table timecode,
membercode,
membername, date,
memberAddress winecode, periodno,
membercode, quarterno,
timecode, year
quantity,
cost
02/08/13 Data Warehousing 17
18. Warehouse Database
Centre of star schema becomes a relation: the fact table –
numeric facts and foreign keys
Sales(membercode, winecode, timecode, qty, itemcost)
Each dimension becomes a relation: a dimension table
Member(membercode, membername, memberaddress)
Wine(winecode, name, vintage, description, price)
There is ALWAYS a time dimension table
This includes period and quarter details, since they are
frequently used in queries
Time(timecode, date, periodno, quarterno, year)
02/08/13 Data Warehousing 18
19. Using the Warehouse
The strategic questions can now be investigated
using data extracted by SQL queries
For example, to discover which wines have
increasing and decreasing sales, we can retrieve
a table giving the total sales for each wine against
time:
SELECT w.winename, t.period_number,SUM(s.qty)
FROM sales s, wine w, time t
WHERE s.winecode = w.winecode
AND s.timecode = t.timecode
GROUP BY w.winename, t.periodno
ORDER BY w.winename, t.periodno
02/08/13 Data Warehousing 19
20. Indexes
Usually a lot of indexes will be created, to make
queries more efficient
An index helps speed up retrieval.
A column that is frequently referred to in the
WHERE clause is a potential candidate for
indexing.
Diagrams of the 2 most commonly used indexes
in data warehousing are shown on the next slides:
Indexes may be based on the B-Tree
Also bitmap indexes are widely used
02/08/13 Data Warehousing 20
22. Bitmap indexes
Bitmap indexes
An example on the next slide
For each value of a domain, there is a bitmap
identifying the row Ids of satisfying tuples
1 if a match, 0 otherwise
Usually applied to attributes with a sparse domain
In Oracle, <100 distinct values
E.g. bitmaps for all tuples with sex = male and for
sex=female
Updating a bitmap takes a lot of time, so use for
tables with hardly any updates, inserts, deletes
Ideal for data warehousing
02/08/13 Data Warehousing 22
23. Bitmap indexes example
The first table is a table about Sailors
The second table shows a bitmap index for the rating
attribute, assuming values are only from 1-3
There is a row in the bitmap index for each row in the
Sailor table
Column headings in the index are the values in the rating
column
SAILORS Bitmap index
Id Rating etc 1 2 3
22 1 Other data 1 0 0
23 2 Other data 0 1 0
31 3 Other data 0 0 1
35 1 Other data 1 0 0
02/08/13 Data Warehousing 23
24. Materialised views and Aggregation
Data warehouses grow continuously, and may
become very large indeed
Problems: the time to compute a query and the
size of the result can be very large indeed
Solution: materialised views and aggregation
A materialised view is a stored pre-computed
table, used to prevent frequent use of time-
consuming joins and calculations
02/08/13 Data Warehousing 24
25. Aggregates
Basic idea: sacrifice detail to reduce the size of the data
Store precomputed tables at a useful level of detail,
consisting of commonly used sums, counts etc.
Must be carefully selected, e.g.
Sales to each member of each wine summer for each
quarter
Sales of each wine summed for each quarter for each month
Levels of aggregation
None(i.e. detail)
Light (e.g. monthly)
Highly (e.g. quarterly)
02/08/13 Data Warehousing 25
26. Aggregate navigator
An aggregate navigator uses information about
available aggregates to automatically rewrite
queries to use them
It also records aggregates usage, so that unused
aggregates can be removed
It can suggest useful new aggregates
E.g. a frequent query is based on the number of
wines sold per month in a range of price bands.
This is suggested as a new aggregate
02/08/13 Data Warehousing 26
27. Presentation requirements
Must be easy to use
Visualise the results of queries in many ways
e.g. charts, graphs, scatter diagrams etc
Make good use of colour and dimensions 2D,
2.5D, 3D, animation
Example of 2.5D graph
Have analysis tools: statistical and curve fitting
For example the product sales trend table
would be plotted as a graph
02/08/13 Data Warehousing 27
28. OLAP
OnLine Analytic Processing uses multidimensional
analysis of the data
Allows users to get summaries and find answers
to known questions
What is the average profit month by month?
If we increased sales by 10%, what would the
effect be?
02/08/13 Data Warehousing 28
29. Data mining
Data mining is the extraction of hidden predictive
information from large databases
E.g. what’s likely to happen to sales next March
and why?
The actual techniques for data mining are not
covered in this course.
Data mining is usually based on the data in a data
warehouse, and ideally data mining tools are
integrated with the data warehouse.
Data Mining provides the Enterprise with
intelligence and Data Warehousing provides the
Enterprise with a memory.
02/08/13 Data Warehousing 29
30. Summary
A data warehouse is a specialised database to
enable efficient and straightforward production of
reports to support strategic decision making.
It contains a copy of the operational data, often
integrated from >1 source. Records, once added,
are not altered. The central fact table in a star
schema design will be very large.
02/08/13 Data Warehousing 30
31. Discussion/Exercise
A company sells garden trees from several stores located
around the country. People visit the store, and buy trees.
The names of the customers are always recorded, and
many customers place repeat orders.
The company would like to set up a data warehouse so
that they can analyse details such as
Frequency of sales per customer
Which store has the best sales, ranked by month
Top selling tree by month
Etc etc
Create a suitable star schema, inventing appropriate
attributes
02/08/13 Data Warehousing 31