1. Information & Knowledge
Management - Class 3
Marielba Zacarias
Prof. Auxiliar DEEI
FCT I, Gab 2.69, Ext. 7749
Data-warehousing
mzacaria@ualg.pt
http://w3.ualg.pt/~mzacaria
3. Data Warehousing
Data collection for analysis and
reporting taks
Historical data
Stored in a distinct environment from
operational data
Structure different from data-bases
4. Why
Operational and analitical data have
different requirements in terms of
usage (frequency, response time)
hardware
software
structure
7. The “arquitected” environment”
Atomic Dept. individual
operational
dw dw dw
“data-marts”
Detailed temporal
More granular derived,
daily Ad-hoc
Temporal Some primitive
current value Heuristic
Integrated Typical of Marketing
High access prob. Não-repetitive
Subject oriented Engineering
Application oriented Oriented to PC or
Sumarized Production
workstations
Accounting
7
8. Type of questions
Atomico
operacional Dept. individual
dw
J. Jones 1986-87
Jan – 4101 Clientes
123 Main St. J. Jones
Fev – 4209 Desde 1982
Credit - AA 456 High St.
Mar- 4175 Com saldos
Credit - B
Apr - 4215 > 5,000
Jones e crédito
Credit? 1987-89
Monthly >= B
J. Jones
456 High St. Sales?
Credit - A
1989 – pte. Client types
Jones J. Jones in analysis?
Credit 123 Main St.
History? Credit - AA
8
9. Architected Environment
Production
Environment
Operational Analitical
environment Environment
9
10. Data-warehouse design
Requirement Performance Tuning
Gatherings Query
Physical Optimization
Environment Setup Quality Assurance
Data Modeling Rolling out to
ETL Production
OLAP Cube Design Production
Front End Maintenance
Development Incremental
Enhancements
Report
Development
11. Requirements
Gathering
Take into account users
Executive with little time and knowledge about
technical terms
Interviews, JAD sessions
User Reporting/Analysis Requirements
Hardware, training requirements
Data source identification
Concrete project plan
12. Physical Environment
Setup
Setup Servers, DBMS and databases,
ETL, OLAP Cubes and reporting services
Create three environments
development, testing, production
13. Data-modeling
Depends on initial data source identification
Conceptual, logical and physical data modeling
Should be related
to the information
architecture!!!!
14. Data Modeling
Dimensional Approach
Transactional data is partitioned in facts
Numeric transaction data
products ordered, price
Dimensions
provide context for facts
order date, customer name, product
number, location info, salesperson
15. Dimensional Approaches
Star
Fact table (typically a transaction)
Dimensions (context of the transaction)
Snowflake
Dimensions indirectly linked to fact
tables
21. OLAP Cube Design
Specification of detailed reporting needs
in terms of the multi-dimensional
structure previously defined (star or
snowflake), but regarded as a n-
dimensional cube
star/snowflake and cubes are pretty
much the same thing
cubes are more appropriate for not IT
users
29. SQL Server
Integration Examples II
Qualitative data
Description term ActionId
team meeting 18
hr distribution 19
project list 19
team meeting 19
hr distribution 26
project list 26
claims application 27
claims application 28
cards application maintenance 29
claims application integration 30
hr distribution 31
project list 31
claims application 34
claims application 35
hr distribution 36
project list 36
31. Front-end development
Front-ends range from
in-house development with scripting
languages php, asp, or perl
to off-the-shelf products such as Crystal
Reports or higher-end products such as
Actuate
OLAP vendors also offer front-ends of their
own
32. Report Development
Derived from requirements
Main point of contact between the data-
warehouse and users
User customization
Report Delivery (web, e-mail, sms, file
formats)
Access privileges
34. Query Optimization
Understand how your DBMS executes queries
Store intermediate results in temporary tables
Query Optimization tips
Use indexes
Partition tables (vertically and horizontally)
De-normalize (less joins)
Server Tuning
35. Quality Assurance
Test plan with quality criteria for data
Critical success factor
Often overlooked
Performed by people with knowledge of
the business data not data-warehouses
Resistance
36. Rolling to production
Seems easy but..
Putting everyone online may take a full
week in some cases
Online access can be as simple as
sending a link by e-mail
37. Production Maintenance
Backup and recovery processes
Crisis Management
Monitoring end-user usage
Capture runaways queries before
whole system is slowed down
To measure usage for ROI calculations
and future enhancements
38. Incremental enhancements
Accomplish small changes such as
changing original geographical
designations
A company may add new sales regions
No matter how simple, never do them
directly in production environment
43. Tools for unstructured
information management
Content Management Systems
Record Management Systems
Digital Image Management Systems
Digital Asset Management Systems
Digital Imaging Systems