Data Warehousing
By-
Sushant Hatwar
Vipul Dhavade
Shashank Shukla
Vigneshwaar P.
Ketki Morwal
Data warehouse
• A data warehouse is an appliance for storing and
analyzing data, and reporting.
• Central database that includes information from several
different sources.
• Keeps current as well as historical data.
“Data Warehouse is a subject oriented, integrated, time-
variant and non-volatile collection of data in support of
management’s decision making process.”
– W. H. Inmon
Data
Warehouse
Subject
Oriented
Integrated
Time
Variant
Non-
volatile
Data Warehouse—Subject-Oriented
• Organized around major subjects, such as customer,
product, sales
• Focusing on the modeling and analysis of data for
decision makers, not on daily operations or
transaction processing
• Provide a simple and concise view around particular
subject issues by excluding data that are not useful in
the decision support process
Data Warehouse—Integrated
• Constructed by integrating multiple, heterogeneous data
sources
– relational databases, flat files, on-line transaction records
• Data cleaning and data integration techniques are applied.
– Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data
sources
• E.g., Hotel price: currency, tax, breakfast covered, etc.
– When data is moved to the warehouse, it is converted
Data Warehouse—Time Variant
• The time horizon for the data warehouse is significantly longer
than that of operational systems
– Operational database: current value data
– Data warehouse data: provide information from a historical
perspective (e.g., past 5-10 years)
• Every key structure in the data warehouse
– Contains an element of time, explicitly or implicitly
– But the key of operational data may or may not contain “time
element”
Data Warehouse—Nonvolatile
• A physically separate store of data transformed from the
operational environment
• Operational update of data does not occur in the data
warehouse environment
– Does not require transaction processing, recovery, and
concurrency control mechanisms
– Requires only two operations in data accessing:
• initial loading of data and access of data
Data Warehouse Architecture
System B
System C
System D
System A
Extract
Transform
Load
The Data
Warehouse
BusinessModel
Self Serve
Data
Sources
ETL Data
Store
Data
Access
Presentation
Prompted Views
Dashboards
Scorecards
Ad-Hoc Reporting
Applications
Industry Application
Finance Credit card Analysis
Insurance Claims, Fraud Analysis
Telecommunication Call record Analysis
Transport Logistics management
Consumer goods Promotion Analysis
Advantages
• Enhances end-user access to a wide variety of data.
• Increases data consistency.
• Increases productivity and decreases computing costs.
• Is able to combine data from different sources, in one place.
• It provides an infrastructure that could support changes to data
and replication of the changed data back into the operational
systems.
Disadvantage
• Extracting, cleaning and loading data could be time
consuming.
• Problems with compatibility with systems already in place
e.g. transaction processing system.
• Providing training to end-users, who end up not using the data
warehouse.
• Security could develop into a serious issue, especially if the
data warehouse is web accessible.
Thank You

Data warehousing

  • 1.
    Data Warehousing By- Sushant Hatwar VipulDhavade Shashank Shukla Vigneshwaar P. Ketki Morwal
  • 2.
    Data warehouse • Adata warehouse is an appliance for storing and analyzing data, and reporting. • Central database that includes information from several different sources. • Keeps current as well as historical data. “Data Warehouse is a subject oriented, integrated, time- variant and non-volatile collection of data in support of management’s decision making process.” – W. H. Inmon
  • 3.
  • 4.
    Data Warehouse—Subject-Oriented • Organizedaround major subjects, such as customer, product, sales • Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing • Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process
  • 5.
    Data Warehouse—Integrated • Constructedby integrating multiple, heterogeneous data sources – relational databases, flat files, on-line transaction records • Data cleaning and data integration techniques are applied. – Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources • E.g., Hotel price: currency, tax, breakfast covered, etc. – When data is moved to the warehouse, it is converted
  • 6.
    Data Warehouse—Time Variant •The time horizon for the data warehouse is significantly longer than that of operational systems – Operational database: current value data – Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) • Every key structure in the data warehouse – Contains an element of time, explicitly or implicitly – But the key of operational data may or may not contain “time element”
  • 7.
    Data Warehouse—Nonvolatile • Aphysically separate store of data transformed from the operational environment • Operational update of data does not occur in the data warehouse environment – Does not require transaction processing, recovery, and concurrency control mechanisms – Requires only two operations in data accessing: • initial loading of data and access of data
  • 8.
    Data Warehouse Architecture SystemB System C System D System A Extract Transform Load The Data Warehouse BusinessModel Self Serve Data Sources ETL Data Store Data Access Presentation Prompted Views Dashboards Scorecards Ad-Hoc Reporting
  • 9.
    Applications Industry Application Finance Creditcard Analysis Insurance Claims, Fraud Analysis Telecommunication Call record Analysis Transport Logistics management Consumer goods Promotion Analysis
  • 10.
    Advantages • Enhances end-useraccess to a wide variety of data. • Increases data consistency. • Increases productivity and decreases computing costs. • Is able to combine data from different sources, in one place. • It provides an infrastructure that could support changes to data and replication of the changed data back into the operational systems.
  • 11.
    Disadvantage • Extracting, cleaningand loading data could be time consuming. • Problems with compatibility with systems already in place e.g. transaction processing system. • Providing training to end-users, who end up not using the data warehouse. • Security could develop into a serious issue, especially if the data warehouse is web accessible.
  • 12.