Your SlideShare is downloading. ×
Data Warehouse: Basics
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Data Warehouse: Basics

260
views

Published on

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
260
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Data Warehouse An Introduction Lecture - 2Dept of MCA, NIT, Durgapur. September 6, 2012 1
  • 2. Data, Data everywhere yet ... I can’t find the data I need data is scattered over the network many versions, subtle differences I can’t get the data I need need an expert to get the data I can’t understand the data I found available data poorly documented I can’t use the data I found results are unexpected data needs to be transformed from one form to otherDept of MCA, NIT, Durgapur. September 6, 2012 2
  • 3. What We Need? A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use, in a Business Context / Subject. [Barry Devlin] Leads towards Business AnalysisDept of MCA, NIT, Durgapur. September 6, 2012 3
  • 4. Subject Orientation  Organized around major subjects, such as customer, product, sales.  Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing.  Provide a simple and concise view around particular subject issues, by excluding data that are not useful in the decision support process.Dept of MCA, NIT, Durgapur. September 6, 2012 4
  • 5. What Are Analytical Needs? Which are our Which are our lowest/highest margin lowest/highest margin customers ? customers ? Who are my customers Who are my customers What is the most What is the most and what products and what products effective distribution effective distribution are they buying? are they buying? channel? channel? What product prom- What product prom- Which customers Which customers-otions have the biggest -otions have the biggest are most likely to go are most likely to go impact on revenue? impact on revenue? to the competition ? to the competition ? What impact will What impact will new products/services new products/services have on revenue have on revenue and margins? and margins?Dept of MCA, NIT, Durgapur. September 6, 2012 5
  • 6. Decision Support System Used to manage and control business Data is historical or point-in-time Optimized for inquiry rather than update Use of the system is loosely defined and can be ad-hoc Used by managers and end-users to understand the business and make judgementsDept of MCA, NIT, Durgapur. September 6, 2012 6
  • 7. Evolution of Decision Support 60’s: Batch reports hard to find and analyze information inflexible and expensive, reprogram every request 70’s: Terminal based DSS and EIS 80’s: Desktop data access and analysis tools query tools, spreadsheets, GUIs easy to use, but access only operational db 90’s: Data warehousing with integrated OLAP engines and tools To meet the analytical needs of the business.Dept of MCA, NIT, Durgapur. September 6, 2012 7
  • 8. What are the users saying... Data should be integrated across the enterprise Summary data had a real value to the organization Historical data held the key to understanding data over time What-if capabilities are requiredDept of MCA, NIT, Durgapur. September 6, 2012 8
  • 9. Need Separate Process? Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previously possible. A decision support database maintained separately from the organization’s operational databaseDept of MCA, NIT, Durgapur. September 6, 2012 9
  • 10. Traditional RDBMS used for OLTP Database Systems have been used traditionally for OLTP clerical data processing tasks detailed, up to date data structured repetitive tasks read/update a few records isolation, recovery and integrity are critical Normalization is mandatory Will call these Operational DatabaseDept of MCA, NIT, Durgapur. September 6, 2012 10
  • 11. Decision Support Database  Defined in many different ways, but not rigorously.  A decision support database that is maintained separately from the organization’s operational database  Support information processing by providing a solid platform of consolidated, historical data for analysis.Dept of MCA, NIT, Durgapur. September 6, 2012 11
  • 12. Some Common Terms Operational databases: Operational databases are detail oriented databases defined to meet the needs of sometimes very complex processes in a company. This detailed view is reflected in the data arrangement in the database. The data is highly normalized to avoid data redundancy and “complex-maintenance". OLTP: On-Line Transaction Processing (OLTP) describes the way data is processed by an end user or a computer system. It is detail oriented, highly repetitive with massive amounts of updates and changes of the data by the end user. It is also very often described as the use of computers to run the on-going operation of a business.Dept of MCA, NIT, Durgapur. September 6, 2012 12
  • 13. Some Common Terms Cont… Data warehouse: A data warehouse collects, organizes, and makes data available for the purpose of analysis — to give management the ability to access and analyze information about its business. This type of data can be called "informational data". The systems used to work with informational data are referred to as OLAP (On-Line Analytical Processing). We will call it Informational Database .Dept of MCA, NIT, Durgapur. September 6, 2012 13
  • 14. Some Common Terms Cont… Operational versus informational databases The major difference between operational and informational databases is the update frequency: 1. On operational databases a high number of transactions take place every hour. The database is always "up to date", and it represents a snapshot of the current business situation, or more commonly referred to as point in time. 2. Informational databases are usually stable over a period of time to represent a situation at a specific point in time in the past, which can be noted as historical data.Dept of MCA, NIT, Durgapur. September 6, 2012 14
  • 15. Some Common Terms Cont… OLAP: On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user. OLAP is implemented in a multi-user client/server mode and offers consistently rapid response to queries, regardless of database size and complexity. OLAP helps the user synthesize enterprise information through comparative, personalized viewing, as well as through analysis of historical and projected data in various "what-if" data model scenarios. This is achieved through use of an OLAP Server.Dept of MCA, NIT, Durgapur. September 6, 2012 15
  • 16. OLTP vs. Data Warehouse OLTP Warehouse (OLAP) Application Oriented Subject Oriented Used to run business Used to analyze business Clerical User Manager/Analyst Detailed data Summarized and refined Current up to date Snapshot data Isolated Data Integrated Data Repetitive access by Ad-hoc access using small transactions large queries Read/Update access Mostly read access (batch update)Dept of MCA, NIT, Durgapur. September 6, 2012 16
  • 17. Some Common Terms Cont… Metadata — a definition Metadata is the kind of information that describes the data stored in a database and includes such information as: • A description of tables and fields in the data warehouse, including data types and the range of acceptable values. • A similar description of tables and fields in the source databases, with a mapping of fields from the source to the warehouse. • A description of how the data has been transformed, including formulae, formatting, currency conversion, and time aggregation. • Any other information that is needed to support and manage the operation of the data warehouse.Dept of MCA, NIT, Durgapur. September 6, 2012 17
  • 18. Some Common Terms Cont… Data mart: A data mart contains a subset of corporate data that is of value to a specific business unit, department, or set of users. This subset consists of historical, summarized, and possibly detailed data captured from transaction processing systems, or from an enterprise data warehouse. It is important to realize that a data mart is defined by the functional scope of its users, and not by the size of the data mart database. Most data marts today involve less than 100 GB of data; some are larger, however it is expected that as data mart usage increases they will rapidly increase in size. Data mining: Data mining is the process of extracting valid, useful, previously unknown, and comprehensible information from data and using it to make business decisions.Dept of MCA, NIT, Durgapur. September 6, 2012 18
  • 19. Problem in General Purpose SQL Let a set of database schemas are as follows: 1. Product ( P_ID, P_NAME, P_DESC); 2. Sales (R_NO, P_ID, Q_ID, AMOUNT); 3. Time (Q_ID, Q_DESC); Say, the organization need to generate a report as follows: Product 4Q96 Sales 4Q97 Sales XYZ 57 66 ABC 29 24 PQR 115 89Dept of MCA, NIT, Durgapur. September 6, 2012 19
  • 20. Problem in SQL Cont… The SQL may be needed to display the Fourth Quarter 1996 Sales may be as follows: SELECT Product.P_Name, SUM(Sales.DOLLAR) FROM Sales, Product, Time WHERE . . . Time.Q_ID= 4Q96 AND Product.Product_Name in (‘XYZ, ‘ABC, ‘PQR) GROUP BY Product.P_NAME If one expand the Time constraint to include both quarters, as follows: WHERE . . . Time.Quarter IN (4Q96, 4Q97) then the sum expression adds up the sales from both quarters, which we do not want. Also SQL not gives any other alternative. Hence General SQL Engine fails in case of query like above.Dept of MCA, NIT, Durgapur. September 6, 2012 20