SlideShare a Scribd company logo
1 of 20
Data Warehouse
                               An Introduction

                                   Lecture - 2


Dept of MCA, NIT, Durgapur.           September 6, 2012   1
Data, Data everywhere yet ...
                              I canā€™t find the data I need
                                 data is scattered over the network
                                 many versions, subtle differences

                              I canā€™t get the data I need
                                 need an expert to get the data


                              I canā€™t understand the data I found
                                 available data poorly documented


                              I canā€™t use the data I found
                                 results are unexpected
                                 data needs to be transformed from one form to
                                 other




Dept of MCA, NIT, Durgapur.      September 6, 2012                               2
What We Need?
     A single, complete and consistent
     store of data obtained from a variety
     of different sources made available to
     end users in a what they can
     understand and use, in a Business
     Context / Subject.

                                   [Barry Devlin]



 Leads towards Business Analysis




Dept of MCA, NIT, Durgapur.            September 6, 2012   3
Subject
                              Orientation
      ļ± Organized around major subjects, such as
        customer, product, sales.

      ļ± Focusing on the modeling and analysis of data for
       decision makers, not on daily operations or
       transaction processing.

      ļ± Provide a simple and concise view around
       particular subject issues, by excluding data that are
       not useful in the decision support process.

Dept of MCA, NIT, Durgapur.      September 6, 2012             4
What Are Analytical
                                   Needs?
                                     Which are our
                                     Which are our
                                 lowest/highest margin
                                  lowest/highest margin
                                      customers ?
                                       customers ?
                                                                Who are my customers
                                                                Who are my customers
        What is the most
         What is the most                                        and what products
                                                                  and what products
      effective distribution
       effective distribution                                     are they buying?
                                                                   are they buying?
            channel?
             channel?


   What product prom-
    What product prom-                                                Which customers
                                                                       Which customers
-otions have the biggest
 -otions have the biggest                                           are most likely to go
                                                                     are most likely to go
   impact on revenue?
    impact on revenue?                                              to the competition ?
                                                                     to the competition ?
                                    What impact will
                                     What impact will
                                 new products/services
                                  new products/services
                                    have on revenue
                                     have on revenue
                                      and margins?
                                       and margins?
Dept of MCA, NIT, Durgapur.                 September 6, 2012                                5
Decision Support System
                  Used to manage and control business
                  Data is historical or point-in-time
                  Optimized for inquiry rather than update
                  Use of the system is loosely defined and can
                  be ad-hoc
                  Used by managers and end-users to
                  understand the business and make
                  judgements




Dept of MCA, NIT, Durgapur.          September 6, 2012           6
Evolution of Decision Support
          60ā€™s: Batch reports
                hard to find and analyze information

                inflexible and expensive, reprogram every request

          70ā€™s: Terminal based DSS and EIS

          80ā€™s: Desktop data access and analysis tools
                query tools, spreadsheets, GUIs

                easy to use, but access only operational db

          90ā€™s: Data warehousing with integrated OLAP engines and
          tools
                To meet the analytical needs of the business.

Dept of MCA, NIT, Durgapur.                   September 6, 2012     7
What are the users saying...

           Data should be integrated across the
           enterprise
           Summary data had a real value to
           the organization
           Historical data held the key to
           understanding data over time
           What-if capabilities are required




Dept of MCA, NIT, Durgapur.            September 6, 2012   8
Need Separate Process?

                               Technique for assembling and
                               managing data from various sources
                               for the purpose of answering business
                               questions. Thus making decisions that
                               were not previously possible.

                               A decision support database
                               maintained separately from the
                               organizationā€™s operational database




Dept of MCA, NIT, Durgapur.      September 6, 2012                     9
Traditional RDBMS used for OLTP
                  Database Systems have been used traditionally
                  for OLTP
                        clerical data processing tasks
                        detailed, up to date data
                        structured repetitive tasks
                        read/update a few records
                        isolation, recovery and integrity are critical
                        Normalization is mandatory



                  Will call these Operational Database
Dept of MCA, NIT, Durgapur.                     September 6, 2012        10
Decision Support
                                 Database
              ļ± Defined in many different ways, but not
               rigorously.
                    ļƒ˜ A decision support database that is
                     maintained separately from the
                     organizationā€™s operational database
                    ļƒ˜ Support information processing by providing
                     a solid platform of consolidated, historical
                     data for analysis.

Dept of MCA, NIT, Durgapur.          September 6, 2012              11
Some Common Terms
     Operational databases: Operational databases are detail oriented
     databases defined to meet the needs of sometimes very complex
     processes in a company. This detailed view is reflected in the data
     arrangement in the database. The data is highly normalized to avoid data
     redundancy and ā€œcomplex-maintenance".


     OLTP: On-Line Transaction Processing (OLTP) describes the way data
     is processed by an end user or a computer system. It is detail oriented,
     highly repetitive with massive amounts of updates and changes of the
     data by the end user. It is also very often described as the use of
     computers to run the on-going operation of a business.

Dept of MCA, NIT, Durgapur.           September 6, 2012                         12
Some Common Terms
                                         Contā€¦

          Data warehouse: A data warehouse collects, organizes, and makes
          data available for the purpose of analysis ā€” to give management the
          ability to access and analyze information about its business. This type
          of data can be called "informational data". The systems used to work
          with informational data are referred to as OLAP (On-Line Analytical
          Processing).


          We will call it Informational Database .




Dept of MCA, NIT, Durgapur.             September 6, 2012                      13
Some Common Terms
                                            Contā€¦




          Operational versus informational databases
          The major difference between operational and informational databases is the
          update frequency:
          1. On operational databases a high number of transactions take place every
          hour. The database is always "up to date", and it represents a snapshot of
          the current business situation, or more commonly referred to as point in
          time.

          2. Informational databases are usually stable over a period of time to
          represent a situation at a specific point in time in the past, which can be
          noted as historical data.
Dept of MCA, NIT, Durgapur.                  September 6, 2012                          14
Some Common Terms
                                             Contā€¦

          OLAP: On-Line Analytical Processing (OLAP) is a category of software
          technology that enables analysts, managers and executives to gain insight into
          data through fast, consistent, interactive access to a wide variety of possible
          views of information that has been transformed from raw data to reflect the real
          dimensionality of the enterprise as understood by the user.

          OLAP is implemented in a multi-user client/server mode and offers
          consistently rapid response to queries, regardless of database size and
          complexity. OLAP helps the user synthesize enterprise information through
          comparative, personalized viewing, as well as through analysis of historical
          and projected data in various "what-if" data model scenarios. This is achieved
          through use of an OLAP Server.



Dept of MCA, NIT, Durgapur.                 September 6, 2012                           15
OLTP vs. Data Warehouse
                  OLTP                               Warehouse (OLAP)
                        Application Oriented              Subject Oriented
                        Used to run business              Used to analyze business
                        Clerical User                     Manager/Analyst
                        Detailed data                     Summarized and refined
                        Current up to date                Snapshot data
                        Isolated Data                     Integrated Data
                        Repetitive access by              Ad-hoc access using
                        small transactions                large queries
                        Read/Update access                Mostly read access (batch
                                                          update)

Dept of MCA, NIT, Durgapur.                    September 6, 2012                      16
Some Common Terms
                                              Contā€¦

          Metadata ā€” a definition

          Metadata is the kind of information that describes the data stored in a
          database and includes such information as:

          ā€¢ A description of tables and fields in the data warehouse, including data
          types and the range of acceptable values.

          ā€¢ A similar description of tables and fields in the source databases, with a
          mapping of fields from the source to the warehouse.

          ā€¢ A description of how the data has been transformed, including formulae,
          formatting, currency conversion, and time aggregation.

          ā€¢ Any other information that is needed to support and manage the operation
          of the data warehouse.


Dept of MCA, NIT, Durgapur.                  September 6, 2012                           17
Some Common Terms
                                       Contā€¦

     Data mart: A data mart contains a subset of corporate data that is of
     value to a specific business unit, department, or set of users. This subset
     consists of historical, summarized, and possibly detailed data captured
     from transaction processing systems, or from an enterprise data
     warehouse. It is important to realize that a data mart is defined by the
     functional scope of its users, and not by the size of the data mart
     database. Most data marts today involve less than 100 GB of data; some
     are larger, however it is expected that as data mart usage increases they
     will rapidly increase in size.

     Data mining: Data mining is the process of extracting valid, useful,
     previously unknown, and comprehensible information from data and using
     it to make business decisions.
Dept of MCA, NIT, Durgapur.            September 6, 2012                       18
Problem in General Purpose SQL
            Let a set of database schemas are as follows:
            1. Product ( P_ID, P_NAME, P_DESC);
            2. Sales (R_NO, P_ID, Q_ID, AMOUNT);
            3. Time (Q_ID, Q_DESC);


            Say, the organization need to generate a report as follows:

              Product         4Q96 Sales        4Q97 Sales
                  XYZ              57                66
                  ABC              29                24
                  PQR             115               89


Dept of MCA, NIT, Durgapur.              September 6, 2012                19
Problem in SQL                   Contā€¦


       The SQL may be needed to display the Fourth Quarter 1996 Sales may be
       as follows:


       SELECT Product.P_Name, SUM(Sales.DOLLAR)
       FROM Sales, Product, Time
       WHERE . . . Time.Q_ID= '4Q96'
       AND Product.Product_Name in (ā€˜XYZ', ā€˜ABC', ā€˜PQR')
       GROUP BY Product.P_NAME

       If one expand the Time constraint to include both quarters, as follows:

       WHERE . . . Time.Quarter IN ('4Q96', '4Q97')

       then the sum expression adds up the sales from both quarters, which
       we do not want. Also SQL not gives any other alternative.

          Hence General SQL Engine fails in case of query like above.

Dept of MCA, NIT, Durgapur.                September 6, 2012                     20

More Related Content

What's hot

Putting customer insight into practice, Peter Gadsdon, Lewisham Council
Putting customer insight into practice, Peter Gadsdon, Lewisham CouncilPutting customer insight into practice, Peter Gadsdon, Lewisham Council
Putting customer insight into practice, Peter Gadsdon, Lewisham Council
localinsight
Ā 
NINtec corporate presentation
NINtec corporate presentationNINtec corporate presentation
NINtec corporate presentation
NINtec
Ā 
Advocate Consulting - Enterprise Communications
Advocate Consulting - Enterprise CommunicationsAdvocate Consulting - Enterprise Communications
Advocate Consulting - Enterprise Communications
Advocate Consulting
Ā 
QServ Corporation Sap BI Brochure
QServ Corporation Sap BI BrochureQServ Corporation Sap BI Brochure
QServ Corporation Sap BI Brochure
Manisha Sangwan
Ā 
2ST.net Corporate Overview 2012
2ST.net Corporate Overview 20122ST.net Corporate Overview 2012
2ST.net Corporate Overview 2012
chohl
Ā 
Cost Reduction Guide Issue 6 IT
Cost Reduction Guide Issue 6 ITCost Reduction Guide Issue 6 IT
Cost Reduction Guide Issue 6 IT
ymw15
Ā 

What's hot (19)

Customer Contact Solutions
Customer Contact SolutionsCustomer Contact Solutions
Customer Contact Solutions
Ā 
Automated loan processing
Automated loan processingAutomated loan processing
Automated loan processing
Ā 
Putting customer insight into practice, Peter Gadsdon, Lewisham Council
Putting customer insight into practice, Peter Gadsdon, Lewisham CouncilPutting customer insight into practice, Peter Gadsdon, Lewisham Council
Putting customer insight into practice, Peter Gadsdon, Lewisham Council
Ā 
GE Healthcare - HP Case Study
GE Healthcare - HP Case StudyGE Healthcare - HP Case Study
GE Healthcare - HP Case Study
Ā 
NINtec corporate presentation
NINtec corporate presentationNINtec corporate presentation
NINtec corporate presentation
Ā 
Advocate Consulting - Enterprise Communications
Advocate Consulting - Enterprise CommunicationsAdvocate Consulting - Enterprise Communications
Advocate Consulting - Enterprise Communications
Ā 
QServ Corporation Sap BI Brochure
QServ Corporation Sap BI BrochureQServ Corporation Sap BI Brochure
QServ Corporation Sap BI Brochure
Ā 
ā€œA Practitionerā€™s Viewā€ on the latest trends and information on BI/ DW techno...
ā€œA Practitionerā€™s Viewā€ on the latest trends and information on BI/ DW techno...ā€œA Practitionerā€™s Viewā€ on the latest trends and information on BI/ DW techno...
ā€œA Practitionerā€™s Viewā€ on the latest trends and information on BI/ DW techno...
Ā 
QServ Retail Analytics Offering
QServ Retail Analytics OfferingQServ Retail Analytics Offering
QServ Retail Analytics Offering
Ā 
QServ Retail Analytics Offering
QServ Retail Analytics OfferingQServ Retail Analytics Offering
QServ Retail Analytics Offering
Ā 
IBM Business Analytics and Optimization - Introduktion till Prediktiv Analys
IBM Business Analytics and Optimization - Introduktion till Prediktiv AnalysIBM Business Analytics and Optimization - Introduktion till Prediktiv Analys
IBM Business Analytics and Optimization - Introduktion till Prediktiv Analys
Ā 
Iscram09 Grant Ppr248 Mixed Rational Naturalistic Ds Final Slides 090506
Iscram09 Grant Ppr248 Mixed Rational Naturalistic Ds Final Slides 090506Iscram09 Grant Ppr248 Mixed Rational Naturalistic Ds Final Slides 090506
Iscram09 Grant Ppr248 Mixed Rational Naturalistic Ds Final Slides 090506
Ā 
2ST.net Corporate Overview 2012
2ST.net Corporate Overview 20122ST.net Corporate Overview 2012
2ST.net Corporate Overview 2012
Ā 
Probabilistic Soft Logic
Probabilistic Soft LogicProbabilistic Soft Logic
Probabilistic Soft Logic
Ā 
IBM Information Management - Optimera er verksamhet och ƶka kundnyttan med nƤ...
IBM Information Management - Optimera er verksamhet och ƶka kundnyttan med nƤ...IBM Information Management - Optimera er verksamhet och ƶka kundnyttan med nƤ...
IBM Information Management - Optimera er verksamhet och ƶka kundnyttan med nƤ...
Ā 
121211 depfac ulb_master_presentation_v5_1
121211 depfac ulb_master_presentation_v5_1121211 depfac ulb_master_presentation_v5_1
121211 depfac ulb_master_presentation_v5_1
Ā 
LucidEra Introduction
LucidEra IntroductionLucidEra Introduction
LucidEra Introduction
Ā 
Make Money with Big Data (TCELab)
Make Money with Big Data (TCELab)Make Money with Big Data (TCELab)
Make Money with Big Data (TCELab)
Ā 
Cost Reduction Guide Issue 6 IT
Cost Reduction Guide Issue 6 ITCost Reduction Guide Issue 6 IT
Cost Reduction Guide Issue 6 IT
Ā 

Viewers also liked

13. factoreo
13. factoreo13. factoreo
13. factoreo
SALINAS
Ā 
Hw geography why is georgraphy important part 1 map
Hw geography why is georgraphy important part 1 mapHw geography why is georgraphy important part 1 map
Hw geography why is georgraphy important part 1 map
paulsturtivant
Ā 
C
CC
C
audfar
Ā 
Dritte Welt ErnƤhrung
Dritte Welt ErnƤhrungDritte Welt ErnƤhrung
Dritte Welt ErnƤhrung
alfred10
Ā 
373 Std. Ind. 3
373 Std. Ind. 3373 Std. Ind. 3
373 Std. Ind. 3
guest2f381e
Ā 
Subiaco Oval Business Strategy
Subiaco Oval Business Strategy Subiaco Oval Business Strategy
Subiaco Oval Business Strategy
ISFM Australasia
Ā 
Install wordpress offline
Install wordpress offlineInstall wordpress offline
Install wordpress offline
Iim Dadut
Ā 
Intl 2pp general flyer a4 dec2013 web
Intl 2pp general flyer a4 dec2013 webIntl 2pp general flyer a4 dec2013 web
Intl 2pp general flyer a4 dec2013 web
Thieu Nguyen
Ā 

Viewers also liked (20)

Taller ViquipĆØdia al Museu del Disseny
Taller ViquipĆØdia al Museu del DissenyTaller ViquipĆØdia al Museu del Disseny
Taller ViquipĆØdia al Museu del Disseny
Ā 
Advertising
AdvertisingAdvertising
Advertising
Ā 
13. factoreo
13. factoreo13. factoreo
13. factoreo
Ā 
Hw geography why is georgraphy important part 1 map
Hw geography why is georgraphy important part 1 mapHw geography why is georgraphy important part 1 map
Hw geography why is georgraphy important part 1 map
Ā 
Using Manual about Ad900 Operating Car Tool
Using Manual about Ad900 Operating Car ToolUsing Manual about Ad900 Operating Car Tool
Using Manual about Ad900 Operating Car Tool
Ā 
Tenis
TenisTenis
Tenis
Ā 
C
CC
C
Ā 
Dritte Welt ErnƤhrung
Dritte Welt ErnƤhrungDritte Welt ErnƤhrung
Dritte Welt ErnƤhrung
Ā 
The Magnificient 7 Review 6
The Magnificient 7   Review 6The Magnificient 7   Review 6
The Magnificient 7 Review 6
Ā 
Power Point Presention
Power Point PresentionPower Point Presention
Power Point Presention
Ā 
Werbewoche-on-Buzzer
Werbewoche-on-BuzzerWerbewoche-on-Buzzer
Werbewoche-on-Buzzer
Ā 
Evidencias unidad 2
Evidencias unidad 2Evidencias unidad 2
Evidencias unidad 2
Ā 
Arumanis Rainbow
Arumanis RainbowArumanis Rainbow
Arumanis Rainbow
Ā 
373 Std. Ind. 3
373 Std. Ind. 3373 Std. Ind. 3
373 Std. Ind. 3
Ā 
Mediaki Solutions - Advanced Solutions for Tourism & Travel Industry
Mediaki Solutions - Advanced Solutions for Tourism & Travel IndustryMediaki Solutions - Advanced Solutions for Tourism & Travel Industry
Mediaki Solutions - Advanced Solutions for Tourism & Travel Industry
Ā 
Subiaco Oval Business Strategy
Subiaco Oval Business Strategy Subiaco Oval Business Strategy
Subiaco Oval Business Strategy
Ā 
Moving Checklist
Moving ChecklistMoving Checklist
Moving Checklist
Ā 
Emm3103
Emm3103Emm3103
Emm3103
Ā 
Install wordpress offline
Install wordpress offlineInstall wordpress offline
Install wordpress offline
Ā 
Intl 2pp general flyer a4 dec2013 web
Intl 2pp general flyer a4 dec2013 webIntl 2pp general flyer a4 dec2013 web
Intl 2pp general flyer a4 dec2013 web
Ā 

Similar to Dw

Data mining & warehousing
Data mining & warehousingData mining & warehousing
Data mining & warehousing
Samoneh Dashti
Ā 
Krithi talk-impact
Krithi talk-impactKrithi talk-impact
Krithi talk-impact
Karan7755
Ā 
Getting to Global Spend Visibility_Nestle
Getting to Global Spend Visibility_NestleGetting to Global Spend Visibility_Nestle
Getting to Global Spend Visibility_Nestle
Zycus
Ā 
Intersection of Business Intelligence and CRM vsr12
Intersection of Business Intelligence and CRM vsr12Intersection of Business Intelligence and CRM vsr12
Intersection of Business Intelligence and CRM vsr12
David J Rosenthal
Ā 
Why mTAB?
Why mTAB?Why mTAB?
Why mTAB?
Brad Hontz
Ā 
Mobile Analytics
Mobile AnalyticsMobile Analytics
Mobile Analytics
arunvanlvanoor
Ā 
Monetizing data - An Evening with Eight of Chicago's Data Product Management...
Monetizing data  - An Evening with Eight of Chicago's Data Product Management...Monetizing data  - An Evening with Eight of Chicago's Data Product Management...
Monetizing data - An Evening with Eight of Chicago's Data Product Management...
Randy Horton
Ā 

Similar to Dw (20)

Data mining & warehousing
Data mining & warehousingData mining & warehousing
Data mining & warehousing
Ā 
Krithi talk-impact
Krithi talk-impactKrithi talk-impact
Krithi talk-impact
Ā 
Leverage IBM Business Analytics with PMSquare
Leverage IBM Business Analytics with PMSquareLeverage IBM Business Analytics with PMSquare
Leverage IBM Business Analytics with PMSquare
Ā 
OLAP Release 13082012
OLAP Release 13082012OLAP Release 13082012
OLAP Release 13082012
Ā 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasir
Ā 
Seven building blocks for MDM
Seven building blocks for MDMSeven building blocks for MDM
Seven building blocks for MDM
Ā 
How to make data actionable for business
How to make data actionable for businessHow to make data actionable for business
How to make data actionable for business
Ā 
Getting to Global Spend Visibility_Nestle
Getting to Global Spend Visibility_NestleGetting to Global Spend Visibility_Nestle
Getting to Global Spend Visibility_Nestle
Ā 
Predictive Analytics with IBM Cognos 10
Predictive Analytics with IBM Cognos 10Predictive Analytics with IBM Cognos 10
Predictive Analytics with IBM Cognos 10
Ā 
Business Intelligence: The Definitive Guide
Business Intelligence: The Definitive GuideBusiness Intelligence: The Definitive Guide
Business Intelligence: The Definitive Guide
Ā 
iClaims SWOT
iClaims SWOTiClaims SWOT
iClaims SWOT
Ā 
Decision Engineering Pass conference presentation 2014
Decision Engineering Pass conference presentation 2014Decision Engineering Pass conference presentation 2014
Decision Engineering Pass conference presentation 2014
Ā 
SharePoint MoneyBall: The Art of Winning the SharePoint Metrics Game by Susan...
SharePoint MoneyBall: The Art of Winning the SharePoint Metrics Game by Susan...SharePoint MoneyBall: The Art of Winning the SharePoint Metrics Game by Susan...
SharePoint MoneyBall: The Art of Winning the SharePoint Metrics Game by Susan...
Ā 
Intersection of Business Intelligence and CRM vsr12
Intersection of Business Intelligence and CRM vsr12Intersection of Business Intelligence and CRM vsr12
Intersection of Business Intelligence and CRM vsr12
Ā 
Why mTAB?
Why mTAB?Why mTAB?
Why mTAB?
Ā 
Mobile Analytics
Mobile AnalyticsMobile Analytics
Mobile Analytics
Ā 
Improve Efficiency & Reduce Costs through BI in Fertilizer Sector
Improve Efficiency & Reduce Costs through BI in Fertilizer SectorImprove Efficiency & Reduce Costs through BI in Fertilizer Sector
Improve Efficiency & Reduce Costs through BI in Fertilizer Sector
Ā 
Data warehousing
Data warehousingData warehousing
Data warehousing
Ā 
Monetizing data - An Evening with Eight of Chicago's Data Product Management...
Monetizing data  - An Evening with Eight of Chicago's Data Product Management...Monetizing data  - An Evening with Eight of Chicago's Data Product Management...
Monetizing data - An Evening with Eight of Chicago's Data Product Management...
Ā 
[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics
Ā 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
Ā 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
Ā 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Ā 
Navi Mumbai Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls šŸ„° 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Ā 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Ā 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
Ā 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Ā 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Ā 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
Ā 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Ā 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
Ā 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Ā 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Ā 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Ā 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Ā 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Ā 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
Ā 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Ā 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Ā 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
Ā 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
Ā 

Dw

  • 1. Data Warehouse An Introduction Lecture - 2 Dept of MCA, NIT, Durgapur. September 6, 2012 1
  • 2. Data, Data everywhere yet ... I canā€™t find the data I need data is scattered over the network many versions, subtle differences I canā€™t get the data I need need an expert to get the data I canā€™t understand the data I found available data poorly documented I canā€™t use the data I found results are unexpected data needs to be transformed from one form to other Dept of MCA, NIT, Durgapur. September 6, 2012 2
  • 3. What We Need? A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use, in a Business Context / Subject. [Barry Devlin] Leads towards Business Analysis Dept of MCA, NIT, Durgapur. September 6, 2012 3
  • 4. Subject Orientation ļ± Organized around major subjects, such as customer, product, sales. ļ± Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. ļ± Provide a simple and concise view around particular subject issues, by excluding data that are not useful in the decision support process. Dept of MCA, NIT, Durgapur. September 6, 2012 4
  • 5. What Are Analytical Needs? Which are our Which are our lowest/highest margin lowest/highest margin customers ? customers ? Who are my customers Who are my customers What is the most What is the most and what products and what products effective distribution effective distribution are they buying? are they buying? channel? channel? What product prom- What product prom- Which customers Which customers -otions have the biggest -otions have the biggest are most likely to go are most likely to go impact on revenue? impact on revenue? to the competition ? to the competition ? What impact will What impact will new products/services new products/services have on revenue have on revenue and margins? and margins? Dept of MCA, NIT, Durgapur. September 6, 2012 5
  • 6. Decision Support System Used to manage and control business Data is historical or point-in-time Optimized for inquiry rather than update Use of the system is loosely defined and can be ad-hoc Used by managers and end-users to understand the business and make judgements Dept of MCA, NIT, Durgapur. September 6, 2012 6
  • 7. Evolution of Decision Support 60ā€™s: Batch reports hard to find and analyze information inflexible and expensive, reprogram every request 70ā€™s: Terminal based DSS and EIS 80ā€™s: Desktop data access and analysis tools query tools, spreadsheets, GUIs easy to use, but access only operational db 90ā€™s: Data warehousing with integrated OLAP engines and tools To meet the analytical needs of the business. Dept of MCA, NIT, Durgapur. September 6, 2012 7
  • 8. What are the users saying... Data should be integrated across the enterprise Summary data had a real value to the organization Historical data held the key to understanding data over time What-if capabilities are required Dept of MCA, NIT, Durgapur. September 6, 2012 8
  • 9. Need Separate Process? Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previously possible. A decision support database maintained separately from the organizationā€™s operational database Dept of MCA, NIT, Durgapur. September 6, 2012 9
  • 10. Traditional RDBMS used for OLTP Database Systems have been used traditionally for OLTP clerical data processing tasks detailed, up to date data structured repetitive tasks read/update a few records isolation, recovery and integrity are critical Normalization is mandatory Will call these Operational Database Dept of MCA, NIT, Durgapur. September 6, 2012 10
  • 11. Decision Support Database ļ± Defined in many different ways, but not rigorously. ļƒ˜ A decision support database that is maintained separately from the organizationā€™s operational database ļƒ˜ Support information processing by providing a solid platform of consolidated, historical data for analysis. Dept of MCA, NIT, Durgapur. September 6, 2012 11
  • 12. Some Common Terms Operational databases: Operational databases are detail oriented databases defined to meet the needs of sometimes very complex processes in a company. This detailed view is reflected in the data arrangement in the database. The data is highly normalized to avoid data redundancy and ā€œcomplex-maintenance". OLTP: On-Line Transaction Processing (OLTP) describes the way data is processed by an end user or a computer system. It is detail oriented, highly repetitive with massive amounts of updates and changes of the data by the end user. It is also very often described as the use of computers to run the on-going operation of a business. Dept of MCA, NIT, Durgapur. September 6, 2012 12
  • 13. Some Common Terms Contā€¦ Data warehouse: A data warehouse collects, organizes, and makes data available for the purpose of analysis ā€” to give management the ability to access and analyze information about its business. This type of data can be called "informational data". The systems used to work with informational data are referred to as OLAP (On-Line Analytical Processing). We will call it Informational Database . Dept of MCA, NIT, Durgapur. September 6, 2012 13
  • 14. Some Common Terms Contā€¦ Operational versus informational databases The major difference between operational and informational databases is the update frequency: 1. On operational databases a high number of transactions take place every hour. The database is always "up to date", and it represents a snapshot of the current business situation, or more commonly referred to as point in time. 2. Informational databases are usually stable over a period of time to represent a situation at a specific point in time in the past, which can be noted as historical data. Dept of MCA, NIT, Durgapur. September 6, 2012 14
  • 15. Some Common Terms Contā€¦ OLAP: On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user. OLAP is implemented in a multi-user client/server mode and offers consistently rapid response to queries, regardless of database size and complexity. OLAP helps the user synthesize enterprise information through comparative, personalized viewing, as well as through analysis of historical and projected data in various "what-if" data model scenarios. This is achieved through use of an OLAP Server. Dept of MCA, NIT, Durgapur. September 6, 2012 15
  • 16. OLTP vs. Data Warehouse OLTP Warehouse (OLAP) Application Oriented Subject Oriented Used to run business Used to analyze business Clerical User Manager/Analyst Detailed data Summarized and refined Current up to date Snapshot data Isolated Data Integrated Data Repetitive access by Ad-hoc access using small transactions large queries Read/Update access Mostly read access (batch update) Dept of MCA, NIT, Durgapur. September 6, 2012 16
  • 17. Some Common Terms Contā€¦ Metadata ā€” a definition Metadata is the kind of information that describes the data stored in a database and includes such information as: ā€¢ A description of tables and fields in the data warehouse, including data types and the range of acceptable values. ā€¢ A similar description of tables and fields in the source databases, with a mapping of fields from the source to the warehouse. ā€¢ A description of how the data has been transformed, including formulae, formatting, currency conversion, and time aggregation. ā€¢ Any other information that is needed to support and manage the operation of the data warehouse. Dept of MCA, NIT, Durgapur. September 6, 2012 17
  • 18. Some Common Terms Contā€¦ Data mart: A data mart contains a subset of corporate data that is of value to a specific business unit, department, or set of users. This subset consists of historical, summarized, and possibly detailed data captured from transaction processing systems, or from an enterprise data warehouse. It is important to realize that a data mart is defined by the functional scope of its users, and not by the size of the data mart database. Most data marts today involve less than 100 GB of data; some are larger, however it is expected that as data mart usage increases they will rapidly increase in size. Data mining: Data mining is the process of extracting valid, useful, previously unknown, and comprehensible information from data and using it to make business decisions. Dept of MCA, NIT, Durgapur. September 6, 2012 18
  • 19. Problem in General Purpose SQL Let a set of database schemas are as follows: 1. Product ( P_ID, P_NAME, P_DESC); 2. Sales (R_NO, P_ID, Q_ID, AMOUNT); 3. Time (Q_ID, Q_DESC); Say, the organization need to generate a report as follows: Product 4Q96 Sales 4Q97 Sales XYZ 57 66 ABC 29 24 PQR 115 89 Dept of MCA, NIT, Durgapur. September 6, 2012 19
  • 20. Problem in SQL Contā€¦ The SQL may be needed to display the Fourth Quarter 1996 Sales may be as follows: SELECT Product.P_Name, SUM(Sales.DOLLAR) FROM Sales, Product, Time WHERE . . . Time.Q_ID= '4Q96' AND Product.Product_Name in (ā€˜XYZ', ā€˜ABC', ā€˜PQR') GROUP BY Product.P_NAME If one expand the Time constraint to include both quarters, as follows: WHERE . . . Time.Quarter IN ('4Q96', '4Q97') then the sum expression adds up the sales from both quarters, which we do not want. Also SQL not gives any other alternative. Hence General SQL Engine fails in case of query like above. Dept of MCA, NIT, Durgapur. September 6, 2012 20