SlideShare a Scribd company logo
1 of 11
Objectives
                                                                             Motivation: Why data warehouse?
                                                                             What is a data warehouse?
                                                                             Why separate DW?
                                                                                y p
                                                                             Conceptual modeling of DW
                                                                             Data Mart
                                                                             Data Warehousing Architectures
        Data Warehousing and OLAP                                            Data Warehouse Development
                                  Lecture 2/DMBI/IKI83403T/MTI/UI
                                                                             Data Warehouse Vendors
                    Yudho Giri Sucahyo, Ph.D, CISA (yudho@cs.ui.ac.id)       Real-time DW
                                                                             R l
                  Faculty of Computer Science, University of Indonesia




                                                                         2




Motivation: Why data warehouse?                                          What is a data warehouse? [JH]
Construction of data warehouses (DW) involves data                           Defined in many different ways, but not rigorously.
cleaning and data integration    important                                    A decision support database that is maintained separately
preprocessing step for data mining (DM).                                      from the organization’s ODB.
DW provide OLAP for the interactive analysis of                               Support information processing by providing a solid platform
                                                                              of consolidated, historical data for analysis.
multidimensional data, which facilitates effective DM.
                      ,
                                                                             “A data warehouse is a subject-oriented, integrated,
Data mining functions can be integrated with OLAP
                                                                             time-variant, and nonvolatile collection of data in
operations to enhance interactive mining of knowledge.
                                                                             support of management’s decision-making process.” —
DW will provide an effective platform for DM.                                W. H. Inmon
While DW
Whil DWs are not requirements to do DM, DW store
                    t     i     t t d DM             t                       Case Study 2: Continental Airlines flies high with its
massive amounts of data that can be uses for DM. [DO]                        real-time data warehouse
3                                                                        4
What is a data warehouse? [ET]                                            Subject Oriented
    Data warehouse
                                                                              Organized around major subjects, such as
    A physical repository where relational data are specially
                                                                              customer, product, sales.
    organized to provide enterprise-wide, cleansed data in a
    standardized format.                                                      Provide i l
                                                                              P id a simple and concise view around
                                                                                                   d     i    i         d
    Characteristics                                                           particular subject issues by excluding data that
      Subject oriented, Integrated, Time Variant, Non-volatile                are not useful in the decision support process.
      Web-based, Relational/multidimensional, Client/server, Real-time
                                                                              Focusing on the modeling and analysis of data
      Include metadata
                                                                              for decision makers, not on daily operations or
    Data warehousing
      Process of constructing and using data warehouses.                      transaction processing
                                                                                           processing.
      Requires data integration, data cleaning, and data consolidation.

5                                                                         6




Integrated                                                                Time Variant
    Integrate multiple, heterogeneous data sources                            The time horizon for the data warehouse is significantly
      Relational databases, flat-files, on-line transaction records           longer than that of operational systems.
    Data cleaning and data integration techniques are
                g              g             q                                  Operational database: current value data.
    applied                                                                     Data warehouse data: provide information from a historical
      Ensure consistency in naming conventions, encoding                        perspective (e g past 5-10 years)
                                                                                            (e.g.,
      structures, attribute measures, etc. among different data
                                                                              Every key structure in the data warehouse
      sources
                                                                                Contains an element of time, explicitly or implicitly
        E.g., Hotel price: currency, tax, breakfast covered, etc.
                                                                                But the key of operational data may or may not contain “time
      When d i
      Wh data is moved to the warehouse, it is converted.
                     d     h      h      i i           d                        element”.


7                                                                         8
Non-volatile
Non volatile                                                                     Data Warehouse vs Heterogeneous DBMS
                                                                                                vs.
    A physically separate store of data transformed from the
      p y      y p                                                                Traditional heterogeneous DB integration:
                                                                                      Build wrappers/mediators on top of multiple, heterogeneous databases.
    operational environment.                                                          Ex: IBM Data Joiner, Informix DataBlade
    Operational update of data does not occur i the d
    O     i l d         fd d                  in h data                               Query d i
                                                                                      Q     driven approach:
                                                                                                          h
                                                                                        When a query is posed to a client site, a metadata-dictionary is used
    warehouse environment.
                                                                                        to translate the query into queries appropriate for the individual
      Does not require transaction processing, recovery, and                            heterogeneous sites involved. There queries are then mapped and sent
                                                                                        to local query processors. The results returned from the different
      concurrency control mechanisms
                y
                                                                                        sites are integrated into a global answer set.
                                                                                                           d         l b l
      Requires only two operations in data accessing:                                   Complex information filtering and integration processes, compete for
         iinitiall lloading of data and access of data.
            ii         di    fd       d         fd                                      resources.
                                                                                        resources
                                                                                        Inefficient and potentially expensive for frequent queries, especially for
                                                                                        q
                                                                                        queries requireing aggregations.
                                                                                                  q      g gg g

9                                                                                10




Data Warehouse vs Heterogeneous DBMS (2)
               vs.                                                               DW vs ODB
                                                                                    vs.
Using DW           update-driven approach                                         Major task of ODB                 OLTP:
     Information from multiple, heterogeneous sources is integrated in advance     Day-to-day operations: purchasing, inventory, banking,
     and stored in a warehouse for direct querying and analysis.                   manufacturing, payroll, registration, accounting, etc.
Unlike OLTP, DW do not contain the most current information
        OLTP                                        information.                  DW serve f data analysis and decision making
                                                                                           for d        l i      dd i i        ki       OLAP
DW brings high performance to the integrated heterogeneous                        Distinct Features (OLTP vs. OLAP)
DB system since data are copied, preprocessed, integrated,
                         copied preprocessed integrated                               User and system orientation: customer vs. market
                                                                                      U       d            i     i                      k
annotated, summarized, and restructured into one data store.                          Data contents: current, detailed vs. historical, consolidated
Query processing in DW does not interfere with the processing                         Database design: ER + application vs. star + subject
                                                                                                                          vs
at local sources                                                                      View: current, local vs. evolutionary, integrated
DW can store and integrate historical information and support
                      g                                  pp                           Access patterns: update vs. read-only but complex queries
                                                                                                                    read only
complex multidimensional queries.


11                                                                               12
OLTP vs OLAP                                                                    Why Separate DW?
                     OLTP                        OLAP
users                Clerk,
                     Clerk IT professional       Knowledge worker                High performance for both systems:
                                                                                   g p                      y
function             day to day operations       decision support                    DBMS — tuned for OLTP: access methods, indexing,
DB design            application-oriented        subject-oriented                    concurrency control, recovery
data                 current, up-to-date         historical,                         Warehouse — tuned for OLAP: complex OLAP queries,
                     detailed, flat relational   summarized, multidimensional        computation of large groups of data at summarized levels,
                     isolated                    integrated, consolidated            multidimensional view, consolidation.
                                                                                                          ,
usage                repetitive                  ad-hoc
access               read/write                  lots of scans                   Processing OLAP queries in operational databases would
                     index/hash on prim. key                                     degrade the performance of operational tasks.
unit of work         short, simple transaction   complex query
                                                                                 In ODB, concurrency control and recovery mechanisms
# records accessed   tens                        millions
                                                                                 (locking, logging) are required to ensure the consistency
#users               thousands                   hundreds
                                                                                 and robustness of transactions.
                                                                                    d b             f        i
DB size              100MB-GB                    100GB-TB
metric               transaction throughput      query throughput, response      OLAP       read only access. No need for concurrency
                                                                                 control and recovery
                                                                                              recovery.
13                                                                              14




Why Separate DW? (2)                                                            Conceptual Modeling of DW
 Different functions and different data:                                        Data Cube:
       missing data: Decision support requires historical data which
       operational DBs do not typically maintain. So, data in ODB is             see TSBD Lecture Notes on Visualization of Data Cubes
       usually far from complete for decision making.
             y                p                        g
                                                                                Modeling d t
                                                                                M d li data warehouses: dimensions & measurements
                                                                                                h       di    i                t
       data consolidation: DS requires consolidation (aggregation,
       summarization) of data from heterogeneous sources. ODB                    Star schema: A single object (fact table) in the middle connected
       contain detailed raw data (transactions) which need to be
           t i d t il d       d t (t       ti ) hi h         dt b                to a number of objects (dimension tables one for each
                                                                                                                      tables,
       consolidated before analysis.                                             dimension).
       data quality: different sources typically use inconsistent data
            q      y                    yp     y                                 Snowflake schema: A refinement of star schema where the
       representations, codes and formats which have to be                       dimensional hierarchy is represented explicitly by normalizing
       reconciled.                                                               the dimension tables.
                                                                                 Fact constellations: Multiple fact tables share dimension tables.
                                                                                   Also known as galaxy schema

15                                                                              16
Example of Star Schema                                               Snowflake Schema
                                              Product
Date                                                                      Year                                                                        Product
Day                                           ProductNo                   Year      Month
                    Sales Fact Table          ProdName                                           Date                                                ProductNo
Month                                                                               Month                              Sales Fact Table
                                              ProdDesc                                                                                               ProdName
Year                              Date                                              Year         Day
                                              Category
                                              C                                                                                                      ProdDesc
                                                                                                                                     Date
                                              QOH                                                Month                                               Category
                              Product
Store                                                                                                                             Product            QOH
                                Store
StoreID                                       Cust                                                      Store                      Store
City                         Customer                                                                                                                Cust
                                              CustId                                           City     StoreID                 Customer
State                                         CustName
                                              C tN                                                      City
                                                                                                        Cit                                          CustId
                             unit_sales                                                        City
Country                                       CustCity                                                                          unit_sales           CustName
                                                                                     State     State
Region                     dollar_sales       CustCountry                                                                                            CustCity
                                                                                     State                                    dollar_sales           CustCountry
                            Yen_sales
                                          Potensi Redundansi              Country    Country
                                                                                                                              Yen_sales
     Measurements
                                           Bandung, Bogor keduanya        Country
                                                                          Region
                                           ada di Jawa Barat                                 Measurements
17                                                                   18




View of Warehouses and Hierarchies                                   Data Cube

                                                                                                                Date
                                                                                                                D t                         Total annual sales
                                                                                                        2Qtr                                 of TV in U.S.A.
                                                                                             1Qtr                 3Qtr     4Qtr    sum
                                                                            TV
                                                                          PC                                                                 U.S.A
                                                                                                                                             USA
                                                                        VCR
                                              Importing data          sum




                                                                                                                                                       Country
                                                                                                                                            Ca ada
                                                                                                                                            Canada
                                              Table Browsing
                                              Dimension creation                                                                             Mexico

                                              Dimension browsing
                                                                                                                                              sum
                                              Cube building
                                                          g
                                              Cube browsing


19                                                                   20
Data Cube                                                           Typical OLAP Operations
                                                                    Roll up (drill-up): summarize data
                                                                       by climbing up hierarchy or by dimension reduction
                                                                    Drill down (roll down): reverse of roll-up
                                                                       from higher level summary to lower level summary or detailed data or
                                                                                                                                         data,
                                                                       introducing new dimensions
                                                                    Slice and dice:
                                                                       project and select
                                                                    Pivot (rotate):
                                                                       reorient the cube, visualization, 3D to series of 2D planes.
                                                                    Other operations
                                                                       drill
                                                                       d ill across: iinvolving (
                                                                                          l i (across) more than one fact table.
                                                                                                     )      th         f t t bl
                                      Visualization
                                                                       drill through: through the bottom level to its back-end relational tables.
                                      OLAP capabilities
                                                p                   More info:
21
                                      Interactive manipulation        www.knowledgecenters.org, www.olapreport.com, www.olapcouncil.org
                                                                     22




Data Mart                                                           Data Mart
 DW collects information about subjects that span the                 A data mart can be either dependent or independent.
 entire organization, such as customers, products, sales, assets,     A dependent data mart is a subset that is created directly
 and personnel. Its scope is enterprise-wide.                         from the DW.
 For DW, fact constellation schema is commonly used                       Consistent data model
 since it can model multiple, interrelated subjects.                      Providing quality data
 Data Mart is a subset of a DW, focuses on a particular                   DW must be constructed first
 subject. Its scope is department-wide. Typically, a data mart            Ensures that the user viewing the same version of the data that
 consisting of a single subject area (e.g. marketing,
               f      l    b          (       k                           are accessed by all other d warehouse users
                                                                                     d b ll h data           h
 operations).                                                         An independent data mart is a small warehouse designed
 For Data Mart, star or snowflake schema are commonly                 for department, and i source is not an EDW.
                                                                      f ad               d its       i        EDW
 used since both are geared towards modeling single
 subjects, although th star schema i more popular.
    bj t lth h the t            h       is            l
23                                                                   24
Data Warehousing Process Overview   Data Warehousing Process Overview
                                     The major components of a data warehousing process
                                         Data sources
                                           Legacy systems, external data providers (e.g. BPS), OLTP,
                                           ERP Systems
                                         Data extraction
                                         Data loading
                                         Comprehensive database
                                         Metadata
                                         Middleware tools




25                                  26




Data Warehousing Architectures      Data Warehousing Architectures




27                                  28
Data Warehousing Architectures                                                 Data Warehousing Architectures




29                                                                             30




Data Integration and the ETL Process                                           Data Integration and the ETL Process
 Various integration technologies:                                              ETL
     Enterprise Application Integration (EAI)                                       60-70% of the time in a data-centric project.
       A technology that provides a vehicle for pushing data from source            Extraction: Reading data from one or more databases
       systems i t a data warehouse
          t     into d t         h                                                  Transformation
       Integrating application functionality and is focused on sharing                Converting the extracted data from its previous form into the form in
       functionality across systems                                                   which it needs to be so that it can be placed into a DW
       Traditionally, API. Nowadays, SOA (web services).                            Load
     Enterprise Information Integration (EII)                                         Putting the
        An evolving tool space that promises real-time data integration from        data
                                                                                    d into
       a variety of sources, such as relational databases, Web services, and        the DW
       multidimensional databases
       A mechanism for pulling data from source systems to satisfy a request
       for information.


31                                                                             32
Data Warehouse Development                                             Data Warehouse Development
 Direct benefits                                                      Some best practices for implementing a DW (Weir, 2002):
     Allowing end users to perform extensive analysis in numerous            Project must fit with corporate strategy and business objectives
     ways                                                                    There must be complete buy-in to the project by executives,
     A consolidated view of corporate data (i.e a single version of
                             f             (                      f          managers,
                                                                             managers and users
     the truth)                                                              It is important to manage user expectations about the completed
                                                                             project
     Better and more timely information
                                                                             The data warehouse must be built incrementally
     Enhanced system performance. DW frees production                        Build in adaptability
     processing because some operational system reporting
                                                                             Managed b b h IT and b i
                                                                             M        d by both     d business professionals
                                                                                                                   f i l
     requirements are moved to DSS
                                                                             Develop a business/supplier relationship
     Simplification of data access
                                                                             Only load data th t h
                                                                             O l l d d t that have been cleansed and are of a quality
                                                                                                      b      l      d d       f      lit
                                                                             understood by the organization
                                                                             Do not overlook training requirements
33                                                                      34
                                                                             Be politically aware




Data Warehouse Vendors                                                 Data Warehouse Vendors
 Computer Associates                 Microsoft                           Six guidelines to considered when developing a
                                                                             g                                   p g
 DataMirror                          Oracle                              vendor list:
 Data Advantage Group
             g       p               SAS                                1.
                                                                        1    Financial strength
 Dell Computer                       Siemens                            2.   ERP linkages
 Embarcadero Technologies            Sybase                             3.   Qualified
                                                                             Q lifi d consultants
                                                                                             l
 Business Objects                    Teradata
                                                                        4.   Market share
 HP                                  Please visit:
                                                                        5.   Industry experience
 Hummingbird                            Data Warehousing Institute
                                        (tdwi.com)
                                        (tdwi com)                      6.   Established partnerships
                                                                                          p        p
 Hyperion
 H
                                        DM Review (dmreview.com)
 IBM
 Informatica
35                                                                      36
Real-time
Real time DW                                                 Real-time
                                                             Real time DW
      Traditionally, updated on a weekly basis.
      Unsuitable for some businesses.
      Real-time (active) data warehousing
                  (      )                   g
      The process of loading and providing data via a data
      warehouse as they become available
                       y
      Levels of data warehouses:
     1.   Reports what happened
     2.   Some analysis occurs
     3.   Provides prediction capabilities,
                   p            p         ,
     4.   Operationalization
     5.   Becomes capable of making events happen
                     p              g        pp

37                                                           38




Real-time
Real time DW                                                 From DW to DM [JH]
                                                             Three kinds of data warehouse applications
                                                              Information processing
                                                                 supports querying, basic statistical analysis, and reporting
                                                                 using crosstabs, tables, charts and graphs
                                                              Analytical processing
                                                                 multidimensional analysis of data warehouse data
                                                                 supports basic OLAP operations, slice-dice, drilling, pivoting
                                                              Data mining
                                                                 knowledge discovery from hidden patterns
                                                                 supports associations, constructing analytical models,
                                                                 performing classification and prediction, and presenting the
                                                                 mining results using visualization t l
                                                                   i i       lt    i    i li ti tools.
39                                                           40
References
 [JH] Jiawei Han and Micheline Kamber, Data Mining:
 Concepts and Techniques, Morgan Kaufmann, 2001.
 [ET] Efraim Turban et al., Decision Support and Business
 Intelligence Systems, Pearson, 2007.
 [DO] David Olson and Yong Shi, Introduction to Business
 Data Mining, McGraw-Hill, 2007.




41

More Related Content

What's hot

Data ware house architecture
Data ware house architectureData ware house architecture
Data ware house architectureDeepak Chaurasia
 
Seminar datawarehousing
Seminar datawarehousingSeminar datawarehousing
Seminar datawarehousingKavisha Uniyal
 
OLAP OnLine Analytical Processing
OLAP OnLine Analytical ProcessingOLAP OnLine Analytical Processing
OLAP OnLine Analytical ProcessingWalid Elbadawy
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technologyDataminingTools Inc
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingPrithwis Mukerjee
 
Business analysis in data warehousing
Business analysis in data warehousingBusiness analysis in data warehousing
Business analysis in data warehousingHimanshu
 
Application of Data Warehousing & Data Mining to Exploitation for Supporting ...
Application of Data Warehousing & Data Mining to Exploitation for Supporting ...Application of Data Warehousing & Data Mining to Exploitation for Supporting ...
Application of Data Warehousing & Data Mining to Exploitation for Supporting ...Gihan Wikramanayake
 
Data ware house design
Data ware house designData ware house design
Data ware house designSayed Ahmed
 
Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best PracticesEduardo Castro
 
Data extraction, cleanup & transformation tools 29.1.16
Data extraction, cleanup & transformation tools 29.1.16Data extraction, cleanup & transformation tools 29.1.16
Data extraction, cleanup & transformation tools 29.1.16Dhilsath Fathima
 
Data Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural FrameworkData Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural FrameworkDr. Sunil Kr. Pandey
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processingnurmeen1
 
Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data miningmaxonlinetr
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesInformaticaTrainingClasses
 
Data Warehousing - in the real world
Data Warehousing - in the real worldData Warehousing - in the real world
Data Warehousing - in the real worldukc4
 

What's hot (20)

Data ware house architecture
Data ware house architectureData ware house architecture
Data ware house architecture
 
Seminar datawarehousing
Seminar datawarehousingSeminar datawarehousing
Seminar datawarehousing
 
Dbm630 Lecture01
Dbm630 Lecture01Dbm630 Lecture01
Dbm630 Lecture01
 
OLAP OnLine Analytical Processing
OLAP OnLine Analytical ProcessingOLAP OnLine Analytical Processing
OLAP OnLine Analytical Processing
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 
Business analysis in data warehousing
Business analysis in data warehousingBusiness analysis in data warehousing
Business analysis in data warehousing
 
Application of Data Warehousing & Data Mining to Exploitation for Supporting ...
Application of Data Warehousing & Data Mining to Exploitation for Supporting ...Application of Data Warehousing & Data Mining to Exploitation for Supporting ...
Application of Data Warehousing & Data Mining to Exploitation for Supporting ...
 
Open Source Datawarehouse
Open Source DatawarehouseOpen Source Datawarehouse
Open Source Datawarehouse
 
Data ware house design
Data ware house designData ware house design
Data ware house design
 
Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best Practices
 
Data extraction, cleanup & transformation tools 29.1.16
Data extraction, cleanup & transformation tools 29.1.16Data extraction, cleanup & transformation tools 29.1.16
Data extraction, cleanup & transformation tools 29.1.16
 
Data Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural FrameworkData Warehousing & Basic Architectural Framework
Data Warehousing & Basic Architectural Framework
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processing
 
Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data mining
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
 
Data Warehousing - in the real world
Data Warehousing - in the real worldData Warehousing - in the real world
Data Warehousing - in the real world
 
Hadoop & Data Warehouse
Hadoop & Data Warehouse Hadoop & Data Warehouse
Hadoop & Data Warehouse
 

Similar to 02. Data Warehouse and OLAP

Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Calpont Corporation
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Empowered Holdings, LLC
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing Girish Dhareshwar
 
Introduction to data warehousing
Introduction to data warehousingIntroduction to data warehousing
Introduction to data warehousinguncleRhyme
 
Introduction to data warehousing
Introduction to data warehousingIntroduction to data warehousing
Introduction to data warehousinguncleRhyme
 
Analyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast DataAnalyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast DataEMC
 
Dw 02-basics
Dw 02-basicsDw 02-basics
Dw 02-basicsac60498
 
Unit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdfUnit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdfShivarkarSandip
 
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesData Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesDenodo
 
Data warehouse and data mining.pptx
Data warehouse and data mining.pptxData warehouse and data mining.pptx
Data warehouse and data mining.pptxChristinaGayenMondal
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing DataWorks Summit
 
Next Gen Analytics Going Beyond Data Warehouse
Next Gen Analytics Going Beyond Data WarehouseNext Gen Analytics Going Beyond Data Warehouse
Next Gen Analytics Going Beyond Data WarehouseDenodo
 
Oracle: Fundamental Of Dw
Oracle: Fundamental Of DwOracle: Fundamental Of Dw
Oracle: Fundamental Of Dworacle content
 

Similar to 02. Data Warehouse and OLAP (20)

Dbm630_Lecture02-03
Dbm630_Lecture02-03Dbm630_Lecture02-03
Dbm630_Lecture02-03
 
Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012Analytic Platforms in the Real World with 451Research and Calpont_July 2012
Analytic Platforms in the Real World with 451Research and Calpont_July 2012
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing
 
Introduction to data warehousing
Introduction to data warehousingIntroduction to data warehousing
Introduction to data warehousing
 
Introduction to data warehousing
Introduction to data warehousingIntroduction to data warehousing
Introduction to data warehousing
 
Analyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast DataAnalyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast Data
 
1 ieee98
1 ieee981 ieee98
1 ieee98
 
Dw 02-basics
Dw 02-basicsDw 02-basics
Dw 02-basics
 
DW Basics
DW BasicsDW Basics
DW Basics
 
Unit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdfUnit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdf
 
Presentation
PresentationPresentation
Presentation
 
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesData Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
 
Data warehouse and data mining.pptx
Data warehouse and data mining.pptxData warehouse and data mining.pptx
Data warehouse and data mining.pptx
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Next Gen Analytics Going Beyond Data Warehouse
Next Gen Analytics Going Beyond Data WarehouseNext Gen Analytics Going Beyond Data Warehouse
Next Gen Analytics Going Beyond Data Warehouse
 
Oracle: Fundamental Of DW
Oracle: Fundamental Of DWOracle: Fundamental Of DW
Oracle: Fundamental Of DW
 
Oracle: Fundamental Of Dw
Oracle: Fundamental Of DwOracle: Fundamental Of Dw
Oracle: Fundamental Of Dw
 

More from Achmad Solichin

Kuliah Umum - Tips Publikasi Jurnal SINTA untuk Mahasiswa Galau (6 Agustus 2022)
Kuliah Umum - Tips Publikasi Jurnal SINTA untuk Mahasiswa Galau (6 Agustus 2022)Kuliah Umum - Tips Publikasi Jurnal SINTA untuk Mahasiswa Galau (6 Agustus 2022)
Kuliah Umum - Tips Publikasi Jurnal SINTA untuk Mahasiswa Galau (6 Agustus 2022)Achmad Solichin
 
Materi Webinar Web 3.0 (16 Juli 2022)
Materi Webinar Web 3.0 (16 Juli 2022)Materi Webinar Web 3.0 (16 Juli 2022)
Materi Webinar Web 3.0 (16 Juli 2022)Achmad Solichin
 
Webinar: Kesadaran Keamanan Informasi (3 Desember 2021)
Webinar: Kesadaran Keamanan Informasi (3 Desember 2021)Webinar: Kesadaran Keamanan Informasi (3 Desember 2021)
Webinar: Kesadaran Keamanan Informasi (3 Desember 2021)Achmad Solichin
 
Webinar PHP-ID: Mari Mengenal Logika Fuzzy (Fuzzy Logic)
Webinar PHP-ID: Mari Mengenal Logika Fuzzy (Fuzzy Logic)Webinar PHP-ID: Mari Mengenal Logika Fuzzy (Fuzzy Logic)
Webinar PHP-ID: Mari Mengenal Logika Fuzzy (Fuzzy Logic)Achmad Solichin
 
Webinar PHP-ID: Machine Learning dengan PHP
Webinar PHP-ID: Machine Learning dengan PHPWebinar PHP-ID: Machine Learning dengan PHP
Webinar PHP-ID: Machine Learning dengan PHPAchmad Solichin
 
Webinar Data Mining dengan Rapidminer | Universitas Budi Luhur
Webinar Data Mining dengan Rapidminer | Universitas Budi LuhurWebinar Data Mining dengan Rapidminer | Universitas Budi Luhur
Webinar Data Mining dengan Rapidminer | Universitas Budi LuhurAchmad Solichin
 
TREN DAN IDE RISET BIDANG DATA MINING TERBARU
TREN DAN IDE RISET BIDANG DATA MINING TERBARUTREN DAN IDE RISET BIDANG DATA MINING TERBARU
TREN DAN IDE RISET BIDANG DATA MINING TERBARUAchmad Solichin
 
Metodologi Riset: Literature Review
Metodologi Riset: Literature ReviewMetodologi Riset: Literature Review
Metodologi Riset: Literature ReviewAchmad Solichin
 
Materi Seminar: Artificial Intelligence dengan PHP
Materi Seminar: Artificial Intelligence dengan PHPMateri Seminar: Artificial Intelligence dengan PHP
Materi Seminar: Artificial Intelligence dengan PHPAchmad Solichin
 
Percobaan Perpindahan Kalor melalui Konduksi, Konveksi dan Radiasi
Percobaan Perpindahan Kalor melalui Konduksi, Konveksi dan RadiasiPercobaan Perpindahan Kalor melalui Konduksi, Konveksi dan Radiasi
Percobaan Perpindahan Kalor melalui Konduksi, Konveksi dan RadiasiAchmad Solichin
 
Metodologi Riset: Literature Review
Metodologi Riset: Literature ReviewMetodologi Riset: Literature Review
Metodologi Riset: Literature ReviewAchmad Solichin
 
Depth First Search (DFS) pada Graph
Depth First Search (DFS) pada GraphDepth First Search (DFS) pada Graph
Depth First Search (DFS) pada GraphAchmad Solichin
 
Breadth First Search (BFS) pada Graph
Breadth First Search (BFS) pada GraphBreadth First Search (BFS) pada Graph
Breadth First Search (BFS) pada GraphAchmad Solichin
 
Binary Search Tree (BST) - Algoritma dan Struktur Data
Binary Search Tree (BST) - Algoritma dan Struktur DataBinary Search Tree (BST) - Algoritma dan Struktur Data
Binary Search Tree (BST) - Algoritma dan Struktur DataAchmad Solichin
 
Computer Vision di Era Industri 4.0
Computer Vision di Era Industri 4.0Computer Vision di Era Industri 4.0
Computer Vision di Era Industri 4.0Achmad Solichin
 
Seminar: Become a Reliable Web Programmer
Seminar: Become a Reliable Web ProgrammerSeminar: Become a Reliable Web Programmer
Seminar: Become a Reliable Web ProgrammerAchmad Solichin
 
The Big 5: Future IT Trends
The Big 5: Future IT TrendsThe Big 5: Future IT Trends
The Big 5: Future IT TrendsAchmad Solichin
 
Seminar: PHP Developer for Dummies
Seminar: PHP Developer for DummiesSeminar: PHP Developer for Dummies
Seminar: PHP Developer for DummiesAchmad Solichin
 
Pertemuan 1 - Algoritma dan Struktur Data 1
Pertemuan 1 - Algoritma dan Struktur Data 1Pertemuan 1 - Algoritma dan Struktur Data 1
Pertemuan 1 - Algoritma dan Struktur Data 1Achmad Solichin
 

More from Achmad Solichin (20)

Kuliah Umum - Tips Publikasi Jurnal SINTA untuk Mahasiswa Galau (6 Agustus 2022)
Kuliah Umum - Tips Publikasi Jurnal SINTA untuk Mahasiswa Galau (6 Agustus 2022)Kuliah Umum - Tips Publikasi Jurnal SINTA untuk Mahasiswa Galau (6 Agustus 2022)
Kuliah Umum - Tips Publikasi Jurnal SINTA untuk Mahasiswa Galau (6 Agustus 2022)
 
Materi Webinar Web 3.0 (16 Juli 2022)
Materi Webinar Web 3.0 (16 Juli 2022)Materi Webinar Web 3.0 (16 Juli 2022)
Materi Webinar Web 3.0 (16 Juli 2022)
 
Webinar: Kesadaran Keamanan Informasi (3 Desember 2021)
Webinar: Kesadaran Keamanan Informasi (3 Desember 2021)Webinar: Kesadaran Keamanan Informasi (3 Desember 2021)
Webinar: Kesadaran Keamanan Informasi (3 Desember 2021)
 
Webinar PHP-ID: Mari Mengenal Logika Fuzzy (Fuzzy Logic)
Webinar PHP-ID: Mari Mengenal Logika Fuzzy (Fuzzy Logic)Webinar PHP-ID: Mari Mengenal Logika Fuzzy (Fuzzy Logic)
Webinar PHP-ID: Mari Mengenal Logika Fuzzy (Fuzzy Logic)
 
Webinar PHP-ID: Machine Learning dengan PHP
Webinar PHP-ID: Machine Learning dengan PHPWebinar PHP-ID: Machine Learning dengan PHP
Webinar PHP-ID: Machine Learning dengan PHP
 
Webinar Data Mining dengan Rapidminer | Universitas Budi Luhur
Webinar Data Mining dengan Rapidminer | Universitas Budi LuhurWebinar Data Mining dengan Rapidminer | Universitas Budi Luhur
Webinar Data Mining dengan Rapidminer | Universitas Budi Luhur
 
TREN DAN IDE RISET BIDANG DATA MINING TERBARU
TREN DAN IDE RISET BIDANG DATA MINING TERBARUTREN DAN IDE RISET BIDANG DATA MINING TERBARU
TREN DAN IDE RISET BIDANG DATA MINING TERBARU
 
Metodologi Riset: Literature Review
Metodologi Riset: Literature ReviewMetodologi Riset: Literature Review
Metodologi Riset: Literature Review
 
Materi Seminar: Artificial Intelligence dengan PHP
Materi Seminar: Artificial Intelligence dengan PHPMateri Seminar: Artificial Intelligence dengan PHP
Materi Seminar: Artificial Intelligence dengan PHP
 
Percobaan Perpindahan Kalor melalui Konduksi, Konveksi dan Radiasi
Percobaan Perpindahan Kalor melalui Konduksi, Konveksi dan RadiasiPercobaan Perpindahan Kalor melalui Konduksi, Konveksi dan Radiasi
Percobaan Perpindahan Kalor melalui Konduksi, Konveksi dan Radiasi
 
Metodologi Riset: Literature Review
Metodologi Riset: Literature ReviewMetodologi Riset: Literature Review
Metodologi Riset: Literature Review
 
Depth First Search (DFS) pada Graph
Depth First Search (DFS) pada GraphDepth First Search (DFS) pada Graph
Depth First Search (DFS) pada Graph
 
Breadth First Search (BFS) pada Graph
Breadth First Search (BFS) pada GraphBreadth First Search (BFS) pada Graph
Breadth First Search (BFS) pada Graph
 
Binary Search Tree (BST) - Algoritma dan Struktur Data
Binary Search Tree (BST) - Algoritma dan Struktur DataBinary Search Tree (BST) - Algoritma dan Struktur Data
Binary Search Tree (BST) - Algoritma dan Struktur Data
 
Computer Vision di Era Industri 4.0
Computer Vision di Era Industri 4.0Computer Vision di Era Industri 4.0
Computer Vision di Era Industri 4.0
 
Seminar: Become a Reliable Web Programmer
Seminar: Become a Reliable Web ProgrammerSeminar: Become a Reliable Web Programmer
Seminar: Become a Reliable Web Programmer
 
The Big 5: Future IT Trends
The Big 5: Future IT TrendsThe Big 5: Future IT Trends
The Big 5: Future IT Trends
 
Modern PHP Developer
Modern PHP DeveloperModern PHP Developer
Modern PHP Developer
 
Seminar: PHP Developer for Dummies
Seminar: PHP Developer for DummiesSeminar: PHP Developer for Dummies
Seminar: PHP Developer for Dummies
 
Pertemuan 1 - Algoritma dan Struktur Data 1
Pertemuan 1 - Algoritma dan Struktur Data 1Pertemuan 1 - Algoritma dan Struktur Data 1
Pertemuan 1 - Algoritma dan Struktur Data 1
 

Recently uploaded

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 

Recently uploaded (20)

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 

02. Data Warehouse and OLAP

  • 1. Objectives Motivation: Why data warehouse? What is a data warehouse? Why separate DW? y p Conceptual modeling of DW Data Mart Data Warehousing Architectures Data Warehousing and OLAP Data Warehouse Development Lecture 2/DMBI/IKI83403T/MTI/UI Data Warehouse Vendors Yudho Giri Sucahyo, Ph.D, CISA (yudho@cs.ui.ac.id) Real-time DW R l Faculty of Computer Science, University of Indonesia 2 Motivation: Why data warehouse? What is a data warehouse? [JH] Construction of data warehouses (DW) involves data Defined in many different ways, but not rigorously. cleaning and data integration important A decision support database that is maintained separately preprocessing step for data mining (DM). from the organization’s ODB. DW provide OLAP for the interactive analysis of Support information processing by providing a solid platform of consolidated, historical data for analysis. multidimensional data, which facilitates effective DM. , “A data warehouse is a subject-oriented, integrated, Data mining functions can be integrated with OLAP time-variant, and nonvolatile collection of data in operations to enhance interactive mining of knowledge. support of management’s decision-making process.” — DW will provide an effective platform for DM. W. H. Inmon While DW Whil DWs are not requirements to do DM, DW store t i t t d DM t Case Study 2: Continental Airlines flies high with its massive amounts of data that can be uses for DM. [DO] real-time data warehouse 3 4
  • 2. What is a data warehouse? [ET] Subject Oriented Data warehouse Organized around major subjects, such as A physical repository where relational data are specially customer, product, sales. organized to provide enterprise-wide, cleansed data in a standardized format. Provide i l P id a simple and concise view around d i i d Characteristics particular subject issues by excluding data that Subject oriented, Integrated, Time Variant, Non-volatile are not useful in the decision support process. Web-based, Relational/multidimensional, Client/server, Real-time Focusing on the modeling and analysis of data Include metadata for decision makers, not on daily operations or Data warehousing Process of constructing and using data warehouses. transaction processing processing. Requires data integration, data cleaning, and data consolidation. 5 6 Integrated Time Variant Integrate multiple, heterogeneous data sources The time horizon for the data warehouse is significantly Relational databases, flat-files, on-line transaction records longer than that of operational systems. Data cleaning and data integration techniques are g g q Operational database: current value data. applied Data warehouse data: provide information from a historical Ensure consistency in naming conventions, encoding perspective (e g past 5-10 years) (e.g., structures, attribute measures, etc. among different data Every key structure in the data warehouse sources Contains an element of time, explicitly or implicitly E.g., Hotel price: currency, tax, breakfast covered, etc. But the key of operational data may or may not contain “time When d i Wh data is moved to the warehouse, it is converted. d h h i i d element”. 7 8
  • 3. Non-volatile Non volatile Data Warehouse vs Heterogeneous DBMS vs. A physically separate store of data transformed from the p y y p Traditional heterogeneous DB integration: Build wrappers/mediators on top of multiple, heterogeneous databases. operational environment. Ex: IBM Data Joiner, Informix DataBlade Operational update of data does not occur i the d O i l d fd d in h data Query d i Q driven approach: h When a query is posed to a client site, a metadata-dictionary is used warehouse environment. to translate the query into queries appropriate for the individual Does not require transaction processing, recovery, and heterogeneous sites involved. There queries are then mapped and sent to local query processors. The results returned from the different concurrency control mechanisms y sites are integrated into a global answer set. d l b l Requires only two operations in data accessing: Complex information filtering and integration processes, compete for iinitiall lloading of data and access of data. ii di fd d fd resources. resources Inefficient and potentially expensive for frequent queries, especially for q queries requireing aggregations. q g gg g 9 10 Data Warehouse vs Heterogeneous DBMS (2) vs. DW vs ODB vs. Using DW update-driven approach Major task of ODB OLTP: Information from multiple, heterogeneous sources is integrated in advance Day-to-day operations: purchasing, inventory, banking, and stored in a warehouse for direct querying and analysis. manufacturing, payroll, registration, accounting, etc. Unlike OLTP, DW do not contain the most current information OLTP information. DW serve f data analysis and decision making for d l i dd i i ki OLAP DW brings high performance to the integrated heterogeneous Distinct Features (OLTP vs. OLAP) DB system since data are copied, preprocessed, integrated, copied preprocessed integrated User and system orientation: customer vs. market U d i i k annotated, summarized, and restructured into one data store. Data contents: current, detailed vs. historical, consolidated Query processing in DW does not interfere with the processing Database design: ER + application vs. star + subject vs at local sources View: current, local vs. evolutionary, integrated DW can store and integrate historical information and support g pp Access patterns: update vs. read-only but complex queries read only complex multidimensional queries. 11 12
  • 4. OLTP vs OLAP Why Separate DW? OLTP OLAP users Clerk, Clerk IT professional Knowledge worker High performance for both systems: g p y function day to day operations decision support DBMS — tuned for OLTP: access methods, indexing, DB design application-oriented subject-oriented concurrency control, recovery data current, up-to-date historical, Warehouse — tuned for OLAP: complex OLAP queries, detailed, flat relational summarized, multidimensional computation of large groups of data at summarized levels, isolated integrated, consolidated multidimensional view, consolidation. , usage repetitive ad-hoc access read/write lots of scans Processing OLAP queries in operational databases would index/hash on prim. key degrade the performance of operational tasks. unit of work short, simple transaction complex query In ODB, concurrency control and recovery mechanisms # records accessed tens millions (locking, logging) are required to ensure the consistency #users thousands hundreds and robustness of transactions. d b f i DB size 100MB-GB 100GB-TB metric transaction throughput query throughput, response OLAP read only access. No need for concurrency control and recovery recovery. 13 14 Why Separate DW? (2) Conceptual Modeling of DW Different functions and different data: Data Cube: missing data: Decision support requires historical data which operational DBs do not typically maintain. So, data in ODB is see TSBD Lecture Notes on Visualization of Data Cubes usually far from complete for decision making. y p g Modeling d t M d li data warehouses: dimensions & measurements h di i t data consolidation: DS requires consolidation (aggregation, summarization) of data from heterogeneous sources. ODB Star schema: A single object (fact table) in the middle connected contain detailed raw data (transactions) which need to be t i d t il d d t (t ti ) hi h dt b to a number of objects (dimension tables one for each tables, consolidated before analysis. dimension). data quality: different sources typically use inconsistent data q y yp y Snowflake schema: A refinement of star schema where the representations, codes and formats which have to be dimensional hierarchy is represented explicitly by normalizing reconciled. the dimension tables. Fact constellations: Multiple fact tables share dimension tables. Also known as galaxy schema 15 16
  • 5. Example of Star Schema Snowflake Schema Product Date Year Product Day ProductNo Year Month Sales Fact Table ProdName Date ProductNo Month Month Sales Fact Table ProdDesc ProdName Year Date Year Day Category C ProdDesc Date QOH Month Category Product Store Product QOH Store StoreID Cust Store Store City Customer Cust CustId City StoreID Customer State CustName C tN City Cit CustId unit_sales City Country CustCity unit_sales CustName State State Region dollar_sales CustCountry CustCity State dollar_sales CustCountry Yen_sales Potensi Redundansi Country Country Yen_sales Measurements Bandung, Bogor keduanya Country Region ada di Jawa Barat Measurements 17 18 View of Warehouses and Hierarchies Data Cube Date D t Total annual sales 2Qtr of TV in U.S.A. 1Qtr 3Qtr 4Qtr sum TV PC U.S.A USA VCR Importing data sum Country Ca ada Canada Table Browsing Dimension creation Mexico Dimension browsing sum Cube building g Cube browsing 19 20
  • 6. Data Cube Typical OLAP Operations Roll up (drill-up): summarize data by climbing up hierarchy or by dimension reduction Drill down (roll down): reverse of roll-up from higher level summary to lower level summary or detailed data or data, introducing new dimensions Slice and dice: project and select Pivot (rotate): reorient the cube, visualization, 3D to series of 2D planes. Other operations drill d ill across: iinvolving ( l i (across) more than one fact table. ) th f t t bl Visualization drill through: through the bottom level to its back-end relational tables. OLAP capabilities p More info: 21 Interactive manipulation www.knowledgecenters.org, www.olapreport.com, www.olapcouncil.org 22 Data Mart Data Mart DW collects information about subjects that span the A data mart can be either dependent or independent. entire organization, such as customers, products, sales, assets, A dependent data mart is a subset that is created directly and personnel. Its scope is enterprise-wide. from the DW. For DW, fact constellation schema is commonly used Consistent data model since it can model multiple, interrelated subjects. Providing quality data Data Mart is a subset of a DW, focuses on a particular DW must be constructed first subject. Its scope is department-wide. Typically, a data mart Ensures that the user viewing the same version of the data that consisting of a single subject area (e.g. marketing, f l b ( k are accessed by all other d warehouse users d b ll h data h operations). An independent data mart is a small warehouse designed For Data Mart, star or snowflake schema are commonly for department, and i source is not an EDW. f ad d its i EDW used since both are geared towards modeling single subjects, although th star schema i more popular. bj t lth h the t h is l 23 24
  • 7. Data Warehousing Process Overview Data Warehousing Process Overview The major components of a data warehousing process Data sources Legacy systems, external data providers (e.g. BPS), OLTP, ERP Systems Data extraction Data loading Comprehensive database Metadata Middleware tools 25 26 Data Warehousing Architectures Data Warehousing Architectures 27 28
  • 8. Data Warehousing Architectures Data Warehousing Architectures 29 30 Data Integration and the ETL Process Data Integration and the ETL Process Various integration technologies: ETL Enterprise Application Integration (EAI) 60-70% of the time in a data-centric project. A technology that provides a vehicle for pushing data from source Extraction: Reading data from one or more databases systems i t a data warehouse t into d t h Transformation Integrating application functionality and is focused on sharing Converting the extracted data from its previous form into the form in functionality across systems which it needs to be so that it can be placed into a DW Traditionally, API. Nowadays, SOA (web services). Load Enterprise Information Integration (EII) Putting the An evolving tool space that promises real-time data integration from data d into a variety of sources, such as relational databases, Web services, and the DW multidimensional databases A mechanism for pulling data from source systems to satisfy a request for information. 31 32
  • 9. Data Warehouse Development Data Warehouse Development Direct benefits Some best practices for implementing a DW (Weir, 2002): Allowing end users to perform extensive analysis in numerous Project must fit with corporate strategy and business objectives ways There must be complete buy-in to the project by executives, A consolidated view of corporate data (i.e a single version of f ( f managers, managers and users the truth) It is important to manage user expectations about the completed project Better and more timely information The data warehouse must be built incrementally Enhanced system performance. DW frees production Build in adaptability processing because some operational system reporting Managed b b h IT and b i M d by both d business professionals f i l requirements are moved to DSS Develop a business/supplier relationship Simplification of data access Only load data th t h O l l d d t that have been cleansed and are of a quality b l d d f lit understood by the organization Do not overlook training requirements 33 34 Be politically aware Data Warehouse Vendors Data Warehouse Vendors Computer Associates Microsoft Six guidelines to considered when developing a g p g DataMirror Oracle vendor list: Data Advantage Group g p SAS 1. 1 Financial strength Dell Computer Siemens 2. ERP linkages Embarcadero Technologies Sybase 3. Qualified Q lifi d consultants l Business Objects Teradata 4. Market share HP Please visit: 5. Industry experience Hummingbird Data Warehousing Institute (tdwi.com) (tdwi com) 6. Established partnerships p p Hyperion H DM Review (dmreview.com) IBM Informatica 35 36
  • 10. Real-time Real time DW Real-time Real time DW Traditionally, updated on a weekly basis. Unsuitable for some businesses. Real-time (active) data warehousing ( ) g The process of loading and providing data via a data warehouse as they become available y Levels of data warehouses: 1. Reports what happened 2. Some analysis occurs 3. Provides prediction capabilities, p p , 4. Operationalization 5. Becomes capable of making events happen p g pp 37 38 Real-time Real time DW From DW to DM [JH] Three kinds of data warehouse applications Information processing supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphs Analytical processing multidimensional analysis of data warehouse data supports basic OLAP operations, slice-dice, drilling, pivoting Data mining knowledge discovery from hidden patterns supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization t l i i lt i i li ti tools. 39 40
  • 11. References [JH] Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001. [ET] Efraim Turban et al., Decision Support and Business Intelligence Systems, Pearson, 2007. [DO] David Olson and Yong Shi, Introduction to Business Data Mining, McGraw-Hill, 2007. 41