SlideShare a Scribd company logo
1 of 22
Oracle: Data Warehouse Design
Characteristics of a Data Warehouse

  A data warehouse is a database designed for
   querying, reporting, and analysis.
  A data warehouse contains historical data
   derived from transaction data.
  Data warehouses separate analysis workload
   from transaction workload.
  A data warehouse is primarily
   an analytical tool.
Comparing OLTP and Data Warehouses
          OLTP                        Data Warehouse



   Many              Joins            Some


   Comparatively   Data accessed by   Large
   lower           queries            amount


   Normalized       Duplicated data   Denormalized
   DBMS                               DBMS

                   Derived data
   Rare            and                Common
                   aggregates
Data Warehouse Architectures

                                                                     Analysis
Operational
systems
                                  Metadata                Sales


                                                        Purchasing
                        Materialized
                                             Raw data
              Staging   views
              area                                                        Reporting
                                                          Inventory




 Flat files                                                       Data mining
Data Warehouse Design
• Key data warehouse design considerations:
  – Identify the specific data content.
  – Recognize the critical relationships within and
    between groups of data.
  – Define the system environment
    supporting your data warehouse.
  – Identify the required data
    transformations.
  – Calculate the frequency at which
    the data must be refreshed.
Logical Design
– A logical design is conceptual and
  abstract.
– Entity-relationship (ER) modeling
  is useful in identifying logical
  information requirements.
   • An entity represents a chunk of data.
   • The properties of entities are known as attributes.
   • The links between entities and attributes are known
     as relationships.
– Dimensional modeling is a specialized
  type of ER modeling useful in data warehouse
  design.
Oracle Warehouse Builder
– Oracle Database provides tools to implement
  the ETL process.
   • Oracle Warehouse Builder is a tool to help in this
     process.
– Oracle Warehouse Builder generates the
  following types of code:
   •   SQL data definition language (DDL) scripts
   •   PL/SQL programs
   •   SQL*Loader control files
   •   XML Processing Description Language (XPDL)
   •   ABAP code (used to extract data from SAP systems)
Data Warehousing Schemas
– Objects can be arranged in data warehousing
  schema models in a variety of ways:
   •   Star schema
   •   Snowflake schema
   •   Third normal form (3NF) schema
   •   Hybrid schemas
– The source data model and user
  requirements should steer the data
  warehouse schema.
– Implementation of the logical model may
  require changes to enable you to adapt it to
  your physical system.
Schema Characteristics
– Star schema
   • Characterized by one or more large fact tables and a
     number of much smaller dimension tables
   • Each dimension table joined to the fact table using a
     primary key to foreign key join
– Snowflake schema
   • Dimension data grouped into multiple tables instead
     of one large table
   • Increased number of dimension tables, requiring
     more foreign key joins
– Third normal form (3NF) schema
   • A classical relational-database model that minimizes
     data redundancy through normalization
Data Warehousing Objects
– Fact tables
   • Fact tables are the large tables that store business
     measurements.
– Dimension tables
   • A dimension is a structure composed of one or more
     hierarchies that categorizes data.
   • Unique identifiers are specified for one distinct
     record in a dimension table.
– Relationships
   • Relationships guarantee
     integrity of business
     information.
Fact Tables
– A fact table must be defined for each star schema.
– Fact tables are the large tables that store business
  measurements.
– A fact table contains either detail-level or
  aggregated facts.
– A fact table usually contains facts with the same
  level of aggregation.
– The primary key of the fact table is
  usually a composite key made up
  of all its foreign keys.
Dimensions and Hierarchies
                                   CUSTOMERS dimension
– A dimension is a structure       hierarchy (by level)
  composed of one or more
  hierarchies that categorizes data. REGION
– Dimensional attributes help to
  describe the dimensional value.         SUBREGION

– Dimension data is collected at the
  lowest level of detail and aggregatedCOUNTRY
  into higher level totals.
– Hierarchies are structures that use STATE
  ordered levels to organize data.
– In a hierarchy, each level is           CITY

  connected to the levels above and
  below it.                               CUSTOMER
Dimensions and Hierarchies

 PRODUCTS                             CUSTOMERS
 #prod_id         Unique identifier   #cust_id
                    Fact table        cust_last_name
                                       cust_city
                                       cust_state_province
   Relationship    SALES
                   cust_id
                   prod_id                     Hierarchy




 TIMES                                  CHANNELS
                  PROMOTIONS

Dimension table                        Dimension table
                  Dimension table
Physical Design
  Logical             Physical (Tablespaces)


Entities           Tables                      Indexes



                                               Materialized
Relationships          Integrity
                                               views
                       constraints
                     - Primary key
                     - Foreign key
Attributes           - Not null                Dimensions



Unique
identifiers        Columns
Data Warehouse Physical Structures

• Tables and partitioned tables
  – Partitioned tables enable you to split
    large data volumes into smaller,
    more manageable pieces.
  – Expect performance benefits from:
     • Partition pruning
     • Intelligent parallel processing
  – Compressed tables offer scaleup opportunities
    for read-only operations.
  – Table compression saves disk space.
Data Warehouse Physical Structures

  – Views:
     • Are tailored presentations of data contained in one
       or more tables or views
     • Do not require any space in the database
  – Materialized views:
     • Are query results that have been stored in advance
     • (Like indexes) are used transparently and improve
       performance
  – Integrity constraints:
     • Are used in data warehouses for query rewrite
  – Dimensions:
     • Are containers of logical relationships and do not
       require any space in the database
Managing Large Volumes of Data
• Work smarter in your data warehouse:
  –   Partitioning
  –   Bitmap indexes/Star transformation
  –   Data compression
  –   Query rewrite
• Work harder in your data warehouse:
  – Parallelism for all operations
       • DBA tasks, such as loading, index creation, table
         creation, data modification, backup and recovery
       • End-user operations, such as queries
       • Unbounded scalability: Real Application Clusters
I/O Performance in Data Warehouses

  – I/O is typically the primary determinant of data
    warehouse performance.
  – Data warehouse storage configurations should be
    chosen by I/O bandwidth, not storage capacity.
  – Every component of the I/O
    subsystem should provide
    enough bandwidth:
     • Disks
     • I/O channels
     • I/O adapters
  – In data warehouses, maximizing
    sequential I/O throughput is critical.
I/O Scalability
Parallel execution:
     – Reduces response time for data-intensive operations on large
       databases
     – Benefits systems with the following characteristics:
          • Multiprocessors, clusters, or massively parallel systems
          • Sufficient I/O bandwidth
          • Sufficient memory to support memory-intensive processes such
            as sorts, hashing, and I/O buffers

                                Query servers
                                                               Coordinator
 Data on disk          Scan                     Sort Q1

                       Scan                     Sort Q2
                                                               Dispatch
                                                               work
                       Scan                     Sort Q3

                       Scan                     Sort Q4
                     Scanners          Sorters (Aggregators)
I/O Scalability

• Automatic Storage Management (ASM)
  – Configuring storage for a DB depends on many
    variables:
     •   Which data to put on which disk
     •   Logical unit number (LUN) configurations
     •   DB types and workloads; data warehouse, OLTP, DSS
     •   Trade-offs between available options
  – ASM provides solutions to storage issues
    encountered in data warehouses.
I/O Scalability

• Automatic Storage Management: Overview
  – Portable and high-performance
    cluster file system                Application
  – Manages Oracle database files
  – Data spread across disks                Database
    to balance load                  File
  – Integrated mirroring across      system
                                                     ASM
    disks                            Volume
                                     manager
  – Solves many storage
    management challenges           Operating system
Visit more self help tutorials

• Pick a tutorial of your choice and browse
  through it at your own pace.
• The tutorials section is free, self-guiding and
  will not involve any additional support.
• Visit us at www.dataminingtools.net

More Related Content

What's hot

Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Simplilearn
 
Oracle architecture ppt
Oracle architecture pptOracle architecture ppt
Oracle architecture pptDeepak Shetty
 
美团数据平台之Kafka应用实践和优化
美团数据平台之Kafka应用实践和优化美团数据平台之Kafka应用实践和优化
美团数据平台之Kafka应用实践和优化confluent
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisAmazon Web Services
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureVARUN SAXENA
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on AzureTrivadis
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry confluent
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best PracticesCloudera, Inc.
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueAmazon Web Services
 
Predicting Flights with Azure Databricks
Predicting Flights with Azure DatabricksPredicting Flights with Azure Databricks
Predicting Flights with Azure DatabricksSarah Dutkiewicz
 
Azure Data Factory
Azure Data FactoryAzure Data Factory
Azure Data FactoryHARIHARAN R
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideWhizlabs
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 

What's hot (20)

Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
 
Oracle architecture ppt
Oracle architecture pptOracle architecture ppt
Oracle architecture ppt
 
美团数据平台之Kafka应用实践和优化
美团数据平台之Kafka应用实践和优化美团数据平台之Kafka应用实践和优化
美团数据平台之Kafka应用实践和优化
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon Kinesis
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and Future
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
Protecting Your Data in AWS
Protecting Your Data in AWS Protecting Your Data in AWS
Protecting Your Data in AWS
 
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the CloudFirst Steps in Semantic Data Modelling and Search & Analytics in the Cloud
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
 
PySpark Best Practices
PySpark Best PracticesPySpark Best Practices
PySpark Best Practices
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS Glue
 
Predicting Flights with Azure Databricks
Predicting Flights with Azure DatabricksPredicting Flights with Azure Databricks
Predicting Flights with Azure Databricks
 
Azure Data Factory
Azure Data FactoryAzure Data Factory
Azure Data Factory
 
Data streaming fundamentals
Data streaming fundamentalsData streaming fundamentals
Data streaming fundamentals
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 

Viewers also liked

Oracle 11g data warehouse introdution
Oracle 11g data warehouse introdutionOracle 11g data warehouse introdution
Oracle 11g data warehouse introdutionAditya Trivedi
 
business analysis-Data warehousing
business analysis-Data warehousingbusiness analysis-Data warehousing
business analysis-Data warehousingDhilsath Fathima
 
multiparty access control
multiparty access controlmultiparty access control
multiparty access controlLevin Sibi
 
Multiparty Access Control For Online Social Networks : Model and Mechanisms.
Multiparty Access Control For Online Social Networks : Model and Mechanisms.Multiparty Access Control For Online Social Networks : Model and Mechanisms.
Multiparty Access Control For Online Social Networks : Model and Mechanisms.Kiran K.V.S.
 
Data warehousing labs maunal
Data warehousing labs maunalData warehousing labs maunal
Data warehousing labs maunalEducation
 
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data PresentationAgile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data PresentationVishal Kumar
 
Oratoria E RetóRica Latinas
Oratoria E RetóRica LatinasOratoria E RetóRica Latinas
Oratoria E RetóRica Latinaslara
 
Powerpoint paragraaf 5.3/5.4
Powerpoint paragraaf 5.3/5.4 Powerpoint paragraaf 5.3/5.4
Powerpoint paragraaf 5.3/5.4 guestaa9e6a
 
Bind How To
Bind How ToBind How To
Bind How Tocntlinux
 
MS SQL SERVER: Microsoft sequence clustering and association rules
MS SQL SERVER: Microsoft sequence clustering and association rulesMS SQL SERVER: Microsoft sequence clustering and association rules
MS SQL SERVER: Microsoft sequence clustering and association rulesDataminingTools Inc
 
MS SQL SERVER: Programming sql server data mining
MS SQL SERVER: Programming sql server data miningMS SQL SERVER: Programming sql server data mining
MS SQL SERVER: Programming sql server data miningDataminingTools Inc
 

Viewers also liked (20)

Oracle 11g data warehouse introdution
Oracle 11g data warehouse introdutionOracle 11g data warehouse introdution
Oracle 11g data warehouse introdution
 
Module Owb Basics
Module Owb BasicsModule Owb Basics
Module Owb Basics
 
Module Owb Process Flows
Module Owb Process FlowsModule Owb Process Flows
Module Owb Process Flows
 
Module Owb Lifecycle
Module Owb LifecycleModule Owb Lifecycle
Module Owb Lifecycle
 
business analysis-Data warehousing
business analysis-Data warehousingbusiness analysis-Data warehousing
business analysis-Data warehousing
 
multiparty access control
multiparty access controlmultiparty access control
multiparty access control
 
Multiparty Access Control For Online Social Networks : Model and Mechanisms.
Multiparty Access Control For Online Social Networks : Model and Mechanisms.Multiparty Access Control For Online Social Networks : Model and Mechanisms.
Multiparty Access Control For Online Social Networks : Model and Mechanisms.
 
Data warehousing labs maunal
Data warehousing labs maunalData warehousing labs maunal
Data warehousing labs maunal
 
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data PresentationAgile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
 
LISP:Object System Lisp
LISP:Object System LispLISP:Object System Lisp
LISP:Object System Lisp
 
LISP: Scope and extent in lisp
LISP: Scope and extent in lispLISP: Scope and extent in lisp
LISP: Scope and extent in lisp
 
Oratoria E RetóRica Latinas
Oratoria E RetóRica LatinasOratoria E RetóRica Latinas
Oratoria E RetóRica Latinas
 
How To Make Pb J
How To Make Pb JHow To Make Pb J
How To Make Pb J
 
SPSS: File Managment
SPSS: File ManagmentSPSS: File Managment
SPSS: File Managment
 
Powerpoint paragraaf 5.3/5.4
Powerpoint paragraaf 5.3/5.4 Powerpoint paragraaf 5.3/5.4
Powerpoint paragraaf 5.3/5.4
 
Bind How To
Bind How ToBind How To
Bind How To
 
BI: Open Source
BI: Open SourceBI: Open Source
BI: Open Source
 
MS SQL SERVER: Microsoft sequence clustering and association rules
MS SQL SERVER: Microsoft sequence clustering and association rulesMS SQL SERVER: Microsoft sequence clustering and association rules
MS SQL SERVER: Microsoft sequence clustering and association rules
 
Data Applied:Forecast
Data Applied:ForecastData Applied:Forecast
Data Applied:Forecast
 
MS SQL SERVER: Programming sql server data mining
MS SQL SERVER: Programming sql server data miningMS SQL SERVER: Programming sql server data mining
MS SQL SERVER: Programming sql server data mining
 

Similar to Oracle: DW Design

Relational
RelationalRelational
Relationaldieover
 
Oracle: Fundamental Of Dw
Oracle: Fundamental Of DwOracle: Fundamental Of Dw
Oracle: Fundamental Of Dworacle content
 
Management information system database management
Management information system database managementManagement information system database management
Management information system database managementOnline
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Michael Rys
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxPriyadarshini648418
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introductionMurli Jha
 
Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8amBusiness Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am
Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8amBarrett Peterson
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSSDeepali Raut
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Datacwensel
 
Oracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big DataOracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big DataAbishek V S
 

Similar to Oracle: DW Design (20)

Relational
RelationalRelational
Relational
 
Oracle: Fundamental Of DW
Oracle: Fundamental Of DWOracle: Fundamental Of DW
Oracle: Fundamental Of DW
 
Oracle: Fundamental Of Dw
Oracle: Fundamental Of DwOracle: Fundamental Of Dw
Oracle: Fundamental Of Dw
 
DBMS
DBMSDBMS
DBMS
 
Management information system database management
Management information system database managementManagement information system database management
Management information system database management
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
data warehousing
data warehousingdata warehousing
data warehousing
 
Lecture3.ppt
Lecture3.pptLecture3.ppt
Lecture3.ppt
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptx
 
The BI Sandbox
The BI SandboxThe BI Sandbox
The BI Sandbox
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
 
Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8amBusiness Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am
Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
(Dbms) class 1 & 2 (Presentation)
(Dbms) class 1 & 2 (Presentation)(Dbms) class 1 & 2 (Presentation)
(Dbms) class 1 & 2 (Presentation)
 
Oracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big DataOracle Database 12c - Features for Big Data
Oracle Database 12c - Features for Big Data
 
Computing 7
Computing 7Computing 7
Computing 7
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 

More from DataminingTools Inc

AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceDataminingTools Inc
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web miningDataminingTools Inc
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataDataminingTools Inc
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDataminingTools Inc
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisDataminingTools Inc
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technologyDataminingTools Inc
 

More from DataminingTools Inc (20)

Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
Techniques Machine Learning
Techniques Machine LearningTechniques Machine Learning
Techniques Machine Learning
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
Areas of machine leanring
Areas of machine leanringAreas of machine leanring
Areas of machine leanring
 
AI: Planning and AI
AI: Planning and AIAI: Planning and AI
AI: Planning and AI
 
AI: Logic in AI 2
AI: Logic in AI 2AI: Logic in AI 2
AI: Logic in AI 2
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
AI: Learning in AI 2
AI: Learning in AI 2AI: Learning in AI 2
AI: Learning in AI 2
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligence
 
AI: Belief Networks
AI: Belief NetworksAI: Belief Networks
AI: Belief Networks
 
AI: AI & Searching
AI: AI & SearchingAI: AI & Searching
AI: AI & Searching
 
AI: AI & Problem Solving
AI: AI & Problem SolvingAI: AI & Problem Solving
AI: AI & Problem Solving
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 

Recently uploaded

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Recently uploaded (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Oracle: DW Design

  • 2. Characteristics of a Data Warehouse A data warehouse is a database designed for querying, reporting, and analysis. A data warehouse contains historical data derived from transaction data. Data warehouses separate analysis workload from transaction workload. A data warehouse is primarily an analytical tool.
  • 3. Comparing OLTP and Data Warehouses OLTP Data Warehouse Many Joins Some Comparatively Data accessed by Large lower queries amount Normalized Duplicated data Denormalized DBMS DBMS Derived data Rare and Common aggregates
  • 4. Data Warehouse Architectures Analysis Operational systems Metadata Sales Purchasing Materialized Raw data Staging views area Reporting Inventory Flat files Data mining
  • 5. Data Warehouse Design • Key data warehouse design considerations: – Identify the specific data content. – Recognize the critical relationships within and between groups of data. – Define the system environment supporting your data warehouse. – Identify the required data transformations. – Calculate the frequency at which the data must be refreshed.
  • 6. Logical Design – A logical design is conceptual and abstract. – Entity-relationship (ER) modeling is useful in identifying logical information requirements. • An entity represents a chunk of data. • The properties of entities are known as attributes. • The links between entities and attributes are known as relationships. – Dimensional modeling is a specialized type of ER modeling useful in data warehouse design.
  • 7. Oracle Warehouse Builder – Oracle Database provides tools to implement the ETL process. • Oracle Warehouse Builder is a tool to help in this process. – Oracle Warehouse Builder generates the following types of code: • SQL data definition language (DDL) scripts • PL/SQL programs • SQL*Loader control files • XML Processing Description Language (XPDL) • ABAP code (used to extract data from SAP systems)
  • 8. Data Warehousing Schemas – Objects can be arranged in data warehousing schema models in a variety of ways: • Star schema • Snowflake schema • Third normal form (3NF) schema • Hybrid schemas – The source data model and user requirements should steer the data warehouse schema. – Implementation of the logical model may require changes to enable you to adapt it to your physical system.
  • 9. Schema Characteristics – Star schema • Characterized by one or more large fact tables and a number of much smaller dimension tables • Each dimension table joined to the fact table using a primary key to foreign key join – Snowflake schema • Dimension data grouped into multiple tables instead of one large table • Increased number of dimension tables, requiring more foreign key joins – Third normal form (3NF) schema • A classical relational-database model that minimizes data redundancy through normalization
  • 10. Data Warehousing Objects – Fact tables • Fact tables are the large tables that store business measurements. – Dimension tables • A dimension is a structure composed of one or more hierarchies that categorizes data. • Unique identifiers are specified for one distinct record in a dimension table. – Relationships • Relationships guarantee integrity of business information.
  • 11. Fact Tables – A fact table must be defined for each star schema. – Fact tables are the large tables that store business measurements. – A fact table contains either detail-level or aggregated facts. – A fact table usually contains facts with the same level of aggregation. – The primary key of the fact table is usually a composite key made up of all its foreign keys.
  • 12. Dimensions and Hierarchies CUSTOMERS dimension – A dimension is a structure hierarchy (by level) composed of one or more hierarchies that categorizes data. REGION – Dimensional attributes help to describe the dimensional value. SUBREGION – Dimension data is collected at the lowest level of detail and aggregatedCOUNTRY into higher level totals. – Hierarchies are structures that use STATE ordered levels to organize data. – In a hierarchy, each level is CITY connected to the levels above and below it. CUSTOMER
  • 13. Dimensions and Hierarchies PRODUCTS CUSTOMERS #prod_id Unique identifier #cust_id Fact table cust_last_name cust_city cust_state_province Relationship SALES cust_id prod_id Hierarchy TIMES CHANNELS PROMOTIONS Dimension table Dimension table Dimension table
  • 14. Physical Design Logical Physical (Tablespaces) Entities Tables Indexes Materialized Relationships Integrity views constraints - Primary key - Foreign key Attributes - Not null Dimensions Unique identifiers Columns
  • 15. Data Warehouse Physical Structures • Tables and partitioned tables – Partitioned tables enable you to split large data volumes into smaller, more manageable pieces. – Expect performance benefits from: • Partition pruning • Intelligent parallel processing – Compressed tables offer scaleup opportunities for read-only operations. – Table compression saves disk space.
  • 16. Data Warehouse Physical Structures – Views: • Are tailored presentations of data contained in one or more tables or views • Do not require any space in the database – Materialized views: • Are query results that have been stored in advance • (Like indexes) are used transparently and improve performance – Integrity constraints: • Are used in data warehouses for query rewrite – Dimensions: • Are containers of logical relationships and do not require any space in the database
  • 17. Managing Large Volumes of Data • Work smarter in your data warehouse: – Partitioning – Bitmap indexes/Star transformation – Data compression – Query rewrite • Work harder in your data warehouse: – Parallelism for all operations • DBA tasks, such as loading, index creation, table creation, data modification, backup and recovery • End-user operations, such as queries • Unbounded scalability: Real Application Clusters
  • 18. I/O Performance in Data Warehouses – I/O is typically the primary determinant of data warehouse performance. – Data warehouse storage configurations should be chosen by I/O bandwidth, not storage capacity. – Every component of the I/O subsystem should provide enough bandwidth: • Disks • I/O channels • I/O adapters – In data warehouses, maximizing sequential I/O throughput is critical.
  • 19. I/O Scalability Parallel execution: – Reduces response time for data-intensive operations on large databases – Benefits systems with the following characteristics: • Multiprocessors, clusters, or massively parallel systems • Sufficient I/O bandwidth • Sufficient memory to support memory-intensive processes such as sorts, hashing, and I/O buffers Query servers Coordinator Data on disk Scan Sort Q1 Scan Sort Q2 Dispatch work Scan Sort Q3 Scan Sort Q4 Scanners Sorters (Aggregators)
  • 20. I/O Scalability • Automatic Storage Management (ASM) – Configuring storage for a DB depends on many variables: • Which data to put on which disk • Logical unit number (LUN) configurations • DB types and workloads; data warehouse, OLTP, DSS • Trade-offs between available options – ASM provides solutions to storage issues encountered in data warehouses.
  • 21. I/O Scalability • Automatic Storage Management: Overview – Portable and high-performance cluster file system Application – Manages Oracle database files – Data spread across disks Database to balance load File – Integrated mirroring across system ASM disks Volume manager – Solves many storage management challenges Operating system
  • 22. Visit more self help tutorials • Pick a tutorial of your choice and browse through it at your own pace. • The tutorials section is free, self-guiding and will not involve any additional support. • Visit us at www.dataminingtools.net