SlideShare a Scribd company logo
1 of 127
20IT501 – Data Warehousing and
Data Mining
III Year / VI Semester
OBJECTIVES
Be familiar with the concepts of data warehouse
Know the fundamentals of data mining
Understand the importance of association rule mining
Understand the techniques of classification and
clustering
Be aware of the recent trends of data mining
UNIT I- DATA WAREHOUSING
Data Warehouse – Data warehousing Components – Building a
Data warehouse – Mapping the Data Warehouse to a
Multiprocessor Architecture – DBMS Schemas for Decision
Support. Business Analysis: Reporting and Query tools and
Applications - Online Analytical Processing (OLAP) – Need –
Multidimensional Data Model – OLAP Guidelines -
Categories of OLAP Tools.
Data Warehouse
A warehouse is a subject-oriented, integrated,
time-variant and non-volatile collection of data
in support of management’s decision making
process.
Data Warehouse
Subject Oriented:
 it provides information around a subject rather than the
organization's ongoing operations.
 These subjects can be product, customers, suppliers, sales,
revenue, etc.
 A data warehouse does not focus on the ongoing
operations, rather it focuses on modeling and analysis of
data for decision making.
Data Warehouse
Integrated:
 A data warehouse is constructed by integrating
data from heterogeneous sources such as relational
databases, flat files, etc.
 This integration enhances the effective analysis of
data.
Data Warehouse
Time Variant:
 The data collected in a data warehouse is
identified with a particular time period.
 The data in a data warehouse provides information
from the historical point of view.
Data Warehouse
Non-volatile
Non-volatile means the previous data is not erased
when new data is added to it.
 A data warehouse is kept separate from the operational
database and therefore frequent changes in operational
database is not reflected in the data warehouse.
Data Warehouse - Terminologies
 Current detail data: It is organized along
subject lines(customer profile data, sales data)
 Data Mart: Summarized departmental data
and is customized to suit the needs of a
particular department that owns the data.
Data Warehouse - Terminologies
 Drill-down: To perform business analysis in a top-
down fashion.
 Metadata: Data about data. It contains the location and
description of warehouse system components; names,
definition, structure and content of the data warehouse
and end-users views.
Data Warehouse
Operational Data Information Data
Content Current Values Summarized, Derived
Organization Application Subject
Stability Dynamic Static until refreshed
Data Structure Optimized for transactions Optimized for complex
queries
Access High Medium to low
Response time Sub second (<1s) to 2-3s Several seconds to minutes
Data Warehouse Components
Data Warehouse Components
1.Data Warehouse Database:
 It is based on a relational database management system server
that functions as the central repository for informational data.
 This repository is surrounded by a number of key components
designed to make the entire functional, manageable and
accessible by both the operational systems that source data into
the warehouse and by end-user query and analysis tool.
Data Warehouse Components
2. Sourcing, Acquisition, Clean up, and
Transformation Tools:
 They perform conversions, summarization, key
changes, structural changes and condensation.
 The data transformation is required so that the
information can be used by decision support tools.
Data Warehouse Components
 Sourcing, Acquisition, Clean up, and
Transformation Tools – Functionalities:
 To remove unwanted data from operational db
 Converting to common data names and attributes
 Calculating summaries and derived data
 Establishing defaults for missing data
 Accommodating source data definition changes
Data Warehouse Components
 Sourcing, Acquisition, Clean up, and
Transformation Tools – Issues:
 Data heterogeneity: It refers to the different way
the data is defined and used in different modules.
Data Warehouse Components
Meta data:
 It is data about data. It is used for maintaining,
managing and using the data warehouse.
 Types:
 Technical Meta data
 Business Meta data
Data Warehouse Components
 Meta data - Technical Meta data:
Information about data stores
Transformation descriptions
The rules used to perform clean up, and data enhancement
Data mapping operations
Access authorization, backup history, archive history, info
delivery history, data acquisition history, data access etc.,
Data Warehouse Components
Meta data – Business Meta data:
 It contains information that stored in data warehouse
to users.
 Subject areas, and info object type including queries,
reports, images, video, audio clips etc.
 Internet home pages
 Information related to info delivery system
Data Warehouse Components
 Meta data:
 Meta data helps the users to understand content and find
the data.
Meta data are stored in a separate data stores which is
known as informational directory or Meta data repository.
It helps to integrate, maintain and view the contents of the
data warehouse
Data Warehouse Components
 Meta data – Characteristics:
 It is the gateway to the data warehouse environment
 It supports easy distribution and replication of content for high
performance and availability
 It should be searchable by business oriented key words
 It should support the sharing of information
 It should support and provide interface to other applications
Data Warehouse Components
Access Tools:
Data query and reporting tools
Application development tools
Executive info system tools (EIS)
OLAP tools
Data mining tools
Data Warehouse Components
Data Marts:
 Departmental subsets that focus on selected subjects.
They are independent used by dedicated user group.
They are used for rapid delivery of enhanced decision
support functionality to end users.
Data Warehouse Components
 Data warehouse admin and management:
 Security and priority management
 Monitoring updates from multiple sources
 Data quality checks
 Managing and updating meta data
 Auditing and reporting data warehouse usage and status
 Replicating, sub setting and distributing data
 Backup and recovery
Data Warehouse Components
Information delivery system
It is used to enable the process of subscribing for
data warehouse info.
Delivery to one or more destinations according to
specified scheduling algorithm
Building a Data Warehouse -
Reason
There are two reasons why organizations consider data
warehousing a critical need
1.Business Factor:
 Decisions need to be made quickly and correctly, using all
available data.
 Users are business domain expert, not computer
professionals
Building a Data Warehouse -
Reason
2.Technological factor:
 The price of computer processing speed continues to
decline.
IT infrastructure is changing rapidly. Its capacity is
increasing and cost is decreasing so that building a data
warehouse is easy.
Network bandwidth is increasing.
Building a Data Warehouse
1. Business considerations:
Approaches
 Top - Down Approach
 Bottom - Up Approach
Building a Data Warehouse
 Approaches: Top - Down Approach
 To build a centralized repository to house corporate
wide business data.
 This repository is called Enterprise Data Warehouse
(EDW).
 The data in the EDW is stored in a normalized form in
order to avoid redundancy.
Building a Data Warehouse
 Approaches: Top - Down Approach
 The data in the EDW is stored at the most detail level.
The reason to build the EDW on the most detail level
is to leverage
Flexibility to be used by multiple departments.
Flexibility to cater for future requirements.
Building a Data Warehouse
 Approaches: Top - Down Approach
 We should implement the top-down approach when
The business has complete clarity on all or multiple subject
areas data warehouse requirements.
The business is ready to invest considerable time and money.
Building a Data Warehouse
 Approaches: Top - Down Approach –
Advantages:
 This is very important for the data to be reliable.
 Consistent across subject areas.
Building a Data Warehouse
 Approaches: Top - Down Approach –
Disadvantages:
 It requires more time and initial investment
The business has to wait for the EDW to be
implemented followed by building the data marts
before which they can access their reports
Building a Data Warehouse
 Approaches: Bottom Up Approach
 It is an incremental approach to build a data
warehouse.
 Here we build the data marts separately at
different points of time as and when the specific
subject area requirements are clear.
Building a Data Warehouse
 Approaches: Bottom Up Approach
 The data marts are integrated or combined
together to form a data warehouse.
 Separate data marts are combined through the use
of conformed dimensions and conformed facts.
Building a Data Warehouse
 Approaches: Bottom Up Approach
 We should implement the bottom up approach
when
 We have initial cost and time constraints.
The complete warehouse requirements are not clear. We
have clarity to only one data mart.
Building a Data Warehouse
 Approaches: Bottom Up Approach -
Advantages
 They do not require high initial costs and have a
faster implementation time;
 Hence the business can start using the marts much
earlier as compared to the top-down approach.
Building a Data Warehouse
 Approaches: Bottom Up Approach -
Disadvantages
 It stores data in the de normalized format, hence
there would be high space usage for detailed data.
 We have a tendency of not keeping detailed data
Building a Data Warehouse
2. Design considerations:
Most successful data warehouses that meet these requirements
have these common characteristics:
 Are based on a dimensional model
 Contain historical and current data
 Include both detailed and summarized data
 Consolidate disparate data from multiple sources while retaining
consistency
Building a Data Warehouse
 Data warehouse is difficult to build due to the
following reason:
 Heterogeneity of data sources
 Use of historical data
 Growing nature of data base
Building a Data Warehouse
General considerations there are following specific points relevant to
the data warehouse design:
 Data content
 The content and structure of the data warehouse are reflected in its data
model.
 The data model is the template that describes how information will be
organized within the integrated warehouse framework.
 The data warehouse data must be a detailed data. It must be formatted,
cleaned up and transformed to fit the warehouse data model.
Building a Data Warehouse
Meta data
It defines the location and contents of data in the
warehouse.
Meta data is searchable by users to find definitions
or subject areas.
Building a Data Warehouse
Data Distribution
 The data can be distributed based on the subject
area, location (geographical region), or time
(current, month, year).
Building a Data Warehouse
 Tools
 These tools provide facilities for defining the
transformation and cleanup rules, data movement,
end user query, reporting and data analysis.
Building a Data Warehouse
 Design steps
 Choosing the subject matter
 Deciding what a fact table represents
 Identifying and conforming the dimensions
 Choosing the facts
 Storing pre calculations in the fact table
 Rounding out the dimension table
 Choosing the duration of the db
 The need to track slowly changing dimensions
 Deciding the query priorities and query models
Building a Data Warehouse
3. Implementation considerations:
The following logical steps needed to implement a data
warehouse:
 Collect and analyze business requirements
 Create a data model and a physical design
 Define data sources
 Choose the database Technology and platform
 Extract the data from operational database, transform it, clean it
up and load it into the warehouse
Building a Data Warehouse
 Implementation considerations
Choose database access and reporting tools
Choose database connectivity software
Choose data analysis and presentation s/w
Update the data warehouse Access tools
Building a Data Warehouse
 Access Tools
Simple tabular form data
Ranking data
Multivariable data
Time series data
Graphing, charting and pivoting data
Building a Data Warehouse
 Access Tools
Statistical analysis data
Predefined repeatable queries
Ad hoc user specified queries
Reporting and analysis data
Building a Data Warehouse
 Data extraction, clean up, transformation and
migration – Selection Criteria:
 Timeliness of data delivery to the warehouse
 The tool must have the ability to identify the
particular data and that can be read by conversion tool
 The tool must have the capability to merge data from
multiple data stores.
Building a Data Warehouse
 User levels - Casual users:
They are most comfortable in retrieving information
from warehouse in pre defined formats and running pre
existing queries and reports.
These users do not need tools that allow for building
standard and ad hoc reports
Building a Data Warehouse
 User levels - Power Users:
The user can use pre defined as well as user defined
queries to create simple and ad hoc reports.
These users can engage in drill down operations.
These users may have the experience of using
reporting and query tools.
Building a Data Warehouse
 User levels - Expert users:
These users tend to create their own complex
queries and perform standard analysis on the info
they retrieve.
These users have the knowledge about the use of
query and report tools
Building a Data Warehouse
 Benefits of data warehousing:
Locating the right information
Presentation of information
Testing of hypothesis
Discovery of information
Sharing the analysis
Building a Data Warehouse
 Tangible benefits (quantified / measureable):It
includes
Improvement in product inventory
Decrement in production cost
Improvement in selection of target markets
Enhancement in asset and liability management
Building a Data Warehouse
 Intangible benefits (not easy to quantified): It
includes,
Improvement in productivity by keeping all data in
single location and eliminating rekeying of data
Reduced redundant processing
Enhanced customer relation
Mapping the Data Warehouse to
a Multiprocessor Architecture
 Parallel Query Processing:
The accessing and processing of portion of database by
individual threads in parallel can greatly improve the
performance of the query.
 Speed up: The ability to execute the same request on the
same amount of data in less time.
 Scale up: The ability to obtain the same performance on
the same requests as the database size increases
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Types of parallelism:
 Inter query Parallelism: In which different server threads
or processes handle multiple requests at the same time.
 Intra query Parallelism: This form of parallelism
decomposes the serial SQL query into lower level
operations such as scan, join, sort etc. Then these lower
level operations are executed concurrently in parallel.
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Types of parallelism: Intra query parallelism can
be done in either of two ways:
 Horizontal parallelism: which means that the data
base is partitioned across multiple disks and parallel
processing occurs within a specific task that is
performed concurrently on different processors against
different set of data
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Types of parallelism: Intra query parallelism can
be done in either of two ways:
 Vertical parallelism: This occurs among different
tasks. All query components such as scan, join, sort etc
are executed in parallel in a pipelined fashion. In other
words, an output from one task becomes an input into
another task.
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Data partitioning
 Data partitioning is the key component for effective
parallel execution of data base operations.
It spread data from database tables across multiple
disks so that I/O operations such as read and write can
be performed in parallel.
Partition can be done randomly or intelligently
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Data partitioning
 Random portioning includes random data striping
across multiple disks on a single server. Another
option for random portioning is round robin fashion
partitioning in which each record is placed on the next
disk assigned to the data base.
 Since DBMS doe not know where each record resides.
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Data partitioning
 Intelligent partitioning assumes that DBMS knows
where a specific record is located and does not waste
time searching for it across all disks. The various
intelligent partitioning include:
 Hash partitioning, Key range partitioning, Schema
portioning and User defined portioning
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Data base architectures of parallel processing:
Shared memory or shared everything Architecture
Shared disk architecture
Shred nothing architecture
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared memory or shared everything Architecture:
 Multiple PUs share memory.
 Each PU has full access to all shared memory through a common
bus.
 Communication between nodes occurs via shared memory.
 Performance is limited by the bandwidth of the memory bus.
 It is providing consistent single system to the user.
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared memory or shared everything
Architecture:
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared memory or shared everything
Architecture:
 Threads provide better resource utilization and faster
context switching, thus providing for better scalability.
 At the same time, threads that are too tightly coupled
with the OS may limit RDBMS portability.
 Scalability of shared-memory architectures is limited.
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared disk architecture:
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared disk architecture:
 Each node consists of one or more PUs and associated
memory.
Memory is not shared between nodes.
Communication occurs over a common high-speed bus.
Each node has access to the same disks and other resources.
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared disk architecture:
 It implements the concept of shared ownership of the
entire database between RBMS server, each of which is
running on a node of a distributed memory system.
 Each RDBMS server can read, write, update, and remove
data from the same shared database, which would require
the system to implement a form of a distributed lock
manager(DLM).
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared disk architecture:
 Since the memory is not shared among the nodes, each
node has its own data cache.
Cache consistency must be maintained across the nodes
and a lock manager is needed to maintain the consistency.
 There is additional overhead in maintaining the locks and
ensuring that the data caches are consistent
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared disk architecture – Advantages:
 Shared disk systems permit high availability. All data
is accessible even if one node dies.
 These systems have the concept of one database,
which is an advantage over shared nothing systems.
 Shared disk systems provide for incremental growth.
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared disk architecture – Disadvantages:
Inter-node synchronization is required, involving DLM
overhead and greater dependency on high-speed
interconnect.
If the workload is not partitioned well, there may be high
synchronization overhead.
There is operating system overhead of running shared disk
software..
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared Nothing Architecture
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared Nothing Architecture:
 Shared nothing systems are typically loosely coupled.
In shared nothing systems only one CPU is connected
to a given disk.
If a table or database is located on that disk, access
depends entirely on the PU which owns it.
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared Nothing Architecture – Advantages:
 Shared nothing systems provide for incremental
growth.
 Failure is local: if one node fails, the others stay
up.
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Shared Nothing Architecture – Advantages:
 More coordination is required
 More overhead is required for a process working
on a disk belonging to another node.
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Parallel DBMS Vendors – Oracle:
Architecture: shared disk architecture
Data partition: Key range, hash, round robin
Parallel operations: hash joins, scan and sort
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Parallel DBMS Vendors – Informix:
Architecture: Shared memory, shared disk and
shared nothing models
Data partition: round robin, hash, schema, key
range and user defined
Parallel operations: INSERT, UPDATE, DELELTE
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Parallel DBMS Vendors – IBM:
 Architecture: Shared nothing models
 Data partition: hash
 Parallel operations: INSERT, UPDATE,
DELELTE, load, recovery, index creation, backup,
table reorganization
Mapping the Data Warehouse to a
Multiprocessor Architecture
 Parallel DBMS Vendors – SYBASE:
Architecture: Shared nothing models
Data partition: hash, key range, Schema
Parallel operations: Horizontal and vertical
parallelism
DBMS Schemas for Decision
Support
A data cube allows data to be modeled and
viewed in multiple dimensions. It is defined by
dimensions and facts.
 In general terms, dimensions are the perspectives
or entities with respect to which an organization
wants to keep records.
DBMS Schemas for Decision
Support
 Example:
A sales data warehouse in order to keep records of the
store’s sales with respect to the dimensions time, item,
branch, and location.
 Each dimension may have a table associated with it, called
a dimension table, which further describes the dimension.
A dimension table for item may contain the attributes item
name, brand, and type.
DBMS Schemas for Decision
Support
 Example:
A multidimensional data model is typically organized
around a central theme, like sales, for instance. This theme
is represented by a fact table.
Facts are numerical measures.
Examples of facts for a sales data warehouse include
dollars sold (sales amount in dollars), units sold (number of
units sold), and amount budgeted.
DBMS Schemas for Decision
Support
 Example
DBMS Schemas for Decision
Support
 Star schema
 The most popular data model for a data warehouse is a
multidimensional model.
 Data warehouse contains a large central table (fact table) containing
the bulk of the data, with no redundancy, and
 A set of smaller associated tables (dimension tables), one for each
dimension.
 The dimension tables displayed in a radial pattern around the central
fact table.
DBMS Schemas for Decision
Support
 Star schema
DBMS Schemas for Decision
Support
 Star schema – Characteristics:
Simple structure - > easy to understand schema
Great query effectives -> small number of tables to
join
Relatively long time of loading data into dimension
tables -> de-normalization, redundancy data caused
that size of the table could be large.
DBMS Schemas for Decision
Support
 Snowflake schema:
 The snowflake schema is a variant of the star schema
model, where some dimension tables are normalized,
thereby further splitting the data into additional tables.
 The resulting schema graph forms a shape similar to a
snowflake.
DBMS Schemas for Decision
Support
 Snowflake schema:
DBMS Schemas for Decision
Support
 Snowflake schema:
 The major difference between the snowflake and star
schema models is that the dimension tables of the
snowflake model may be kept in normalized form to
reduce redundancies.
 Such a table is easy to maintain and saves storage
space.
DBMS Schemas for Decision
Support
Fact constellation
 Sophisticated applications may require multiple
fact tables to share dimension tables.
 This kind of schema can be viewed as a collection
of stars, and hence is called a galaxy schema or a
fact constellation.
DBMS Schemas for Decision
Support
Fact constellation
Business Analysis: Reporting and
Query Tools and Applications
 The principal purpose of data warehousing is to
provide information to business users for strategic
decision making.
 These users interact with the data warehouse using
front-end tools, or by getting the required information
through the information delivery system.
Business Analysis: Reporting and
Query Tools and Applications
 Tool Categories
 Reporting - Companies generate regular operational reports. Ex: Calculating
and printing paychecks.
 Managed query - make it possible for knowledge workers to access corporate
data without IS intervention
 Executive information systems - It allows developers to build customized,
graphical decision support applications
 OLAP - A natural way to view corporate data
 Data Mining - It uses a variety of statistical and artificial-intelligence
algorithms to analyze the correlation of variables
Business Analysis: Reporting and
Query Tools and Applications
 Cognos Impromptu:
 Impromptu is an interactive database reporting tool.
 It allows Power Users to query data without
programming knowledge.
 When using the Impromptu tool, no data is written or
changed in the database. It is only capable of reading
the data
Business Analysis: Reporting and
Query Tools and Applications
 Cognos Impromptu – Features:
 Interactive reporting capability
Enterprise-wide scalability
Superior user interface
Fastest time to result
Lowest cost of ownership
Business Analysis: Reporting and
Query Tools and Applications
 Catalogs:
 Impromptu stores metadata in subject related folders
 This metadata will be used to develop a query for a report.
The metadata set is stored in a file called a catalog.
 The catalog does not contain any data.
It just contains information about connecting to the
database and the fields that will be accessible for reports.
Business Analysis: Reporting and
Query Tools and Applications
 Catalogs contains:
 Folders—meaningful groups of information representing columns
from one or more tables
 Columns—individual data elements that can appear in one or more
folders
 Calculations—expressions used to compute required values from
existing data
 Conditions—used to filter information so that only a certain type of
information is displayed
Business Analysis: Reporting and
Query Tools and Applications
Prompts—pre-defined selection criteria prompts
that users can include in reports they create
Other components, such as metadata, a logical
database name, join information, and user classes
Business Analysis: Reporting and
Query Tools and Applications
 Catalogs – Uses:
view, run, and print reports
export reports to other applications
disconnect from and connect to the database
create reports
change the contents of the catalog
Business Analysis: Reporting and
Query Tools and Applications
 Reports:
 Reports are created by choosing fields from the
catalog folders. This process will build a SQL
(Structured Query Language) statement behind the
scene.
 No SQL knowledge is required to use Impromptu.
Business Analysis: Reporting and
Query Tools and Applications
 Reports:
 The data in the report may be formatted, sorted and/or
grouped as needed.
 Titles, dates, headers and footers and other standard text
formatting features (italics, bolding, and font size) are also
available.
 Once the desired layout is obtained, the report can be
saved to a report file.
Business Analysis: Reporting and
Query Tools and Applications
 Frame-Based Reporting
 Frames are the building blocks of all Impromptu reports and
templates.
 They may contain report objects, such as data, text, pictures, and
charts.
 There are no limits to the number of frames that you can place
within an individual report or template. You can nest frames
within other frames to group report objects within a report.
Business Analysis: Reporting and
Query Tools and Applications
 Frame-Based Reporting
 Form frame: An empty form frame appears.
 List frame: An empty list frame appears.
 Text frame: The flashing I-beam appears where you can begin
inserting text.
 Picture frame: The Source tab (Picture Properties dialog box)
appears. You can use this tab to select the image to include in the
frame.
Business Analysis: Reporting and
Query Tools and Applications
 Frame-Based Reporting
Chart frame: The Data tab (Chart Properties dialog box)
appears. You can use this tab to select the data item to
include in the chart.
OLE Object: The Insert Object dialog box appears where
you can locate and select the file you want to insert, or you
can create a new object using the software listed in the
Object Type box.
Business Analysis: Reporting and
Query Tools and Applications
 Impromptu features
 Unified query and reporting interface: It unifies both
query and reporting interface in a single user interface.
 Object oriented architecture: It enables an inheritance
based administration so that more than 1000 users can be
accommodated as easily as single user.
Business Analysis: Reporting and
Query Tools and Applications
 Impromptu features
Scalability: Its scalability ranges from single user to
1000 user
Security and Control: Security is based on user
profiles and their classes.
 Data presented in a business context: It presents
information using the terminology of the business.
Business Analysis: Reporting and
Query Tools and Applications
 Impromptu features
Over 70 pre defined report templates: It allows users can
simply supply the data to create an interactive report
Frame based reporting: It offers number of objects to
create a user designed report
Business relevant reporting: It can be used to generate a
business relevant report through filters, pre conditions and
calculations
Online Analytical Processing
(OLAP)
 It uses database tables (fact and dimension tables) to
enable multidimensional viewing, analysis and
querying of large amounts of data.
 Online Analytical Processing (OLAP) applications and
tools are those that are designed to ask ―complex
queries of large multidimensional collections of data
Online Analytical Processing
(OLAP)
Need:
 It is the multidimensional nature of the business
problem.
 These problems are characterized by retrieving a very
large number of records that can reach gigabytes and
terabytes and summarizing this data into a form
information that can by used by business analysts.
Online Analytical Processing
(OLAP)
The Multidimensional Data Model:
 OLAP is on-line, it must provide answers quickly,
analysts pose iterative queries during interactive
sessions
 Multidimensional data model is to view it as a
cube.
Online Analytical Processing
(OLAP)
 The Multidimensional Data Model – Operations:
 Roll-up: The roll-up operation (also called the drill-up
operation by some vendors) performs aggregation on a
data cube, either by climbing up a concept hierarchy
for a dimension or by dimension reduction.
Online Analytical Processing
(OLAP)
 The Multidimensional Data Model –
Operations:
Online Analytical Processing
(OLAP)
 The Multidimensional Data Model – Operations:
 Drill-down: Drill-down is the reverse of roll-up. It
navigates from less detailed data to more detailed data.
 Drill-down can be realized by either stepping down a
concept hierarchy for a dimension or introducing
additional dimensions.
Online Analytical Processing
(OLAP)
 The Multidimensional Data Model –
Operations:
Online Analytical Processing
(OLAP)
 The Multidimensional Data Model –
Operations:
 Slice and dice: The slice operation performs a
selection on one dimension of the given cube,
resulting in a subcube.
Online Analytical Processing
(OLAP)
 The Multidimensional Data Model –
Operations:
 The dice operation defines a subcube by
performing a selection on two or more dimensions.
Online Analytical Processing
(OLAP)
 The Multidimensional Data Model –
Operations:
 Pivot (also called rotate) is a visualization
operation that rotates the data axes in view to
provide an alternative data presentation.
OLAP vs OLTP
OLTP OLAP
Source of data Operational data; OLTPs are the
original source of the data.
Consolidation data; OLAP
data comes from the
various OLTP Databases
Purpose of data To control and run fundamental
business tasks
To help with planning,
problem solving, and
decision support
What the data Reveals a snapshot of ongoing
business processes
Multi-dimensional views of
various kinds of business
activities
Queries Relatively standardized and simple
queries Returning relatively few
records
Often complex queries
involving aggregations
OLAP vs OLTP
OLTP OLAP
Processing Speed Typically very fast Depends on the amount of
data involved;
Space Requirements Can be relatively small if historical
data is archived
Larger due to the existence
of aggregation structures
and history data;
Database Design Highly normalized with many
tables
Typically de-normalized
with fewer tables; use of
star and/or snowflake
schemas
Backup and
Recovery
data loss is likely to entail
significant monetary loss and legal
liability
Instead of regular backups,
some environments may
consider simply reloading
the OLTP data as a
recovery method
Online Analytical Processing
(OLAP)
 Categories of OLAP Tools – MOLAP:
 This is the more traditional way of OLAP analysis.
In MOLAP, data is stored in a multidimensional cube.
The storage is not in the relational database, but in
proprietary formats.
That is, data stored in array-based structures.
Online Analytical Processing
(OLAP)
 Categories of OLAP Tools – ROLAP
This methodology relies on manipulating the data
stored in the relational database to give the appearance
of traditional OLAP's slicing and dicing functionality.
In essence, each action of slicing and dicing is
equivalent to adding a "WHERE" clause in the SQL
statement. Data stored in relational tables
Online Analytical Processing
(OLAP)
 Categories of OLAP Tools – HOLAP (MQE: Managed Query
Environment)
 HOLAP technologies attempt to combine the advantages of MOLAP
and ROLAP.
 For summary-type information, HOLAP leverages cube technology for
faster performance.
 It stores only the indexes and aggregations in the multidimensional
form while the rest of the data is stored in the relational database.
20IT501_DWDM_PPT_Unit_I.ppt

More Related Content

Similar to 20IT501_DWDM_PPT_Unit_I.ppt

Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse conceptsobieefans
 
11667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect411667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect4ambujm
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Data warehouse
Data warehouseData warehouse
Data warehouseRajThakuri
 
Chapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 veroChapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 veroangshuman2387
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Materialobieefans
 
Dwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousingDwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousingDhilsath Fathima
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data WarehousingAAKANKSHA JAIN
 
Dataware housing
Dataware housingDataware housing
Dataware housingwork
 
Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designSarita Kataria
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxSalehaMariyam
 

Similar to 20IT501_DWDM_PPT_Unit_I.ppt (20)

Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
11667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect411667 Bitt I 2008 Lect4
11667 Bitt I 2008 Lect4
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Unit 3 part 2
Unit  3 part 2Unit  3 part 2
Unit 3 part 2
 
DW 101
DW 101DW 101
DW 101
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
 
Chapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 veroChapter 2-data-warehousingppt2517 vero
Chapter 2-data-warehousingppt2517 vero
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
 
Dwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousingDwdm unit 1-2016-Data ingarehousing
Dwdm unit 1-2016-Data ingarehousing
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data Warehousing
 
Dataware housing
Dataware housingDataware housing
Dataware housing
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Data wirehouse
Data wirehouseData wirehouse
Data wirehouse
 
Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-design
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptx
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 

More from SamPrem3

20IT501_DWDM_U5.ppt
20IT501_DWDM_U5.ppt20IT501_DWDM_U5.ppt
20IT501_DWDM_U5.pptSamPrem3
 
20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.pptSamPrem3
 
20IT501_DWDM_U3.ppt
20IT501_DWDM_U3.ppt20IT501_DWDM_U3.ppt
20IT501_DWDM_U3.pptSamPrem3
 
20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.pptSamPrem3
 
UNIT V DIS.pptx
UNIT V DIS.pptxUNIT V DIS.pptx
UNIT V DIS.pptxSamPrem3
 
UNIT IV DIS.pptx
UNIT IV DIS.pptxUNIT IV DIS.pptx
UNIT IV DIS.pptxSamPrem3
 
UNIT III DIS.pptx
UNIT III DIS.pptxUNIT III DIS.pptx
UNIT III DIS.pptxSamPrem3
 
UNIT II DIS.pptx
UNIT II DIS.pptxUNIT II DIS.pptx
UNIT II DIS.pptxSamPrem3
 
UNIT I DIS.pptx
UNIT I DIS.pptxUNIT I DIS.pptx
UNIT I DIS.pptxSamPrem3
 

More from SamPrem3 (9)

20IT501_DWDM_U5.ppt
20IT501_DWDM_U5.ppt20IT501_DWDM_U5.ppt
20IT501_DWDM_U5.ppt
 
20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt
 
20IT501_DWDM_U3.ppt
20IT501_DWDM_U3.ppt20IT501_DWDM_U3.ppt
20IT501_DWDM_U3.ppt
 
20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt20IT501_DWDM_PPT_Unit_II.ppt
20IT501_DWDM_PPT_Unit_II.ppt
 
UNIT V DIS.pptx
UNIT V DIS.pptxUNIT V DIS.pptx
UNIT V DIS.pptx
 
UNIT IV DIS.pptx
UNIT IV DIS.pptxUNIT IV DIS.pptx
UNIT IV DIS.pptx
 
UNIT III DIS.pptx
UNIT III DIS.pptxUNIT III DIS.pptx
UNIT III DIS.pptx
 
UNIT II DIS.pptx
UNIT II DIS.pptxUNIT II DIS.pptx
UNIT II DIS.pptx
 
UNIT I DIS.pptx
UNIT I DIS.pptxUNIT I DIS.pptx
UNIT I DIS.pptx
 

Recently uploaded

Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage examplePragyanshuParadkar1
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixingviprabot1
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIkoyaldeepu123
 

Recently uploaded (20)

Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage example
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixing
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AI
 

20IT501_DWDM_PPT_Unit_I.ppt

  • 1. 20IT501 – Data Warehousing and Data Mining III Year / VI Semester
  • 2. OBJECTIVES Be familiar with the concepts of data warehouse Know the fundamentals of data mining Understand the importance of association rule mining Understand the techniques of classification and clustering Be aware of the recent trends of data mining
  • 3. UNIT I- DATA WAREHOUSING Data Warehouse – Data warehousing Components – Building a Data warehouse – Mapping the Data Warehouse to a Multiprocessor Architecture – DBMS Schemas for Decision Support. Business Analysis: Reporting and Query tools and Applications - Online Analytical Processing (OLAP) – Need – Multidimensional Data Model – OLAP Guidelines - Categories of OLAP Tools.
  • 4. Data Warehouse A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process.
  • 5. Data Warehouse Subject Oriented:  it provides information around a subject rather than the organization's ongoing operations.  These subjects can be product, customers, suppliers, sales, revenue, etc.  A data warehouse does not focus on the ongoing operations, rather it focuses on modeling and analysis of data for decision making.
  • 6. Data Warehouse Integrated:  A data warehouse is constructed by integrating data from heterogeneous sources such as relational databases, flat files, etc.  This integration enhances the effective analysis of data.
  • 7. Data Warehouse Time Variant:  The data collected in a data warehouse is identified with a particular time period.  The data in a data warehouse provides information from the historical point of view.
  • 8. Data Warehouse Non-volatile Non-volatile means the previous data is not erased when new data is added to it.  A data warehouse is kept separate from the operational database and therefore frequent changes in operational database is not reflected in the data warehouse.
  • 9. Data Warehouse - Terminologies  Current detail data: It is organized along subject lines(customer profile data, sales data)  Data Mart: Summarized departmental data and is customized to suit the needs of a particular department that owns the data.
  • 10. Data Warehouse - Terminologies  Drill-down: To perform business analysis in a top- down fashion.  Metadata: Data about data. It contains the location and description of warehouse system components; names, definition, structure and content of the data warehouse and end-users views.
  • 11. Data Warehouse Operational Data Information Data Content Current Values Summarized, Derived Organization Application Subject Stability Dynamic Static until refreshed Data Structure Optimized for transactions Optimized for complex queries Access High Medium to low Response time Sub second (<1s) to 2-3s Several seconds to minutes
  • 13. Data Warehouse Components 1.Data Warehouse Database:  It is based on a relational database management system server that functions as the central repository for informational data.  This repository is surrounded by a number of key components designed to make the entire functional, manageable and accessible by both the operational systems that source data into the warehouse and by end-user query and analysis tool.
  • 14. Data Warehouse Components 2. Sourcing, Acquisition, Clean up, and Transformation Tools:  They perform conversions, summarization, key changes, structural changes and condensation.  The data transformation is required so that the information can be used by decision support tools.
  • 15. Data Warehouse Components  Sourcing, Acquisition, Clean up, and Transformation Tools – Functionalities:  To remove unwanted data from operational db  Converting to common data names and attributes  Calculating summaries and derived data  Establishing defaults for missing data  Accommodating source data definition changes
  • 16. Data Warehouse Components  Sourcing, Acquisition, Clean up, and Transformation Tools – Issues:  Data heterogeneity: It refers to the different way the data is defined and used in different modules.
  • 17. Data Warehouse Components Meta data:  It is data about data. It is used for maintaining, managing and using the data warehouse.  Types:  Technical Meta data  Business Meta data
  • 18. Data Warehouse Components  Meta data - Technical Meta data: Information about data stores Transformation descriptions The rules used to perform clean up, and data enhancement Data mapping operations Access authorization, backup history, archive history, info delivery history, data acquisition history, data access etc.,
  • 19. Data Warehouse Components Meta data – Business Meta data:  It contains information that stored in data warehouse to users.  Subject areas, and info object type including queries, reports, images, video, audio clips etc.  Internet home pages  Information related to info delivery system
  • 20. Data Warehouse Components  Meta data:  Meta data helps the users to understand content and find the data. Meta data are stored in a separate data stores which is known as informational directory or Meta data repository. It helps to integrate, maintain and view the contents of the data warehouse
  • 21. Data Warehouse Components  Meta data – Characteristics:  It is the gateway to the data warehouse environment  It supports easy distribution and replication of content for high performance and availability  It should be searchable by business oriented key words  It should support the sharing of information  It should support and provide interface to other applications
  • 22. Data Warehouse Components Access Tools: Data query and reporting tools Application development tools Executive info system tools (EIS) OLAP tools Data mining tools
  • 23. Data Warehouse Components Data Marts:  Departmental subsets that focus on selected subjects. They are independent used by dedicated user group. They are used for rapid delivery of enhanced decision support functionality to end users.
  • 24. Data Warehouse Components  Data warehouse admin and management:  Security and priority management  Monitoring updates from multiple sources  Data quality checks  Managing and updating meta data  Auditing and reporting data warehouse usage and status  Replicating, sub setting and distributing data  Backup and recovery
  • 25. Data Warehouse Components Information delivery system It is used to enable the process of subscribing for data warehouse info. Delivery to one or more destinations according to specified scheduling algorithm
  • 26. Building a Data Warehouse - Reason There are two reasons why organizations consider data warehousing a critical need 1.Business Factor:  Decisions need to be made quickly and correctly, using all available data.  Users are business domain expert, not computer professionals
  • 27. Building a Data Warehouse - Reason 2.Technological factor:  The price of computer processing speed continues to decline. IT infrastructure is changing rapidly. Its capacity is increasing and cost is decreasing so that building a data warehouse is easy. Network bandwidth is increasing.
  • 28. Building a Data Warehouse 1. Business considerations: Approaches  Top - Down Approach  Bottom - Up Approach
  • 29. Building a Data Warehouse  Approaches: Top - Down Approach  To build a centralized repository to house corporate wide business data.  This repository is called Enterprise Data Warehouse (EDW).  The data in the EDW is stored in a normalized form in order to avoid redundancy.
  • 30. Building a Data Warehouse  Approaches: Top - Down Approach  The data in the EDW is stored at the most detail level. The reason to build the EDW on the most detail level is to leverage Flexibility to be used by multiple departments. Flexibility to cater for future requirements.
  • 31. Building a Data Warehouse  Approaches: Top - Down Approach  We should implement the top-down approach when The business has complete clarity on all or multiple subject areas data warehouse requirements. The business is ready to invest considerable time and money.
  • 32. Building a Data Warehouse  Approaches: Top - Down Approach – Advantages:  This is very important for the data to be reliable.  Consistent across subject areas.
  • 33. Building a Data Warehouse  Approaches: Top - Down Approach – Disadvantages:  It requires more time and initial investment The business has to wait for the EDW to be implemented followed by building the data marts before which they can access their reports
  • 34. Building a Data Warehouse  Approaches: Bottom Up Approach  It is an incremental approach to build a data warehouse.  Here we build the data marts separately at different points of time as and when the specific subject area requirements are clear.
  • 35. Building a Data Warehouse  Approaches: Bottom Up Approach  The data marts are integrated or combined together to form a data warehouse.  Separate data marts are combined through the use of conformed dimensions and conformed facts.
  • 36. Building a Data Warehouse  Approaches: Bottom Up Approach  We should implement the bottom up approach when  We have initial cost and time constraints. The complete warehouse requirements are not clear. We have clarity to only one data mart.
  • 37. Building a Data Warehouse  Approaches: Bottom Up Approach - Advantages  They do not require high initial costs and have a faster implementation time;  Hence the business can start using the marts much earlier as compared to the top-down approach.
  • 38. Building a Data Warehouse  Approaches: Bottom Up Approach - Disadvantages  It stores data in the de normalized format, hence there would be high space usage for detailed data.  We have a tendency of not keeping detailed data
  • 39. Building a Data Warehouse 2. Design considerations: Most successful data warehouses that meet these requirements have these common characteristics:  Are based on a dimensional model  Contain historical and current data  Include both detailed and summarized data  Consolidate disparate data from multiple sources while retaining consistency
  • 40. Building a Data Warehouse  Data warehouse is difficult to build due to the following reason:  Heterogeneity of data sources  Use of historical data  Growing nature of data base
  • 41. Building a Data Warehouse General considerations there are following specific points relevant to the data warehouse design:  Data content  The content and structure of the data warehouse are reflected in its data model.  The data model is the template that describes how information will be organized within the integrated warehouse framework.  The data warehouse data must be a detailed data. It must be formatted, cleaned up and transformed to fit the warehouse data model.
  • 42. Building a Data Warehouse Meta data It defines the location and contents of data in the warehouse. Meta data is searchable by users to find definitions or subject areas.
  • 43. Building a Data Warehouse Data Distribution  The data can be distributed based on the subject area, location (geographical region), or time (current, month, year).
  • 44. Building a Data Warehouse  Tools  These tools provide facilities for defining the transformation and cleanup rules, data movement, end user query, reporting and data analysis.
  • 45. Building a Data Warehouse  Design steps  Choosing the subject matter  Deciding what a fact table represents  Identifying and conforming the dimensions  Choosing the facts  Storing pre calculations in the fact table  Rounding out the dimension table  Choosing the duration of the db  The need to track slowly changing dimensions  Deciding the query priorities and query models
  • 46. Building a Data Warehouse 3. Implementation considerations: The following logical steps needed to implement a data warehouse:  Collect and analyze business requirements  Create a data model and a physical design  Define data sources  Choose the database Technology and platform  Extract the data from operational database, transform it, clean it up and load it into the warehouse
  • 47. Building a Data Warehouse  Implementation considerations Choose database access and reporting tools Choose database connectivity software Choose data analysis and presentation s/w Update the data warehouse Access tools
  • 48. Building a Data Warehouse  Access Tools Simple tabular form data Ranking data Multivariable data Time series data Graphing, charting and pivoting data
  • 49. Building a Data Warehouse  Access Tools Statistical analysis data Predefined repeatable queries Ad hoc user specified queries Reporting and analysis data
  • 50. Building a Data Warehouse  Data extraction, clean up, transformation and migration – Selection Criteria:  Timeliness of data delivery to the warehouse  The tool must have the ability to identify the particular data and that can be read by conversion tool  The tool must have the capability to merge data from multiple data stores.
  • 51. Building a Data Warehouse  User levels - Casual users: They are most comfortable in retrieving information from warehouse in pre defined formats and running pre existing queries and reports. These users do not need tools that allow for building standard and ad hoc reports
  • 52. Building a Data Warehouse  User levels - Power Users: The user can use pre defined as well as user defined queries to create simple and ad hoc reports. These users can engage in drill down operations. These users may have the experience of using reporting and query tools.
  • 53. Building a Data Warehouse  User levels - Expert users: These users tend to create their own complex queries and perform standard analysis on the info they retrieve. These users have the knowledge about the use of query and report tools
  • 54. Building a Data Warehouse  Benefits of data warehousing: Locating the right information Presentation of information Testing of hypothesis Discovery of information Sharing the analysis
  • 55. Building a Data Warehouse  Tangible benefits (quantified / measureable):It includes Improvement in product inventory Decrement in production cost Improvement in selection of target markets Enhancement in asset and liability management
  • 56. Building a Data Warehouse  Intangible benefits (not easy to quantified): It includes, Improvement in productivity by keeping all data in single location and eliminating rekeying of data Reduced redundant processing Enhanced customer relation
  • 57. Mapping the Data Warehouse to a Multiprocessor Architecture  Parallel Query Processing: The accessing and processing of portion of database by individual threads in parallel can greatly improve the performance of the query.  Speed up: The ability to execute the same request on the same amount of data in less time.  Scale up: The ability to obtain the same performance on the same requests as the database size increases
  • 58. Mapping the Data Warehouse to a Multiprocessor Architecture  Types of parallelism:  Inter query Parallelism: In which different server threads or processes handle multiple requests at the same time.  Intra query Parallelism: This form of parallelism decomposes the serial SQL query into lower level operations such as scan, join, sort etc. Then these lower level operations are executed concurrently in parallel.
  • 59. Mapping the Data Warehouse to a Multiprocessor Architecture  Types of parallelism: Intra query parallelism can be done in either of two ways:  Horizontal parallelism: which means that the data base is partitioned across multiple disks and parallel processing occurs within a specific task that is performed concurrently on different processors against different set of data
  • 60. Mapping the Data Warehouse to a Multiprocessor Architecture  Types of parallelism: Intra query parallelism can be done in either of two ways:  Vertical parallelism: This occurs among different tasks. All query components such as scan, join, sort etc are executed in parallel in a pipelined fashion. In other words, an output from one task becomes an input into another task.
  • 61. Mapping the Data Warehouse to a Multiprocessor Architecture  Data partitioning  Data partitioning is the key component for effective parallel execution of data base operations. It spread data from database tables across multiple disks so that I/O operations such as read and write can be performed in parallel. Partition can be done randomly or intelligently
  • 62. Mapping the Data Warehouse to a Multiprocessor Architecture  Data partitioning  Random portioning includes random data striping across multiple disks on a single server. Another option for random portioning is round robin fashion partitioning in which each record is placed on the next disk assigned to the data base.  Since DBMS doe not know where each record resides.
  • 63. Mapping the Data Warehouse to a Multiprocessor Architecture  Data partitioning  Intelligent partitioning assumes that DBMS knows where a specific record is located and does not waste time searching for it across all disks. The various intelligent partitioning include:  Hash partitioning, Key range partitioning, Schema portioning and User defined portioning
  • 64. Mapping the Data Warehouse to a Multiprocessor Architecture  Data base architectures of parallel processing: Shared memory or shared everything Architecture Shared disk architecture Shred nothing architecture
  • 65. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared memory or shared everything Architecture:  Multiple PUs share memory.  Each PU has full access to all shared memory through a common bus.  Communication between nodes occurs via shared memory.  Performance is limited by the bandwidth of the memory bus.  It is providing consistent single system to the user.
  • 66. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared memory or shared everything Architecture:
  • 67. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared memory or shared everything Architecture:  Threads provide better resource utilization and faster context switching, thus providing for better scalability.  At the same time, threads that are too tightly coupled with the OS may limit RDBMS portability.  Scalability of shared-memory architectures is limited.
  • 68. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared disk architecture:
  • 69. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared disk architecture:  Each node consists of one or more PUs and associated memory. Memory is not shared between nodes. Communication occurs over a common high-speed bus. Each node has access to the same disks and other resources.
  • 70. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared disk architecture:  It implements the concept of shared ownership of the entire database between RBMS server, each of which is running on a node of a distributed memory system.  Each RDBMS server can read, write, update, and remove data from the same shared database, which would require the system to implement a form of a distributed lock manager(DLM).
  • 71. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared disk architecture:  Since the memory is not shared among the nodes, each node has its own data cache. Cache consistency must be maintained across the nodes and a lock manager is needed to maintain the consistency.  There is additional overhead in maintaining the locks and ensuring that the data caches are consistent
  • 72. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared disk architecture – Advantages:  Shared disk systems permit high availability. All data is accessible even if one node dies.  These systems have the concept of one database, which is an advantage over shared nothing systems.  Shared disk systems provide for incremental growth.
  • 73. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared disk architecture – Disadvantages: Inter-node synchronization is required, involving DLM overhead and greater dependency on high-speed interconnect. If the workload is not partitioned well, there may be high synchronization overhead. There is operating system overhead of running shared disk software..
  • 74. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared Nothing Architecture
  • 75. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared Nothing Architecture:  Shared nothing systems are typically loosely coupled. In shared nothing systems only one CPU is connected to a given disk. If a table or database is located on that disk, access depends entirely on the PU which owns it.
  • 76. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared Nothing Architecture – Advantages:  Shared nothing systems provide for incremental growth.  Failure is local: if one node fails, the others stay up.
  • 77. Mapping the Data Warehouse to a Multiprocessor Architecture  Shared Nothing Architecture – Advantages:  More coordination is required  More overhead is required for a process working on a disk belonging to another node.
  • 78. Mapping the Data Warehouse to a Multiprocessor Architecture  Parallel DBMS Vendors – Oracle: Architecture: shared disk architecture Data partition: Key range, hash, round robin Parallel operations: hash joins, scan and sort
  • 79. Mapping the Data Warehouse to a Multiprocessor Architecture  Parallel DBMS Vendors – Informix: Architecture: Shared memory, shared disk and shared nothing models Data partition: round robin, hash, schema, key range and user defined Parallel operations: INSERT, UPDATE, DELELTE
  • 80. Mapping the Data Warehouse to a Multiprocessor Architecture  Parallel DBMS Vendors – IBM:  Architecture: Shared nothing models  Data partition: hash  Parallel operations: INSERT, UPDATE, DELELTE, load, recovery, index creation, backup, table reorganization
  • 81. Mapping the Data Warehouse to a Multiprocessor Architecture  Parallel DBMS Vendors – SYBASE: Architecture: Shared nothing models Data partition: hash, key range, Schema Parallel operations: Horizontal and vertical parallelism
  • 82. DBMS Schemas for Decision Support A data cube allows data to be modeled and viewed in multiple dimensions. It is defined by dimensions and facts.  In general terms, dimensions are the perspectives or entities with respect to which an organization wants to keep records.
  • 83. DBMS Schemas for Decision Support  Example: A sales data warehouse in order to keep records of the store’s sales with respect to the dimensions time, item, branch, and location.  Each dimension may have a table associated with it, called a dimension table, which further describes the dimension. A dimension table for item may contain the attributes item name, brand, and type.
  • 84. DBMS Schemas for Decision Support  Example: A multidimensional data model is typically organized around a central theme, like sales, for instance. This theme is represented by a fact table. Facts are numerical measures. Examples of facts for a sales data warehouse include dollars sold (sales amount in dollars), units sold (number of units sold), and amount budgeted.
  • 85. DBMS Schemas for Decision Support  Example
  • 86. DBMS Schemas for Decision Support  Star schema  The most popular data model for a data warehouse is a multidimensional model.  Data warehouse contains a large central table (fact table) containing the bulk of the data, with no redundancy, and  A set of smaller associated tables (dimension tables), one for each dimension.  The dimension tables displayed in a radial pattern around the central fact table.
  • 87. DBMS Schemas for Decision Support  Star schema
  • 88. DBMS Schemas for Decision Support  Star schema – Characteristics: Simple structure - > easy to understand schema Great query effectives -> small number of tables to join Relatively long time of loading data into dimension tables -> de-normalization, redundancy data caused that size of the table could be large.
  • 89. DBMS Schemas for Decision Support  Snowflake schema:  The snowflake schema is a variant of the star schema model, where some dimension tables are normalized, thereby further splitting the data into additional tables.  The resulting schema graph forms a shape similar to a snowflake.
  • 90. DBMS Schemas for Decision Support  Snowflake schema:
  • 91. DBMS Schemas for Decision Support  Snowflake schema:  The major difference between the snowflake and star schema models is that the dimension tables of the snowflake model may be kept in normalized form to reduce redundancies.  Such a table is easy to maintain and saves storage space.
  • 92. DBMS Schemas for Decision Support Fact constellation  Sophisticated applications may require multiple fact tables to share dimension tables.  This kind of schema can be viewed as a collection of stars, and hence is called a galaxy schema or a fact constellation.
  • 93. DBMS Schemas for Decision Support Fact constellation
  • 94. Business Analysis: Reporting and Query Tools and Applications  The principal purpose of data warehousing is to provide information to business users for strategic decision making.  These users interact with the data warehouse using front-end tools, or by getting the required information through the information delivery system.
  • 95. Business Analysis: Reporting and Query Tools and Applications  Tool Categories  Reporting - Companies generate regular operational reports. Ex: Calculating and printing paychecks.  Managed query - make it possible for knowledge workers to access corporate data without IS intervention  Executive information systems - It allows developers to build customized, graphical decision support applications  OLAP - A natural way to view corporate data  Data Mining - It uses a variety of statistical and artificial-intelligence algorithms to analyze the correlation of variables
  • 96. Business Analysis: Reporting and Query Tools and Applications  Cognos Impromptu:  Impromptu is an interactive database reporting tool.  It allows Power Users to query data without programming knowledge.  When using the Impromptu tool, no data is written or changed in the database. It is only capable of reading the data
  • 97. Business Analysis: Reporting and Query Tools and Applications  Cognos Impromptu – Features:  Interactive reporting capability Enterprise-wide scalability Superior user interface Fastest time to result Lowest cost of ownership
  • 98. Business Analysis: Reporting and Query Tools and Applications  Catalogs:  Impromptu stores metadata in subject related folders  This metadata will be used to develop a query for a report. The metadata set is stored in a file called a catalog.  The catalog does not contain any data. It just contains information about connecting to the database and the fields that will be accessible for reports.
  • 99. Business Analysis: Reporting and Query Tools and Applications  Catalogs contains:  Folders—meaningful groups of information representing columns from one or more tables  Columns—individual data elements that can appear in one or more folders  Calculations—expressions used to compute required values from existing data  Conditions—used to filter information so that only a certain type of information is displayed
  • 100. Business Analysis: Reporting and Query Tools and Applications Prompts—pre-defined selection criteria prompts that users can include in reports they create Other components, such as metadata, a logical database name, join information, and user classes
  • 101. Business Analysis: Reporting and Query Tools and Applications  Catalogs – Uses: view, run, and print reports export reports to other applications disconnect from and connect to the database create reports change the contents of the catalog
  • 102. Business Analysis: Reporting and Query Tools and Applications  Reports:  Reports are created by choosing fields from the catalog folders. This process will build a SQL (Structured Query Language) statement behind the scene.  No SQL knowledge is required to use Impromptu.
  • 103. Business Analysis: Reporting and Query Tools and Applications  Reports:  The data in the report may be formatted, sorted and/or grouped as needed.  Titles, dates, headers and footers and other standard text formatting features (italics, bolding, and font size) are also available.  Once the desired layout is obtained, the report can be saved to a report file.
  • 104. Business Analysis: Reporting and Query Tools and Applications  Frame-Based Reporting  Frames are the building blocks of all Impromptu reports and templates.  They may contain report objects, such as data, text, pictures, and charts.  There are no limits to the number of frames that you can place within an individual report or template. You can nest frames within other frames to group report objects within a report.
  • 105. Business Analysis: Reporting and Query Tools and Applications  Frame-Based Reporting  Form frame: An empty form frame appears.  List frame: An empty list frame appears.  Text frame: The flashing I-beam appears where you can begin inserting text.  Picture frame: The Source tab (Picture Properties dialog box) appears. You can use this tab to select the image to include in the frame.
  • 106. Business Analysis: Reporting and Query Tools and Applications  Frame-Based Reporting Chart frame: The Data tab (Chart Properties dialog box) appears. You can use this tab to select the data item to include in the chart. OLE Object: The Insert Object dialog box appears where you can locate and select the file you want to insert, or you can create a new object using the software listed in the Object Type box.
  • 107. Business Analysis: Reporting and Query Tools and Applications  Impromptu features  Unified query and reporting interface: It unifies both query and reporting interface in a single user interface.  Object oriented architecture: It enables an inheritance based administration so that more than 1000 users can be accommodated as easily as single user.
  • 108. Business Analysis: Reporting and Query Tools and Applications  Impromptu features Scalability: Its scalability ranges from single user to 1000 user Security and Control: Security is based on user profiles and their classes.  Data presented in a business context: It presents information using the terminology of the business.
  • 109. Business Analysis: Reporting and Query Tools and Applications  Impromptu features Over 70 pre defined report templates: It allows users can simply supply the data to create an interactive report Frame based reporting: It offers number of objects to create a user designed report Business relevant reporting: It can be used to generate a business relevant report through filters, pre conditions and calculations
  • 110. Online Analytical Processing (OLAP)  It uses database tables (fact and dimension tables) to enable multidimensional viewing, analysis and querying of large amounts of data.  Online Analytical Processing (OLAP) applications and tools are those that are designed to ask ―complex queries of large multidimensional collections of data
  • 111. Online Analytical Processing (OLAP) Need:  It is the multidimensional nature of the business problem.  These problems are characterized by retrieving a very large number of records that can reach gigabytes and terabytes and summarizing this data into a form information that can by used by business analysts.
  • 112. Online Analytical Processing (OLAP) The Multidimensional Data Model:  OLAP is on-line, it must provide answers quickly, analysts pose iterative queries during interactive sessions  Multidimensional data model is to view it as a cube.
  • 113. Online Analytical Processing (OLAP)  The Multidimensional Data Model – Operations:  Roll-up: The roll-up operation (also called the drill-up operation by some vendors) performs aggregation on a data cube, either by climbing up a concept hierarchy for a dimension or by dimension reduction.
  • 114. Online Analytical Processing (OLAP)  The Multidimensional Data Model – Operations:
  • 115. Online Analytical Processing (OLAP)  The Multidimensional Data Model – Operations:  Drill-down: Drill-down is the reverse of roll-up. It navigates from less detailed data to more detailed data.  Drill-down can be realized by either stepping down a concept hierarchy for a dimension or introducing additional dimensions.
  • 116. Online Analytical Processing (OLAP)  The Multidimensional Data Model – Operations:
  • 117. Online Analytical Processing (OLAP)  The Multidimensional Data Model – Operations:  Slice and dice: The slice operation performs a selection on one dimension of the given cube, resulting in a subcube.
  • 118. Online Analytical Processing (OLAP)  The Multidimensional Data Model – Operations:  The dice operation defines a subcube by performing a selection on two or more dimensions.
  • 119. Online Analytical Processing (OLAP)  The Multidimensional Data Model – Operations:  Pivot (also called rotate) is a visualization operation that rotates the data axes in view to provide an alternative data presentation.
  • 120. OLAP vs OLTP OLTP OLAP Source of data Operational data; OLTPs are the original source of the data. Consolidation data; OLAP data comes from the various OLTP Databases Purpose of data To control and run fundamental business tasks To help with planning, problem solving, and decision support What the data Reveals a snapshot of ongoing business processes Multi-dimensional views of various kinds of business activities Queries Relatively standardized and simple queries Returning relatively few records Often complex queries involving aggregations
  • 121. OLAP vs OLTP OLTP OLAP Processing Speed Typically very fast Depends on the amount of data involved; Space Requirements Can be relatively small if historical data is archived Larger due to the existence of aggregation structures and history data; Database Design Highly normalized with many tables Typically de-normalized with fewer tables; use of star and/or snowflake schemas Backup and Recovery data loss is likely to entail significant monetary loss and legal liability Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method
  • 122. Online Analytical Processing (OLAP)  Categories of OLAP Tools – MOLAP:  This is the more traditional way of OLAP analysis. In MOLAP, data is stored in a multidimensional cube. The storage is not in the relational database, but in proprietary formats. That is, data stored in array-based structures.
  • 123.
  • 124. Online Analytical Processing (OLAP)  Categories of OLAP Tools – ROLAP This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement. Data stored in relational tables
  • 125.
  • 126. Online Analytical Processing (OLAP)  Categories of OLAP Tools – HOLAP (MQE: Managed Query Environment)  HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP.  For summary-type information, HOLAP leverages cube technology for faster performance.  It stores only the indexes and aggregations in the multidimensional form while the rest of the data is stored in the relational database.