This document discusses OLTP and OLAP systems. It defines OLTP as online transaction processing systems that are used for transactional applications involving data entry, storage and retrieval of small transactions. Examples provided are point-of-sale systems. It then defines OLAP as online analytical processing systems that are used for analysis of large datasets through multidimensional views of aggregated data to help with business intelligence questions. Key differences between the two types of systems are highlighted such as OLTP involving small transactions and OLAP involving aggregated data analysis.
OLTP systems are used to manage transactional data for day-to-day business operations like sales, inventory, payroll. They are designed to allow for quick insertion, retrieval, and updating of large volumes of individual records. While efficient for simple queries, OLTP systems are not suited for complex analyses needed for strategic decision making.
The document provides an overview of business intelligence (BI) including definitions, typical architectures, and key concepts. It describes how data is extracted from operational systems via ETL processes and loaded into data warehouses to support OLAP and business analytics. Different data modeling approaches are covered, including star schemas, snowflake schemas, and fact constellations. Dimensional modeling techniques are outlined to transform enterprise data models into structures optimized for analysis and reporting.
Traditional Data-warehousing / BI overviewNagaraj Yerram
Business intelligence (BI) refers to technologies that collect, analyze, and present business data to support decision-making. A traditional BI architecture extracts data from source systems, transforms it using ETL processes, and loads it into a data warehouse optimized for analysis (OLAP). Dimensional modeling techniques structure data warehouses into fact and dimension tables arranged in star or snowflake schemas to enable analysis of key business metrics over time and across different dimensions like product or location. This facilitates interactive exploration and reporting on historical, current, and predictive business insights for strategic planning and opportunities.
MSBI online training offered by Quontra Solutions with special features having Extensive Training will be in both MSBI Online Training and Placement. We help you in resume preparation and conducting Mock Interviews.
Emphasis is given on important topics that were required and mostly used in real time projects. Quontra Solutions is an Online Training Leader when it comes to high-end effective and efficient IT Training. We have always been and still are focusing on the key aspect which is providing utmost effective and competent training to both students and professionals who are eager to enrich their technical skills.
In computing, a data warehouse (DW, DWH), or an enterprise data warehouse (EDW), is a database used for reporting (1) and data analysis (2). Integrating data from one or more disparate sources creates a central repository of data, a data warehouse (DW). Data warehouses store current and historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.
This document discusses online analytical processing (OLAP) and related concepts. It defines data mining, data warehousing, OLTP, and OLAP. It explains that a data warehouse integrates data from multiple sources and stores historical data for analysis. OLAP allows users to easily extract and view data from different perspectives. The document also discusses OLAP cube operations like slicing, dicing, drilling, and pivoting. It describes different OLAP architectures like MOLAP, ROLAP, and HOLAP and data warehouse schemas and architecture.
This document is about Data Warehouse Tools such as:
OLAP (On – line Analytical Processing)
OLTP (On – Line Transaction Processing)
Business Intelligence
Driving Force
Data Mart
Meta Data
The document discusses data warehousing and how it helps businesses make informed decisions by integrating data from various sources into a centralized repository. A data warehouse allows businesses to analyze historical data to understand trends, identify high/low performing customers, determine which products or promotions are most effective, and more. It provides a single, complete view of business data that users can easily access and understand to gain insights.
OLTP systems are used to manage transactional data for day-to-day business operations like sales, inventory, payroll. They are designed to allow for quick insertion, retrieval, and updating of large volumes of individual records. While efficient for simple queries, OLTP systems are not suited for complex analyses needed for strategic decision making.
The document provides an overview of business intelligence (BI) including definitions, typical architectures, and key concepts. It describes how data is extracted from operational systems via ETL processes and loaded into data warehouses to support OLAP and business analytics. Different data modeling approaches are covered, including star schemas, snowflake schemas, and fact constellations. Dimensional modeling techniques are outlined to transform enterprise data models into structures optimized for analysis and reporting.
Traditional Data-warehousing / BI overviewNagaraj Yerram
Business intelligence (BI) refers to technologies that collect, analyze, and present business data to support decision-making. A traditional BI architecture extracts data from source systems, transforms it using ETL processes, and loads it into a data warehouse optimized for analysis (OLAP). Dimensional modeling techniques structure data warehouses into fact and dimension tables arranged in star or snowflake schemas to enable analysis of key business metrics over time and across different dimensions like product or location. This facilitates interactive exploration and reporting on historical, current, and predictive business insights for strategic planning and opportunities.
MSBI online training offered by Quontra Solutions with special features having Extensive Training will be in both MSBI Online Training and Placement. We help you in resume preparation and conducting Mock Interviews.
Emphasis is given on important topics that were required and mostly used in real time projects. Quontra Solutions is an Online Training Leader when it comes to high-end effective and efficient IT Training. We have always been and still are focusing on the key aspect which is providing utmost effective and competent training to both students and professionals who are eager to enrich their technical skills.
In computing, a data warehouse (DW, DWH), or an enterprise data warehouse (EDW), is a database used for reporting (1) and data analysis (2). Integrating data from one or more disparate sources creates a central repository of data, a data warehouse (DW). Data warehouses store current and historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.
This document discusses online analytical processing (OLAP) and related concepts. It defines data mining, data warehousing, OLTP, and OLAP. It explains that a data warehouse integrates data from multiple sources and stores historical data for analysis. OLAP allows users to easily extract and view data from different perspectives. The document also discusses OLAP cube operations like slicing, dicing, drilling, and pivoting. It describes different OLAP architectures like MOLAP, ROLAP, and HOLAP and data warehouse schemas and architecture.
This document is about Data Warehouse Tools such as:
OLAP (On – line Analytical Processing)
OLTP (On – Line Transaction Processing)
Business Intelligence
Driving Force
Data Mart
Meta Data
The document discusses data warehousing and how it helps businesses make informed decisions by integrating data from various sources into a centralized repository. A data warehouse allows businesses to analyze historical data to understand trends, identify high/low performing customers, determine which products or promotions are most effective, and more. It provides a single, complete view of business data that users can easily access and understand to gain insights.
This document discusses OLAP (Online Analytical Processing) and how it compares to OLTP (Online Transactional Processing). It defines OLAP as software for performing multidimensional analysis on large volumes of data from a data warehouse or data mart. Key points:
- OLAP is optimized for analysis and complex queries, while OLTP is optimized for processing high volumes of transactions.
- OLAP uses multidimensional data models (cubes) to organize and analyze data across multiple dimensions like time, products, locations. This allows for fast analysis on aggregated data.
- A hypercube model can represent data with more than three dimensions by displaying the data across multiple tables and pages.
1. Transaction processing systems (TPS) are computerized systems that perform and record routine daily transactions necessary to operate a business. They serve the operational level of an organization.
2. TPS process input data from transactions and update stored data. They produce output reports and ensure accuracy and integrity of transaction data.
3. The main functions of TPS are input, processing, output, and storage. They capture transaction data, validate and store the information, process the transactions, and generate reports.
OLTP systems are used for operational tasks like processing transactions, while OLAP systems are used for analysis of historical data extracted from OLTP systems. OLAP systems allow for complex queries and reporting on aggregated and multidimensional views of the data. Both systems are complementary, with OLTP housing and processing the source transactional data and OLAP leveraging that data for planning, problem solving, and decision making.
Online transaction processing (OLTP) systems facilitate transaction-oriented data entry and retrieval applications in real-time. OLTP provides simplicity and efficiency for businesses by reducing paper trails and enabling faster financial forecasting. Effective OLTP requires support for distributed transactions across networks and platforms using client/server architectures and transaction management software. It involves gathering input, processing it, and updating existing information. Concurrency control protocols like locking and timestamps are used to manage concurrent transactions and avoid problems such as lost updates, dirty reads, and phantoms.
Online transaction processing (OLTP) systems facilitate transaction-oriented applications like data entry and retrieval. An example is an automated teller machine (ATM). OLTP systems aim to provide immediate responses to user requests through simplicity, efficiency, and reduced paperwork. New OLTP software uses client/server processing and transaction brokering to support transactions spanning networks and companies. Maintaining high performance for large numbers of concurrent updates requires sophisticated transaction management and database optimization.
This document provides an introduction and overview of a course on data warehousing and data mining. It discusses the need for data warehousing to help organizations make intelligent decisions based on analyzing large amounts of integrated historical data. A data warehouse combines data from multiple sources and stores it in a way that supports ad-hoc querying. It allows knowledge workers to analyze patterns and trends in the data to address business questions. The document contrasts this with traditional transaction systems and management information systems.
This document provides an overview of a course on data warehousing, data mining, and decision support. It discusses what data warehousing is, how it differs from operational transaction processing systems, and the processes involved like data extraction, transformation, loading and refreshing the warehouse. It also covers warehouse architecture, design considerations, and multidimensional data modeling. Examples from Walmart's data warehouse implementation are provided to illustrate real-world warehouse concepts and capabilities.
Gulabs Ppt On Data Warehousing And Mininggulab sharma
The document provides an overview of data warehousing, decision support, and OLAP. It discusses how a data warehouse can integrate data from various operational sources to provide a single point of access for analysis. It also compares the differences between operational databases designed for transactions versus data warehouses designed for analytics and decision making. Key points covered include data extraction, transformation and loading into the warehouse, as well as refresh strategies to propagate changes from source systems.
This document provides an overview of Hyperion and Essbase. It discusses how raw data is transformed into information through data warehousing processes like extracting, transforming, and loading data. It then explains what an OLTP system is and how Essbase provides multi-dimensional analysis capabilities. Key features of Essbase like dimensions, facts, aggregation, and its architecture are summarized. Finally, the document outlines the typical lifecycle of building and maintaining an Essbase database application.
This document provides an overview of data warehousing. It defines a data warehouse as a subject-oriented, integrated, time-variant and non-volatile collection of data that helps analysts make informed decisions. Key points include:
- A data warehouse contains consolidated historical data from multiple sources to help executives analyze business trends and make strategic decisions.
- Features include being subject-oriented around topics like products or customers, integrating data from different systems, only including time-specific data, and not overwriting previous data when new info is added.
- Online analytical processing (OLAP) allows users to easily extract and analyze data from different perspectives across multiple database systems.
- Data warehouses differ from operational transaction systems which
The document discusses data warehousing concepts including:
1) A data warehouse is a subject-oriented, integrated, and non-volatile collection of data used for decision making. It stores historical and current data from multiple sources.
2) The architecture of a data warehouse is typically three-tiered, with an operational data tier, data warehouse/data mart tier for storage, and client access tier. OLAP servers allow analysis of stored data.
3) ROLAP and MOLAP refer to relational and multidimensional approaches for OLAP. ROLAP dynamically generates data cubes from relational databases, while MOLAP pre-calculates and stores aggregated data in multidimensional structures.
This document discusses decision support, data warehousing, and online analytical processing (OLAP). It defines these terms and compares online transaction processing (OLTP) to OLAP. It describes the evolution of decision support from batch reports to integrated data warehouses. The benefits of separating data warehouses from operational databases are outlined. Common architectures and the design/operational process are summarized.
ERP systems integrate key functions like accounting, manufacturing, sales, and inventory management into a single system. ERP helps companies better manage resources like employees, materials, and machines to produce quality products for customers. It provides software to aid processes like sales orders, purchasing, production scheduling, and accounting. ERP gives companies tools to improve delivery performance, reduce inventory costs, and increase profits when implemented properly.
The document discusses advances in database querying and summarizes key topics including data warehousing, online analytical processing (OLAP), and data mining. It describes how data warehouses integrate data from various sources to enable decision making, and how OLAP tools allow users to analyze aggregated data and model "what-if" scenarios. The document also covers data transformation techniques used to build the data warehouse.
The document provides an overview of SAP BODS (SAP Business Object Data Services), an ETL tool for data integration and management. It discusses key aspects of SAP BODS including its architecture, components, objects, tools and functions. The architecture has source, integration and presentation layers, with a staging area for data extraction, transformation and loading. Key components are the Data Services Designer, Data Services Management Console and Repository Manager. Objects include reusable and single-purpose objects stored in a repository. Tools support ETL processes like extraction, transformation and loading of data.
Business intelligence (BI) refers to processes, technologies, and tools used to transform data into insights. BI allows organizations to gather data from various sources, analyze it, and use the insights to make better strategic, tactical, and operational decisions. Key components of BI include data warehousing, online transaction processing, and online analytical processing. The goal of BI is to provide the right information to the right decision-makers at the right time through dashboards, reports, forecasts, and other visualizations.
This document discusses transaction processing systems (TPS). It defines a TPS as an information system that captures and processes data from daily business transactions like deposits, payments, orders or reservations. A TPS has several functions including processing transactions, outputting information, and accepting user inputs. It discusses the differences between batch processing, which collects and stores data to update databases later, and real-time processing, which immediately processes transactions. Key features of TPS include rapid response, reliability, inflexibility, and controlled processing. TPS must pass the ACID test of atomicity, consistency, isolation and durability to qualify. The document outlines the five stages of transaction processing: data entry, processing, database maintenance, document/report
What is Business Intelligence?
Why Business Intelligence?
Data analysis problems
Data Warehouse (DW) introduction
DW topics
Multidimensional modeling
ETL
Performance optimization
Essential Skills for Family Assessment - Marital and Family Therapy and Couns...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
This document discusses OLAP (Online Analytical Processing) and how it compares to OLTP (Online Transactional Processing). It defines OLAP as software for performing multidimensional analysis on large volumes of data from a data warehouse or data mart. Key points:
- OLAP is optimized for analysis and complex queries, while OLTP is optimized for processing high volumes of transactions.
- OLAP uses multidimensional data models (cubes) to organize and analyze data across multiple dimensions like time, products, locations. This allows for fast analysis on aggregated data.
- A hypercube model can represent data with more than three dimensions by displaying the data across multiple tables and pages.
1. Transaction processing systems (TPS) are computerized systems that perform and record routine daily transactions necessary to operate a business. They serve the operational level of an organization.
2. TPS process input data from transactions and update stored data. They produce output reports and ensure accuracy and integrity of transaction data.
3. The main functions of TPS are input, processing, output, and storage. They capture transaction data, validate and store the information, process the transactions, and generate reports.
OLTP systems are used for operational tasks like processing transactions, while OLAP systems are used for analysis of historical data extracted from OLTP systems. OLAP systems allow for complex queries and reporting on aggregated and multidimensional views of the data. Both systems are complementary, with OLTP housing and processing the source transactional data and OLAP leveraging that data for planning, problem solving, and decision making.
Online transaction processing (OLTP) systems facilitate transaction-oriented data entry and retrieval applications in real-time. OLTP provides simplicity and efficiency for businesses by reducing paper trails and enabling faster financial forecasting. Effective OLTP requires support for distributed transactions across networks and platforms using client/server architectures and transaction management software. It involves gathering input, processing it, and updating existing information. Concurrency control protocols like locking and timestamps are used to manage concurrent transactions and avoid problems such as lost updates, dirty reads, and phantoms.
Online transaction processing (OLTP) systems facilitate transaction-oriented applications like data entry and retrieval. An example is an automated teller machine (ATM). OLTP systems aim to provide immediate responses to user requests through simplicity, efficiency, and reduced paperwork. New OLTP software uses client/server processing and transaction brokering to support transactions spanning networks and companies. Maintaining high performance for large numbers of concurrent updates requires sophisticated transaction management and database optimization.
This document provides an introduction and overview of a course on data warehousing and data mining. It discusses the need for data warehousing to help organizations make intelligent decisions based on analyzing large amounts of integrated historical data. A data warehouse combines data from multiple sources and stores it in a way that supports ad-hoc querying. It allows knowledge workers to analyze patterns and trends in the data to address business questions. The document contrasts this with traditional transaction systems and management information systems.
This document provides an overview of a course on data warehousing, data mining, and decision support. It discusses what data warehousing is, how it differs from operational transaction processing systems, and the processes involved like data extraction, transformation, loading and refreshing the warehouse. It also covers warehouse architecture, design considerations, and multidimensional data modeling. Examples from Walmart's data warehouse implementation are provided to illustrate real-world warehouse concepts and capabilities.
Gulabs Ppt On Data Warehousing And Mininggulab sharma
The document provides an overview of data warehousing, decision support, and OLAP. It discusses how a data warehouse can integrate data from various operational sources to provide a single point of access for analysis. It also compares the differences between operational databases designed for transactions versus data warehouses designed for analytics and decision making. Key points covered include data extraction, transformation and loading into the warehouse, as well as refresh strategies to propagate changes from source systems.
This document provides an overview of Hyperion and Essbase. It discusses how raw data is transformed into information through data warehousing processes like extracting, transforming, and loading data. It then explains what an OLTP system is and how Essbase provides multi-dimensional analysis capabilities. Key features of Essbase like dimensions, facts, aggregation, and its architecture are summarized. Finally, the document outlines the typical lifecycle of building and maintaining an Essbase database application.
This document provides an overview of data warehousing. It defines a data warehouse as a subject-oriented, integrated, time-variant and non-volatile collection of data that helps analysts make informed decisions. Key points include:
- A data warehouse contains consolidated historical data from multiple sources to help executives analyze business trends and make strategic decisions.
- Features include being subject-oriented around topics like products or customers, integrating data from different systems, only including time-specific data, and not overwriting previous data when new info is added.
- Online analytical processing (OLAP) allows users to easily extract and analyze data from different perspectives across multiple database systems.
- Data warehouses differ from operational transaction systems which
The document discusses data warehousing concepts including:
1) A data warehouse is a subject-oriented, integrated, and non-volatile collection of data used for decision making. It stores historical and current data from multiple sources.
2) The architecture of a data warehouse is typically three-tiered, with an operational data tier, data warehouse/data mart tier for storage, and client access tier. OLAP servers allow analysis of stored data.
3) ROLAP and MOLAP refer to relational and multidimensional approaches for OLAP. ROLAP dynamically generates data cubes from relational databases, while MOLAP pre-calculates and stores aggregated data in multidimensional structures.
This document discusses decision support, data warehousing, and online analytical processing (OLAP). It defines these terms and compares online transaction processing (OLTP) to OLAP. It describes the evolution of decision support from batch reports to integrated data warehouses. The benefits of separating data warehouses from operational databases are outlined. Common architectures and the design/operational process are summarized.
ERP systems integrate key functions like accounting, manufacturing, sales, and inventory management into a single system. ERP helps companies better manage resources like employees, materials, and machines to produce quality products for customers. It provides software to aid processes like sales orders, purchasing, production scheduling, and accounting. ERP gives companies tools to improve delivery performance, reduce inventory costs, and increase profits when implemented properly.
The document discusses advances in database querying and summarizes key topics including data warehousing, online analytical processing (OLAP), and data mining. It describes how data warehouses integrate data from various sources to enable decision making, and how OLAP tools allow users to analyze aggregated data and model "what-if" scenarios. The document also covers data transformation techniques used to build the data warehouse.
The document provides an overview of SAP BODS (SAP Business Object Data Services), an ETL tool for data integration and management. It discusses key aspects of SAP BODS including its architecture, components, objects, tools and functions. The architecture has source, integration and presentation layers, with a staging area for data extraction, transformation and loading. Key components are the Data Services Designer, Data Services Management Console and Repository Manager. Objects include reusable and single-purpose objects stored in a repository. Tools support ETL processes like extraction, transformation and loading of data.
Business intelligence (BI) refers to processes, technologies, and tools used to transform data into insights. BI allows organizations to gather data from various sources, analyze it, and use the insights to make better strategic, tactical, and operational decisions. Key components of BI include data warehousing, online transaction processing, and online analytical processing. The goal of BI is to provide the right information to the right decision-makers at the right time through dashboards, reports, forecasts, and other visualizations.
This document discusses transaction processing systems (TPS). It defines a TPS as an information system that captures and processes data from daily business transactions like deposits, payments, orders or reservations. A TPS has several functions including processing transactions, outputting information, and accepting user inputs. It discusses the differences between batch processing, which collects and stores data to update databases later, and real-time processing, which immediately processes transactions. Key features of TPS include rapid response, reliability, inflexibility, and controlled processing. TPS must pass the ACID test of atomicity, consistency, isolation and durability to qualify. The document outlines the five stages of transaction processing: data entry, processing, database maintenance, document/report
What is Business Intelligence?
Why Business Intelligence?
Data analysis problems
Data Warehouse (DW) introduction
DW topics
Multidimensional modeling
ETL
Performance optimization
Essential Skills for Family Assessment - Marital and Family Therapy and Couns...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Marlon Dumas
This webinar discusses the limitations of traditional approaches for business process simulation based on had-crafted model with restrictive assumptions. It shows how process mining techniques can be assembled together to discover high-fidelity digital twins of end-to-end processes from event data.
06-20-2024-AI Camp Meetup-Unstructured Data and Vector DatabasesTimothy Spann
Tech Talk: Unstructured Data and Vector Databases
Speaker: Tim Spann (Zilliz)
Abstract: In this session, I will discuss the unstructured data and the world of vector databases, we will see how they different from traditional databases. In which cases you need one and in which you probably don’t. I will also go over Similarity Search, where do you get vectors from and an example of a Vector Database Architecture. Wrapping up with an overview of Milvus.
Introduction
Unstructured data, vector databases, traditional databases, similarity search
Vectors
Where, What, How, Why Vectors? We’ll cover a Vector Database Architecture
Introducing Milvus
What drives Milvus' Emergence as the most widely adopted vector database
Hi Unstructured Data Friends!
I hope this video had all the unstructured data processing, AI and Vector Database demo you needed for now. If not, there’s a ton more linked below.
My source code is available here
https://github.com/tspannhw/
Let me know in the comments if you liked what you saw, how I can improve and what should I show next? Thanks, hope to see you soon at a Meetup in Princeton, Philadelphia, New York City or here in the Youtube Matrix.
Get Milvused!
https://milvus.io/
Read my Newsletter every week!
https://github.com/tspannhw/FLiPStackWeekly/blob/main/141-10June2024.md
For more cool Unstructured Data, AI and Vector Database videos check out the Milvus vector database videos here
https://www.youtube.com/@MilvusVectorDatabase/videos
Unstructured Data Meetups -
https://www.meetup.com/unstructured-data-meetup-new-york/
https://lu.ma/calendar/manage/cal-VNT79trvj0jS8S7
https://www.meetup.com/pro/unstructureddata/
https://zilliz.com/community/unstructured-data-meetup
https://zilliz.com/event
Twitter/X: https://x.com/milvusio https://x.com/paasdev
LinkedIn: https://www.linkedin.com/company/zilliz/ https://www.linkedin.com/in/timothyspann/
GitHub: https://github.com/milvus-io/milvus https://github.com/tspannhw
Invitation to join Discord: https://discord.com/invite/FjCMmaJng6
Blogs: https://milvusio.medium.com/ https://www.opensourcevectordb.cloud/ https://medium.com/@tspann
https://www.meetup.com/unstructured-data-meetup-new-york/events/301383476/?slug=unstructured-data-meetup-new-york&eventId=301383476
https://www.aicamp.ai/event/eventdetails/W2024062014
Do People Really Know Their Fertility Intentions? Correspondence between Sel...Xiao Xu
Fertility intention data from surveys often serve as a crucial component in modeling fertility behaviors. Yet, the persistent gap between stated intentions and actual fertility decisions, coupled with the prevalence of uncertain responses, has cast doubt on the overall utility of intentions and sparked controversies about their nature. In this study, we use survey data from a representative sample of Dutch women. With the help of open-ended questions (OEQs) on fertility and Natural Language Processing (NLP) methods, we are able to conduct an in-depth analysis of fertility narratives. Specifically, we annotate the (expert) perceived fertility intentions of respondents and compare them to their self-reported intentions from the survey. Through this analysis, we aim to reveal the disparities between self-reported intentions and the narratives. Furthermore, by applying neural topic modeling methods, we could uncover which topics and characteristics are more prevalent among respondents who exhibit a significant discrepancy between their stated intentions and their probable future behavior, as reflected in their narratives.
1. “Fundamentals of Business Analytics”
RN Prasad and Seema Acharya
Copyright 2011 Wiley India Pvt. Ltd. All rights reserved.
Chapter 3
Introduction to
OLTP and OLAP
2. Content of this presentation has been
taken from Book
“Fundamentals of Business
Analytics”
RN Prasad and Seema Acharya
Published by Wiley India Pvt. Ltd.
Published by Wiley India Pvt. Ltd.
and it will always be the copyright of the
authors of the book and publisher only.
3. OLTP Understanding
Online Transaction Processing
Consider a point-of-sale (POS) system in a supermarket store. You have picked
a bar of chocolate and await your chance in the queue for getting it billed. The
cashier scans the chocolate bar's bar code. Consequent to the scanning of the bar
code, some activities take place in the background —
the database is accessed;
the price and product information is retrieved and displayed on the computer screen;
the cashier feeds in the quantity purchased;
the cashier feeds in the quantity purchased;
the application then computes the total, generates the bill, and prints it. You pay the cash and
leave.
The application has just added a record of your purchase in its database. This
was an On-Line Transaction Processing (OLTP) system designed to support on-
line transactions and query processing.
In other words, the POS of the supermarket store was an OLTP system.
4. OLTP Understanding
OLTP systems refer to a class of systems that manage transaction-oriented
applications.
These applications are mainly concerned with the entry, storage, and retrieval
of data.
They are designed to cover most of the day-to-day operations of an
organization such as purchasing, inventory, manufacturing, payroll, accounting,
etc.
OLTP systems are characterized by a large number of short on-line
transactions such as INSERT (a record of final purchase by a customer was
transactions such as INSERT (a record of final purchase by a customer was
added to the database), UPDATE (the price of a product has been raised from
Rs10 to Rs10.5), and DELETE (a product has gone out of demand and therefore
the store removes it from the shelf as well as from its database).
Almost all industries today (including airlines, mail-order, supermarkets,
banking, etc.) use OLTP systems to record transactional data. The data captured
by OLTP systems is usually stored in commercial relational databases. For
example, the database of a supermarket store consists of the following tables to
store the data about its transactions, products, employees, inventory supplies,
Like Transactions, ProductMaster, EmployeeDetails,
InventorySupplies,Suppliers, etc.
5. Online Transactional Processing
(OLTP)
Traditional database application is focused on
Online Transactional Processing (OLTP),
– Short, simple queries and frequent updates
involving a relatively small number of tuples e.g.,
recording sales at cash-registers, selling airline
recording sales at cash-registers, selling airline
tickets.
6. OLTP
(ONLINE TRANSACTION PROCESSING SYSTEM)
• Used for transaction oriented applications
• Used by lower level employee
• Quick updates and retrievals
• Quick updates and retrievals
• Many users accessing the same data
• Users are not technical persons
• Response rate is very fast
• Single transaction (one application) at a time
7. OLTP
(ONLINE TRANSACTION PROCESSING SYSTEM)
• Stores routine data
• Follows client server model
• Applications
– Banks
– Retail stores
– Airline reservation
9. TRANSACTIONS
TRANSACTIONS
• Single event that changes something
• Different types of transactions
– Customer orders
– Receipts
– Invoices
– Invoices
– Payments
• Processing of transactions include storage and
editing of data
– When transaction is completed then the records of an
organization are changed
12. OLTP Segmentation
• They can be segmented into:
– Real-time Transaction Processing
– Batch Processing
13. Real-time Transaction processing
• Multiple users can fetch the information
• Very fast response rate
• Transactions processed immediately
• Everything is processed in real time
• Everything is processed in real time
14. Batch Processing
• Where information is required in batch
• Offline access to information
• Presorting (sequence) is applied
• Takes time to process information
• Takes time to process information
Day 1 Day 2 Day 3
Day
30
. . . . . . . . . .
Monthly
purchase of
Retail Store
15. Characteristics of OLTP Model
• Online connectivity
• LAN,WAN
• Availability
• Availability
– Available 24 hours a day
• Response rate
– Rapid response rate
– Load balancing by prioritizing the transactions
16. Characteristics of OLTP Model
• Cost
– Cost of transactions is less
• Update facility
– Less lock periods
– Less lock periods
– Instant updates
– Use the full potential of hardware and software
17. Limitations of Relational Models
• Create and maintain large number of tables
for the voluminous data
• For new functionalities, new tables are added
• Unstructured data cannot be stored in
• Unstructured data cannot be stored in
relational databases
• Very difficult to manage the data with
common denominator (keys)
18. Answer a Quick Question
According to your understanding,
what are some of the queries that OLTP systems can process?
what are some of the queries that OLTP systems can process?
19. • Search for a particular customer’s record.
• Retrieve the product description and unit price of a particular product.
• Filter all products with a unit price equal to or above Rs. 25.
• Filter all products supplied by a particular supplier.
• Search and display the record of a particular supplier.
Queries that an OLTP System can Process
20. Advantages and Challenges of an OLTP System
Advantages of an OLTP System
• Simplicity – It is designed typically for use by clerks, cashiers, clients, etc.
• Efficiency – It allows its users to read, write and delete data quickly.
• Fast query processing – It responds to user actions immediately and also
supports transaction processing on demand.
Challenges of an OLTP System
Challenges of an OLTP System
• Security – An OLTP system requires concurrency control (locking) and
recovery mechanisms (logging).
• OLTP system data content not suitable for decision making – A typical
OLTP system manages the current data within an enterprise/organization.
This current data is far too detailed to be easily used for decision making.
21. • The super market store is deciding on introducing a new product. The key
questions they are debating are: “Which product should they introduce?”
and “Should it be specific to a few customer segments?”
• The super market store is looking at offering some discount on their year-
end sale. The questions here are: “How much discount should they offer?”
The Queries that OLTP Cannot Answer
end sale. The questions here are: “How much discount should they offer?”
and “Should it be different discounts for different customer segments?”
• The supermarket is looking at rewarding its most consistent salesperson.
The question here is:“How to zero in on its most consistent salesperson
(consistent on several parameters)?All the queries stated above have more
to do with analysis than simple reporting”
• Ideally these queries are not meant to be solved by an OLTP system.
22. OLAP - Online Analytical Processing
OLAP differs from traditional databases in the way data is conceptualized
and stored.
In OLAP data is held in the dimensional form rather than the relational
form.
OLAP’s life blood is multi-dimensional data.
OLAP tools are based on the multi-dimensional data model. The multi-
dimensional data model views data in the form of a data cube.
Online Analytical Processing (OLAP) is a technology that is used to
Online Analytical Processing (OLAP) is a technology that is used to
organize large business databases and support business intelligence.
OLAP databases are divided into one or more cubes. The cubes are
designed in such a way that creating and viewing reports become easy.
OLAP databases are divided into one or more cubes, and each cube is
organized and designed by a cube administrator to fit the way that you
retrieve and analyze data so that it is easier to create and use the PivotTable
reports and PivotChart reports that you need.
23. OLAP (Online Analytical Processing)
• OLAP is a category of software that allows users to analyze
information from multiple database systems at the same time. It
is a technology that enables analysts to extract and view business
data from different points of view
• Analysts frequently need to group, aggregate and join data.
These operations in relational databases are resource intensive.
With OLAP, data can be pre-calculated and pre-aggregated,
making analysis faster.
making analysis faster.
• Provides multidimensional view of data
• Used for analysis of data
• Data can be viewed from different perspectives
• Determine why data appears the way it does
• Drill down approach is used to further dig down deep into the
data
24. OLAP - Example
Let us consider the data of a supermarket store, “AllGoods” store, for the
year “2001”.
This data as captured by the OLTP system is under the following column
headings: Section, Product-CategoryName, YearQuarter, and SalesAmount.
We have a total of 32 records/rows.
The Section column can have one value from amongst “Men”, “Women”,
“Kid”, and “Infant”.
The ProductCategory Name column can have either the value
The ProductCategory Name column can have either the value
“Accessories” or the value “Clothing”.
The YearQuarter column can have one value from amongst “Q1”, “Q2”,
“Q3”, and “Q4”.
The SalesAmount column record the sales figures for each Section,
ProductCategory Name, and Year Quarter.
26. Characteristics of OLAP
• Multidimensional analysis
• Support for complex queries
• Advanced database support
– Support large databases
– Access different data sources
– Access aggregated data and detailed data
27. Characteristics of OLAP
• Easy-to-use End-user interface
– Easy to use graphical interfaces
– Familiar interfaces with previous data analysis
tools
tools
• Client-Server Architecture
– Provides flexibility
– Can be used on different computers
– More machines can be added
28. One Dimensional
Consider the table shown in the earlier slide - It displays “AllGoods” store’s
sales data by Section, which is one-dimensional .
Figure 3.4 shows data in two dimensions (horizontal and vertical), in
OLAP it is considered to be one dimension as we are looking at the
SalesAmount from one particular perspective, i.e. by Section.
Table 3.5 presents the sales data of the “AllGoods” stores by
ProductCategoryName. This data is again in one dimension
as we are looking at the SalesAmount from one particular
perspective, ie.ProductCategoryName.
Table 3.6 presents the“AllGoods” sales data by yet another
dimension, i.e. YearQuarter. However, this data is yet another
example of one-dimensional data as we are looking at the
SalesAmount from one particular perspective, i.e. by
YearQuarter.
29. Two Dimensional
One-dimensional data was easy. What if, the requirement was to view Company’s data
by calendar quarters and product categories? Here, two-dimensional data comes into
play. The two-dimensional depiction of data allows one the liberty to think about
dimensions as a kind of coordinate system.
Table 3.7 gives you a clear idea of the two-dimensional data. In this table, two
dimensions (YearQuarters and ProductCategoryName) have been combined.
In Table 3.7, data has been plotted along two dimensions as we can now look at the
SalesAmount from two perspectives, i.e. by YearQuarter and ProductCategoryName. The
calendar quarters have been listed along the vertical axis and the product categories have been
listed across the horizontal axis. Each unique pair of values of these two dimensions
corresponds to a single point of SalesAmount data. For example, the Accessories sales for Q2
add up to $9680.00 whereas the Clothing sales for the same quarter total up to $12366.00.
Their sales figures correspond to a single point of SalesAmount data, i.e. $22046.
30. Three Dimensional
What if the company’s analyst wishes to view the data — all of it — along all the three
dimensions (Year-Quarter, ProductCategoryName, and Section) and all on the same table
at the same time? For this theanalyst needs a three-dimensional view of data as arranged
in Table 3.8. In this table, one can now look atthe data by all the three dimensions/
perspectives, i.e. Section, ProductCategoryName, YearQuarter. If theanalyst wants to
look for the section which recorded maximum Accessories sales in Q2, then by giving
aquick glance to Table 3.8, he can conclude that it is the Kid section.
31. Can we go beyond Three Dimensional?
Well, if the question is “Can you go beyond the third dimension?” the answer is
YES!
If at all there is any constraint, it is because of the limits of your software. But if
the question is “Should you go beyond the third dimension?” we will say it is
entirely on what data has been captured by your operational transactional systems
and what kind of queries you wish your OLAP system to respond to.
Now that we understand multi-dimensional data, it is time to look at the
functionalities and characteristics of an OLAP system. OLAP systems are
characterized by a low volume of transactions that involve very complex queries.
characterized by a low volume of transactions that involve very complex queries.
Some typical applications of OLAP are: budgeting, sales forecasting, sales
reporting, business process manage
Example: Assume a financial analyst reports that the sales by the company have
gone up. The next question is “Which Section is most responsible for this
increase?” The answer to this question is usually followed by a barrage of
questions such as “Which store in this Section is most responsible for the
increase?” or “Which particular product category or categories registered the
maximum incréase?” The answers to these are provided by multidimensional
analysis or OLAP;
32. Can we go beyond Three Dimensional?
Let us go back to our example of a company’s
(“AllGoods”) sales data viewed along three dimensions:
Section, ProductCategoryName, and YearQuarter.
Given below are a set of queries, related to example,
that a typical OLAP system is capable of responding to:
•What will be the future sales trend for “Accessories” in the “Kid’s” Section?
•
•Given the customers buying pattern, will it be profitable to launch product
“XYZ” in the “Kid's” Section?
• What impact will a 5% increase in the price of produces have on the
customers?
33. Advantages of an OLAP System
• Multi-dimensional data representation.
• Consistency of information.
• “What if ” analysis.
• Provides a single platform for all information and business needs –
• Provides a single platform for all information and business needs –
planning, budgeting, forecasting, reporting and analysis.
• Fast and interactive ad hoc exploration.
34. Answer a Quick Question
According to your understanding,
what are some of the queries that OLAP systems can process?
35. OLTP OLAP
Online Transaction Processing Online Analytical Processing
Focus Data in Data out
Source of data Operational/Transactional Data Data extracted from various
operational data sources,
transformed and loaded into the
data warehouse
Purpose of data Manage (control and execute) basic
business tasks
Assists in planning, budgeting,
forecasting and decision making
Data contents Current data. Far too detailed – not
suitable for decision making
Historical data. Has support for
summarization and aggregation.
OLTP vs. OLAP
suitable for decision making summarization and aggregation.
Stores and manages data at
various levels of granularity,
thereby suitable for decision
making
Inserts and updates Very frequent updates and inserts Periodic updates to refresh the
data warehouse
Queries Simple queries, often returning fewer
records
Often complex queries involving
aggregations
Processing speed Usually returns fast Queries usually take a long time
(several hours) to execute and
return
Access Field level access Typically aggregated access to
data of business interest
36. OLTP OLAP
Online Transaction Processing Online Analytical Processing
Database Design Typically normalized tables. OLTP
system adopts ER (Entity Relationship)
model
Typically de-normalized tables; uses
star or snowflake schema
Operations Read/Write Mostly read
Backup and Recovery Regular backups of operational data are
mandatory. Requires concurrency control
(locking) and recovery mechanisms
(logging)
Instead of regular backups, data
warehouse is refreshed periodically
using data from operational data
sources
OLTP vs. OLAP
Joins Many Few
Derived data and aggregates Rare Common
Data Structures Complex Multi-dimensional
Few Sample Queries Search & locate student(s)
Print student scores
Filter students above 90% marks
Which courses have productivity
impact on-the-job?
How much training is needed on
future technologies for non-
linear growth in BI?
Why consider investing in DSS
experience lab?
37. Sr.No. Data Warehouse (OLAP) Operational Database (OLTP)
1 Involves historical processing of information. Involves day-to-day processing.
2 OLAP systems are used by knowledge workers such as executives,
managers and analysts.
OLTP systems are used by clerks, DBAs, or
database professionals.
3 Useful in analyzing the business. Useful in running the business.
4 It focuses on Information out. It focuses on Data in.
5 Based on Star Schema, Snowflake, Schema and Fact Constellation
Schema.
Based on Entity Relationship Model.
7 Provides summarized and consolidated data. Provides primitive and highly detailed
7 Provides summarized and consolidated data. Provides primitive and highly detailed
data.
8 Provides summarized and multidimensional view of data. Provides detailed and flat relational view
of data.
9 Number or users is in hundreds. Number of users is in thousands.
10 Number of records accessed is in millions. Number of records accessed is in tens.
11 Database size is from 100 GB to 1 TB Database size is from 100 MB to 1 GB.
12 Highly flexible. Provides high performance.
41. Decision Support
• Information technology to help the
knowledge worker (executive, manager,
analyst) make faster & better decisions
– “What were the sales volumes by region and product category for
the last year?”
CS 336 41
– “How did the share price of comp. manufacturers correlate with
quarterly profits over the past 10 years?”
– “Which orders should we fill to maximize revenues?”
• On-line analytical processing (OLAP) is an
element of decision support systems (DSS)
42. Three-Tier Decision Support Systems
• Warehouse database server
– Almost always a relational DBMS, rarely flat files
• OLAP servers
– Relational OLAP (ROLAP): extended relational DBMS that maps
operations on multidimensional data to standard relational
operators
CS 336 42
operators
– Multidimensional OLAP (MOLAP): special-purpose server that
directly implements multidimensional data and operations
• Clients
– Query and reporting tools
– Analysis tools
– Data mining tools
43. The Complete Decision Support
System
Information Sources Data Warehouse
Server
(Tier 1)
OLAP Servers
(Tier 2)
Clients
(Tier 3)
Semistructured
Sources
Data
Warehouse
e.g., MOLAP
Analysis
serve
CS 336 43
Operational
DB’s
extract
transform
load
refresh
etc.
Data Marts
Warehouse
e.g., ROLAP
serve
Query/Reporting
Data Mining
serve
serve
44. Data Warehouse vs. Data Marts
• Enterprise warehouse: collects all information about
subjects (customers,products,sales,assets,
personnel) that span the entire organization
– Requires extensive business modeling (may take years to design
and build)
• Data Marts: Departmental subsets that focus on selected
CS 336 44
• Data Marts: Departmental subsets that focus on selected
subjects
– Marketing data mart: customer, product, sales
– Faster roll out, but complex integration in the long run
• Virtual warehouse: views over operational dbs
– Materialize sel. summary views for efficient query processing
– Easy to build but require excess capability on operat. db servers
45. Approaches to OLAP Servers
• Relational DBMS as Warehouse Servers
• Two possibilities for OLAP servers
• (1) Relational OLAP (ROLAP)
– Relational and specialized relational DBMS to
CS 336 45
Relational and specialized relational DBMS to
store and manage warehouse data
– OLAP middleware to support missing pieces
• (2) Multidimensional OLAP (MOLAP)
– Array-based storage structures
– Direct access to array data structures
46. OLAP Server: Query Engine
Requirements
• Aggregates (maintenance and querying)
– Decide what to precompute and when
• Query language to support multidimensional
operations
CS 336 46
operations
– Standard SQL falls short
• Scalable query processing
– Data intensive and data selective queries
47. OLAP for Decision Support
• OLAP = Online Analytical Processing
• Support (almost) ad-hoc querying for business analyst
• Think in terms of spreadsheets
– View sales data by geography, time, or product
• Extend spreadsheet analysis model to work with
warehouse data
CS 336 47
warehouse data
– Large data sets
– Semantically enriched to understand business terms
– Combine interactive queries with reporting functions
• Multidimensional view of data is the foundation of
OLAP
– Data model, operations, etc.
49. Multi-Dimensional Data
• Measures - numerical data being tracked
• Dimensions - business parameters that define a
transaction
• Example: Analyst may want to view sales data
(measure) by geography, by time, and by product
(dimensions)
CS 336 49
(dimensions)
• Dimensional modeling is a technique for
structuring data around the business concepts
• ER models describe “entities” and “relationships”
• Dimensional models describe “measures” and
“dimensions”
50. The Multi-Dimensional Model
“Sales by product line over the past six months”
“Sales by store between 1990 and 1995”
Store Info
Numerical Measures
Key columns joining fact table
to dimension tables
CS 336 50
Prod Code Time Code Store Code Sales Qty
Product Info
Time Info
. . .
Fact table for
measures
Dimension tables
51. Dimensional Modeling
• Dimensions are organized into hierarchies
– E.g., Time dimension: days weeks quarters
– E.g., Product dimension: product product line brand
• Dimensions have attributes
CS 336 51
• Dimensions have attributes
52. Dimension Hierarchies
Store Dimension Product Dimension
Region
Total
Manufacturer
Total
CS 336 52
District
Region
Brand
Manufacturer
Stores Products
53. ROLAP: Dimensional Modeling
Using Relational DBMS
• Special schema design: star, snowflake
• Special indexes: bitmap, multi-table join
• Special tuning: maximize query throughput
• Proven technology (relational model,
CS 336 53
• Proven technology (relational model,
DBMS), tend to outperform specialized
MDDB especially on large data sets
• Products
– IBM DB2, Oracle, Sybase IQ, RedBrick, Informix
54. MOLAP: Dimensional Modeling
Using the Multi Dimensional Model
• MDDB: a special-purpose data model
• Facts stored in multi-dimensional arrays
• Dimensions used to index array
CS 336 54
• Sometimes on top of relational DB
• Products
– Pilot, Arbor Essbase, Gentia
58. The “Classic” Star Schema
A single fact table, with
detail and summary data
Fact table primary key has
only one key column per
dimension
PERIOD KEY
Store Dimension Time Dimension
Product Dimension
STORE KEY
PRODUCT KEY
PERIOD KEY
Dollars
Units
Price
Period Desc
Year
Quarter
Month
Day
Current Flag
Resolution
Fact Table
PRODUCT KEY
Store Description
City
State
District ID
District Desc.
Region_ID
Region Desc.
Regional Mgr.
Level
STORE KEY
CS 336 58
dimension
Each key is generated
Each dimension is a single
table, highly denormalized
Benefits: Easy to understand, easy to define hierarchies, reduces # of physical joins,
low maintenance, very simple metadata
Drawbacks: Summary data in the fact table yields poorer performance for summary
levels, huge dimension tables a problem
Resolution
Sequence
PRODUCT KEY
Level
Product Desc.
Brand
Color
Size
Manufacturer
Level
59. The “Classic” Star Schema
The biggest drawback: dimension
tables must carry a level indicator for
every record and every query must
use it. In the example below, without
the level constraint, keys for all stores
in the NORTH region, including
aggregates for region and district will
PERIOD KEY
Store Dimension Time Dimension
Product Dimension
STORE KEY
PRODUCT KEY
PERIOD KEY
Dollars
Units
Price
Period Desc
Year
Quarter
Month
Day
Current Flag
Resolution
Sequence
Fact Table
PRODUCT KEY
Store Description
City
State
District ID
District Desc.
Region_ID
Region Desc.
Regional Mgr.
Level
Product Desc.
STORE KEY
CS 336 59
aggregates for region and district will
be pulled from the fact table, resulting
in error.
Example:
Select A.STORE_KEY, A.PERIOD_KEY, A.dollars from
Fact_Table A
where A.STORE_KEY in (select STORE_KEY
from Store_Dimension B
where region = “North” and Level = 2)
and etc...
Level is needed
whenever aggregates
are stored with detail
facts.
Product Desc.
Brand
Color
Size
Manufacturer
Level
60. The “Level” Problem
• Level is a problem because because it causes
potential for error. If the query builder, human
or program, forgets about it, perfectly
reasonable looking WRONG answers can occur.
CS 336 60
reasonable looking WRONG answers can occur.
• One alternative: the FACT CONSTELLATION
model...
61. The “Fact Constellation” Schema
PERIOD KEY
Store Dimension Time Dimension
Product Dimension
STORE KEY
PRODUCT KEY
PERIOD KEY
Dollars
Units
Price
Period Desc
Year
Quarter
Month
Day
Current Flag
Fact Table
Store Description
City
State
District ID
District Desc.
Region_ID
Region Desc.
Regional Mgr.
STORE KEY
CS 336 61
Dollars
Units
Price
District Fact Table
District_ID
PRODUCT_KEY
PERIOD_KEY
Dollars
Units
Price
Region Fact Table
Region_ID
PRODUCT_KEY
PERIOD_KEY
Product Dimension Current Flag
Sequence
PRODUCT KEY
Regional Mgr.
Product Desc.
Brand
Color
Size
Manufacturer
62. The “Fact Constellation” Schema
In the Fact Constellations,
aggregate tables are
created
separately from the detail,
therefor
it is impossible to pick up,
District Fact Table
District_ID
PRODUCT_KEY
Region Fact Table
Region_ID
PERIOD KEY
Store Dimension Time Dimension
Product Dimension
STORE KEY
PRODUCT KEY
PERIOD KEY
Dollars
Units
Price
Period Desc
Year
Quarter
Month
Day
Current Flag
Sequence
Fact Table
PRODUCT KEY
Store Description
City
State
District ID
District Desc.
Region_ID
Region Desc.
Regional Mgr.
Product Desc.
Brand
Color
Size
Manufacturer
STORE KEY
CS 336 62
it is impossible to pick up,
for
example, Store detail when
querying
the District Fact Table.
Major Advantage: No need for the “Level” indicator in the dimension tables,
since no aggregated data is stored with lower-level detail
Disadvantage: Dimension tables are still very large in some cases, which can
slow performance; front-end must be able to detect existence of aggregate
facts, which requires more extensive metadata
Dollars
Units
Price
PRODUCT_KEY
PERIOD_KEY
Dollars
Units
Price
Region_ID
PRODUCT_KEY
PERIOD_KEY
Manufacturer
63. Another Alternative to “Level”
• Fact Constellation is a good alternative to the
Star, but when dimensions have very high
cardinality, the sub-selects in the dimension
tables can be a source of delay.
CS 336 63
tables can be a source of delay.
• An alternative is to normalize the dimension
tables by attribute level, with each smaller
dimension table pointing to an appropriate
aggregated fact table, the “Snowflake Schema”
...
64. The “Snowflake” Schema
STORE KEY
Store Dimension
Store Description
City
State
District ID
District Desc.
Region_ID
District_ID
District Desc.
Region_ID
Region_ID
Region Desc.
Regional Mgr.
CS 336 64
Region_ID
Region Desc.
Regional Mgr.
STORE KEY
PRODUCT KEY
PERIOD KEY
Dollars
Units
Price
Store Fact Table
Dollars
Units
Price
District Fact Table
District_ID
PRODUCT_KEY
PERIOD_KEY Dollars
Units
Price
RegionFact Table
Region_ID
PRODUCT_KEY
PERIOD_KEY
65. The “Snowflake” Schema
• No LEVEL in dimension tables
• Dimension tables are normalized by
decomposing at the attribute level
• Each dimension table has one key for
each level of the dimensionís hierarchy
• The lowest level key joins the
STORE KEY
Store Dimension
Store Description
City
State
District ID
District Desc.
Region_ID
Region Desc.
Regional Mgr.
District_ID
District Desc.
Region_ID
Region_ID
Region Desc.
Regional Mgr.
STORE KEY
PRODUCT KEY
Store Fact Table District Fact Table
District_ID
PRODUCT_KEY
PERIOD_KEY Dollars
RegionFact Table
Region_ID
PRODUCT_KEY
PERIOD_KEY
CS 336 65
• The lowest level key joins the
dimension table to both the fact table
and the lower level attribute table
How does it work? The best way is for the query to be built by understanding which
summary levels exist, and finding the proper snowflaked attribute tables,
constraining there for keys, then selecting from the fact table.
PRODUCT KEY
PERIOD KEY
Dollars
Units
Price
Dollars
Units
Price
PERIOD_KEY Dollars
Units
Price
PERIOD_KEY
66. The “Snowflake” Schema
• Additional features: The original Store
Dimension table, completely de-
normalized, is kept intact, since certain
queries can benefit by its all-
encompassing content.
• In practice, start with a Star Schema
STORE KEY
Store Dimension
Store Description
City
State
District ID
District Desc.
Region_ID
Region Desc.
Regional Mgr.
District_ID
District Desc.
Region_ID
Region_ID
Region Desc.
Regional Mgr.
STORE KEY
PRODUCT KEY
Store Fact Table District Fact Table
District_ID
PRODUCT_KEY
PERIOD_KEY Dollars
RegionFact Table
Region_ID
PRODUCT_KEY
PERIOD_KEY
CS 336 66
• In practice, start with a Star Schema
and create the “snowflakes” with
queries. This eliminates the need to
create separate extracts for each table,
and referential integrity is inherited
from the dimension table.
Advantage: Best performance when queries involve aggregation
Disadvantage: Complicated maintenance and metadata, explosion in the number
of tables in the database
PRODUCT KEY
PERIOD KEY
Dollars
Units
Price
Dollars
Units
Price
PERIOD_KEY Dollars
Units
Price
PERIOD_KEY
67. Advantages of ROLAP
Dimensional Modeling
• Define complex, multi-dimensional data with
simple model
• Reduces the number of joins a query has to
process
CS 336 67
process
• Allows the data warehouse to evolve with rel.
low maintenance
• HOWEVER! Star schema and relational DBMS
are not the magic solution
– Query optimization is still problematic
68. Aggregates
sale prodId storeId date amt
Add up amounts for day 1
In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1
CS 336 68
sale prodId storeId date amt
p1 s1 1 12
p2 s1 1 11
p1 s3 1 50
p2 s2 1 8
p1 s1 2 44
p1 s2 2 4
81
69. Aggregates
Add up amounts by day
In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date
CS 336 69
ans date sum
1 81
2 48
sale prodId storeId date amt
p1 s1 1 12
p2 s1 1 11
p1 s3 1 50
p2 s2 1 8
p1 s1 2 44
p1 s2 2 4
70. Another Example
Add up amounts by day, product
In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date, prodId
sale prodId date amt
sale prodId storeId date amt
CS 336 70
sale prodId date amt
p1 1 62
p2 1 19
p1 2 48
drill-down
rollup
p1 s1 1 12
p2 s1 1 11
p1 s3 1 50
p2 s2 1 8
p1 s1 2 44
p1 s2 2 4
71. Aggregates
• Operators: sum, count, max, min,
median, ave
• “Having” clause
• Using dimension hierarchy
CS 336 71
• Using dimension hierarchy
– average by region (within store)
– maximum by month (within date)
79. Aggregation Using Hierarchies
store
region
day 2 s1 s2 s3
p1 44 4
p2 s1 s2 s3
p1 12 50
p2 11 8
day 1
CS 336 79
region A region B
p1 56 54
p2 11 8
country
(store s1 in Region A;
stores s2, s3 in Region B)
p2 11 8
80. Slicing
day 2 s1 s2 s3
p1 44 4
p2 s1 s2 s3
p1 12 50
p2 11 8
day 1
CS 336 80
s1 s2 s3
p1 12 50
p2 11 8
TIME = day 1
81. Products
d1 d2
Store s1 Electronics $5.2
Toys $1.9
Clothing $2.3
Cosmetics $1.1
Store s2 Electronics $8.9
Toys $0.75
Clothing $4.6
Cosmetics $1.5
Sales
($ millions)
Time
Slicing &
Pivoting
CS 336 81
Products
Store s1 Store s2
Store s1 Electronics $5.2 $8.9
Toys $1.9 $0.75
Clothing $2.3 $4.6
Cosmetics $1.1 $1.5
Store s2 Electronics
Toys
Clothing
($ millions)
d1
Sales
82. Summary of Operations
• Aggregation (roll-up)
– aggregate (summarize) data to the next higher dimension
element
– e.g., total sales by city, year total sales by region, year
• Navigation to detailed data (drill-down)
• Selection (slice) defines a subcube
CS 336 82
• Selection (slice) defines a subcube
– e.g., sales where city =‘Gainesville’ and date = ‘1/15/90’
• Calculation and ranking
– e.g., top 3% of cities by average income
• Visualization operations (e.g., Pivot)
• Time functions
– e.g., time average
83. Query & Analysis Tools
• Query Building
• Report Writers (comparisons, growth, graphs,…)
• Spreadsheet Systems
• Web Interfaces
CS 336 83
• Web Interfaces
• Data Mining
85. MOLAP
• Uses multidimensional approach to solve a
problem
• Directly stores the information in cubes
• Used in SSAS (SQL Server Analysis Services)
• Used in SSAS (SQL Server Analysis Services)
86. ROLAP
• Relational databases are used to store the
data
• Translates OLAP queries to appropriate SQL
statements
statements
• Data created by OLTP is directly used
87. Do it Exercise
Study the Data models for OLTP and OLAP systems
Hint: ER modeling, Star and Snowflake Schema
Hint: ER modeling, Star and Snowflake Schema
89. OLAP:example
Example: find what kinds of cars are popular?
sales(make, color, size, num_sold) (slightly summarized data)
where make can be Toyota, Nissan, Holden, Ford etc
colors are white, red, silver
size can be small, medium, large.
Attributes such as num_sold are called measure attributes, since they can be
used to measure some value, and can be aggregated.
Attributes like make, color, size are called dimension attributes, since they
define the dimensions on which measure attributes are viewed.
Data that can be modeled as dimension attributes and measure attributes are
called multi-dimensional data.
91. Cross Tabs and Data Cubes
OLAP systems allow analyst to view different summaries of the data.
The following table can be derived from
sales(make, color, size, num_sold)
WHITE RED SILVER TOTAL
TOYOTA 8 35 10 53
Cross-tab or pivot table make color num_sold
Toyota white 8
Toyota red 35
Toyota silver 10
Toyota all 53
Relational representation
TOYOTA 8 35 10 53
NISSAN 20 10 5 35
HOLDEN 14 7 28 49
FORD 20 2 5 27
TOTAL 62 54 48 164
Toyota all 53
Nissan white 20
Nissan red 10
Nissan silver 5
Nissan all 35
Holden white 14
Holden red 7
Holden silver 28
Holden all 49
Ford white 20
Ford red 2
Ford silver 5
Ford all 27
all white 62
all red 54
all silver 48
all all 164
92. Data Cubes
The generalization of a cross tab, which is 2-dimensional, to n
dimensions can be visualized as a n-dimensional cube, called
the data cube.
white
red
silver
all
color
93. MOLAP vs ROLAP
OLAP systems can use multi-dimensional array to store data cubes, called
multidimensional OLAP systems (MOLAP) .
Alternatively, they can stored data as relations in relational databases, called
relational OLAP systems (ROLAP).
94. ROLAP
The main relation, which relates dimensions to measures, is called the fact table.
e.g., sales(prod_id, date, shop_id, num_sold)
Very large, accumulation of facts such as sales
Each dimension can have additional attributes and an associated dimensional
table.
E.g., product(prod_id, price, color)
prod_id is a foreign key of sales
shops(shop_id, location, manager)
shops(shop_id, location, manager)
Dimension data are smaller, generally static
95. The Star Schema
In a ROLAP system, relations are often stored with star schemas
A star schema consists of the fact table and one or more dimension tables.
Dimension tables are usually not normalized, why?
A typical query often involves a join of the fact table and the dimension tables.
prod_id
sales
prod_id
date
shop_id
num_sold
prod_id
Price
color
shop_id
Location
manager
99. OLAP Queries
A common operation is to aggregate a measure over one or more dimensions, e.g.,
find total/average sales for a product.
find total sales in each city/state/month etc
find top 2 products by total sales
Roll-up: moving from finer granularity to coarser granularity by means of
aggregation.
E.g., given total sales for each city, find total sales for each state.
Drill-down: The inverse of roll-up
Pivoting: aggregate on selected dimensions
Slicing and dicing:
E.g., from the data cube find the cross-tab on Model and Color for medium
cars . The cross-tab can be viewed as a slice of the data cube.
100. Query Processing Issues
Expensive aggregations are common
Pre-compute all aggregates? Maybe infeasible!
Materialized views can help.
Which views to materialize?
given a query and some materialized views, can we use the views to answer
the query? How?
How frequently should we refresh the views to make them consistent with the
How frequently should we refresh the views to make them consistent with the
underlying tables?
What indexes should one use?
101. SQL:1999 Extended Aggregations*
Example 1
Select make, color, size, sum(number) from sales
group by cube(make, color, size)
Calculates 8 groupings:
(make, color, size), (make, color), (make, size), …., ().
Example 2
Select make, color, sum(number) from sales
Group by rollup(make, color, size)
Calculates 4 groupings:
(make, color, size), (make, color), (make), ().
105. Should OLAP be Performed Directly
on Operational Databases?
• OLTP systems support multiple concurrent transactions. Therefore the
OLTP systems have support for concurrency control (locking) and recovery
mechanisms (logging).
• An OLAP system on the other hand requires mostly a read only access to
data records for summarization and aggregation. If concurrency control and
data records for summarization and aggregation. If concurrency control and
recovery mechanisms are applied for such OLAP operations, it will
severely impact the throughput of an OLAP system.
106. OLAP Operations on Multi-dimensional Data
• Slice
• Dice
• Roll-up
• Drill down
• Drill through
• Drill across
• Drill across
• Pivot/Rotate
107. Do It Exercise
Hands on practice on the various OLAP operations on multi-dimensional data.
Hint: Provide the participants with a sample data sheet (Excel sheet) and
ask them to demonstrate their understanding of the various OLAP
ask them to demonstrate their understanding of the various OLAP
operations on multi-dimensional data.
108. Data Warehouse
A repository of information gathered from multiple sources, stored under a unified
schema, usually at a single site .
Data may be augmented with additional attributes, such as timestamp, and
summary information.
Data are stored for a long time, permitting access to historical data.
Interactive response times expected for complex queries; ad-hoc updates
uncommon.
uncommon.
109. Building Data Warehouse
Issues:
– Semantic integration: When getting data from
multiple sources, must eliminate mismatches, e.g.,
different currencies.
– Heterogeneous sources: must access data from a
– Heterogeneous sources: must access data from a
variety of source formats.
– Load, refresh, purge: Must load data, periodically
refresh it, and purge too old or useless data
– Metadata management: Must keep track of
source, loading time, etc.
110. Elements of data warehouse EIS/DSS
Apps
Data
4
Operational Data
Data
Replication &
Cleansing
Data
Metadata
Informational
Database
Information
Directory
1
2
3
111. Elements of data warehouse
Data Replication Manager
copying & distribution of data across databases
• data that needs to be copied, source/destination, frequency, data
transforms
• refresh copy entire source, propagate changes only
all external data is transformed & cleansed before adding to warehouse
Informational Database
database that stores data copied from multiple sources by data replication
database that stores data copied from multiple sources by data replication
manager
Information Directory
metadata manager - collects metadata from databases on network
EIS/DSS tools
SQL based query tools
some vendors use extended SQL
112. Query/Reporting tools
Formulate queries without (extended) SQL or other languages
Result displayed as table, graph, report,
Spreadsheet systems
Web interfaces
Vendor-specific tools
Oracle Discoverer:
• http://www.oracle.com/tools/disc/index.html
• http://www.oracle.com/tools/disc/index.html
113. Column stores
A recently proposed data storage method that
allows more efficient aggregation queries in
data warehouses
stores data as columns rather than as rows.
See http://en.wikipedia.org/wiki/Column-
oriented_DBMS.
115. Answer a Quick Question
Will using BI/Analytics in conjunction with ERP systems prove
advantageous to the enterprise? Why?
116. Leveraging ERP Data Using Analytics
ERP provides several business benefits, here we enumerate the top three:
1. Consistency and reliability of data across the various units of the
organization.
2. Streamlining the transactional process.
3. A few basic reports to serve the operational (day-to-day) needs.
In short ERP systems are adept at capturing, storing and moving the data
across the various units smoothly.
It is however inept at serving the analytical and reporting needs of the
organization.