Database Systems:
Design, Implementation, and
Management
Tenth Edition
Chapter 12
Distributed Database Management
Systems
The Evolution of Distributed Database
Management Systems
• Distributed database management system
(DDBMS)
– Governs storage and processing of logically related
data over interconnected computer systems
– Both data and processing functions are distributed
among several sites
• 1970s - Centralized database required that
corporate data be stored in a single central site
– Usually a mainframe computer
– Data access via dumb terminals
Database Systems, 10th Edition 2
Database Systems, 10th Edition 3
• Wasn’t responsive to need for faster response times
and quick access to information
• Slow process to approve and develop new application
The Evolution of Distributed Database
Management Systems
Database Systems, 10th Edition 4
• Social and technological changes led to change
• Businesses went global; competition was now in
cyberspace not next door
• Customer demands and market needs required Web-
based services
• rapid development of low-cost, smart mobile devices
increased the demand for complex and fast networks to
interconnect them – cloud based services
• Multiple types of data (voice, image, video, music)
which are geographically distributed must be managed
The Evolution of Distributed Database
Management Systems
Database Systems, 10th Edition 5
• As a result, businesses had to react quickly to
remain competitive. This required:
• Rapid ad hoc data access became crucial in
the quick-response decision making
environment
• Distributed data access to support
geographically dispersed business units
The Evolution of Distributed Database
Management Systems
Database Systems, 10th Edition 6
• The following factors strongly influenced the shape of the
response
• Acceptance of the Internet as the platform for data access
and distribution
• The mobile wireless revolution
• Created high demand for data access
• Use of “applications as a service”
• Company data stored on central servers but applications are
deployed “in the cloud”
• Increased focus on mobile BI
• Use of social networks increases need for on-the-spot
decision making
The Evolution of Distributed Database
Management Systems
Database Systems, 10th Edition 7
• The distributed database is especially desirable because
centralized database management is subject to problems such
as:
• Performance degradation as remote locations and distances
increase
• High cost to maintain and operate
• Reliability issues with a single site and need for data
replication
• Scalability problems due to a single location (space, power
consumption, etc)
• Organizational rigidity imposed by the database – might not
be able to support flexibility and agility required by modern
global organizations
The Evolution of Distributed Database
Management Systems
8
Distributed Processing and Distributed
Data ...
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Database SystemsDesign, Implementation, and Management
1. Database Systems:
Design, Implementation, and
Management
Tenth Edition
Chapter 12
Distributed Database Management
Systems
The Evolution of Distributed Database
Management Systems
• Distributed database management system
(DDBMS)
– Governs storage and processing of logically related
data over interconnected computer systems
– Both data and processing functions are distributed
among several sites
• 1970s - Centralized database required that
corporate data be stored in a single central site
– Usually a mainframe computer
– Data access via dumb terminals
Database Systems, 10th Edition 2
2. Database Systems, 10th Edition 3
• Wasn’t responsive to need for faster response times
and quick access to information
• Slow process to approve and develop new application
The Evolution of Distributed Database
Management Systems
Database Systems, 10th Edition 4
• Social and technological changes led to change
• Businesses went global; competition was now in
cyberspace not next door
• Customer demands and market needs required Web-
based services
• rapid development of low-cost, smart mobile devices
increased the demand for complex and fast networks to
interconnect them – cloud based services
• Multiple types of data (voice, image, video, music)
which are geographically distributed must be managed
The Evolution of Distributed Database
Management Systems
3. Database Systems, 10th Edition 5
• As a result, businesses had to react quickly to
remain competitive. This required:
• Rapid ad hoc data access became crucial in
the quick-response decision making
environment
• Distributed data access to support
geographically dispersed business units
The Evolution of Distributed Database
Management Systems
Database Systems, 10th Edition 6
• The following factors strongly influenced the shape of the
response
• Acceptance of the Internet as the platform for data access
and distribution
• The mobile wireless revolution
• Created high demand for data access
• Use of “applications as a service”
• Company data stored on central servers but applications are
deployed “in the cloud”
4. • Increased focus on mobile BI
• Use of social networks increases need for on-the-spot
decision making
The Evolution of Distributed Database
Management Systems
Database Systems, 10th Edition 7
• The distributed database is especially desirable because
centralized database management is subject to problems such
as:
• Performance degradation as remote locations and distances
increase
• High cost to maintain and operate
• Reliability issues with a single site and need for data
replication
• Scalability problems due to a single location (space, power
consumption, etc)
• Organizational rigidity imposed by the database – might not
be able to support flexibility and agility required by modern
global organizations
The Evolution of Distributed Database
Management Systems
5. 8
Distributed Processing and Distributed
Databases
• Distributed processing
– Database’s logical processing is shared among two or
more physically independent sites connected through
a network
9
Distributed Processing and Distributed
Databases
• Distributed database
– Stores logically related database over two or more physically
independent sites
– Database composed of database fragments
• Located at different sites and can be replicated among various
sites
10
Distributed Processing and Distributed
6. Databases
• Distributed processing does not require a distributed
database, but a distributed database requires
distributed processing
• Distributed processing may be based on a single
database located on a single computer
• For the management of distributed data to occur,
copies or parts of the database processing functions
must be distributed to all data storage sites
• Both distributed processing and distributed databases
require a network of interconnected components
11
Characteristics of Distributed
Management Systems
• Application interface to interact with the end user,
application programs and other DBMSs within the
distributed database
• Validation to analyze data requests for syntax correctness
• Transformation to decompose complex requests into
atomic data request components
• Query optimization to find the best access strategy
• Mapping to determine the data location of local and
remote fragments
7. • I/O interface to read or write data from or to permannet
local storage
12
Characteristics of Distributed
Management Systems (cont’d.)
• Formatting to prepare the data for presentation to the end user
or to an application
• Security to provide data privacy at both local and remote
databases
• Backup and recovery to ensure the availability and
recoverability of the database in case of failure
• DB administration features for the DBA
• Concurrency control to manage simultaneous data access and
to ensure data consistency across database fragments in the
DDBMS
• Transaction management to ensure the data move from one
consistent state to another
13
Characteristics of Distributed
Management Systems (cont’d.)
8. • Must perform all the functions of centralized
DBMS
• Must handle all necessary functions imposed
by distribution of data and processing
– Must perform these additional functions
transparently to the end user
14
15
• The single logical database consists of two database fragments
A1 and A2 located at
sites 1 and 2
• All users “see” and query the database as if it were a local
database,
• The fact that there are fragments is completely transparent to
the user
DDBMS Components
• Must include (at least) the following
components:
– Computer workstations/remote devices
– Network hardware and software that reside
in each device or w/s to interact and
exchange data
9. – Communications media that carry data from
one site to another
16
DDBMS Components (cont’d.)
– Transaction processor (a.k.a application
processor, transaction manager)
• Software component found in each computer that
receives and processes the application’s remote
and local data requests
– Data processor or data manager
• Software component residing on each computer
that stores and retrieves data located at the site
• May be a centralized DBMS
17
DDBMS Components (cont’d.)
• The communication among the TPs and DPs is
made possible through protocols which determine
how the DDBMS will
– Interface with the network to transport data and
commands between the DPs and TPs
– Synchronize all data received from DPs and route
10. retrieved data to appropriate TPs
– Ensure common DB functions in a distributed system
e.g., data security, transaction management,
concurrency control, data partitioning and
synchronization and data backup and recovery
18
19
Levels of Data
and Process Distribution
• Current systems classified by how process
distribution and data distribution are supported
20
Single-Site Processing,
Single-Site Data
• All processing is done on single CPU or host computer
(mainframe, midrange, or PC)
• All data are stored on host computer’s local disk
• Processing cannot be done on end user’s side of system
• Typical of most mainframe and midrange computer
11. DBMSs
• DBMS is located on host computer, which is accessed by
dumb terminals connected to it
– The TP and DP functions are embedded within the DBMS on
the host computer
– DBMS usually runs under a time-sharing, multitasking OS
21
22
Multiple-Site Processing,
Single-Site Data
• Multiple processes run on different computers
sharing single data repository
• MPSD scenario requires network file server
running conventional applications
– Accessed through LAN
• Many multiuser accounting applications, running
under personal computer network
23
Multiple-Site Processing,
12. Single-Site Data
• The TP on each w/s acts only as a redirector to route all
network data requests to the file server
• The end user sees the fileserver as just another hard drive
• The end user must make a direct reference to the file
server to access remote data
– All record- and file-locking are performed at the end-user
location
• All data selection, search and update take place at the w/s
– Entire files travel through the network for processing at the
w/s which increases network traffic, slows response time and
increases communication costs
24
Multiple-Site Processing,
Single-Site Data
• Suppose the file server stores a CUSTOMER table
containing 100,000 data rows, 50 of which have
balances greater than $1,000
• The SQL command
SELECT * FROM CUSTOMER WHERE CUST_BALANCE >
1000
causes all 100,000 rows to travel to end user w/s
13. • A variation of MSP/SSD is client/server architecture
– All DB processing is done at the server site
25
Database Systems, 10th Edition 26
Multiple-Site Processing,
Multiple-Site Data
• Fully distributed database management system
• Support for multiple data processors and
transaction processors at multiple sites
• Classified as either homogeneous or
heterogeneous
• Homogeneous DDBMSs
– Integrate multiple instances of the same DBMS
over a network
Database Systems, 10th Edition 27
Multiple-Site Processing,
Multiple-Site Data (cont’d.)
• Heterogeneous DDBMSs
– Integrate different types of centralized DBMSs
14. over a network but all support the same data
model
• Fully heterogeneous DDBMSs
– Support different DBMSs
– Support different data models (relational,
hierarchical, or network)
– Different computer systems, such as
mainframes and microcomputers
28
29
Distributed Database Transparency Features
• Allow end user to feel like database’s only user
• Features include:
– Distribution transparency
– Transaction transparency
– Failure transparency
– Performance transparency
– Heterogeneity transparency
30
15. • Distribution Transparency
– Allows management of physically dispersed
database as if centralized
– The user does not need to know
• That the table’s rows and columns are split
vertically or horizontally and stored among multiple
sites
• That the data are geographically dispersed among
multiple sites
• That the data are replicated among multiple sites
31
Distributed Database Transparency Features
• Transaction Transparency
– Allows a transaction to update data at more than one
network site
– Ensures that the transaction will be either entirely
completed or aborted in order to maintain database
integrity
• Failure Transparency
– Ensures that the system will continue to operate in the
16. event of a node or network failure
– Functions that were lost will be picked up by another
network node
32
Distributed Database Transparency Features
• Performance Transparency
– Allows the system to perform as if it were a centralized
DBMS
• No performance degradation due to use of a network or
platform
differences
• System will find the most cost effective path to access remote
data
• System will increase performance capacity without affecting
overall
performance when adding more TP or DP nodes
• Heterogeneity Transparency
– Allows the integration of several different local DBMSs under
a common global schema
• DDBMS translates the data requests from the global schema to
the
local DBMS schema
33
17. Distributed Database Transparency Features
Distribution Transparency
• Allows management of physically dispersed database as if
centralized
• Three levels of distribution transparency:
– Fragmentation transparency
• End user does not need to know that a DB is partitioned
– SELECT * FROM EMPLOYEE WHERE…
– Location transparency
• Must specify the database fragment names but not the
location
– SELECT * FROM E1 WHERE … UNION
– Local mapping transparency
• Must specify fragment name and location
– SELECT * FROM E1 “NODE” NY WHERE … UNION
34
35
Distribution Transparency
18. • Supported by a distributed data dictionary (DDD)
or distributed data catalog (DDC)
– Contains the description of the entire database as
seen by the DBA
– It is distributed and replicated at the network nodes
– The database description, known as the distributed
global schema, is the common database schema
used by local TPs to translate user requests into
subqueries that will be processed by different DPs
36
Transaction Transparency
• Ensures database transactions will maintain
distributed database’s integrity and consistency
• Ensures transaction completed only when all
database sites involved complete their part
• Distributed database systems require complex
mechanisms to manage transactions and
ensure consistency and integrity
37
Distributed Requests and Distributed
Transactions
19. • Remote request: single SQL statement accesses
data from single remote database
– The SQL statement can reference data only at one
remote site
38
Distributed Requests and Distributed
Transactions
• Remote transaction: composed of several requests, accesses
data at single remote site
– Updates PRODUCT and INVOICE tables at site B
– Remote transaction is sent to B and executed there
– Transaction can reference only one remote DP
– Each SQL statement can reference only one remote DP and the
entire transaction can reference and be executed at only one
remote DP
39
Distributed Requests and Distributed
Transactions
• Distributed transaction: requests data from several different
remote sites on network
– Each single request can reference only one local or remote DP
site
20. – The transaction as a whole can reference multiple DP sites
because
each request can reference a different site
40
Distributed Requests and Distributed
Transactions
• Distributed request: single SQL statement references data at
several DP sites
– A DB can be partitioned into several fragments
– Fragmentation transparency: reference one or more of those
fragments with only one request
41
Distributed Requests and Distributed
Transactions
• A single request can reference a physically
partitioned table
– CUSTOMER table is divided into two fragments C1 and
C2 located at sites B and C
42
21. Distributed Concurrency Control
• Concurrency control is important in distributed
environment
– Multisite multiple-process operations create
inconsistencies and deadlocked transactions
• Suppose a transaction updates data at three DP
sites
– The first two DP sites complete the transaction and
commit the data at each local DP
– The third DP cannot commit the transaction but the
first two sites cannot be rolled back since they were
committed. This results in an inconsistent database
43
44
Two-Phase Commit Protocol
• Distributed databases make it possible for
transaction to access data at several sites
• 2PC guarantees that if a portion of a transaction
can not be committed, all changes made at the
other sites will be undone
– Final COMMIT is issued after all sites have
22. committed their parts of transaction
– Requires that each DP’s transaction log entry be
written before database fragment updated
45
Two-Phase Commit Protocol
• DO-UNDO-REDO protocol with write-ahead protocol
– DO performs the operation and records the “before” and
“after” values in the transaction log
– UNDO reverses an operation using the log entries
written by the DO portion of the sequence
– REDO redoes an operation, using the log entries written
by the DO portion
• Requires a write-ahead protocol where the log entry is
written to permanent storage before the actual
operation takes place
• 2PC defines the operations between the coordinator
(transaction initiator) and one or more subordinates
46
Two-Phase Commit Protocol
• Phase 1: preparation
– The coordinator sends a PREPARE TO COMMIT
23. message to all subordinates
• The subordinates receive the message, write the
transaction log using the write-ahead protocol and send
an acknowledgement message (YES/PREPARED TO
COMMIT or NO/NOT PREAPRED ) to the coordinator
• The coordinator make sure all nodes are ready to
commit or it aborts the action
47
Two-Phase Commit Protocol
• Phase 2 The Final COMMIT
– The coordinator broadcasts a COMMIT to all
subordinates and waits for replies
– Each subordinate receives the COMMIT and then
updates the database using the DO protocol
– The subordinates replay with a COMMITTED or NOT
COMMITTED message to the coordinator
– If one or more subordinates do not commit, the
coordinator sends an ABORT message and the
subordinates UNDO all changes
48
24. Performance and Failure Transparency
• Performance transparency
– Allows a DDBMS to perform as if it were a centralized
database; no performance degradation
• Failure transparency
– System will continue to operate in the case of a node or
network failure
• Query optimization
– Minimize the total cost associated with the execution of
a request (CPU, communication, I/O)
49
Performance and Failure Transparency
• In a DDBMS, transactions are distributed among
multiple nodes. Determining what data are being used
becomes more complex
– Data distribution: determine which fragment to access,
create multiple data requests to the chosen DPs,
combine the responses and present the data to the
application
– Data Replication: data may be replicated at several
different sites making the access problem even more
complex as all copies must be consistent
• Replica transparency - DDBMS’s ability to hide multiple
25. copies of data from the user
50
Performance and Failure Transparency
• Network and node availability
– The response time associated with remote sites cannot be
easily predetermined because some nodes finish their part
of the query in less time than others and network path
performance varies because of bandwidth and traffic loads
– The DDBMS must consider
• Network latency
– Delay imposed by the amount of time required for a data
packet to make a round trip from point A to point B
• Network partitioning
– Delay imposed when nodes become suddenly
unavailable due to a network failure
51
Distributed Database Design
• Data fragmentation
– How to partition database into fragments
• Data replication
– Which fragments to replicate
26. • Data allocation
– Where to locate those fragments and replicas
Database Systems, 10th Edition 52
Data Fragmentation
• Breaks single object into two or more segments
or fragments
• Each fragment can be stored at any site over
computer network
• Information stored in distributed data catalog
(DDC)
– Accessed by TP to process user requests
53
Data Fragmentation Strategies
• Horizontal fragmentation
– Division of a relation into subsets (fragments) of tuples
(rows)
– Each fragment is stored at a different node and each
fragment has unique rows
• Vertical fragmentation
– Division of a relation into attribute (column) subsets
27. – Each fragment is stored at a different node and each
fragment has unique columns with the exception of the
key column which is common to all fragments
• Mixed fragmentation
– Combination of horizontal and vertical strategies
54
Data Fragmentation Strategies
• Horizontal fragmentation based on CUS_STATE
55
Data Fragmentation Strategies
• Vertical fragmentation based on use by service and
collections departments
• Both require the same key column and have the same
number of rows
56
Data Fragmentation Strategies
• Mixed fragmentation based on location as well as use by
service and collections departments
28. 57
Data Replication
• Data copies stored at multiple sites served by
computer network
• Fragment copies stored at several sites to serve
specific information requirements
– Enhance data availability and response time
– Reduce communication and total query costs
• Mutual consistency rule: all copies of data
fragments must be identical
58
Data Replication
• Styles of replication
– Push replication: after a data update, the
originating DP node sends the changes to the
replica nodes to ensure that data are immediately
updated
• Decreases data availability due to the latency
involved in ensuring data consistemcy at all nodes
– Pull replication: after a data update, the originating
DP sends “messages” to the replica nodes to notify
29. them of a change. The replica nodes decide when
to apply the updates to their local fragment
• Could have temporary data inconsistencies
59
Data Replication
• Fully replicated database
– Stores multiple copies of each database fragment at multiple
sites
– Can be impractical due to amount of overhead
• Partially replicated database
– Stores multiple copies of some database fragments at multiple
sites
• Unreplicated database
– Stores each database fragment at single site
– No duplicate database fragments
• Data replication is influenced by several factors
– Database size
– Usage frequency
– Cost: performance, overhead
60
Data Allocation
30. • Deciding where to locate data
– Allocation is closely related to the way a
database is fragmented or divided
– Centralized data allocation
• Entire database is stored at one site
– Partitioned data allocation
• Database is divided into several disjointed parts
(fragments) and stored at several sites
– Replicated data allocation
• Copies of one or more database fragments are
stored at several sites
61
The CAP Theorem
• Initials CAP stand for three desirable properties
– Consistency
– Availability
– Partition tolerance (similar to failure transparency)
• When dealing with highly distributed systems, some
companies forfeit consistency and isolation to achieve
higher availability
• This has led to a new type of DDBMS in which data are
basically available, soft state, eventually consistent
(BASE)
31. – Data changes are not immediate but propagate slowly
through the system until all replicas are eventually consistent
62
Database Systems, 10th Edition 63
C. J. Date’s Twelve Commandments for
Distributed Databases
64
Database Systems: Design, Implementation, and Management
Tenth EditionThe Evolution of Distributed Database
Management SystemsPowerPoint PresentationSlide 4Slide
5Slide 6Slide 7Slide 8Distributed Processing and Distri buted
DatabasesSlide 10Slide 11Characteristics of Distributed
Management SystemsCharacteristics of Distributed Management
Systems (cont’d.)Characteristics of Distributed Management
Systems (cont’d.)Slide 15DDBMS ComponentsDDBMS
Components (cont’d.)Slide 18Slide 19Levels of Data and
Process DistributionSingle-Site Processing, Single-Site
DataSlide 22Multiple-Site Processing, Single-Site DataSlide
24Slide 25Slide 26Multiple-Site Processing, Multiple-Site
DataMultiple-Site Processing, Multiple-Site Data (cont’d.)Slide
29Distributed Database Transparency FeaturesSlide 31Slide
32Slide 33Distribution TransparencySlide 35Slide
36Transaction TransparencyDistributed Requests and
Distributed TransactionsSlide 39Slide 40Slide 41Slide
42Distributed Concurrency ControlSlide 44Two-Phase Commit
ProtocolSlide 46Slide 47Slide 48Performance and Failure
TransparencySlide 50Slide 51Distributed Database DesignData
32. FragmentationData Fragmentation StrategiesSlide 55Slide
56Slide 57Data ReplicationSlide 59Slide 60Data AllocationThe
CAP TheoremSlide 63C. J. Date’s Twelve Commandments for
Distributed Databases
Database Systems:
Design, Implementation, and
Management
Tenth Edition
Chapter 15
Database Administration and Security
Objectives
In this chapter, students will learn:
• That data are a valuable business asset
requiring careful management
• How a database plays a critical role in an
organization
• That the introduction of a DBMS has important
technological, managerial, and cultural
consequences for an organization
Database Systems, 10th Edition 2
33. Objectives (cont’d.)
• What the database administrator’s managerial
and technical roles are
• About data security, database security, and the
information security framework
• About several database administration tools
and strategies
• How various technical tasks of database
administration are performed with Oracle
Database Systems, 10th Edition 3
Data as a Corporate Asset
• Data:
– Valuable asset that requires careful
management
– Valuable resource that translates into
information
• Accurate, timely information triggers actions
that enhance company’s position and generate
wealth
Database Systems, 10th Edition 4
34. Data as a Corporate Asset (cont’d.)
• Dirty data
– Data that suffer from inaccuracies and
inconsistencies
– Threat to organizations
Database Systems, 10th Edition 5
Database Systems, 10th Edition 6
Data as a Corporate Asset (cont’d.)
• Data quality
– Comprehensive approach to ensuring the
accuracy, validity, and timeliness of the data
• Data profiling software
– Consists of programs that gather statistics and
analyze existing data sources
• Master data management (MDM) software
– Helps prevent dirty data by coordinating
common data across multiple systems.
Database Systems, 10th Edition 7
35. The Need for and Role of Databases
in an Organization
• Database’s predominant role is to support
managerial decision making at all levels
• DBMS facilitates:
– Interpretation and presentation of data
– Distribution of data and information
– Preservation and monitoring of data
– Control over data duplication and use
• Three levels to organization management:
– Top, middle, operational
Database Systems, 10th Edition 8
Introduction of a Database:
Special Considerations
• Introduction of a DBMS is likely to have a
profound impact
– Might be positive or negative, depending on how
it is administered
• Three aspects to DBMS introduction:
– Technological
36. – Managerial
– Cultural
• One role of DBA department is to educate end
users about system uses and benefits
Database Systems, 10th Edition 9
The Evolution of the Database
Administration Function
• Data administration has its roots in the old,
decentralized world of the file system
• Advent of DBMS produced new level of data
management sophistication
– DP department evolved into information systems
(IS) department
• Data management became increasingly
complex
– Development of database administrator (DBA)
function
Database Systems, 10th Edition 10
Database Systems, 10th Edition 11
37. Database Systems, 10th Edition 12
The Database Environment’s
Human Component
• Even most carefully crafted database system
cannot operate without human component
• Effective data administration requires both
technical and managerial skills
• DA must set data administration goals
• DBA is focal point for data/user interaction
• Need for diverse mix of skills
Database Systems, 10th Edition 13
Database Systems, 10th Edition 14
The DBA’s Managerial Role
• DBA responsible for:
– Coordinating, monitoring, allocating resources
• Resources include people and data
– Defining goals and formulating strategic plans
• Interacts with end user by providing data and
information
38. • Enforces policies, standards, procedures
Database Systems, 10th Edition 15
The DBA’s Managerial Role (cont’d.)
• Manages security, privacy, integrity
• Ensures data can be fully recovered
– In large organizations, database security officer
(DSO) responsible for disaster management
• Ensures data is distributed appropriately
– Makes it easy for authorized end users to
access the database
Database Systems, 10th Edition 16
Database Systems, 10th Edition 17
The DBA’s Technical Role
• Evaluates, selects, and installs DBMS and
related utilities
• Designs and implements databases and
applications
39. • Tests and evaluates databases and
applications
Database Systems, 10th Edition 18
The DBA’s Technical Role (cont’d.)
• Operates DBMS, utilities, and applications
• Trains and supports users
• Maintains DBMS, utilities, and applications
Database Systems, 10th Edition 19
The DBA’s Role in the Cloud
• Cloud services provide:
– DBMS installation and updates
– Server/network management
– Backup and recovery operations
• DBA’s managerial role is largely unchanged
Database Systems, 10th Edition 20
Security
• Securing data entails securing overall
information system architecture
40. • Confidentiality: data protected against
unauthorized access
• Integrity: keep data consistent and free of
errors or anomalies
• Availability: accessibility of data by authorized
users for authorized purposes
Database Systems, 10th Edition 21
Security Policies
• Database security officer secures the system
and the data
– Works with the database administrator
• Security policy: collection of standards, policies,
procedures to guarantee security
– Ensures auditing and compliance
• Security audit process identifies security
vulnerabilities
– Identifies measures to protect the system
Database Systems, 10th Edition 22
Security Vulnerabilities
• Security vulnerability: weakness in a system
component
41. – Could allow unauthorized access or cause
service disruptions
• Security threat: imminent security violation
– Could occur at any time
• Security breach yields a database whose
integrity is either:
– Preserved
– Corrupted
Database Systems, 10th Edition 23
Database Systems, 10th Edition 24
Database Systems, 10th Edition 25
Database Security
• Refers to the use of DBMS features and other
measures to comply with security requirements
• DBA secures DBMS from installation through
operation and maintenance
• Authorization management
– User access management
– View definition
42. – DBMS access control
– DBMS usage monitoring
Database Systems, 10th Edition 26
Database Administration Tools
• Data dictionary
• CASE tools
Database Systems, 10th Edition 27
The Data Dictionary
• Two main types of data dictionaries:
– Integrated
– Standalone
• Active data dictionary is automatically updated
by the DBMS with every database access
• Passive data dictionary requires running a
batch process
• Main function: store description of all objects
that interact with database
Database Systems, 10th Edition 28
43. The Data Dictionary (cont’d.)
• Data dictionary that includes data external to
DBMS becomes flexible tool
– Enables use and allocation of all of an
organization’s information
• Metadata is often the basis for monitoring
database use
– Also for assigning access rights to users
• DBA uses data dictionary to support data
analysis and design
Database Systems, 10th Edition 29
CASE Tools
• Computer-aided systems engineering
– Automated framework for SDLC
– Structured methodologies and powerful
graphical interfaces
• Front-end CASE tools provide support for
planning, analysis, and design phases
• Back-end CASE tools provide support for
coding and implementation phases
Database Systems, 10th Edition 30
44. CASE Tools (cont’d.)
• Typical CASE tool has five components
– Graphics for diagrams
– Screen painters and report generators
– Integrated repository
– Analysis segment
– Program documentation generator
Database Systems, 10th Edition 31
Database Systems, 10th Edition 32
Developing a Data
Administration Strategy
• Information engineering (IE) translates strategic
goals into data and applications
• Information systems architecture (ISA) is the
output of IE process
• Implementing IE is a costly process
– Provides a framework that includes use of
computerized, automated, and integrated tools
45. • Success of information systems strategy
depends on critical success factors
– Managerial, technological, and corporate culture
Database Systems, 10th Edition 33
The DBA at Work: Using Oracle for
Database Administration
• Technical tasks handled by the DBA in a
specific DBMS:
– Creating and expanding database storage
structures
– Managing database objects
– Managing end-user database environment
– Customizing database initialization parameters
• All DBMS vendors provide programs to perform
database administrative tasks
Database Systems, 10th Edition 34
Oracle Database Administration Tools
Database Systems, 10th Edition 35
46. The Default Login
• Must connect to the database to perform
administrative tasks
– Username with administrative privileges
• Oracle automatically creates SYSTEM and SYS
user IDs with administrative privileges
• Define preferred credentials by clicking on
Preferences link, then Preferred Credentials
• Username and passwords are database-
specific
Database Systems, 10th Edition 36
Ensuring that the RDBMS Starts
Automatically
• DBA ensures database access is automatically
started when computer turned on
• A service is a Windows system name for a
special program that runs automatically
– Part of the operating system
• Database instance: separate location in
memory reserved to run the database
– May have several databases running in memory
at the same time
Database Systems, 10th Edition 37
47. Creating Tablespaces and Datafiles
• Database composed of one or more
tablespaces
• Tablespace is a logical storage space
– Physically stored in one or more datafiles
• Datafile physically stores the database’s data
– Each datafile can reside in a different directory
on the hard disk
• Database has 1:M relationship with tablespaces
• Tablespace has 1:M relationship with datafiles
Database Systems, 10th Edition 38
Database Systems, 10th Edition 39
Database Systems, 10th Edition 40
Managing the Database Objects:
Tables, Views, Triggers, and
Procedures
• Database object: any object created by end
48. users
• Schema: logical section of the database that
belongs to a given user
– Schema identified by a username
– Within the schema, users create their own tables
and other objects
• Normally, users are authorized to access only
the objects that belong to their own schemas
Database Systems, 10th Edition 41
Managing Users and
Establishing Security
• User: uniquely identifiable object
– Allows a given person to log on to the database
• Role: a named collection of database access
privileges
– Authorizes a user to connect to the database
and use system resources
• Profile: named collection of settings
– Controls how much of a resource a given user
can use
Database Systems, 10th Edition 42
49. Database Systems, 10th Edition 43
Customizing the Database
Initialization Parameters
• Fine-tuning requires modification of database
configuration parameters
– Some are changed in real time using SQL
– Some affect database instance
– Others affect entire RDBMS and all instances
• Initialization parameters reserve resources
used by the database at run time
• After modifying parameters, may need to restart
the database
Database Systems, 10th Edition 44
Database Systems, 10th Edition 45
Summary
• Data management is a critical activity for any
organization
– Data should be treated as a corporate asset
50. • DBMS is the most commonly used electronic
tool for corporate data management
• DBMS has impact on organization’s managerial,
technological, and cultural framework
• Data administration function evolved from
centralized electronic data processing
Database Systems, 10th Edition 46
Summary (cont’d.)
• Database administrator (DBA) is responsible for
managing corporate database
• Broader data management activity is handled by
data administrator (DA)
• DA is more managerially oriented than more
technically oriented DBA
– DA function is DBMS-independent
– DBA function is more DBMS-dependent
• When there is no DA, DBA executes all DA
functions
Database Systems, 10th Edition 47
Summary (cont’d.)
51. • Managerial services of DBA function:
– Supporting end-user community
– Defining and enforcing policies, procedures, and
standards for database function
– Ensuring data security, privacy, and integrity
– Providing data backup and recovery services
– Monitoring distribution and use of data in
database
Database Systems, 10th Edition 48
Summary (cont’d.)
• Technical role of DBA:
– Evaluating, selecting, and installing DBMS
– Designing and implementing databases and
applications
– Testing and evaluating databases and
applications
– Operating DBMS, utilities, and applications
– Training and supporting users
– Maintaining DBMS, utilities, and applications
Database Systems, 10th Edition 49
52. Summary (cont’d.)
• Security: ensures confidentiality, integrity,
availability of information system and data
• Security policy: collection of standards, policies,
and practices
• Security vulnerability: weakness in system
component
• Information engineering guides development of
data administration strategy
• CASE tools and data dictionaries translate
strategic plans to operational plans
Database Systems, 10th Edition 50
Database Systems: Design, Implementation, and Management
Tenth EditionObjectivesObjectives (cont’d.)Data as a Corporate
AssetData as a Corporate Asset (cont’d.)PowerPoint
PresentationData as a Corporate Asset (cont’d.)The Need for
and Role of Databases in an OrganizationIntrod uction of a
Database: Special ConsiderationsThe Evolution of the Database
Administration FunctionSlide 11Slide 12The Database
Environment’s Human ComponentSlide 14The DBA’s
Managerial RoleThe DBA’s Managerial Role (cont’d.)Slide
17The DBA’s Technical RoleThe DBA’s Technical Role
(cont’d.)The DBA’s Role in the CloudSecuritySecurity
PoliciesSecurity VulnerabilitiesSlide 24Slide 25Database
SecurityDatabase Administration ToolsThe Data DictionaryThe
Data Dictionary (cont’d.)CASE ToolsCASE Tools (cont’d.)Slide
32Developing a Data Administration StrategyThe DBA at Work:
53. Using Oracle for Database AdministrationOracle Database
Administration ToolsThe Default LoginEnsuring that the
RDBMS Starts AutomaticallyCreating Tablespaces and
DatafilesSlide 39Slide 40Managing the Database Objects:
Tables, Views, Triggers, and ProceduresManaging Users and
Establishing SecuritySlide 43Customizing the Database
Initialization ParametersSlide 45SummarySummary
(cont’d.)Slide 48Slide 49Slide 50
Database Systems:
Design, Implementation, and
Management
Tenth Edition
Chapter 14
Database Connectivity and Web
Technologies
Objectives
In this chapter, you will learn:
• About various database connectivity
technologies
• How Web-to-database middleware is used to
integrate databases with the Internet
• About Web browser plug-ins and extensions
• What services are provided by Web application
54. servers
Database Systems, 10th Edition 2
Objectives (cont’d.)
• What Extensible Markup Language (XML) is
and why it is important for Web database
development
• About cloud computing and how it enables the
database-as-a-service model
Database Systems, 10th Edition 3
Database Connectivity
• Mechanisms by which application programs
connect and communicate with data sources
– Also known as database middleware
• Data repository:
– Also known as a data source
– Represents the data management application
• Used to store data generated by an application
program
• ODBC, OLE-DB, ADO.NET: the backbone of
MS Universal Data Access (UDA) architecture
55. Database Systems, 10th Edition 4
Native SQL Connectivity
• Connection interface provided by database
vendors
– Unique to each vendor
• Example: Oracle RDBMS
– Must install and configure Oracle’s SQL*Net
interface in client computer
• Interfaces optimized for particular vendor’s
DBMS
– Maintenance is a burden for the programmer
Database Systems, 10th Edition 5
Database Systems, 10th Edition 6
ODBC, DAO, and RDO
• Open Database Connectivity (ODBC)
– Microsoft’s implementation of a superset of SQL
Access Group Call Level Interface (CLI)
– Widely supported database connectivity
interface
56. – Any Windows application can access relational
data sources
– Uses SQL via standard application programming
interface (API)
Database Systems, 10th Edition 7
ODBC, DAO, and RDO (cont’d.)
• Data Access Objects (DAO)
– Object-oriented API
• Accesses MS Access, MS FoxPro, and dBase
databases from Visual Basic programs
– Provided an optimized interface that exposed
functionality of Jet data engine to programmers
– DAO interface can also be used to access other
relational style data sources
Database Systems, 10th Edition 8
ODBC, DAO, and RDO (cont’d.)
• Remote Data Objects (RDO)
– Higher-level object-oriented application interface
used to access remote database servers
57. – Uses lower-level DAO and ODBC for direct
access to databases
– Optimized to deal with server-based databases,
such as MS SQL Server, Oracle, and DB2
• Implemented as shared code dynamically
linked to Windows via dynamic-link libraries
Database Systems, 10th Edition 9
Database Systems, 10th Edition 10
ODBC, DAO, and RDO (cont’d.)
• Basic ODBC architecture has three main
components:
– High-level ODBC API through which application
programs access ODBC functionality
– Driver manager that is in charge of managing all
database connections
– ODBC driver that communicates directly to
DBMS
Database Systems, 10th Edition 11
Database Systems, 10th Edition 12
58. OLE-DB
• Object Linking and Embedding for Database
• Database middleware that adds object-oriented
functionality for access to data
• Series of COM objects provides low-level
database connectivity for applications
• Functionality divided into two types of objects:
– Consumers
– Providers
Database Systems, 10th Edition 13
OLE-DB (cont’d.)
• OLE-DB did not provide support for scripting
languages
• ActiveX Data Objects (ADO) provides high-level
application-oriented interface to interact with
OLE-DB, DAO, and RDO
• ADO provides unified interface to access data
from any programming language that uses the
underlying OLE-DB objects
Database Systems, 10th Edition 14
59. Database Systems, 10th Edition 15
ADO.NET
• Data access component of Microsoft’s .NET
application development framework
• Two new features for development of
distributed applications:
– DataSet is disconnected memory-resident
representation of database
– DataSet is internally stored in XML format
• Data in DataSet made persistent as XML
documents
Database Systems, 10th Edition 16
Database Systems, 10th Edition 17
ADO.NET (cont’d.)
• Specific objects manipulate data in data source
– Connection
– Command
60. – DataReader
– DataAdapter
– DataSet
– DataTable
Database Systems, 10th Edition 18
Java Database Connectivity (JDBC)
• Java is an object-oriented programming
language
– Runs on top of Web browser software
• Advantages of JDBC:
– Company can leverage existing technology and
personnel training
– Allows direct access to database server or
access via database middleware
– Provides a way to connect to databases through
an ODBC driver
Database Systems, 10th Edition 19
Database Systems, 10th Edition 20
61. Database Internet Connectivity
• Web database connectivity allows new
innovative services that:
– Permit rapid response by bringing new services
and products to market quickly
– Increase customer satisfaction through creation
of Web-based support services
– Allow anywhere, anytime data access using
mobile smart devices via the Internet
– Yield fast and effective information
dissemination through universal access
Database Systems, 10th Edition 21
Database Systems, 10th Edition 22
Web-to-Database Middleware:
Server-Side Extensions
• Web server is the main hub through which
Internet services are accessed
• Dynamic Web pages are at the heart of current
generation Web sites
• Server-side extension: a program that interacts
62. directly with the Web server
– Also known as Web-to-database middleware
• Middleware must be well integrated
Database Systems, 10th Edition 23
Database Systems, 10th Edition 24
Web Server Interfaces
• Two well-defined Web server interfaces:
– Common Gateway Interface (CGI)
– Application Programming Interface (API)
• Disadvantage of CGI scripts:
– Loading external script decreases system
performance
– Language and method used to create script also
decrease performance
• API is more efficient than CGI
– API is treated as part of Web server program
Database Systems, 10th Edition 25
Database Systems, 10th Edition 26
63. The Web Browser
• Software that lets users navigate the Web
• Located in client computer
• Interprets HTML code received from Web
server
• Presents different page components in
standard way
• Web is a stateless system: Web server does
not know the status of any clients
Database Systems, 10th Edition 27
Client-Side Extensions
• Add functionality to Web browser
• Three general types:
– Plug-ins
– Java and JavaScript
– ActiveX and VBScript
Database Systems, 10th Edition 28
64. Client-Side Extensions (cont’d.)
• Plug-in: an external application automatically
invoked by the browser when needed
• Java and JavaScript: embedded in Web page
– Downloaded with the Web page and activated
by an event
• ActiveX and VBScript: embedded in Web page
– Downloaded with page and activated by event
– Oriented to Windows applications
Database Systems, 10th Edition 29
Web Application Servers
• Middleware application that expands the
functionality of Web servers
– Links them to a wide range of services
• Some uses of Web application servers:
– Connect to and query database from Web page
– Create dynamic Web search pages
– Enforce referential integrity
• Some features of Web application servers:
– Security and user authentication
– Access to multiple services
65. Database Systems, 10th Edition 30
Web Database Development
• Process of interfacing databases with the Web
browser
• Code examples
– ColdFusion
– PHP
Database Systems, 10th Edition 31
Database Systems, 10th Edition 32
Database Systems, 10th Edition 33
Extensible Markup Language (XML)
• Companies use Internet to create new systems
that integrate their data
– Increase efficiency and reduce costs
• Electronic commerce enables organizations to
market to millions of users
66. • Most e-commerce transactions take place
between businesses
• HTML Web pages display in the browser
– Tags describe how something looks on the page
Database Systems, 10th Edition 34
Extensible Markup Language (XML)
(cont’d.)
• Extensible Markup Language (XML)
– Metalanguage to represent and manipulate data
elements
– Facilitates exchange of structured documents
over the Web
– Allows definition of new tags
• Case sensitive
• Must be well-formed and properly nested
• Comments indicated with <- and ->
• XML and xml prefixes reserved for XML tags only
Database Systems, 10th Edition 35
Database Systems, 10th Edition 36
67. Document Type Definitions (DTD)
and XML Schemas
• Document Type Definition (DTD)
– File with .dtd extension that describes elements
– Provides composition of database’s logical
model
– Defines the syntax rules or valid tags for each
type of XML document
• Companies engaging in e-commerce
transaction must develop and share DTDs
• DTD referenced from inside XML document
Database Systems, 10th Edition 37
Database Systems, 10th Edition 38
Document Type Definitions (DTD)
and XML Schemas (cont’d.)
• XML schema
– Advanced data definition language
– Describes the structure of XML data documents
• Advantage of XML schema:
– More closely maps to database terminology and
68. features
• XML schema definition (XSD) file uses syntax
similar to XML document
Database Systems, 10th Edition 39
XML Presentation
• XML separates data structure from presentation
and processing
• Extensible Style Language (XSL) displays XML
data
– Defines the rules by which XML data are
formatted and displayed
– Two parts:
• Extensible Style Language Transformations
(XSLT)
• XSL style sheets
Database Systems, 10th Edition 40
Database Systems, 10th Edition 41
Database Systems, 10th Edition 42
69. Database Systems, 10th Edition 43
XML Applications
• B2B exchanges
• Legacy systems integration
• Web page development
• Database support
• Database meta-dictionaries
• XML databases
• XML services
Database Systems, 10th Edition 44
Cloud Computing Services
• Cloud computing
– “A computing model for enabling ubiquitous,
convenient, on-demand network access to a
shared pool of configurable computer resources
that can be rapidly provisioned and released
with minimal management effort or service
provider interaction.”
– Potential to become a “game changer”
Database Systems, 10th Edition 45
70. Database Systems, 10th Edition 46
Cloud Implementation Types
• Public cloud
• Private cloud
• Community cloud
Database Systems, 10th Edition 47
Characteristics of Cloud Services
• Ubiquitous access via Internet technologies.
• Shared infrastructure
• Lower costs and variable pricing
• Flexible and scalable services
• Dynamic provisioning
• Service orientation
• Managed operations
Database Systems, 10th Edition 48
Types of Cloud Services
• Software as a Service (SaaS)
• Platform as a Service (PaaS)
• Infrastructure as a Service (IaaS)
Database Systems, 10th Edition 49
71. Cloud Services: Advantages and
Disadvantages
Database Systems, 10th Edition 50
SQL Data Services
• Cloud computing data management service
• Provides relational data management to
companies of any size
• Avoids high cost of personnel/maintenance
• Leverages Internet to provide:
– Hosted data management
– Standard protocols
– A common programming interface
• Could assist businesses with limited information
technology resources
Database Systems, 10th Edition 51
Summary
• Database connectivity:
– Ways in which programs connect and
communicate with data repositories
72. • Database connectivity software known as
database middleware
• Database repository also known as data source
– Represents data management application used to
store data generated by the program
• Microsoft interfaces are dominant players
– ODBC, OLE-DB, ADO.NET
Database Systems, 10th Edition 52
Summary (cont’d.)
• Microsoft’s Universal Data Access (UDA)
architecture
– Collection of technologies to access any type of
data source using common interface
• Native database connectivity: interface
provided by database vendor
– ODBC is Microsoft's implementation of SQL
Access Group Call Level Interface
• Allows any Windows application to access
relational data sources using SQL
Database Systems, 10th Edition 53
73. Summary (cont’d.)
• OLE-DB adds object-oriented functionality for
access to data
• ActiveX Data Objects provide interface with
OLE-DB, DAO, and RDO
• ADO.NET is data access component of
Microsoft .NET framework
• Java Database Connectivity (JDBC) interfaces
with Java applications with data sources
Database Systems, 10th Edition 54
Summary (cont’d.)
• Database access through the Web uses
middleware
• On client side of Web browser, use plug-ins,
Java and JavaScript, ActiveX, and VBScript
• On server side, middleware expands
functionality of Web servers
– Links them to wide range of services
• XML provides semantics to share structured
documents across the Web
– Produces description and representation of data
Database Systems, 10th Edition 55
74. Summary (cont’d.)
• Cloud computing
– Computing model that provides ubiquitous, on-
demand access to a shared pool of configurable
resources that can be rapidly provisioned
• SQL data services (SDS)
– Cloud computing-based data management
service that provides relational data storage,
ubiquitous access, and local management to
companies of all sizes
Database Systems, 10th Edition 56
Database Systems: Design, Implementation, and Management
Tenth EditionObjectivesObjectives (cont’d.)Database
ConnectivityNative SQL ConnectivityPowerPoint
PresentationODBC, DAO, and RDOODBC, DAO, and RDO
(cont’d.)Slide 9Slide 10Slide 11Slide 12OLE-DBOLE-DB
(cont’d.)Slide 15ADO.NETSlide 17ADO.NET (cont’d.)Java
Database Connectivity (JDBC)Slide 20Database Internet
ConnectivitySlide 22Web-to-Database Middleware: Server-Side
ExtensionsSlide 24Web Server InterfacesSlide 26The Web
BrowserClient-Side ExtensionsClient-Side Extensions
(cont’d.)Web Application ServersWeb Database
DevelopmentSlide 32Slide 33Extensible Markup Language
(XML)Extensible Markup Language (XML) (cont’d.)Slide
36Document Type Definitions (DTD) and XML SchemasSlide
38Document Type Definitions (DTD) and XML Schemas
(cont’d.)XML PresentationSlide 41Slide 42Slide 43XML
ApplicationsCloud Computing ServicesSlide 46Cloud
75. Implementation TypesCharacteristics of Cloud ServicesTypes of
Cloud ServicesCloud Services: Advantages and
DisadvantagesSQL Data ServicesSummarySummary
(cont’d.)Slide 54Slide 55Slide 56
Database Systems: Design,
Implementation, and
Management
Eighth Edition
Chapter 11
Database Performance Tuning and
Query Optimization
Database Systems, 8th Edition 2
Objectives
• In this chapter, you will learn:
– Basic database performance-tuning concepts
– How a DBMS processes SQL queries
– About the importance of indexes in query processing
– About the types of decisions the query optimizer has
to make
– Some common practices used to write efficient SQL
code
– How to formulate queries and tune the DBMS for
76. optimal performance
– Performance tuning in SQL Server 2005
Database Systems, 8th Edition 3
11.1 Database Performance-Tuning Concepts
• Goal of database performance is to execute
queries as fast as possible
• Database performance tuning
– Set of activities and procedures designed to
reduce response time of database system
• All factors must operate at optimum level with
minimal bottlenecks
• Good database performance starts with
good database design
Database Systems, 8th Edition 4
Database Systems, 8th Edition 5
Performance Tuning: Client and Server
• Client side
– Generate SQL query that returns correct answer
77. in least amount of time
• Using minimum amount of resources at server
– SQL performance tuning
• Server side
– DBMS environment configured to respond to
clients’ requests as fast as possible
• Optimum use of existing resources
– DBMS performance tuning
Database Systems, 8th Edition 6
DBMS Architecture
• All data in database are stored in data files
• Data files
– Automatically expand in predefined increments
known as extends
– Grouped in file groups or table spaces
• Table space or file group:
– Logical grouping of several data files that store
data with similar characteristics
Database Systems, 8th Edition 7
Basic DBMS architecture
78. Database Systems, 8th Edition 8
DBMS Architecture (continued)
• Data cache or buffer cache: shared, reserved
memory area
– Stores most recently accessed data blocks in RAM
• SQL cache or procedure cache: stores most
recently executed SQL statements
– Also PL/SQL procedures, including triggers and
functions
• DBMS retrieves data from permanent storage and
places it in RAM
Database Systems, 8th Edition 9
DBMS Architecture (continued)
• Input/output request: low-level data access
operation to/from computer devices, such as
memory, hard disks, videos, and printers
• Data cache is faster than data in data files
– DBMS does not wait for hard disk to retrieve data
• Majority of performance-tuning activities focus on
minimizing I/O operations
79. • Typical DBMS processes:
– Listener, User, Scheduler, Lock manager, Optimizer
Database Systems, 8th Edition 10
Database Statistics
• Measurements about database objects and available
resources
– Tables, Indexes, Number of processors used,
Processor speed, Temporary space available
• Make critical decisions about improving query
processing efficiency
• Can be gathered manually by DBA or automatically by
DBMS
– UPDATE STATISTICS table_name [index_name]
– Auto-Update and Auto-Create Statistics option
• 資料庫屬性-> 自動更新統計資料
• 資料庫屬性-> 自動建立統計資料
Database Systems, 8th Edition 11
Database Systems, 8th Edition 12
Ch08: dbcc show_statistics (customer,
PK__CUSTOMER__24927208 )
80. Ch08: dbcc show_statistics (customer, CUS_UI1)
補充SQL Server 2005
Database Systems, 8th Edition 13
11.2 Query Processing
• DBMS processes queries in three phases
– Parsing
• DBMS parses the query and chooses the most
efficient access/execution plan
– Execution
• DBMS executes the query using chosen
execution plan
– Fetching
• DBMS fetches the data and sends the result back
to the client
Database Systems, 8th Edition 14
Query Processing
Database Systems, 8th Edition 15
81. SQL Parsing Phase
• Break down query into smaller units
• Transform original SQL query into slightly
different version of original SQL code
– Fully equivalent
• Optimized query results are always the same as
original query
– More efficient
• Optimized query will almost always execute faster
than original query
Database Systems, 8th Edition 16
SQL Parsing Phase (continued)
• Query optimizer analyzes SQL query and finds most
efficient way to access data
– Validated for syntax compliance
– Validated against data dictionary
• Tables, column names are correct
• User has proper access rights
– Analyzed and decomposed into more atomic components
– Optimized through transforming into a fully equivalent but
more efficient SQL query
– Prepared for execution by determining the execution or
82. access plan
Database Systems, 8th Edition 17
SQL Parsing Phase (continued)
• Access plans are DBMS-specific
– Translate client’s SQL query into series of
complex I/O operations
– Required to read the data from the physical data
files and generate result set
• DBMS checks if access plan already exists for
query in SQL cache
• DBMS reuses the access plan to save time
• If not, optimizer evaluates various plans
– Chosen plan placed in SQL cache
Database Systems, 8th Edition 18
Database Systems, 8th Edition 19
SQL Execution and Fetching Phase
• All I/O operations indicated in access plan are
executed
– Locks acquired
83. – Data retrieved and placed in data cache
– Transaction management commands processed
• Rows of resulting query result set are returned to
client
• DBMS may use temporary table space to store
temporary data
– The server may send only the first 100 rows of 9000 rows
Database Systems, 8th Edition 20
Query Processing Bottlenecks
• Delay introduced in the processing of an I/O
operation that slows the system
– CPU
– RAM
– Hard disk
– Network
– Application code
Database Systems, 8th Edition 21
SQL 敘述
輸入完成
後先不要
執行查
84. 詢, 請按
下工具列
的顯示估
計執行計
劃鈕
:
Database Systems, 8th Edition 22
11.3 Indexes and Query Optimization
• Indexes
– Crucial in speeding up data access
– Facilitate searching, sorting, and using
aggregate functions as well as join operations
– Ordered set of values that contains index key
and pointers
• More efficient to use index to access table than
to scan all rows in table sequentially
Database Systems, 8th Edition 23
Indexes and Query Optimization
• Data sparsity: number of different values a column
could possibly have
• Indexes implemented using: ( 課本p. 453)
85. – Hash indexes
– B-tree indexes: most common index type. Used in tables in
which column values repeat a small number of times. The
leaves contain pointers to records It is self-balanced.
– Bitmap indexes: 0/1
• DBMSs determine best type of index to use
– Ex: CUST_LNAME with B-tree and REGION_CODE with
Bitmap indexes
Database Systems, 8th Edition 24B-tree and bitmap index
representation
25
Index Representation for the
CUSTOMER table
SELECT CUS_NAME
FROM CUSTOMER
WHERE CUS_STATE=‘FL’
Requires only 5 accesses to STATE_INDEX,
5 accesses to CUSTOMER
Database Systems, 8th Edition 26
11.4 Optimizer Choices
86. • Rule-based optimizer
– Preset rules and points
– Rules assign a fixed cost to each operation
• Cost-based optimizer
– Algorithms based on statistics about objects
being accessed
– Adds up processing cost, I/O costs, resource
costs to derive total cost
Example
Database Systems, 8th Edition 27
SELECT P_CODE, P_DESCRIPT, P_PRICE, V_NAME,
V_STATE
FROM PRODUCT P, VENDOR V
WHERE P.V_CODE=V.V_CODE
AND V.V_STATE=‘FL’;
• With the following database statistics:
– The PRODUCT table has 7000 rows
– The VENDOR table has 300 rows
– 10 vendors come from Florida
– 1000 products come from vendors in Florida
Database Systems, 8th Edition 28
87. Example
Database Systems, 8th Edition 29
• Assume the PRODUCT table has the index
PQOH_NDX in the P_QOH attribute
SELECT MIN(P_QOH) FROM PRODUCT
could be resolved by reading only the first entry in
the PQOH_NDX index
Database Systems, 8th Edition 30
Using Hints to Affect Optimizer Choices
• Optimizer might not choose best plan
• Makes decisions based on existing statistics
– Statistics may be old
– Might choose less efficient decisions
• Optimizer hints: special instructions for the
optimizer embedded in the SQL command text
Database Systems, 8th Edition 31
Oracle 版本
88. Database Systems, 8th Edition 32
MS SQL Server 的語法請參考:
http://msdn.microsoft.com/en-us/library/ms187713.aspx
SQL Server Query Hints Example
select o.customerid,companyname
from orders as o inner MERGE join customers as c
on o.customerid = c.customerid
select o.customerid,companyname
from orders as o inner HASH join customers as c
on o.customerid = c.customerid
select o.customerid,companyname
from orders as o inner LOOP join customers as c
on o.customerid = c.customerid
select city, count(*)
from customers
group by city
OPTION (HASH GROUP)
Database Systems: Design, Implementation, and Management
Eighth EditionObjectives11.1 Database Performance-Tuning
ConceptsPowerPoint PresentationPerformance Tuning: Client
and ServerDBMS ArchitectureSlide 7DBMS Architecture
(continued)Slide 9Database StatisticsSlide 11Slide 1211.2
Query ProcessingSlide 14SQL Parsing PhaseSQL Parsing Phase
(continued)Slide 17Slide 18SQL Execution and Fetching
PhaseQuery Processing BottlenecksSlide 2111.3 Indexes and
Query OptimizationIndexes and Query OptimizationSlide
24Slide 2511.4 Optimizer ChoicesExampleSlide 28Slide
89. 29Using Hints to Affect Optimizer ChoicesSlide 31Slide 32
Database Systems:
Design, Implementation, and
Management
Tenth Edition
Chapter 10
Transaction Management
and Concurrency Control
Objectives
• In this chapter, you will learn:
– About database transactions and their properties
– What concurrency control is and what role it
plays in maintaining the database’s integrity
– What locking methods are and how they work
Database Systems, 10th Edition 2
Objectives (cont’d.)
– How stamping methods are used for
concurrency control
– How optimistic methods are used for
90. concurrency control
– How database recovery management is used to
maintain database integrity
Database Systems, 10th Edition 3
What Is a Transaction?
• Logical unit of work that must be either entirely
completed or aborted
• Successful transaction changes database from
one consistent state to another
– One in which all data integrity constraints are
satisfied
• Most real-world database transactions are
formed by two or more database requests
– Equivalent of a single SQL statement in an
application program or transaction
Database Systems, 10th Edition 4
Database Systems, 10th Edition 5
Evaluating Transaction Results
91. • Not all transactions update database
• SQL code represents a transaction because
database was accessed
• Improper or incomplete transactions can have
devastating effect on database integrity
– Some DBMSs provide means by which user can
define enforceable constraints
– Other integrity rules are enforced automatically
by the DBMS
Database Systems, 10th Edition 6
Database Systems, 10th Edition 7
Figure 9.2
Transaction Properties
• Atomicity
– All operations of a transaction must be
completed
• Consistency
– Permanence of database’s consistent state
• Isolation
– Data used during transaction cannot be used by
92. second transaction until the first is completed
Database Systems, 10th Edition 8
Transaction Properties (cont’d.)
• Durability
– Once transactions are committed, they cannot
be undone
• Serializability
– Concurrent execution of several transactions
yields consistent results
• Multiuser databases are subject to multiple
concurrent transactions
Database Systems, 10th Edition 9
Transaction Management with SQL
• ANSI has defined standards that govern SQL
database transactions
• Transaction support is provided by two SQL
statements: COMMIT and ROLLBACK
• Transaction sequence must continue until:
– COMMIT statement is reached
93. – ROLLBACK statement is reached
– End of program is reached
– Program is abnormally terminated
Database Systems, 10th Edition 10
The Transaction Log
• Transaction log stores:
– A record for the beginning of transaction
– For each transaction component:
• Type of operation being performed (update,
delete, insert)
• Names of objects affected by transaction
• “Before” and “after” values for updated fields
• Pointers to previous and next transaction log
entries for the same transaction
– Ending (COMMIT) of the transaction
Database Systems, 10th Edition 11
Database Systems, 10th Edition 12
94. Concurrency Control
• Coordination of simultaneous transaction
execution in a multiprocessing database
• Objective is to ensure serializability of
transactions in a multiuser environment
• Three main problems:
– Lost updates
– Uncommitted data
– Inconsistent retrievals
Database Systems, 10th Edition 13
Lost Updates
• Lost update problem:
– Two concurrent transactions update same data
element
– One of the updates is lost
• Overwritten by the other transaction
Database Systems, 10th Edition 14
Database Systems, 10th Edition 15
95. Uncommitted Data
• Uncommitted data phenomenon:
– Two transactions are executed concurrently
– First transaction rolled back after second already
accessed uncommitted data
Database Systems, 10th Edition 16
Database Systems, 10th Edition 17
Inconsistent Retrievals
• Inconsistent retrievals:
– First transaction accesses data
– Second transaction alters the data
– First transaction accesses the data again
• Transaction might read some data before they
are changed and other data after changed
• Yields inconsistent results
Database Systems, 10th Edition 18
Database Systems, 10th Edition 19
96. Database Systems, 10th Edition 20
The Scheduler
• Special DBMS program
– Purpose is to establish order of operations within
which concurrent transactions are executed
• Interleaves execution of database operations:
– Ensures serializability
– Ensures isolation
• Serializable schedule
– Interleaved execution of transactions yields
same results as serial execution
Database Systems, 10th Edition 21
Concurrency Control
with Locking Methods
• Lock
– Guarantees exclusive use of a data item to a
current transaction
– Required to prevent another transaction from
97. reading inconsistent data
– Pessimistic locking
• Use of locks based on the assumption that conflict
between transactions is likely
– Lock manager
• Responsible for assigning and policing the locks
used by transactions
Database Systems, 10th Edition 22
Lock Granularity
• Indicates level of lock use
• Locking can take place at following levels:
– Database
– Table
– Page
– Row
– Field (attribute)
Database Systems, 10th Edition 23
Lock Granularity (cont’d.)
• Database-level lock
98. – Entire database is locked
• Table-level lock
– Entire table is locked
• Page-level lock
– Entire diskpage is locked
Database Systems, 10th Edition 24
Lock Granularity (cont’d.)
• Row-level lock
– Allows concurrent transactions to access
different rows of same table
• Even if rows are located on same page
• Field-level lock
– Allows concurrent transactions to access same
row
• Requires use of different fields (attributes) within
the row
Database Systems, 10th Edition 25
Database Systems, 10th Edition 26
99. Database Systems, 10th Edition 27
Database Systems, 10th Edition 28
Database Systems, 10th Edition 29
Lock Types
• Binary lock
– Two states: locked (1) or unlocked (0)
• Exclusive lock
– Access is specifically reserved for transaction
that locked object
– Must be used when potential for conflict exists
• Shared lock
– Concurrent transactions are granted read
access on basis of a common lock
Database Systems, 10th Edition 30
Database Systems, 10th Edition 31
100. Two-Phase Locking
to Ensure Serializability
• Defines how transactions acquire and
relinquish locks
• Guarantees serializability, but does not prevent
deadlocks
– Growing phase
• Transaction acquires all required locks without
unlocking any data
– Shrinking phase
• Transaction releases all locks and cannot obtain
any new lock
Database Systems, 10th Edition 32
Two-Phase Locking
to Ensure Serializability (cont’d.)
• Governed by the following rules:
– Two transactions cannot have conflicting locks
– No unlock operation can precede a lock
operation in the same transaction
– No data are affected until all locks are obtained
Database Systems, 10th Edition 33
101. Database Systems, 10th Edition 34
Deadlocks
• Condition that occurs when two transactions
wait for each other to unlock data
• Possible only if one of the transactions wants to
obtain an exclusive lock on a data item
– No deadlock condition can exist among shared
locks
Database Systems, 10th Edition 35
Deadlocks (cont’d.)
• Three techniques to control deadlock:
– Prevention
– Detection
– Avoidance
• Choice of deadlock control method depends on
database environment
– Low probability of deadlock; detection
recommended
– High probability; prevention recommended
102. Database Systems, 10th Edition 36
Database Systems, 10th Edition 37
Concurrency Control
with Time Stamping Methods
• Assigns global unique time stamp to each
transaction
• Produces explicit order in which transactions
are submitted to DBMS
• Uniqueness
– Ensures that no equal time stamp values can
exist
• Monotonicity
– Ensures that time stamp values always increase
Database Systems, 10th Edition 38
Wait/Die and Wound/Wait Schemes
• Wait/die
– Older transaction waits and younger is rolled
back and rescheduled
103. • Wound/wait
– Older transaction rolls back younger transaction
and reschedules it
Database Systems, 10th Edition 39
Database Systems, 10th Edition 40
Concurrency Control
with Optimistic Methods
• Optimistic approach
– Based on assumption that majority of database
operations do not conflict
– Does not require locking or time stamping
techniques
– Transaction is executed without restrictions until
it is committed
– Phases: read, validation, and write
Database Systems, 10th Edition 41
Database Recovery Management
104. • Restores database to previous consistent state
• Based on atomic transaction property
– All portions of transaction are treated as single
logical unit of work
– All operations are applied and completed to
produce consistent database
• If transaction operation cannot be completed:
– Transaction aborted
– Changes to database are rolled back
Database Systems, 10th Edition 42
Transaction Recovery
• Write-ahead-log protocol: ensures transaction
logs are written before data is updated
• Redundant transaction logs: ensure physical
disk failure will not impair ability to recover
• Buffers: temporary storage areas in primary
memory
• Checkpoints: operations in which DBMS writes
all its updated buffers to disk
Database Systems, 10th Edition 43
105. Transaction Recovery (cont’d.)
• Deferred-write technique
– Only transaction log is updated
• Recovery process: identify last checkpoint
– If transaction committed before checkpoint:
• Do nothing
– If transaction committed after checkpoint:
• Use transaction log to redo the transaction
– If transaction had ROLLBACK operation:
• Do nothing
Database Systems, 10th Edition 44
Transaction Recovery (cont’d.)
• Write-through technique
– Database is immediately updated by transaction
operations during transaction’s execution
• Recovery process: identify last checkpoint
– If transaction committed before checkpoint:
• Do nothing
– If transaction committed after last checkpoint:
• DBMS redoes the transaction using “after” values
– If transaction had ROLLBACK or was left active:
106. • Do nothing because no updates were made
Database Systems, 10th Edition 45
Database Systems, 10th Edition 46
Summary
• Transaction: sequence of database operations
that access database
– Logical unit of work
• No portion of transaction can exist by itself
– Five main properties: atomicity, consistency,
isolation, durability, and serializability
• COMMIT saves changes to disk
• ROLLBACK restores previous database state
• SQL transactions are formed by several SQL
statements or database requests
Database Systems, 10th Edition 47
Summary (cont’d.)
• Transaction log keeps track of all transactions
that modify database
• Concurrency control coordinates simultaneous
107. execution of transactions
• Scheduler establishes order in which
concurrent transaction operations are executed
• Lock guarantees unique access to a data item
by transaction
• Two types of locks: binary locks and
shared/exclusive locks
Database Systems, 10th Edition 48
Summary (cont’d.)
• Serializability of schedules is guaranteed
through the use of two-phase locking
• Deadlock: when two or more transactions wait
indefinitely for each other to release lock
• Three deadlock control techniques: prevention,
detection, and avoidance
• Time stamping methods assign unique time
stamp to each transaction
– Schedules execution of conflicting transactions
in time stamp order
Database Systems, 10th Edition 49
Summary (cont’d.)
108. • Optimistic methods assume the majority of
database transactions do not conflict
– Transactions are executed concurrently, using
private copies of the data
• Database recovery restores database from
given state to previous consistent state
Database Systems, 10th Edition 50
Database Systems: Design, Implementation, and Management
Tenth EditionObjectivesObjectives (cont’d.)What Is a
Transaction?PowerPoint PresentationEvaluating Transaction
ResultsSlide 7Transaction PropertiesTransaction Properties
(cont’d.)Transaction Management with SQLThe Transaction
LogSlide 12Concurrency ControlLost UpdatesSlide
15Uncommitted DataSlide 17Inconsistent RetrievalsSlide
19Slide 20The SchedulerConcurrency Control with Locking
MethodsLock GranularityLock Granularity (cont’d.)Lock
Granularity (cont’d.)Slide 26Slide 27Slide 28Slide 29Lock
TypesSlide 31Two-Phase Locking to Ensure SerializabilityTwo-
Phase Locking to Ensure Serializability (cont’d.)Slide
34DeadlocksDeadlocks (cont’d.)Slide 37Concurrency Control
with Time Stamping MethodsWait/Die and Wound/Wait
SchemesSlide 40Concurrency Control with Optimistic
MethodsDatabase Recovery ManagementTransaction
RecoveryTransaction Recovery (cont’d.)Slide 45Slide
46SummarySummary (cont’d.)Slide 49Slide 50
Database Systems: Design,
Implementation, and
Management
109. Tenth Edition
Chapter 13
Business Intelligence and Data
Warehouses
Objectives
In this chapter, you will learn:
• How business intelligence provides a
comprehensive business decision support
framework
• About business intelligence architecture, its
evolution, and reporting styles
• About the relationship and differences between
operational data and decision support data
• What a data warehouse is and how to prepare
data for one
Database Systems, 10th Edition 2
Objectives (cont’d.)
• What star schemas are and how they are
constructed
• About data analytics, data mining, and
110. predictive analytics
• About online analytical processing (OLAP)
• How SQL extensions are used to support
OLAP-type data manipulations
Database Systems, 10th Edition 3
The Need for Data Analysis
• Managers track daily transactions to evaluate
how the business is performing
• Strategies should be developed to meet
organizational goals using operational
databases
• Data analysis provides information about short-
term tactical evaluations and strategies
Database Systems, 10th Edition 4
Business Intelligence
• Comprehensive, cohesive, integrated tools and
processes
– Capture, collect, integrate, store, and analyze
data
– Generate information to support business
111. decision making
• Framework that allows a business to transform:
– Data into information
– Information into knowledge
– Knowledge into wisdom
Database Systems, 10th Edition 5
Business Intelligence Architecture
• Composed of data, people, processes,
technology, and management of components
• Focuses on strategic and tactical use of
information
• Key performance indicators (KPI)
– Measurements that assess company’s
effectiveness or success in reaching goals
• Multiple tools from different vendors can be
integrated into a single BI framework
Database Systems, 10th Edition 6
Database Systems, 10th Edition 7
112. Business Intelligence Benefits
• Main goal: improved decision making
• Other benefits
– Integrating architecture
– Common user interface for data reporting and
analysis
– Common data repository fosters single version
of company data
– Improved organizational performance
Database Systems, 10th Edition 8
Business Intelligence Evolution
Database Systems, 10th Edition 9
Database Systems, 10th Edition 10
Business Intelligence Technology
Trends
• Data storage improvements
• Business intelligence appliances
• Business intelligence as a service
• Big Data analytics
113. • Personal analytics
Database Systems, 10th Edition 11
Decision Support Data
• BI effectiveness depends on quality of data
gathered at operational level
• Operational data seldom well-suited for
decision support tasks
• Need reformat data in order to be useful for
business intelligence
Database Systems, 10th Edition 12
Operational Data vs.
Decision Support Data
• Operational data
– Mostly stored in relational database
– Optimized to support transactions representing
daily operations
• Decision support data differs from operational
data in three main areas:
– Time span
– Granularity
114. – Dimensionality
Database Systems, 10th Edition 13
Database Systems, 10th Edition 14
Decision Support
Database Requirements
• Specialized DBMS tailored to provide fast
answers to complex queries
• Three main requirements
– Database schema
– Data extraction and loading
– Database size
Database Systems, 10th Edition 15
Decision Support
Database Requirements (cont’d.)
• Database schema
– Complex data representations
– Aggregated and summarized data
– Queries extract multidimensional time slices
• Data extraction and filtering
115. – Supports different data sources
• Flat files
• Hierarchical, network, and relational databases
• Multiple vendors
– Checking for inconsistent data
Database Systems, 10th Edition 16
Decision Support
Database Requirements (cont’d.)
• Database size
– In 2005, Wal-Mart had 260 terabytes of data in
its data warehouses
– DBMS must support very large databases
(VLDBs)
Database Systems, 10th Edition 17
The Data Warehouse
• Integrated, subject-oriented, time-variant, and
nonvolatile collection of data
– Provides support for decision making
• Usually a read-only database optimized for data
analysis and query processing
• Requires time, money, and considerable
116. managerial effort to create
Database Systems, 10th Edition 18
Database Systems, 10th Edition 19
Data Marts
• Small, single-subject data warehouse subset
• More manageable data set than data
warehouse
• Provides decision support to small group of
people
• Typically lower cost and lower implementation
time than data warehouse
Database Systems, 10th Edition 20
Twelve Rules That Define
a Data Warehouse
Database Systems, 10th Edition 21
Star Schemas
117. • Data-modeling technique
– Maps multidimensional decision support data
into relational database
• Creates near equivalent of multidimensional
database schema from relational data
• Easily implemented model for multidimensional
data analysis while preserving relational
structures
• Four components: facts, dimensions, attributes,
and attribute hierarchies
Database Systems, 10th Edition 22
Facts
• Numeric measurements that represent specific
business aspect or activity
– Normally stored in fact table that is center of star
schema
• Fact table contains facts linked through their
dimensions
• Metrics are facts computed at run time
Database Systems, 10th Edition 23
118. Dimensions
• Qualifying characteristics provide additional
perspectives to a given fact
• Decision support data almost always viewed in
relation to other data
• Study facts via dimensions
• Dimensions stored in dimension tables
Database Systems, 10th Edition 24
Attributes
• Use to search, filter, and classify facts
• Dimensions provide descriptions of facts
through their attributes
• No mathematical limit to the number of
dimensions
• Slice and dice: focus on slices of the data cube
for more detailed analysis
Database Systems, 10th Edition 25
Attribute Hierarchies
• Provide top-down data organization
• Two purposes:
119. – Aggregation
– Drill-down/roll-up data analysis
• Determine how the data are extracted and
represented
• Stored in the DBMS’s data dictionary
• Used by OLAP tool to access warehouse
properly
Database Systems, 10th Edition 26
Star Schema Representation
• Facts and dimensions represented in physical
tables in data warehouse database
• Many fact rows related to each dimension row
– Primary key of fact table is a composite primary
key
– Fact table primary key formed by combining
foreign keys pointing to dimension tables
• Dimension tables are smaller than fact tables
• Each dimension record is related to thousands
of fact records
Database Systems, 10th Edition 27
120. Performance-Improving Techniques
for the Star Schema
• Four techniques to optimize data warehouse
design:
– Normalizing dimensional tables
– Maintaining multiple fact tables to represent
different aggregation levels
– Denormalizing fact tables
– Partitioning and replicating tables
Database Systems, 10th Edition 28
Performance-Improving Techniques
for the Star Schema (cont’d.)
• Dimension tables normalized to:
– Achieve semantic simplicity
– Facilitate end-user navigation through the
dimensions
• Denormalizing fact tables improves data access
performance and saves data storage space
• Partitioning splits table into subsets of rows or
columns
• Replication makes copy of table and places it in
121. different location
Database Systems, 10th Edition 29
Data Analytics
• Subset of BI functionality
• Encompasses a wide range of mathematical,
statistical, and modeling techniques
– Purpose of extracting knowledge from data
• Tools can be grouped into two separate areas:
– Explanatory analytics
– Predictive analytics
Database Systems, 10th Edition 30
Data Mining
• Data-mining tools do the following:
– Analyze data
– Uncover problems or opportunities hidden in
data relationships
– Form computer models based on their findings
– Use models to predict business behavior
• Runs in two modes
– Guided
122. – Automated
Database Systems, 10th Edition 31
Database Systems, 10th Edition 32
Predictive Analytics
• Employs mathematical and statistical
algorithms, neural networks, artificial
intelligence, and other advanced modeling tools
• Create actionable predictive models based on
available data
• Models are used in areas such as:
– Customer relationships, customer service,
customer retention, fraud detection, targeted
marketing, and optimized pricing
Database Systems, 10th Edition 33
Online Analytical Processing
• Three main characteristics:
– Multidimensional data analysis techniques
– Advanced database support
123. – Easy-to-use end-user interfaces
Database Systems, 10th Edition 34
Multidimensional Data Analysis
Techniques
• Data are processed and viewed as part of a
multidimensional structure
• Augmented by the following functions:
– Advanced data presentation functions
– Advanced data aggregation, consolidation, and
classification functions
– Advanced computational functions
– Advanced data modeling functions
Database Systems, 10th Edition 35
Advanced Database Support
• Advanced data access features include:
– Access to many different kinds of DBMSs, flat
files, and internal and external data sources
– Access to aggregated data warehouse data
– Advanced data navigation
124. – Rapid and consistent query response times
– Maps end-user requests to appropriate data
source and to proper data access language
– Support for very large databases
Database Systems, 10th Edition 36
Easy-to-Use End-User Interface
• Advanced OLAP features are more useful when
access is simple
• Many interface features are “borrowed” from
previous generations of data analysis tools
– Already familiar to end users
– Makes OLAP easily accepted and readily used
Database Systems, 10th Edition 37
OLAP Architecture
• Three main architectural components:
– Graphical user interface (GUI)
– Analytical processing logic
– Data-processing logic
Database Systems, 10th Edition 38
125. OLAP Architecture (cont’d.)
• Designed to use both operational and data
warehouse data
• In most implementations, data warehouse and
OLAP are interrelated and complementary
• OLAP systems merge data warehouse and
data mart approaches
Database Systems, 10th Edition 39
Database Systems, 10th Edition 40
Relational OLAP
• Relational online analytical processing
(ROLAP) provides the following extensions:
– Multidimensional data schema support within the
RDBMS
– Data access language and query performance
optimized for multidimensional data
– Support for very large databases (VLDBs)
Database Systems, 10th Edition 41
126. Multidimensional OLAP
• Multidimensional online analytical processing
(MOLAP) extends OLAP functionality to
multidimensional database management
systems (MDBMSs)
– MDBMS end users visualize stored data as a 3D
data cube
– Data cubes can grow to n dimensions, becoming
hypercubes
– To speed access, data cubes are held in
memory in a cube cache
Database Systems, 10th Edition 42
Relational vs. Multidimensional OLAP
• Selection of one or the other depends on
evaluator’s vantage point
• Proper evaluation must include supported
hardware, compatibility with DBMS, etc.
• ROLAP and MOLAP vendors working toward
integration within unified framework
• Relational databases use star schema design
to handle multidimensional data
127. Database Systems, 10th Edition 43
Database Systems, 10th Edition 44
SQL Extensions for OLAP
• Proliferation of OLAP tools fostered
development of SQL extensions
• Many innovations have become part of
standard SQL
• All SQL commands will work in data warehouse
as expected
• Most queries include many data groupings and
aggregations over multiple columns
Database Systems, 10th Edition 45
The ROLLUP Extension
• Used with GROUP BY clause to generate
aggregates by different dimensions
• GROUP BY generates only one aggregate for
each new value combination of attributes
• ROLLUP extension enables subtotal for each
column listed except for the last one
128. – Last column gets grand total
• Order of column list important
Database Systems, 10th Edition 46
The CUBE Extension
• CUBE extension used with GROUP BY clause
to generate aggregates by listed columns
– Includes the last column
• Enables subtotal for each column in addition to
grand total for last column
– Useful when you want to compute all possible
subtotals within groupings
• Cross-tabulations are good candidates for
application of CUBE extension
Database Systems, 10th Edition 47
Materialized Views
• A dynamic table that contains SQL query
command to generate rows
– Also contains the actual rows
• Created the first time query is run and summary
rows are stored in table
129. • Automatically updated when base tables are
updated
Database Systems, 10th Edition 48
Summary
• Business intelligence generates information
used to support decision making
• BI covers a range of technologies, applications,
and functionalities
• Decision support systems were the precursor of
current generation BI systems
• Operational data not suited for decision support
Database Systems, 10th Edition 49
Summary (cont’d.)
• Data warehouse provides support for decision
making
– Usually read-only
– Optimized for data analysis, query processing
• Star schema is a data-modeling technique
– Maps multidimensional decision support data
into a relational database
130. • Star schema has four components:
– Facts, dimensions, attributes, and attribute
hierarchies
Database Systems, 10th Edition 50
Summary (cont’d.)
• Data analytics
– Provides advanced data analysis tools to extract
knowledge from business data
• Data mining
– Automates the analysis of operational data to
find previously unknown data characteristics,
relationships, dependencies, and trends
• Predictive analytics
– Uses information generated in the data-mining
phase to create advanced predictive models
Database Systems, 10th Edition 51
Summary (cont’d.)
• Online analytical processing (OLAP)
– Advanced data analysis environment that
131. supports decision making, business modeling,
and operations research
• SQL has been enhanced with extensions that
support OLAP-type processing and data
generation
Database Systems, 10th Edition 52
Database Systems: Design, Implementation, and Management
Tenth EditionObjectivesObjectives (cont’d.)The Need for Data
AnalysisBusiness IntelligenceBusiness Intelligence
ArchitecturePowerPoint PresentationBusiness Intelligence
BenefitsBusiness Intelligence EvolutionSlide 10Business
Intelligence Technology TrendsDecision Support
DataOperational Data vs. Decision Support DataSlide
14Decision Support Database RequirementsDecision Support
Database Requirements (cont’d.)Slide 17The Data
WarehouseSlide 19Data MartsTwelve Rules That Define a Data
WarehouseStar SchemasFactsDimensionsAttributesAttribute
HierarchiesStar Schema RepresentationPerformance-Improving
Techniques for the Star SchemaPerformance-Improving
Techniques for the Star Schema (cont’d.)Data AnalyticsData
MiningSlide 32Predictive AnalyticsOnline Analytical
ProcessingMultidimensional Data Analysis
TechniquesAdvanced Database SupportEasy-to-Use End-User
InterfaceOLAP ArchitectureOLAP Architecture (cont’d.)Slide
40Relational OLAPMultidimensional OLAPRelational vs.
Multidimensional OLAPSlide 44SQL Extensions for OLAPThe
ROLLUP ExtensionThe CUBE ExtensionMaterialized
ViewsSummarySummary (cont’d.)Slide 51Slide 52