Overview
                                                                       We study the internal of DBMSs
                                                                           Principles of relational DBMSs
   Database Management Systems                                              • Emphasize on query & transaction processing
                                                                              techniques
                                                                           Advanced database systems & applications
                                                                            • OODBMS, XML database, data warehousing, OLAP,
                                                                              data mining
                      Prof. Weining Zhang
                                                                       Course work includes
                   Dept. of Computer Science
                                                                           Homework, 2 midterm exams, no final exam
                University of Texas at San Antonio
                                                                           Programming assignments in Java


                                                                       W. Zhang                   Introduction                       2




Teaching Staff                                                        Communication
    Instructor: Prof. Weining Zhang                                    Web page:
             Office: SB 4.01.19
                                                                       http://www.cs.utsa.edu/~wzhang/cs5443/home
                                                                           Contains everything about the course: syllabus,
             Phone: 458-5557
                                                                           announcement, assignments, project, lecture notes, etc.
             Email: wzhang@cs.utsa.edu
             Office hour: MW 5:00 – 6:00 pm
                                                                       You should check course web pages regularly.
                          T 4:00-5:00 pm                               Mailing list: 5443@cs.utsa.edu
                          and by appointment                               Include your CS email address; you may need to forward
                                                                           emails to your regular email address




 W. Zhang                        Introduction                     3    W. Zhang                   Introduction                       4




Textbooks                                                             Other Textbooks
    Required textbook:                                                    Fundamentals of Database Systems, 5th ed., by
            Database Management systems,        3rd   ed., by             Elmasri & Navathe
            Ramakrishnan & Gehrke                                         Other database books in the Main Library
    Recommended textbook:
            Principles of Distributed Database Systems, by M.
            Ozsu & P. Valduriez
            Database System: the Complete Book, by Garcia-
            Molina, Ullman & Widom
            Database system concepts, 5th ed., by Silberschatz,
            Korth & Sudarshan


 W. Zhang                        Introduction                     5    W. Zhang                   Introduction                       6
Prerequisite                                                                             Grading
 CS3743 or equivalent, or extensive experience with                                          Programming assignments 20%
 database & DB application
                                                                                             Homework 20%
 Strong Java programming skills
                                                                                             Midterm I 25%
 Data structures, algorithms, OO programming, etc.
                                                                                             Midterm II 25%
 Mathematics including logic, sets, algebra, …
                                                                                             Intangibles 10%




 W. Zhang                              Introduction                                 7     W. Zhang                     Introduction                      8




Programming Assignments                                                                  Introduction to Database Systems
    Implement several components of a simple DBMS                                         A database system consists of
    called Minibase (Java version), such as,                                                  Database management system: the software
            Buffer Manager                                                                    Databases: the data
            Heap File                                                                     A DBMS needs to provide
            Hash-based Index                                                                   persistent data storage
            Relational operators
                                                                                               declarative query language for efficient data retrieval
            Query processing
                                                                                               shared access to data by different applications
    Work in groups of 2                                                                        data security
    Programming in Java, on Linux or Windows,                                                  data integrity …
    recommend using Eclipse IDE

 W. Zhang                              Introduction                                 9     W. Zhang                     Introduction                      10




An RDBMS Architecture                                                                    Storage Management
                                                                                          Data is stored on disks, and processed in the main
  Web forms              Application front end                 SQL interface
                                                                                          memory
                                SQL Commands                                              Since disk I/Os are costly, search structures, such as,
                                                                     Query                indexes, must be used to achieve efficient data access
                        Parser                 Optimizer
                                                                     Evaluation
 Concurrency        Plan Executor        Operator Evaluator Engine
                                                                                          DBMS components that manage different types of
 Control                                                                                  storage include
 Xction Man              File & Access Methods
                                                                       Recovery               Disk Manager: manages pages on disk drive
                                                                       Manager                Buffer Manager: manages pages in main memory buffer
 Lock Man            Buffer Manager & Disk Manager                                DBMS



                  Index files                         Sys. catalog
                                  Data files
 W. Zhang                              Introduction                                11     W. Zhang                     Introduction                      12
File Organization                                          Query Processing
 Data records are logically organized in files and          DBMS evaluates declarative queries by executing an
 physically stored on disk pages                            optimal query plan that is expressed using relational
 File organization must consider the format and size of     algebraic operations.
 data records                                               A DBMS must evaluate algebraic operations
 In addition to simple files of raw data, DBMS also         efficiently.
 maintains search structures, such as,                      The algorithms and the costs of relational algebraic
     Ordering                                               operations, such as, selection and join, depend
     Hashing                                                critically on
     Indexing                                                   types of query condition
 to reduce access costs                                         specifics of file organizations

 W. Zhang                 Introduction                13    W. Zhang                     Introduction                   14




Query Optimization                                         Transaction Processing
                                                            A transaction models the execution of a database
    For easy of use, query languages are declarative.
                                                            application, which typically updates the data in
    The system must figure out an efficient evaluation
                                                            databases.
    plan
                                                            Transaction management must deal with concurrent
    The goal is to answer a query with as few disk I/O
                                                            transactions and possible system failures.
    as possible
    The system uses statistics of the data & heuristics
    to decide how to process the query




 W. Zhang                 Introduction                15    W. Zhang                     Introduction                   16




Recovery                                                   Concurrency Control
 The recovery manager protects data integrity in case of    Concurrent execution of application programs is
 system crash.                                              essential for good DBMS performance.
 The system guarantees that either all operations of a          Need to keep CPU busy while performing I/O operations
                                                                (frequent & relatively slow).
 transaction or none of them are performed, and updates
 made by completed transactions are persistent.             Interleaving actions of different user programs can lead
                                                            to inconsistency: e.g., check is cleared while account
                                                            balance is being computed.
                                                            Concurrency control subsystem ensures such problems
                                                            don’t arise: users can pretend they are using a single-
                                                            user system.

 W. Zhang                 Introduction                17    W. Zhang                     Introduction                   18
Advanced Hashing & Indexing                                        Distributed DBMS
 Relational DBMS support hashing & B+ tree indexing                 Modern corporations have data, control, & application
 New DBMSs & DB applications need more                              distributed globally
 sophisticated search structures                                    Multiple databases at geographically dispersed
     Hashing with variable size hash table or multiple keys         locations need to cooperate to answer queries with
     Indexes for spatial, multidimensional data (common in          distributed data
     multimedia DBSs, Data warehousing, OLAP, …)                    Concurrent transaction processing and recovery are still
                                                                    major issues




 W. Zhang                    Introduction                     19    W. Zhang                   Introduction                20




Parallel DBMS                                                      XML & Semistructured DBMS
 Both centralized and distributed databases may use                 Data in RDB, OODB, & ORDB are structured (with
 multiple processors to evaluate queries                            rigid schemas)
 Parallel system architecture requires new algorithms               Data on the Web (and other applications) are
 for query evaluation and optimization                              semistructured
 Performance concerns include                                           HTML, XML, Text, …
     Ability to scale up                                            Need new concepts and techniques
     Ability to speed up                                                Data model, query language
                                                                        Query processing & optimization
                                                                        Storage management
                                                                        Update, transaction processing, CC, …


 W. Zhang                    Introduction                     21    W. Zhang                   Introduction                22




Data Warehousing & OLAP                                            Data Mining
 Corporations need to put all available data into use               Data contains important patterns useful for making
 when making vital business decisions                               sound business decisions
 Need to have technology to integrate data from all                 Databases need tools to discover knowledge embedded
 sources, and keep them up to date                                  in data
 Need advanced tools to analyze, summarize, and view                    Associations
 data in various ways                                                   Clusters
 Issues:                                                                Classifications
     Data cube model                                                Useful for business trend analysis, fraud detection,
     OLAP operations                                                diagnosis, market prediction, …
     Query processing, indexing, views, …

 W. Zhang                    Introduction                     23    W. Zhang                   Introduction                24
Topics                                             Topics (cont.)
 Relational algebra and calculus                     Distributed Database Systems
 Storage & File Management                              Database design
     Disk manager, buffer manager,                      Query processing & optimization
     Indexing, hashing                                  Concurrency control & recovery
 Query Evaluation & Optimization                     Parallel Database systems
     Access methods, selection, joins, etc.          XML databases
     Query optimization methods
                                                     Data Warehousing and OLAP
 Transaction Processing
                                                     Data Mining, …
     Crash Recovery
     Concurrency Control

 W. Zhang                     Introduction    25    W. Zhang                  Introduction   26

01 intro

  • 1.
    Overview We study the internal of DBMSs Principles of relational DBMSs Database Management Systems • Emphasize on query & transaction processing techniques Advanced database systems & applications • OODBMS, XML database, data warehousing, OLAP, data mining Prof. Weining Zhang Course work includes Dept. of Computer Science Homework, 2 midterm exams, no final exam University of Texas at San Antonio Programming assignments in Java W. Zhang Introduction 2 Teaching Staff Communication Instructor: Prof. Weining Zhang Web page: Office: SB 4.01.19 http://www.cs.utsa.edu/~wzhang/cs5443/home Contains everything about the course: syllabus, Phone: 458-5557 announcement, assignments, project, lecture notes, etc. Email: wzhang@cs.utsa.edu Office hour: MW 5:00 – 6:00 pm You should check course web pages regularly. T 4:00-5:00 pm Mailing list: 5443@cs.utsa.edu and by appointment Include your CS email address; you may need to forward emails to your regular email address W. Zhang Introduction 3 W. Zhang Introduction 4 Textbooks Other Textbooks Required textbook: Fundamentals of Database Systems, 5th ed., by Database Management systems, 3rd ed., by Elmasri & Navathe Ramakrishnan & Gehrke Other database books in the Main Library Recommended textbook: Principles of Distributed Database Systems, by M. Ozsu & P. Valduriez Database System: the Complete Book, by Garcia- Molina, Ullman & Widom Database system concepts, 5th ed., by Silberschatz, Korth & Sudarshan W. Zhang Introduction 5 W. Zhang Introduction 6
  • 2.
    Prerequisite Grading CS3743 or equivalent, or extensive experience with Programming assignments 20% database & DB application Homework 20% Strong Java programming skills Midterm I 25% Data structures, algorithms, OO programming, etc. Midterm II 25% Mathematics including logic, sets, algebra, … Intangibles 10% W. Zhang Introduction 7 W. Zhang Introduction 8 Programming Assignments Introduction to Database Systems Implement several components of a simple DBMS A database system consists of called Minibase (Java version), such as, Database management system: the software Buffer Manager Databases: the data Heap File A DBMS needs to provide Hash-based Index persistent data storage Relational operators declarative query language for efficient data retrieval Query processing shared access to data by different applications Work in groups of 2 data security Programming in Java, on Linux or Windows, data integrity … recommend using Eclipse IDE W. Zhang Introduction 9 W. Zhang Introduction 10 An RDBMS Architecture Storage Management Data is stored on disks, and processed in the main Web forms Application front end SQL interface memory SQL Commands Since disk I/Os are costly, search structures, such as, Query indexes, must be used to achieve efficient data access Parser Optimizer Evaluation Concurrency Plan Executor Operator Evaluator Engine DBMS components that manage different types of Control storage include Xction Man File & Access Methods Recovery Disk Manager: manages pages on disk drive Manager Buffer Manager: manages pages in main memory buffer Lock Man Buffer Manager & Disk Manager DBMS Index files Sys. catalog Data files W. Zhang Introduction 11 W. Zhang Introduction 12
  • 3.
    File Organization Query Processing Data records are logically organized in files and DBMS evaluates declarative queries by executing an physically stored on disk pages optimal query plan that is expressed using relational File organization must consider the format and size of algebraic operations. data records A DBMS must evaluate algebraic operations In addition to simple files of raw data, DBMS also efficiently. maintains search structures, such as, The algorithms and the costs of relational algebraic Ordering operations, such as, selection and join, depend Hashing critically on Indexing types of query condition to reduce access costs specifics of file organizations W. Zhang Introduction 13 W. Zhang Introduction 14 Query Optimization Transaction Processing A transaction models the execution of a database For easy of use, query languages are declarative. application, which typically updates the data in The system must figure out an efficient evaluation databases. plan Transaction management must deal with concurrent The goal is to answer a query with as few disk I/O transactions and possible system failures. as possible The system uses statistics of the data & heuristics to decide how to process the query W. Zhang Introduction 15 W. Zhang Introduction 16 Recovery Concurrency Control The recovery manager protects data integrity in case of Concurrent execution of application programs is system crash. essential for good DBMS performance. The system guarantees that either all operations of a Need to keep CPU busy while performing I/O operations (frequent & relatively slow). transaction or none of them are performed, and updates made by completed transactions are persistent. Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is being computed. Concurrency control subsystem ensures such problems don’t arise: users can pretend they are using a single- user system. W. Zhang Introduction 17 W. Zhang Introduction 18
  • 4.
    Advanced Hashing &Indexing Distributed DBMS Relational DBMS support hashing & B+ tree indexing Modern corporations have data, control, & application New DBMSs & DB applications need more distributed globally sophisticated search structures Multiple databases at geographically dispersed Hashing with variable size hash table or multiple keys locations need to cooperate to answer queries with Indexes for spatial, multidimensional data (common in distributed data multimedia DBSs, Data warehousing, OLAP, …) Concurrent transaction processing and recovery are still major issues W. Zhang Introduction 19 W. Zhang Introduction 20 Parallel DBMS XML & Semistructured DBMS Both centralized and distributed databases may use Data in RDB, OODB, & ORDB are structured (with multiple processors to evaluate queries rigid schemas) Parallel system architecture requires new algorithms Data on the Web (and other applications) are for query evaluation and optimization semistructured Performance concerns include HTML, XML, Text, … Ability to scale up Need new concepts and techniques Ability to speed up Data model, query language Query processing & optimization Storage management Update, transaction processing, CC, … W. Zhang Introduction 21 W. Zhang Introduction 22 Data Warehousing & OLAP Data Mining Corporations need to put all available data into use Data contains important patterns useful for making when making vital business decisions sound business decisions Need to have technology to integrate data from all Databases need tools to discover knowledge embedded sources, and keep them up to date in data Need advanced tools to analyze, summarize, and view Associations data in various ways Clusters Issues: Classifications Data cube model Useful for business trend analysis, fraud detection, OLAP operations diagnosis, market prediction, … Query processing, indexing, views, … W. Zhang Introduction 23 W. Zhang Introduction 24
  • 5.
    Topics Topics (cont.) Relational algebra and calculus Distributed Database Systems Storage & File Management Database design Disk manager, buffer manager, Query processing & optimization Indexing, hashing Concurrency control & recovery Query Evaluation & Optimization Parallel Database systems Access methods, selection, joins, etc. XML databases Query optimization methods Data Warehousing and OLAP Transaction Processing Data Mining, … Crash Recovery Concurrency Control W. Zhang Introduction 25 W. Zhang Introduction 26