View stunning SlideShares in full-screen with the new iOS app!Introducing SlideShare for AndroidExplore all your favorite topics in the SlideShare appGet the SlideShare app to Save for Later — even offline
View stunning SlideShares in full-screen with the new Android app!View stunning SlideShares in full-screen with the new iOS app!
Overview We study the internal of DBMSs Principles of relational DBMSs Database Management Systems • Emphasize on query & transaction processing techniques Advanced database systems & applications • OODBMS, XML database, data warehousing, OLAP, data mining Prof. Weining Zhang Course work includes Dept. of Computer Science Homework, 2 midterm exams, no final exam University of Texas at San Antonio Programming assignments in Java W. Zhang Introduction 2Teaching Staff Communication Instructor: Prof. Weining Zhang Web page: Office: SB 4.01.19 http://www.cs.utsa.edu/~wzhang/cs5443/home Contains everything about the course: syllabus, Phone: 458-5557 announcement, assignments, project, lecture notes, etc. Email: firstname.lastname@example.org Office hour: MW 5:00 – 6:00 pm You should check course web pages regularly. T 4:00-5:00 pm Mailing list: email@example.com and by appointment Include your CS email address; you may need to forward emails to your regular email address W. Zhang Introduction 3 W. Zhang Introduction 4Textbooks Other Textbooks Required textbook: Fundamentals of Database Systems, 5th ed., by Database Management systems, 3rd ed., by Elmasri & Navathe Ramakrishnan & Gehrke Other database books in the Main Library Recommended textbook: Principles of Distributed Database Systems, by M. Ozsu & P. Valduriez Database System: the Complete Book, by Garcia- Molina, Ullman & Widom Database system concepts, 5th ed., by Silberschatz, Korth & Sudarshan W. Zhang Introduction 5 W. Zhang Introduction 6
Prerequisite Grading CS3743 or equivalent, or extensive experience with Programming assignments 20% database & DB application Homework 20% Strong Java programming skills Midterm I 25% Data structures, algorithms, OO programming, etc. Midterm II 25% Mathematics including logic, sets, algebra, … Intangibles 10% W. Zhang Introduction 7 W. Zhang Introduction 8Programming Assignments Introduction to Database Systems Implement several components of a simple DBMS A database system consists of called Minibase (Java version), such as, Database management system: the software Buffer Manager Databases: the data Heap File A DBMS needs to provide Hash-based Index persistent data storage Relational operators declarative query language for efficient data retrieval Query processing shared access to data by different applications Work in groups of 2 data security Programming in Java, on Linux or Windows, data integrity … recommend using Eclipse IDE W. Zhang Introduction 9 W. Zhang Introduction 10An RDBMS Architecture Storage Management Data is stored on disks, and processed in the main Web forms Application front end SQL interface memory SQL Commands Since disk I/Os are costly, search structures, such as, Query indexes, must be used to achieve efficient data access Parser Optimizer Evaluation Concurrency Plan Executor Operator Evaluator Engine DBMS components that manage different types of Control storage include Xction Man File & Access Methods Recovery Disk Manager: manages pages on disk drive Manager Buffer Manager: manages pages in main memory buffer Lock Man Buffer Manager & Disk Manager DBMS Index files Sys. catalog Data files W. Zhang Introduction 11 W. Zhang Introduction 12
File Organization Query Processing Data records are logically organized in files and DBMS evaluates declarative queries by executing an physically stored on disk pages optimal query plan that is expressed using relational File organization must consider the format and size of algebraic operations. data records A DBMS must evaluate algebraic operations In addition to simple files of raw data, DBMS also efficiently. maintains search structures, such as, The algorithms and the costs of relational algebraic Ordering operations, such as, selection and join, depend Hashing critically on Indexing types of query condition to reduce access costs specifics of file organizations W. Zhang Introduction 13 W. Zhang Introduction 14Query Optimization Transaction Processing A transaction models the execution of a database For easy of use, query languages are declarative. application, which typically updates the data in The system must figure out an efficient evaluation databases. plan Transaction management must deal with concurrent The goal is to answer a query with as few disk I/O transactions and possible system failures. as possible The system uses statistics of the data & heuristics to decide how to process the query W. Zhang Introduction 15 W. Zhang Introduction 16Recovery Concurrency Control The recovery manager protects data integrity in case of Concurrent execution of application programs is system crash. essential for good DBMS performance. The system guarantees that either all operations of a Need to keep CPU busy while performing I/O operations (frequent & relatively slow). transaction or none of them are performed, and updates made by completed transactions are persistent. Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is being computed. Concurrency control subsystem ensures such problems don’t arise: users can pretend they are using a single- user system. W. Zhang Introduction 17 W. Zhang Introduction 18
Advanced Hashing & Indexing Distributed DBMS Relational DBMS support hashing & B+ tree indexing Modern corporations have data, control, & application New DBMSs & DB applications need more distributed globally sophisticated search structures Multiple databases at geographically dispersed Hashing with variable size hash table or multiple keys locations need to cooperate to answer queries with Indexes for spatial, multidimensional data (common in distributed data multimedia DBSs, Data warehousing, OLAP, …) Concurrent transaction processing and recovery are still major issues W. Zhang Introduction 19 W. Zhang Introduction 20Parallel DBMS XML & Semistructured DBMS Both centralized and distributed databases may use Data in RDB, OODB, & ORDB are structured (with multiple processors to evaluate queries rigid schemas) Parallel system architecture requires new algorithms Data on the Web (and other applications) are for query evaluation and optimization semistructured Performance concerns include HTML, XML, Text, … Ability to scale up Need new concepts and techniques Ability to speed up Data model, query language Query processing & optimization Storage management Update, transaction processing, CC, … W. Zhang Introduction 21 W. Zhang Introduction 22Data Warehousing & OLAP Data Mining Corporations need to put all available data into use Data contains important patterns useful for making when making vital business decisions sound business decisions Need to have technology to integrate data from all Databases need tools to discover knowledge embedded sources, and keep them up to date in data Need advanced tools to analyze, summarize, and view Associations data in various ways Clusters Issues: Classifications Data cube model Useful for business trend analysis, fraud detection, OLAP operations diagnosis, market prediction, … Query processing, indexing, views, … W. Zhang Introduction 23 W. Zhang Introduction 24
Topics Topics (cont.) Relational algebra and calculus Distributed Database Systems Storage & File Management Database design Disk manager, buffer manager, Query processing & optimization Indexing, hashing Concurrency control & recovery Query Evaluation & Optimization Parallel Database systems Access methods, selection, joins, etc. XML databases Query optimization methods Data Warehousing and OLAP Transaction Processing Data Mining, … Crash Recovery Concurrency Control W. Zhang Introduction 25 W. Zhang Introduction 26