HTML Injection Attacks: Impact and Mitigation Strategies
01 intro
1. Overview
We study the internal of DBMSs
Principles of relational DBMSs
Database Management Systems • Emphasize on query & transaction processing
techniques
Advanced database systems & applications
• OODBMS, XML database, data warehousing, OLAP,
data mining
Prof. Weining Zhang
Course work includes
Dept. of Computer Science
Homework, 2 midterm exams, no final exam
University of Texas at San Antonio
Programming assignments in Java
W. Zhang Introduction 2
Teaching Staff Communication
Instructor: Prof. Weining Zhang Web page:
Office: SB 4.01.19
http://www.cs.utsa.edu/~wzhang/cs5443/home
Contains everything about the course: syllabus,
Phone: 458-5557
announcement, assignments, project, lecture notes, etc.
Email: wzhang@cs.utsa.edu
Office hour: MW 5:00 – 6:00 pm
You should check course web pages regularly.
T 4:00-5:00 pm Mailing list: 5443@cs.utsa.edu
and by appointment Include your CS email address; you may need to forward
emails to your regular email address
W. Zhang Introduction 3 W. Zhang Introduction 4
Textbooks Other Textbooks
Required textbook: Fundamentals of Database Systems, 5th ed., by
Database Management systems, 3rd ed., by Elmasri & Navathe
Ramakrishnan & Gehrke Other database books in the Main Library
Recommended textbook:
Principles of Distributed Database Systems, by M.
Ozsu & P. Valduriez
Database System: the Complete Book, by Garcia-
Molina, Ullman & Widom
Database system concepts, 5th ed., by Silberschatz,
Korth & Sudarshan
W. Zhang Introduction 5 W. Zhang Introduction 6
2. Prerequisite Grading
CS3743 or equivalent, or extensive experience with Programming assignments 20%
database & DB application
Homework 20%
Strong Java programming skills
Midterm I 25%
Data structures, algorithms, OO programming, etc.
Midterm II 25%
Mathematics including logic, sets, algebra, …
Intangibles 10%
W. Zhang Introduction 7 W. Zhang Introduction 8
Programming Assignments Introduction to Database Systems
Implement several components of a simple DBMS A database system consists of
called Minibase (Java version), such as, Database management system: the software
Buffer Manager Databases: the data
Heap File A DBMS needs to provide
Hash-based Index persistent data storage
Relational operators
declarative query language for efficient data retrieval
Query processing
shared access to data by different applications
Work in groups of 2 data security
Programming in Java, on Linux or Windows, data integrity …
recommend using Eclipse IDE
W. Zhang Introduction 9 W. Zhang Introduction 10
An RDBMS Architecture Storage Management
Data is stored on disks, and processed in the main
Web forms Application front end SQL interface
memory
SQL Commands Since disk I/Os are costly, search structures, such as,
Query indexes, must be used to achieve efficient data access
Parser Optimizer
Evaluation
Concurrency Plan Executor Operator Evaluator Engine
DBMS components that manage different types of
Control storage include
Xction Man File & Access Methods
Recovery Disk Manager: manages pages on disk drive
Manager Buffer Manager: manages pages in main memory buffer
Lock Man Buffer Manager & Disk Manager DBMS
Index files Sys. catalog
Data files
W. Zhang Introduction 11 W. Zhang Introduction 12
3. File Organization Query Processing
Data records are logically organized in files and DBMS evaluates declarative queries by executing an
physically stored on disk pages optimal query plan that is expressed using relational
File organization must consider the format and size of algebraic operations.
data records A DBMS must evaluate algebraic operations
In addition to simple files of raw data, DBMS also efficiently.
maintains search structures, such as, The algorithms and the costs of relational algebraic
Ordering operations, such as, selection and join, depend
Hashing critically on
Indexing types of query condition
to reduce access costs specifics of file organizations
W. Zhang Introduction 13 W. Zhang Introduction 14
Query Optimization Transaction Processing
A transaction models the execution of a database
For easy of use, query languages are declarative.
application, which typically updates the data in
The system must figure out an efficient evaluation
databases.
plan
Transaction management must deal with concurrent
The goal is to answer a query with as few disk I/O
transactions and possible system failures.
as possible
The system uses statistics of the data & heuristics
to decide how to process the query
W. Zhang Introduction 15 W. Zhang Introduction 16
Recovery Concurrency Control
The recovery manager protects data integrity in case of Concurrent execution of application programs is
system crash. essential for good DBMS performance.
The system guarantees that either all operations of a Need to keep CPU busy while performing I/O operations
(frequent & relatively slow).
transaction or none of them are performed, and updates
made by completed transactions are persistent. Interleaving actions of different user programs can lead
to inconsistency: e.g., check is cleared while account
balance is being computed.
Concurrency control subsystem ensures such problems
don’t arise: users can pretend they are using a single-
user system.
W. Zhang Introduction 17 W. Zhang Introduction 18
4. Advanced Hashing & Indexing Distributed DBMS
Relational DBMS support hashing & B+ tree indexing Modern corporations have data, control, & application
New DBMSs & DB applications need more distributed globally
sophisticated search structures Multiple databases at geographically dispersed
Hashing with variable size hash table or multiple keys locations need to cooperate to answer queries with
Indexes for spatial, multidimensional data (common in distributed data
multimedia DBSs, Data warehousing, OLAP, …) Concurrent transaction processing and recovery are still
major issues
W. Zhang Introduction 19 W. Zhang Introduction 20
Parallel DBMS XML & Semistructured DBMS
Both centralized and distributed databases may use Data in RDB, OODB, & ORDB are structured (with
multiple processors to evaluate queries rigid schemas)
Parallel system architecture requires new algorithms Data on the Web (and other applications) are
for query evaluation and optimization semistructured
Performance concerns include HTML, XML, Text, …
Ability to scale up Need new concepts and techniques
Ability to speed up Data model, query language
Query processing & optimization
Storage management
Update, transaction processing, CC, …
W. Zhang Introduction 21 W. Zhang Introduction 22
Data Warehousing & OLAP Data Mining
Corporations need to put all available data into use Data contains important patterns useful for making
when making vital business decisions sound business decisions
Need to have technology to integrate data from all Databases need tools to discover knowledge embedded
sources, and keep them up to date in data
Need advanced tools to analyze, summarize, and view Associations
data in various ways Clusters
Issues: Classifications
Data cube model Useful for business trend analysis, fraud detection,
OLAP operations diagnosis, market prediction, …
Query processing, indexing, views, …
W. Zhang Introduction 23 W. Zhang Introduction 24
5. Topics Topics (cont.)
Relational algebra and calculus Distributed Database Systems
Storage & File Management Database design
Disk manager, buffer manager, Query processing & optimization
Indexing, hashing Concurrency control & recovery
Query Evaluation & Optimization Parallel Database systems
Access methods, selection, joins, etc. XML databases
Query optimization methods
Data Warehousing and OLAP
Transaction Processing
Data Mining, …
Crash Recovery
Concurrency Control
W. Zhang Introduction 25 W. Zhang Introduction 26