Your SlideShare is downloading. ×
ch1.ppt
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

ch1.ppt

7,315
views

Published on


3 Comments
1 Like
Statistics
Notes
No Downloads
Views
Total Views
7,315
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
299
Comments
3
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • 3
  • 6
  • 7
  • 5
  • 22
  • 21
  • Transcript

    • 1. Chapter 1: Introduction
      • Purpose of Database Systems
      • View of Data
      • Data Models
      • Data Definition Language
      • Data Manipulation Language
      • Transaction Management
      • Storage Management
      • Database Administrator
      • Database Users
      • Overall System Structure
    • 2. DATABASE DEFINITION
      • A database represents some aspect of the real world, sometimes called the mini-world or the Universe of Discourse (UoD).
      • A database is a logically coherent collection of data with some inherit meaning.
      • A random assortment of data cannot correctly be referred to as a database.
      • A database is designed, built, and populated with data for a specific purpose. It has an intended group of users and some preconceived applications in which these users are interested
    • 3. What Is a DBMS?
      • A very large, integrated collection of data.
      • Models real-world enterprise.
        • Entities (e.g., students, courses)
        • Relationships (e.g., Madonna is taking CS564)
      • A Database Management System (DBMS) is a software package designed to store and manage databases.
    • 4. Database Management System (DBMS)
      • Collection of interrelated data
      • Set of programs to access the data
      • DBMS contains information about a particular enterprise
      • DBMS provides an environment that is both convenient and efficient to use.
      • Database Applications:
        • Banking: all transactions
        • Airlines: reservations, schedules
        • Universities: registration, grades
        • Sales: customers, products, purchases
        • Manufacturing: production, inventory, orders, supply chain
        • Human resources: employee records, salaries, tax deductions
      • Databases touch all aspects of our lives
    • 5. Purpose of Database System
      • In the early days, database applications were built on top of file systems
      • Drawbacks of using file systems to store data:
        • Data redundancy and inconsistency
          • Multiple file formats, duplication of information in different files
        • Difficulty in accessing data
          • Need to write a new program to carry out each new task
        • Data isolation — multiple files and formats
        • Integrity problems
          • Integrity constraints (e.g. account balance > 0) become part of program code
          • Hard to add new constraints or change existing ones
    • 6. Purpose of Database Systems (Cont.)
      • Drawbacks of using file systems (cont.)
        • Atomicity of updates
          • Failures may leave database in an inconsistent state with partial updates carried out
          • E.g. transfer of funds from one account to another should either complete or not happen at all
        • Concurrent access by multiple users
          • Concurrent accessed needed for performance
          • Uncontrolled concurrent accesses can lead to inconsistencies
            • E.g. two people reading a balance and updating it at the same time
        • Security problems
      • Database systems offer solutions to all the above problems
    • 7. Why Use a DBMS?
      • Separation of the Data definition and the Program
      • Abstraction into a simple model
      • Data independence and efficient access.
      • Reduced application development time – ad-hoc queries
      • Data integrity and security.
      • Uniform data administration.
      • Concurrent access, recovery from crashes.
      • Support for multiple different views
    • 8. Why Study Databases??
      • Shift from computation to information
        • at the “low end”: scramble to webspace (a mess!)
        • at the “high end”: scientific applications
      • Datasets increasing in diversity and volume.
        • Digital libraries, interactive video, Human Genome project, EOS project
        • ... need for DBMS exploding
      • DBMS encompasses most of CS
        • OS, languages, theory, “A”I, multimedia, logic
      ?
    • 9. Levels of Abstraction
      • Many views , single conceptual (logical) schema and physical schema .
        • Views describe how users see the data.
        • Conceptual schema defines logical structure
        • Physical schema describes the files and indexes used.
      • Schemas are defined using DDL; data is modified/queried using DML .
      Physical Schema Conceptual Schema View 1 View 2 View 3
    • 10. View of Data An architecture for a database system
    • 11. Levels of Abstraction
      • Physical level describes how a record (e.g., customer) is stored.
      • Logical level: describes data stored in database, and the relationships among the data.
      • type customer = record name : string; street : string; city : integer; end ;
      • View level: application programs hide details of data types. Views can also hide information (e.g., salary) for security purposes.
    • 12. Instances and Schemas
      • Similar to types and variables in programming languages
      • Schema – the logical structure of the database
        • e.g., the database consists of information about a set of customers and accounts and the relationship between them)
        • Analogous to type information of a variable in a program
        • Physical schema : database design at the physical level
        • Logical schema : database design at the logical level
      • Instance – the actual content of the database at a particular point in time
        • Analogous to the value of a variable
    • 13. Example of a database schema
    • 14. Example of a database Instance
    • 15. Example: University Database
      • Conceptual schema:
        • Students(sid: string, name: string, login: string,
        • age: integer, gpa:real)
        • Courses(cid: string, cname:string, credits:integer)
        • Enrolled(sid:string, cid:string, grade:string)
      • Physical schema:
        • Relations stored as unordered files.
        • Index on first column of Students.
      • External Schema (View):
        • Course_info(cid:string,enrollment:integer)
    • 16. Physical (Storage) schema decisions
      • Mapping of entities to files (OS files)
      • Data representation and encoding (compression)
      • Access methods (Direct, Hashing, Indexed)
      • Which indexes to maintain
      • Clustering of records
      • OS/DBMS issues (buffer management)
    • 17. External (View) schema decisions
      • Which entities to present/filter
      • Data representation and encoding (compression)
      • Programming language dependent issues
      • Changes to names, order of attributes
      • Derived (computed) fields and joined tables
    • 18. Sample Database Instance
    • 19. Two views derived from the example database
    • 20. Data Independence
      • Physical Data Independence – the ability to modify the physical schema without changing the application programs
        • Applications depend on the logical schema
        • DBA may change physical level (tuning) without affecting applications
        • The DBMS automatically make the required adjustments, and application programs are not changed (queries may need to be recompiled and optimized…)
      • Logical Data Independence – the ability to modify the logical schema without changing the application programs
        • Applications depend on the logical schema via the Views
        • Can be supported on a limited basis only (if view is not affected)
    • 21. Data Models
      • A collection of tools for describing
        • data
        • data relationships
        • data semantics
        • data constraints
      • Entity-Relationship model
      • Relational model
      • Other models:
        • object-oriented model
        • semi-structured data models (XML)
        • Older models: network model and hierarchical model
    • 22. Data Models
      • A data model is a collection of concepts for describing data.
      • A schema is a description of a particular collection of data, using the a given data model.
      • The relational model of data is the most widely used model today.
        • Main concept: relation , basically a table with rows and columns.
        • Every relation has a schema , which describes the columns, or fields.
    • 23. Entity-Relationship Model
      • Example of schema in the entity-relationship model
    • 24. Entity Relationship Model (Cont.)
      • E-R model of real world
        • Entities (objects)
          • E.g. customers, accounts, bank branch
        • Relationships between entities
          • E.g. Account A-101 is held by customer Johnson
          • Relationship set depositor associates customers with accounts
      • Widely used for database design
        • Database design in E-R model usually converted to design in the relational model (coming up next) which is used for storage and processing
    • 25. Relational Model
      • Example of tabular data in the relational model
      customer- name Customer-id customer- street customer- city account- number Johnson Smith Johnson Jones Smith 192-83-7465 019-28-3746 192-83-7465 321-12-3123 019-28-3746 Alma North Alma Main North Palo Alto Rye Palo Alto Harrison Rye A-101 A-215 A-201 A-217 A-201 Attributes
    • 26. A Sample Relational Database
    • 27. Data Definition Language (DDL)
      • Specification notation for defining the database schema
        • E.g. create table account ( account-number char (10), balance integer )
      • DDL compiler generates a set of tables stored in a data dictionary
      • Data dictionary contains metadata (i.e., data about data)
        • database schema
        • Data storage and definition language
          • language in which the storage structure and access methods used by the database system are specified
          • Usually an extension of the data definition language
    • 28. Data Manipulation Language (DML)
      • Language for accessing and manipulating the data organized by the appropriate data model
        • A declarative DML is also known as query language
      • Two classes of languages
        • Procedural – user specifies what data is required and how to get those data (DML)
        • Nonprocedural – user specifies what data is required without specifying how to get those data (Query language)
      • SQL is the most widely used query language
    • 29. SQL
      • SQL: widely used non-procedural language
        • E.g. find the name of the customer with customer-id 192-83-7465 select customer.customer-name from customer where customer.customer-id = ‘192-83-7465’
        • E.g. find the balances of all accounts held by the customer with customer-id 192-83-7465 select account.balance from depositor , account where depositor.customer-id = ‘192-83-7465’ and depositor.account-number = account.account-number
      • Application programs generally access databases through one of
        • Language extensions to allow embedded SQL
        • Application program interface (e.g. ODBC/JDBC) which allow SQL queries to be sent to a database
    • 30. Database Users
      • Users are differentiated by the way they expect to interact with the system
      • Application programmers – interact with system through DML calls
      • Sophisticated users – form requests in a database query language
      • Specialized users – write specialized database applications that do not fit into the traditional data processing framework
      • Naïve users – invoke one of the permanent application programs that have been written previously
        • E.g. people accessing database over the web, bank tellers, clerical staff
    • 31. Database Administrator
      • Coordinates all the activities of the database system; the database administrator has a good understanding of the enterprise’s information resources and needs.
      • Database administrator's duties include:
        • Schema definition
        • Storage structure and access method definition
        • Schema and physical organization modification
        • Granting user authority to access the database
        • Specifying integrity constraints
        • Acting as liaison with users
        • Monitoring performance and responding to changes in requirements
    • 32. Structure of a DBMS
      • A typical DBMS has a layered architecture.
      • The figure does not show the concurrency control and recovery components.
      • This is one of several possible architectures; each system has its own variations.
      These layers must consider concurrency control and recovery Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB
    • 33. THE TRANSACTION CONCEPT Transfer money from: account A to: account B SUBTRACT 100 FROM A ADD 100 TO B End Transaction Abort, Commit, Rollback Begin Transaction CRASH!
    • 34. The concurrency concept AGENT 1 AGENT 2 READ # SEATS # SEATS = SEATS –1 WRITE # SEATS READ # SEATS # SEATS = #SEATS – 1 WRITE # SEATS LOST UPDATE
    • 35. Overall System Structure
    • 36. Storage Management
      • Storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system.
      • The storage manager is responsible to the following tasks:
        • interaction with the file manager
        • efficient storing, retrieving and updating of data
    • 37. Concurrency Control
      • Concurrent execution of user programs is essential for good DBMS performance.
        • Because disk accesses are frequent, and relatively slow, it is important to keep the cpu humming by working on several user programs concurrently.
      • Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is being computed.
      • DBMS ensures such problems don’t arise: users can pretend they are using a single-user system.
    • 38. Transaction Management
      • A transaction is a collection of operations that performs a single logical function in a database application
      • Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures.
      • Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.
    • 39. Transaction: An Execution of a DB Program
      • Key concept is transaction , which is an atomic sequence of database actions (reads/writes).
      • Each transaction, executed completely, must leave the DB in a consistent state if DB is consistent when the transaction begins.
        • Users can specify some simple integrity constraints on the data, and the DBMS will enforce these constraints.
        • Beyond this, the DBMS does not really understand the semantics of the data. (e.g., it does not understand how the interest on a bank account is computed).
        • Thus, ensuring that a transaction (run alone) preserves consistency is ultimately the user’s responsibility!
    • 40. Scheduling Concurrent Transactions
      • DBMS ensures that execution of {T1, ... , Tn} is equivalent to some serial execution T1’ ... Tn’.
        • Before reading/writing an object, a transaction requests a lock on the object, and waits till the DBMS gives it the lock. All locks are released at the end of the transaction. ( Strict 2PL locking protocol.)
        • Idea: If an action of Ti (say, writing X) affects Tj (which perhaps reads X), one of them, say Ti, will obtain the lock on X first and Tj is forced to wait until Ti completes; this effectively orders the transactions.
        • What if Tj already has a lock on Y and Ti later requests a lock on Y? ( Deadlock !) Ti or Tj is aborted and restarted!
    • 41. The importance of the Data Dictionary
      • Contains all definitions: DDL (logical schema), Views definition, Physical schema definitions including Indexing and clustering information, Integrity constraints, security rules, stored procedures (SQL)
      • Essential for query parsing and optimization
      • Contains other important documentation and programs (regulations, standards, codes, etc.)
      • There are companies who sell Data Dictionary tools as a separate product!
    • 42. DATABASE UTILITIES
      • Logical Design and Data-Dictionary Tools
      • Loading
      • Physical Design and File reorganization
      • Backup / Restore / Recovery
      • Performance Monitoring and Tuning
    • 43. Application Architectures
      • Two-tier architecture : E.g. client programs using ODBC/JDBC to communicate with a database
      • Three-tier architecture : E.g. web-based applications, and applications built using “middleware”
    • 44. DBMS TYPES
      • Hierarchical – Pre-historic – IMS
      • Network – Historic –IDMS, ADABAS, lead to Object- Oriented
      • RELATIONAL- current – 95% of the market – Oracle, Informix, SQL/ Server, Progress, IBM DB2, etc.
      • Object- ORIENTED Current – lot of HuHa but very narrow market, mainly CAD AND Engineering – Objectivity, Versant, Jasmine
      • Object – Relational - Current / Future – SQL3, Informix UDO , Oracle-9, IBM DB2.
    • 45. Database systens: a brief time line EVENT: PRE-1960S 1945- magnetic tapes developed (the first medium to allow searching). 1957- First commercial computer installed. 1959- McGee proposed the notion of generalized access to electronically stored data. THE 60s 1961- The first generalized DBMS-GEs Integrated Data Store (IDS) designed by Bachman. THE 70s – database technology experienced rapid growth. 1970- The relational model is developed by Ted Codd, an IBM research fellow. 1971- CODASYL Database Task Group Report. 1975- ACM Special Interest Group on Management of data organized first SIGMOD international conference. 1976- Entity- relationship (ER)model introduced by chen. THE 80s- DBMSs developed for personal computers (DBASE, PARADOX, etc). 1983- ANSI/SPARC survey revealed>100 relational systems had been implemented by the beginning of the 80s.
    • 46. Database systens: a brief time line EVENT: 1985- Preliminary SQL standard published. Business world influenced by “Fourth Generation Languages”. * Trends in the ‘80s: extendable database systems:object- oriented DBMSs, client server architecture for distributed database. The ’90s * Demand for extending DBMS capabilities to meet new applications. * Emergence of commercial object- oriented DBMSs. * Demand for developed applications utilizing data from a variety of sources. * Demand for exploiting massively parallel processors (MPPs). * Total victory by the relational model. * SQL 3 * Object relational systems.
    • 47. Databases make these folks happy ...
      • End users and DBMS vendors
      • DB application programmers
        • E.g. smart webmasters
      • Database administrator (DBA)
        • Designs logical /physical schemas
        • Handles security and authorization
        • Data availability, crash recovery
        • Database tuning as needs evolve
      Must understand how a DBMS works!
    • 48. Summary
      • DBMS used to maintain, query large datasets.
      • Benefits include recovery from system crashes, concurrent access, quick application development, data integrity and security.
      • Levels of abstraction give data independence.
      • A DBMS typically has a layered architecture.
      • DBAs hold responsible jobs and are well-paid !
      • DBMS R&D is one of the broadest, most exciting areas in CS.