Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

on

  • 5,434 views

This lecture is part of an Introduction to Databases course given at the Vrije Universiteit Brussel.

This lecture is part of an Introduction to Databases course given at the Vrije Universiteit Brussel.

Statistics

Views

Total Views
5,434
Views on SlideShare
4,440
Embed Views
994

Actions

Likes
5
Downloads
307
Comments
0

9 Embeds 994

http://www.beatsigner.com 901
http://wise.vub.ac.be 38
http://www.slideshare.net 22
http://www.sigtec.org 11
http://jujo00obo2o234ungd3t8qjfcjrs3o6k-a-sites-opensocial.googleusercontent.com 7
http://beatsigner.com 7
http://moodle.tamk.fi 4
http://www.inf.ethz.ch 2
http://tabula.tamk.fi 2
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR) Presentation Transcript

  • 1. Introduction to Databases Introduction and Conceptual Modelling Prof. Beat Signer Department of Computer Science Vrije Universiteit Brussel http://www.beatsigner.com 2 December 2005
  • 2. Course Organisation  Prof. Beat Signer Vrije Universiteit Brussel 10 G 731d +32 2 629 12 39 bsigner@vub.ac.be  Reinout Roels Vrije Universiteit Brussel 10 F 730 +32 2 629 37 53 rroels@vub.ac.be February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 2
  • 3. Course Information  Course book   Database System Concepts (Sixth Edition), Abraham Silberschatz, Henry Korth and S. Sudarshan, McGraw-Hill, 2010 additional information from the book is available online - http://highered.mcgraw-hill.com/sites/0073523321  Course information (lecture slides, exercises, …) available on PointCarré  http://pointcarre.vub.ac.be/index.php?application=web lcms&go=course_viewer&course=2321 February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 3
  • 4. Exercises  Course content is going to be applied in the exercise sessions  Weekly exercise sessions  starting on February 20 - group 1: computer room E.1.4, Thursday 11:00-13:00 - group 2: computer room E.1.7, Thursday 14:00-16:00  assistants: Reinout Roels  Additional content may be covered in exercise sessions  exam covers content of lectures and exercises February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 4
  • 5. Exam  Written closed book exam in Dutch / English  covers content of lectures, specific book chapters and exercises February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 5
  • 6. Course Overview 1. Introduction   overview conceptual modelling and ER model 2. Extended ER Model and other Modelling Languages 3. Relational Model and Relational Algebra 4. Relational Database Design   reduction functional dependencies and normalisation 5. Structured Query Language (SQL) 6. Advanced SQL February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 6
  • 7. Course Overview … 7. DBMS Architectures and Features    DBMS components client-server architecture parallelisation and distribution 8. Storage Management 9. Access Methods  indexing and hashing 10.Query Processing and Optimisation 11.Transaction Management   transactions concurrency and recovery February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 7
  • 8. Course Overview … 12.Object and Object-Relational Databases 13.Future Trends and Review February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 8
  • 9. Databases in Action  Human resources   course registration, student grades, employee records, salary information, tax information, ... e.g. PointCarré - course registration  Online shops   product information, customer data, order data, ... e.g. Amazon - hundreds of millions of customers - more than 50 terrabytes of data February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 9
  • 10. Databases in Action ...  Reservation systems   book flights from multiple airlines, hotel rooms etc. e.g. Amadeus systems - Global Distribution System (GDS) founded by Lufthansa, Air France and other partners  Banking and trading   customer data, account information, transactions, ... e.g. London Stock Exchange - almost 1 million trades per day February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 10
  • 11. Databases in Action ...  Libraries   index for traditional paperbased libraries as well as digital libraries e.g. Open Library project - over 23 million indexed books [openlibrary.org]  Digital archives   persistently store various types of digital media e.g. Internet Archive project - access to more than 150 billion archived web pages [archive.org] February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 11
  • 12. Databases in Action ...  Scientific databases   sensor data, classifications (e.g. human genome) as well as data from simulations e.g. LHC Computing Grid - LHC experiments at CERN - 15 petabytes of data per year  Geographic Information Systems (GIS)   February 14, 2014 store raster (bitmap) or vector data representing real world objects geospatial query language Beat Signer - Department of Computer Science - bsigner@vub.ac.be 12
  • 13. Databases in Action ...  Embedded databases in cars, airplanes etc.   manage configurations and store sensor data e.g. db4o object database used in BMW's Car IT system  Many everyday devices contain databases   February 14, 2014 TVs, washing machines, mobile phones, ... e.g. Android phones with SQLite database Beat Signer - Department of Computer Science - bsigner@vub.ac.be 13
  • 14. Databases in Action ...  Databases in the WISE research lab (VUB)     database-driven cross-media publishing database extensions for hypermedia services personal information management data visualisation - e.g. ArtVis   February 14, 2014 human-information interaction paper-digital interfaces Beat Signer - Department of Computer Science - bsigner@vub.ac.be 14
  • 15. Databases in Action ...  Databases touch all aspects of our daily life!  Numerous large database software companies  e.g. Oracle was the 3rd largest software company in 2011 February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 15
  • 16. Basic Terminology  Database   collection of logically related data database schema describes the database design - format and relationships between stored data (often rather static)  collection of data stored in a database at a given time is called an instance of the database  Database Management Systems (DBMS)  tools (programs) to efficiently store, maintain and retrieve information from a database - support of create, read, update and delete data (CRUD operations) - data definition language (DDL) to define the database schema - data manipulation language (DML) to query and update the data • often declarative fourth generation languages (4GLs) such as SQL - data access control, transactions and concurrency control February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 16
  • 17. File-Processing System  Why should we not just use multiple files in a file system to store our data?  There are various disadvantages of such an approach  data redundancy and inconsistency - different file formats over time - duplication of information in different files  limited data access - we have to write new programs to carry out new tasks - data cannot be retrieved in a convenient and efficient manner  data isolation - data may be distributed over different files without a common format  integrity - integrity constraints (e.g. balance > 0) are hidden in the program code and not explicitly stated and checked February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 17
  • 18. File-Processing System …  missing atomic operations - system failures and crashes may leave the data in an inconsistent state (e.g. only parts of a operation have been carried out) - example: transfer of money from one account to another account  concurrent update anomalies - concurrent updates may leave the data in an inconsistent state - example: two programs simultaneously removing money from a single account  limited security control - difficult to give a user only access to parts of a file  DBMSs offer solutions to all these problems  concepts and algorithms to solve the problems with file-processing systems February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 18
  • 19. General Database Architecture Programmers Users DB Admins Application Programs Queries Database Schema DML Preprocessor Query Compiler DDL Compiler Program Object Code Authorisation Control Catalogue Manager Integrity Checker Command Processor Query Optimiser Transaction Manager Scheduler Buffer Manager Recovery Manager DBMS Data Manager Database Manager Access Methods System Buffers February 14, 2014 File Manager Data, Indices and System Catalogue Based on 'Components of a DBMS', Database Systems, T. Connolly and C. Begg, Addison-Wesley 2010 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 19
  • 20. Data Abstraction  Physical level  View1 physical schema describes how the data is stored (complex low-level data structures) Logical Level Physical Level  Logical level  Viewn logical schema describes what data is stored - simple structures: attribute names, data types and relationships between data  implementation of simple structures might be based on complex physical-level structures but the user of the logical level should not be aware of that  physical data independence  View level  subschemas provide only access to parts of the database - reduce complexity and introduce security  multiple views might be defined for a single database February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 20
  • 21. Duality of the Database Schema Application Concepts Application World describes Database Schema describes Computer World February 14, 2014 Database Concepts Beat Signer - Department of Computer Science - bsigner@vub.ac.be 21
  • 22. Data Models and History of DBMSs Punched Card Magnetic Tape  Punched cards used since 1725  Hollerith cards later used by IBM for data processing  1950s: Data processing with magnetic tapes as storage    only sequential access to data reading from one or multiple tapes and writing to a new tape sometimes combined with input from punched cards etc. February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 22
  • 23. Data Models and History of DBMSs ... IBM 350 Disk Storage Unit Hard Disk  1960s: Widespread use of hard disks   direct access (random access) to data opened possibilities for new Navigational DBMSs - IBM's IMS (1968), hierarchical database - Integrated Data Store (IDS), network database February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 23
  • 24. Data Models and History of DBMSs ...  A data model is a collection of conceptual tools  describes data, data relationships, data semantics and constraints  Hierarchical model   data organised in a tree structure used in early mainframe DBMS - e.g. IBM's Information Management System (IMS)  XML documents also described by a hierarchical model  Network model   generalised graph structure two main constructs - records contain fields and sets define relationships between records  navigational operations - follow the relationship from one record to another record February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 24
  • 25. Data Models and History of DBMSs ...  Relational model   collection of tables (relations) containing records described in a paper by Edgar F. Codd in 1970  1970s: Relational DBMS  IBM's System R (1974) "based" on Codd's paper; SQL added later  Entity-Relationship (ER) model   representation of basic objects (entities) and their relationships widely used in conceptual database design  Object-based data model  introduces object identity, encapsulation and methods  1980s: Object Databases  initial work on object databases February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 25
  • 26. Data Models and History of DBMSs ...  1990s: Web Interfaces to Databases  databases deployed much more extensively  Semistructured data model    no clear separation between data and the schema ("self-describing" data) individual data items of the same type may have different attributes XML is widely used to represent semistructured data  2000s: XML and XQuery  relational databases often still form the core  Later 2000s: Extremely Large-scale distributed DBMS   BigTable or Hadoop and Hbase "NoSQL databases" February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 26
  • 27. Database Design  Conceptual Design  define an abstract conceptual application model containing the main domain concepts - interact with domain experts to get the requirements   describe the entities (with attributes) and their relationships specify the functional requirements (operations) - ensure that operations can be realised based on the conceptual model  Database implementation based on conceptual model  logical design phase - mapping of the conceptual schema to the implementation data model • e.g. reduction from the ER model to the relational data model - define the logical database schema  physical design phase - define the physical database layout based on the logical database schema February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 27
  • 28. Database Design ...  Two major database design pitfalls have to be avoided  we should avoid any redundancy where information is repeated at multiple places since this might lead to inconsistent data - e.g. a lecture management system where a student's name is stored for each lecture they are attending instead of storing it in a separate student entity  a database design may be incomplete and not enable the representation of certain aspects of the application domain - e.g. in a shopping application where the customer information is stored as part of an order we cannot enter new customer data without having an order  There is often more than one "good design"   e.g. when do we model something as a relationship and when as a separate entity? modelling is a challenging task that requires a combination of engineering skills and "good taste" February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 28
  • 29. Entity-Relationship (ER) Model  Conceptual model based on a set of entities and relationships  An entity is a "thing" or "object" that can be distinguished from other objects  A relationship describes an association between multiple entities Peter Chen  "Introduced" and formalised by Peter Chen  P. Chen, The Entity-Relationship Model - Toward a Unified View of Data, ACM Transactions on Database Systems 1 (1), March 1976 February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 29
  • 30. Entities  An entity represents a distinguishable object  e.g. specific person, car or company  An entity is described by a number of attributes  has to be uniquely identifiable by its attributes (ovals)  An entity set is a set of entities with the same type   the extension of the entity set (rectangle) are its entities an entity may belong to multiple entity sets Employees name note that we will use a slightly different notation than in the book! February 14, 2014 Beat Signer 1576 id 1234 Lode Hoste 3212 William Van Woensel Employees Beat Signer - Department of Computer Science - bsigner@vub.ac.be 30
  • 31. Attributes  The set of permitted attribute values is called the domain or value set   entity instances can be described by a set of (name,value) pairs e.g. {(id, 1576),(name, Lode Hoste)}  The ER model supports the following attribute types   simple attributes composite attributes - hierarchy of sub-attributes   single-valued attributes multivalued attributes - optional lower and upper bounds  derived attributes - computed via relationships or other attribute values February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 31
  • 32. Attributes ...  A multivalued attribute is represented by a double ellipse  Derived attributes are indicated by dashed ellipses  address is an example of composite attribute phone age #offices birthday Employees id name 0..1 0..* Offices address street February 14, 2014 LocatedAt city Beat Signer - Department of Computer Science - bsigner@vub.ac.be 32
  • 33. Keys  An entity's attribute values must uniquely identify the entity  A subset of attributes that uniquely identify an entity is called superkey  A minimal superkey without any unnecessary attributes is called a candidate key  The primary key is one of the candidate keys chosen by the database designer for unique entity identification   in the ER model, the primary key is highlighted by the set of underlined attributes the value of a primary key should change very rarely February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 33
  • 34. Relationships  A relationship is an association between multiple entities  A relationship set (diamond) is a set of relationships of the same type  e.g. LocatedAt 1234 Beat Signer 10F718 1576 Lode Hoste 10F705 3212 William Van Woensel 10F721 Employees 10F703 Offices Employees id February 14, 2014 Offices LocatedAt name name address Beat Signer - Department of Computer Science - bsigner@vub.ac.be 34
  • 35. Relationships ...  We can have binary or n-ary relationship sets  {(e1, e2,..., en) | e1 E1, e2E2,..., enEn}  Each relationship instance in an ER schema represents an association between the involved entities  The role defines an entity's function in a relationship  has to be explicitly defined if the same entity set participates more than once in a relationship set (recursive relationship)  A relationship may contain descriptive attributes  A relationship instance must be uniquely identifiable by its entities (without any descriptive attributes)  i.e. a relationship set cannot contain two relationship entities that only differ in their descriptive attributes February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 35
  • 36. Relationship with Roles and Attributes duration WorksFor boss employee Employees id February 14, 2014 Offices LocatedAt name name address Beat Signer - Department of Computer Science - bsigner@vub.ac.be 36
  • 37. Example of a 3-ary Relationship from to Durations Employees id February 14, 2014 Companies WorksFor name name address Beat Signer - Department of Computer Science - bsigner@vub.ac.be 37
  • 38. Cardinality Constraints  A relationship can be one-to-one, one-to-many, many-to-one or many-to-many  An arrow indicates a to-one relationship  cardinality constraints may also be expressed by numbers - e.g. 0..*, 1..*, 0..1, 1..1, 2..5  a double line or 1..* indicates a total participation constraint Men Women Teachers Teaches Courses Filmstars February 14, 2014 MarriedTo StarsIn Films Beat Signer - Department of Computer Science - bsigner@vub.ac.be 38
  • 39. Weak Entity Sets  An entity set with a primary key is called a strong entity set  A weak entity set (double rectangle) does not have enough attributes to form a primary key Cinemas id February 14, 2014 Seats Offers name number colour Beat Signer - Department of Computer Science - bsigner@vub.ac.be 39
  • 40. Weak Entity Sets ...  A weak entity set must be associated with an identifying entity set via an identifying relationship (double diamond)  a weak entity set is existence dependent on an identifying entity set - can also participate in other non-identifying relationships   a weak entity set must relate to the identifying entity set via a total participation constraint and each weak entity instance can only be related to one identifying entity instance a discriminator or partial key (underlined dashed attributes) uniquely identifies a weak entity relative to a strong entity  In some cases a weak entity may also be expressed as a multivalued composite attribute February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 40
  • 41. ER Design Issues  When do we model something as an attribute and when as an entity set?  there is no general answer and the choice depends on the specific application domain to be modelled  When do we model something as a relationship set and when as an entity set?  a relationship set often corresponds to an action between entities  In general we should try to avoid higher level n-ary relationship sets (3-ary relationship sets should be the maximum and even these should be used carefully)  There should be no redundant attributes February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 41
  • 42. Homework  Study the following two chapters of the Database System Concepts book  chapter 1 - Introduction  chapter 7 - sections 7.1-7.5 and 7.7 - Database Design and the ER Model February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 42
  • 43. Exercise 1  Conceptual modelling  Entity-Relationship (ER) model February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 43
  • 44. References  A. Silberschatz, H. Korth and S. Sudarshan, Database System Concepts (Sixth Edition), McGraw-Hill, 2010  P. Chen, The Entity-Relationship Model - Toward a Unified View of Data, ACM Transactions on Database Systems 1 (1), March 1976  WISE Lab  http://wise.vub.ac.be  ArtVis Project  http://wise.vub.ac.be/content/artvis-exploringinformation-through-advanced-visualisation-techniques February 14, 2014 Beat Signer - Department of Computer Science - bsigner@vub.ac.be 44
  • 45. Next Lecture Extended ER Model and other Modelling Languages 2 December 2005