Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

6,555 views

Published on

This lecture is part of an Introduction to Databases course given at the Vrije Universiteit Brussel.

Published in: Education, Technology
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
6,555
On SlideShare
0
From Embeds
0
Number of Embeds
1,754
Actions
Shares
0
Downloads
411
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

  1. 1. 2 December 2005 Introduction to Databases Introduction and Conceptual Modelling Prof. Beat Signer Department of Computer Science Vrije Universiteit Brussel http://www.beatsigner.com
  2. 2. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 2February 17, 2017 Course Organisation  Prof. Beat Signer Vrije Universiteit Brussel G.10.731d +32 2 629 12 39 bsigner@vub.ac.be  Reinout Roels Vrije Universiteit Brussel G.10.730f +32 2 629 37 53 rroels@vub.ac.be
  3. 3. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 3February 17, 2017 Course Information  Course book  Database System Concepts (Sixth Edition), Abraham Silberschatz, Henry Korth and S. Sudarshan, McGraw-Hill, 2010  additional information from the book is available online - http://highered.mcgraw-hill.com/sites/0073523321  Course information (lecture slides, exercises, …) available on PointCarré  http://pointcarre.vub.ac.be/index.php?application=weblcms&go=course_v iewer&course=2321
  4. 4. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 4February 17, 2017 Exercises  Course content is going to be applied in the exercise sessions  Weekly exercise sessions  starting on February 19 - group 1: computer room E.1.5, Thursday 11:00-13:00 - group 2: computer room E.1.7, Thursday 14:00-16:00 - group 3: computer room TBA, Wednesday 10:00-12:00  assistant: Reinout Roels  Additional content may be covered in exercise sessions  exam covers content of lectures and exercises
  5. 5. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 5February 17, 2017 Exam  Written closed book exam in Dutch / English  covers content of lectures and exercises
  6. 6. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 6February 17, 2017 Course Overview 1. Introduction  overview  conceptual modelling and ER model 2. Extended ER Model and other Modelling Languages 3. Relational Model and Relational Algebra 4. Relational Database Design  reduction  functional dependencies and normalisation 5. Structured Query Language (SQL) 6. Advanced SQL
  7. 7. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 7February 17, 2017 Course Overview … 7. DBMS Architectures and Features  DBMS components  client-server architecture  parallelisation and distribution 8. Storage Management 9. Access Methods  indexing and hashing 10.Query Processing and Optimisation 11.Transaction Management  transactions  concurrency and recovery
  8. 8. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 8February 17, 2017 Course Overview … 12.NoSQL Databases 13.Current Trends and Review
  9. 9. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 9February 17, 2017 Databases in Action  Online shops  product information, customer data, order data, ...  e.g. Amazon - hundreds of millions of customers - more than 50 terrabytes of data  Human resources  course registration, student grades, employee records, salary information, tax information, ...  e.g. PointCarré - course registration
  10. 10. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 10February 17, 2017 Databases in Action ...  Banking and trading  customer data, account information, transactions, ...  e.g. London Stock Exchange - almost 1 million trades per day  Reservation systems  book flights from multiple airlines, hotel rooms etc.  e.g. Amadeus systems - Global Distribution System (GDS) founded by Lufthansa, Air France and other partners
  11. 11. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 11February 17, 2017 Databases in Action ...  Digital archives  persistently store various types of digital media  e.g. Internet Archive project - access to more than 450 billion archived web pages (http://archive.org)  Libraries  index for traditional paper- based libraries as well as digital libraries  e.g. Open Library project - over 23 million indexed books (http://openlibrary.org)
  12. 12. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 12February 17, 2017 Databases in Action ...  Geographic Information Systems (GIS)  store raster (bitmap) or vector data representing real world objects  geospatial query language  Scientific databases  sensor data, classifications (e.g. human genome) as well as data from simulations  e.g. LHC Computing Grid - LHC experiments at CERN - 15 petabytes of data per year
  13. 13. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 13February 17, 2017 Databases in Action ...  Many everyday devices contain databases  TVs, washing machines, mobile phones, ...  e.g. Android phones with SQLite database  Embedded databases in cars, airplanes etc.  manage configurations and store sensor data  e.g. db4o object database was used in BMW's Car IT system
  14. 14. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 14February 17, 2017 Databases in Action ...  Databases in the WISE research lab (VUB)  database-driven cross-media publishing  database extensions for hypermedia services  personal information management (PIM)  data visualisation - e.g. ArtVis  human-information interaction  paper-digital interfaces
  15. 15. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 15February 17, 2017 Databases in Action ...  Databases touch all aspects of our daily life!  Numerous large database software companies  e.g. Oracle is the 2nd largest software company in 2017  Databases form an important part of product lines of Microsoft (SQL Server), IBM (DB2), …
  16. 16. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 16February 17, 2017 Basic Terminology  Database  collection of logically related data  database schema describes the database design (blueprint) - format and relationships between stored data (often rather static)  collection of data stored in a database at a given time is called an instance of the database  Database Management Systems (DBMS)  tools (programs) to efficiently store, maintain and retrieve information from a database - support of create, read, update and delete data (CRUD operations) - data definition language (DDL) to define the database schema - data manipulation language (DML) to query and update the data • often declarative fourth generation languages (4GLs) such as SQL - data access control, transactions and concurrency control
  17. 17. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 17February 17, 2017 File Processing System  Why should we not just use multiple files in a file system to store our data?  There are various disadvantages of such an approach  data redundancy and inconsistency - different file formats over time - duplication of information in different files  limited data access - we have to write new programs to carry out new tasks - data cannot be retrieved in a convenient and efficient manner  data isolation - data may be distributed over different files without a common format  integrity - integrity constraints (e.g. balance > 0) are hidden in the program code and not explicitly stated and checked
  18. 18. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 18February 17, 2017 File Processing System …  missing atomic operations - system failures and crashes may leave the data in an inconsistent state (e.g. only parts of a operation have been carried out) - example: transfer of money from one account to another account - we later discuss transaction management as a solution  concurrent update anomalies - concurrent updates may leave the data in an inconsistent state - example: two programs simultaneously removing money from a single account - we later discuss scheduling as a solution  limited security control - difficult to give a user only access to parts of a file  DBMSs offer solutions to all these problems  concepts and algorithms to solve the problems with file processing systems
  19. 19. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 19February 17, 2017 Database Management System (DBMS) Access Methods System Buffers Authorisation Control Integrity Checker Command Processor Program Object Code DDL Compiler File Manager Buffer Manager Recovery Manager Scheduler Query Optimiser Transaction Manager Query Compiler Queries Catalogue Manager DML Preprocessor Database Schema Application Programs Data, Indices and System Catalogue Database Manager Data Manager DBMS Programmers Users DB Admins Based on 'Components of a DBMS', Database Systems, T. Connolly and C. Begg, Addison-Wesley 2010
  20. 20. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 20February 17, 2017 Data Abstraction  DBMS provides abstract view of data  hide some details how data is stored  Physical level  physical schema describes how the data is stored (complex low-level data structures)  Logical level  logical schema describes what data is stored - simple structures: attribute names, data types and relationships between data - implementation of simple structures might be based on complex physical-level structures but the user of the logical level should not be aware of that  physical data independence  View level  subschemas provide only access to parts of the database - reduce complexity and introduce security Viewn Physical Level Logical Level View1
  21. 21. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 21February 17, 2017 Duality of the Database Schema Database Schema describes describes Application Concepts Database Concepts Application World Computer World
  22. 22. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 22February 17, 2017 Data Models and History of DBMSs  Punched cards used since 1725 (textile looms)  Hollerith cards later used by IBM for data processing  1950s: Data processing with magnetic tapes as storage  only sequential access to data  reading from one or multiple tapes and writing to a new tape  sometimes combined with input from punched cards etc. Magnetic TapePunched Card
  23. 23. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 23February 17, 2017 Data Models and History of DBMSs ...  1960s: Widespread use of hard disks  direct access (random access) to data  opened possibilities for new Navigational DBMSs - IBM's IMS (1968), hierarchical database - Integrated Data Store (IDS), network database IBM 350 Disk Storage Unit Hard Disk
  24. 24. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 24February 17, 2017 Data Models and History of DBMSs ...  A data model is a collection of conceptual tools  describes data, data relationships, data semantics and constraints  Hierarchical model  data organised in a tree structure  used in early mainframe DBMS - e.g. IBM's Information Management System (IMS)  XML documents also described by a hierarchical model  Network model  generalised graph structure  two main constructs - records contain fields and sets define relationships between records  navigational operations - follow the relationship from one record to another record
  25. 25. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 25February 17, 2017 Data Models and History of DBMSs ...  Relational model  collection of tables (relations) containing records  described in a paper by Edgar F. Codd in 1970  1970s: Relational DBMS  IBM's System R (1974) "based" on Codd's paper; SQL added later  Entity-Relationship (ER) model  representation of basic objects (entities) and their relationships  widely used in conceptual database design  Object-based data model  introduces object identity, encapsulation and methods  1980s: Object Databases  seminal work on object databases
  26. 26. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 26February 17, 2017 Data Models and History of DBMSs ...  1990s: Web Interfaces to Databases  databases deployed much more extensively  Semistructured data model  no clear separation between data and the schema ("self-describing" data)  individual data items of the same type may have different attributes  XML is widely used to represent semistructured data  2000s: XML and XQuery  relational databases often still form the core  Later 2000s: Extremely large-scale distributed DBMS  BigTable or Hadoop and Hbase  "NoSQL databases"
  27. 27. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 27February 17, 2017 Database Design  Conceptual Design  define an abstract conceptual application model containing the main domain concepts - interact with domain experts to get the requirements  describe the entities (with attributes) and their relationships - e.g. via ER model  specify the functional requirements (operations) - ensure that operations can be realised based on the conceptual model  Database implementation based on conceptual model  logical design phase - mapping of the conceptual schema to the implementation data model • e.g. reduction from the ER model to the relational data model - define the logical database schema  physical design phase - define the physical database layout based on the logical database schema
  28. 28. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 28February 17, 2017 Database Design ...  Two major database design pitfalls have to be avoided  we should avoid any redundancy where information is repeated at multiple places since this might lead to inconsistent data - e.g. a lecture management system where a student's name is stored for each lecture they are attending instead of storing it in a separate student entity  a database design may be incomplete and not enable the representation of certain aspects of the application domain - e.g. in a shopping application where the customer information is stored as part of an order we cannot enter new customer data without having an order  There is often more than one "good design"  e.g. when do we model something as a relationship and when as a separate entity?  modelling is a challenging task that requires a combination of engineering skills and "good taste"
  29. 29. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 29February 17, 2017 Entity-Relationship (ER) Model  Conceptual model based on a set of entities and relationships  An entity is a "thing" or "object" that can be distinguished from other objects  A relationship describes an association between multiple entities  Introduced and formalised by Peter Chen  P. Chen, The Entity-Relationship Model - Toward a Unified View of Data, ACM Transactions on Database Systems 1 (1), March 1976 - one of the most cited and influential papers in Computer Science Peter Chen
  30. 30. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 30February 17, 2017 Entities  An entity represents a distinguishable object  e.g. specific person, car or company  An entity is described by a number of attributes  has to be uniquely identifiable by its attributes (ovals)  An entity set is a set of entities with the same type  the extension of the entity set (rectangle) are its entities Beat Signer1234 1576 Lode Hoste 3212 William Van Woensel Employees Employees id name note that we will use a slightly different notation than in the book!
  31. 31. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 31February 17, 2017 Attributes  The set of permitted attribute values is called the domain or value set  entity instances can be described by a set of (name,value) pairs  e.g. {(id, 1576),(name, Lode Hoste)}  The ER model supports the following attribute types  simple attributes  composite attributes - hierarchy of sub-attributes  multivalued attributes - optional lower and upper bounds  derived attributes - computed via relationships or other attribute values
  32. 32. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 32February 17, 2017 Attributes ...  A multivalued attribute is represented by a double ellipse  Derived attributes are indicated by dashed ellipses  address is an example of composite attribute LocatedAt OfficesEmployees id name birthday phone age #offices address street city 0..1 0..*
  33. 33. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 33February 17, 2017 Keys  An entity's attribute values must uniquely identify the entity  A subset of attributes that uniquely identify an entity is called superkey  A minimal superkey without any unnecessary attributes is called a candidate key  The primary key is one of the candidate keys chosen by the database designer for unique entity identification  in the ER model, the primary key is highlighted by the set of underlined attributes  the value of a primary key should change very rarely
  34. 34. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 34February 17, 2017 Relationships  A relationship is an association between multiple entities  A relationship set (diamond) is a set of relationships of the same type  e.g. LocatedAt Beat Signer1234 1576 Reinout Roels 3212 Sandra Trullemans Employees 10F716 10G731e 10G731d 10F703 Offices LocatedAt OfficesEmployees id name name address
  35. 35. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 35February 17, 2017 Relationships ...  We can have binary or n-ary relationship sets  {(e1, e2,..., en) | e1 E1, e2 E2,..., en En}  Each relationship instance in an ER schema represents an association between the involved entities  The role defines an entity's function in a relationship  has to be explicitly defined if the same entity set participates more than once in a relationship set (recursive relationship)  A relationship may contain descriptive attributes  A relationship instance must be uniquely identifiable by its entities (without any descriptive attributes)  i.e. a relationship set cannot contain two relationship entities that only differ in their descriptive attributes
  36. 36. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 36February 17, 2017 Relationship with Roles and Attributes LocatedAt OfficesEmployees id name name address WorksFor boss employee duration
  37. 37. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 37February 17, 2017 Example of a 3-ary Relationship WorksFor CompaniesEmployees id name name address Durations from to
  38. 38. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 38February 17, 2017 Cardinality Constraints  A relationship can be one-to-one, one-to-many, many-to-one or many-to-many  An arrow indicates a to-one relationship  cardinality constraints may also be expressed by numbers - e.g. 0..*, 1..*, 0..1, 1..1, 2..5  a double line or 1..* indicates a total participation constraint MarriedTo WomenMen Teaches CoursesTeachers StarsIn FilmsFilmstars
  39. 39. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 39February 17, 2017 Weak Entity Sets  An entity set with a primary key is called a strong entity set  A weak entity set (double rectangle) does not have enough attributes to form a primary key
  40. 40. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 40February 17, 2017 Weak Entity Sets ...  A weak entity set must be associated with an identifying entity set via an identifying relationship (double diamond)  a weak entity set is existence dependent on an identifying entity set - can also participate in other non-identifying relationships  a weak entity set must relate to the identifying entity set via a total participation constraint and each weak entity instance can only be related to one identifying entity instance  a discriminator or partial key (underlined dashed attributes) uniquely identifies a weak entity relative to a strong entity  In some cases a weak entity may also be expressed as a multivalued composite attribute
  41. 41. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 41February 17, 2017 Homework  Study the following two chapters of the Database System Concepts book  chapter 1 - Introduction  chapter 7 - sections 7.1-7.5 and 7.7 - Database Design and the ER Model
  42. 42. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 42February 17, 2017 Exercise 1  Conceptual modelling  Entity-Relationship (ER) model
  43. 43. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 43February 17, 2017 References  A. Silberschatz, H. Korth and S. Sudarshan, Database System Concepts (Sixth Edition), McGraw-Hill, 2010  P. Chen, The Entity-Relationship Model - Toward a Unified View of Data, ACM Transactions on Database Systems 1 (1), March 1976  WISE Lab  http://wise.vub.ac.be  ArtVis Project  http://wise.vub.ac.be/content/artvis-exploring-information-through- advanced-visualisation-techniques
  44. 44. 2 December 2005 Next Lecture Extended ER Model and other Modelling Languages

×