Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Modern Database Systems
CS-E5040
Aristides Gionis
Michael Mathioudakis
T.A.: Orestis Kostakis
Spring 2016
what is a Database Management System (DBMS)?
what is a database?
a collection of data
what is a database management system...
why use a DBMS?
discuss
3
why use a DBMS?
separate logical from physical data organization
efficient data access
guarantee data integrity and securi...
why study database systems?
discuss
5
why study database systems?
to manage data efficiently
6
consider the following task…
data records that contain information about products viewed or purchased from an online store...
the main message
to manage data efficiently
minimize expensive operations
e.g., disk access
parallelize computation
8
why study database systems?
to manage data efficiently ...
... from different roles
•  develop database systems that match...
the relational database system
10
database
(data stored on disk)
DBMS
query optimization & execution
relational operators
...
previously on ‘database systems’...
§  relational data model
•  relation, attribute, tuple, schema, domain, keys
§  rela...
modern database systems
beyond the typical relational DBMS setting...
different data models
semi-structured data, unstruct...
about this course
familiarize ourselves with modern database systems
principles and practice
database models: data, querie...
syllabus
14
part 1: relational database systems (Jan 22 & Jan 29 - Michael)
topics relational model (SQL), indexing (b+ tr...
logistics
15
instructors Aristides Gionis, Michael Mathioudakis
teaching assistant Orestis Kostakis
lectures Friday 10-12,...
workload & grading scheme
●  3 assignments + exam
●  25% each
●  need to have at least 50% on each
○  i.e., cannot skip so...
that’s all for now!
questions?
next week
relational model and SQL
indexing
access cost analysis
what to do until then
(if ...
Credits
for some of these slides, we used material from
“Database Systems: The Complete Book”, by Garcia-Mollina, Ullman, ...
Upcoming SlideShare
Loading in …5
×

Modern Database Systems - Lecture 00

2,268 views

Published on

Slides for course "Modern Database Systems", taught at Aalto University during Spring 2016.

Published in: Education
  • Be the first to comment

Modern Database Systems - Lecture 00

  1. 1. Modern Database Systems CS-E5040 Aristides Gionis Michael Mathioudakis T.A.: Orestis Kostakis Spring 2016
  2. 2. what is a Database Management System (DBMS)? what is a database? a collection of data what is a database management system? ... a.k.a. ‘database system’ software to store, access, administer a database not just a collection of files provides mechanism to query the data transfers data between main memory and secondary storage (disk) enables concurrent access, offers guarantees for data consistency provides crash recovery mechanisms provides security and access control 2
  3. 3. why use a DBMS? discuss 3
  4. 4. why use a DBMS? separate logical from physical data organization efficient data access guarantee data integrity and security reduce application development time data administration 4
  5. 5. why study database systems? discuss 5
  6. 6. why study database systems? to manage data efficiently 6
  7. 7. consider the following task… data records that contain information about products viewed or purchased from an online store task for each pair of Games products, count the number of customers that have purchased both 7 Product Category Customer Date Price Action other... Portal 2 Games Michael M. 12/01/2015 10€ Purchase ... FLWR Plant Food Garden Aris G. 19/02/2015 32€ View Chase the Rabbit Games Michael M. 23/04/2015 1€ View Portal 2 Games Orestis K. 13/05/2015 10€ Purchase ... > what challenges does case B pose compared to case A? hint limited main memory, disk access, distributed setting case A 10,000 records (0.5MB per record, 5GB total disk space) 10GB of main memory case B 10,000,000 records (~5TB total disk space) stored across 100 nodes (50GB per node), 10GB of main memory per node
  8. 8. the main message to manage data efficiently minimize expensive operations e.g., disk access parallelize computation 8
  9. 9. why study database systems? to manage data efficiently ... ... from different roles •  develop database systems that match application requirements •  use database systems efficiently o  … knowing how a DBMS •  stores data, •  processes queries, •  and accesses data o  … allows us to ü  organize data appropriately ü  design efficient algorithms to process the data •  combine existing database systems to match requirements •  large variety of data and applications •  “one size fits none” - Michael Stonebraker 9
  10. 10. the relational database system 10 database (data stored on disk) DBMS query optimization & execution relational operators files and access methods buffer (memory) management disk space management query interface database design application database user introductory course relational dbms our course relational dbms our course non-relational dbms
  11. 11. previously on ‘database systems’... §  relational data model •  relation, attribute, tuple, schema, domain, keys §  relational algebra •  projection, selection, •  cartesian product, natural joins, theta joins, outer joins •  renaming, constraints §  structured query language (SQL) §  schema design •  functional dependencies, normalization §  applications •  embedded sql, drivers 11
  12. 12. modern database systems beyond the typical relational DBMS setting... different data models semi-structured data, unstructured text, graphs operations at massive scale big data platforms & map-reduce paradigm, hadoop and spark, cloud computing tailored performance key-value stores, column-stores, in-memory databases, streaming systems 12
  13. 13. about this course familiarize ourselves with modern database systems principles and practice database models: data, queries, and computation algorithms - simple queries (e.g., joins) to complex algorithms experience with real technologies emphasis is on understanding of core issues… essentially: the cost of algorithms for different database models and settings you can use what you learn here to: select a database system that fits the demands of your application... … based on supported data model, functionality, optimizations, scalability design your database to fit the needs of your application … e.g., by building appropriate index structures write fast algorithms to process your data… … and estimate their running time adapt your knowledge to the database system you’ll be using 5 years from now 13
  14. 14. syllabus 14 part 1: relational database systems (Jan 22 & Jan 29 - Michael) topics relational model (SQL), indexing (b+ trees, hash tables), join algorithms, query optimization technology MySQL part 2: semi-structured data (Feb 5 - Aris) topics semi-structured data abstraction, representation, search, indexing and pipeline aggregation technology MongoDB part 3: unstructured text and information retrieval (Feb 12 & Feb 26 - Aris) topics querying text data, inverted indexes, compression, ranking and evaluation, rank aggregation technology Lucene part 4a: big data platforms - mapreduce (Mar 4 & Mar 11 - Michael) topics mapreduce paradigm, algorithms in mapreduce technology Hadoop part 4b: big data platforms - graph databases (Mar 18 & Apr 1 - Aris) topics the pregel paradigm, algorithms on pregel (pagerank, centrality) technology hadoop giraph, spark graphx
  15. 15. logistics 15 instructors Aristides Gionis, Michael Mathioudakis teaching assistant Orestis Kostakis lectures Friday 10-12, Room T3 weeks 2 - 6, 8 - 11, 13 office hours by appointment starting on Monday January 25th curriculum •  slides and course notes •  no single textbook, but slides will provide references for further study announcements •  follow course website on mycourses.aalto.fi when you send email •  aristides.gionis / michael.mathioudakis / orestis.kostakis @ aalto.fi •  subject: [ModernDB] your topic programming assignments •  we’ll provide instances of VirtualBox, ready for use •  you can use campus labs or own laptop •  access to CSC, if needed
  16. 16. workload & grading scheme ●  3 assignments + exam ●  25% each ●  need to have at least 50% on each ○  i.e., cannot skip some of the course ●  assignments: ○  pen & paper ■  based on slides + references ○  programming ■  real-world tools, e.g., MySQL, MongoDB, Spark ■  will provide tutorials 16
  17. 17. that’s all for now! questions? next week relational model and SQL indexing access cost analysis what to do until then (if you want) SQL 17
  18. 18. Credits for some of these slides, we used material from “Database Systems: The Complete Book”, by Garcia-Mollina, Ullman, Widom “Database Management Systems”, by Ramakrishnan and Gehrke 18

×