2. About the Instructor Please call me J (Just one letter, no period). Office hours: 5:00 – 6:00 day of the class in this room. In my spare time: President, Early Stage IT – a cloud-based consulting firm Co-founder and CTO, ConnectScholar – a cloud-based web service Co-chair of Software and Services SIG at TiE-Boston In the past: Director of Software Engineering, Fidelity Investments Software Architect, Computervision Corp Prof. ofEE @ WPI
3. A SQL test a b 20 30 40 … … R Explain the difference between SELECT b FROM R WHERE a<10 OR a>=10; and SELECT b FROM R;
4. Another SQL test Explain the difference between SELECT a FROM R, S WHERE R.b = S.b; And SELECT a FROM R WHERE b IN (SELECT b FROM S);
5. About CS 542 CS 542 will Build on database concepts you already know Provide you tools for separating hype from reality Help you develop skills in evaluating the tradeoffs involved in using and/or creating a database CS 542 may Train you to read technical journals and apply them CS 542 will not Cover the intricacies of SQL programming Spend much effort in Dynamic SQL Stored Procedures Interfaces with application programming languages Connectors, e.g., JDBC, ODBC
6. What’s so fun about databases? Traditional database courses talked about Employee records Bank records Now we talk about Web search Data mining The collective intelligence of tweets Scientific and medical databases
7. How much data can a database hold? The biggest OLTP databases 2001: 1.1 – 10.3 TB. 2003: 9.1 – 29.2 TB. 2005: 17.7 – 100.4 TB. 2010: ~2.5 PB. The trend will continue Very large databases bring new unique challenges CS 542 is about the challenges of big databases
8. DBMS Architecture Applications can be in any programming language DBMS presents a programmatic interface to the applications Typically SQL SQL is not a Turing-complete programming language Every SQL statement is guaranteed to complete
9. Databases are a strategic asset The value of a company is defined by its data, for example: LinkedIn, Facebook, eBay, Amazon, Google They who have the data have the power Some examples? The power of the data comes from Its quality Its consistency Its ease of retrieval What else? CS 542 is about creating & enhancing the power of data
10. Course Plan Course Plan Course Policies and Grading Rubric Other materials: Prof. ShivnathBabu, Duke Univ. Prof. Ullman, Stanford Univ. Prof. Ramakrishnan, U. of Wisconsin Published papers, cited in the notes.
11. Computing Resource Options On your laptop Download MySQL from the web Eclipse IDE if desired Microsoft Access or SQL Server WPI Computer Science Resources MySQL or Oracle Amazon AWS $100 credit per student, send me an email to get authorization code For use with RDS or MapReduce Google App Engine BigTable is available under the name DataStore, free up to a limit
13. Overview of Data Models A Data Model pertains to the structure, operations and any constraints on those structures
14. Basics of the Relational Model A Table is referred to as a Relation Each Row is a tuple; each Column is an attribute Each attribute is constrained to be a specific type May also have value constraints May also have uniqueness constraints
15. More on Relations A Relation is a set, not a list Order of tuples is irrelevant It’s common to add/modify/delete rows Not so common to add/delete columns When you modify a relation, the old version is replaced by the new version At any time, the relation only has “the current instance” Almost impossible to get the state back to prior versions Why is this so hard?
16. Keys of Relations A key is uniquely able to identify a tuple Single-column keys Multi-column keys Can have more than one key (more than one way to identify a tuple)
17. Defining a Relational Schema in SQL Data Definition Language Equivalent of declaration statements in C or Java Look these up for the database of your choice: CREATE TABLE DROP TABLE ALTER TABLE Data Manipulation Language Equivalent of programming constructs Will be covered next week
22. The Algebra of Data Manipulation (p1) Set Operations on Relations Union, Intersection, Difference The tuples must have the same schema Subsetting: selection, projection C(R) Selection of R for condition C yields a subset of rows A1,A2,… ,An (R) Projection of R for attributes A1, A2,… , An yields a subset of columns Quasi-multiplication operators – also known as JOIN Renaming of tables or their attributes a/b(R) Rename column b in R to a
23. The Algebra of Data Manipulation (p2) See treatment in Wikipedia. Focus on natural-joins, -joins, semi-joins and outer-joins
24. The Algebra of Data Manipulation (p3) Relational Algebra allows you to combine the primitives SELECT b FROM R WHERE a<10 OR a >= 10; b((a < 10 OR a 10)(R)) SELECT a FROM R WHERE b IN (SELECT b FROM S); a((b IN b(S)) (R))
25. Next meeting January 24 Chapter 6 Sections 5.3 and 5.4 Due on 1/24: a proposal for your presentation topic No more than 1 page, no less than 300 words. Include an initial bibliography Will not be graded independently, feedback will be provided Will feed into your presentation grade