Database Systems - A Historical Perspective
Investigate the history of database systems, from early days to the cloud area.
Covers RDBMS (SQL), NoSQL and other systems.
Topics Covered
* Historical databases
* Relational databases
* Non-relational databases
* Future directions
4. Historical Databases (No Database)
All data is stored in memory
It's a start
✔ Fast
✔ Store anything in any format
✖ No persistent and durable storage
5. Historical Databases (Flat File)
Ted Scott ▫ $100 ▫ Apple ☷ Ai Joe ▫ $900 ▫ Peach ☷
◺ ◿ ↑ ↑
field │ │
value │ └─ record separator
└─ field separator
✔ Persistent
✔ Store anything (records can be different)
✖ Low-level access, programmer needed
✖ Complex queries are hard and slow
➤ Today: for small data sets in some domains
6. Historical Databases (Hierarchical)
CTO
╱ ╲
Head1 Head2
╱ ╲
Mngr1 Mngr2
✔ Defined structure
✔ Faster than flat file
✖ Navigation through the hierarchy only (up-down)
✖ "Programmer perspective" needed
➤ Today: LDAP, Active Directory
7. Historical Databases (Navigational)
John ── Alice ── Maggie Rob
│ │ │ │
Richard ── Scott Susie ── Nancy
✔ Relaxed navigation
✔ Very fast
✖ Still pre-determined navigation (no ad hoc queries)
✖ "Programmer perspective" needed
➤ Today: IBM Information Management System v15
9. Relational Database Management System (RDBMS)
E. F. Codd in 1970 (IBM)
Relational model of data
Based on formal (math) rules
Optimal database design (NF)
Data access optimization
User friendly
Very popular
MySQL, Oracle, MS SQL, Sybase, MS Access, etc.
12. Structured Query Language (SQL)
SQL = Structured Query Language
ANSI Standard
Declarative language
Focus on what to do, not how to do
User friendly
Abstractions for non-programmers
English like language
Pure SQL applications (MS Access)
Not fancy, but no programming needed
13. Structured Query Language (Table Operations)
Create table
CREATE TABLE families (f_name char, s_name char, id int);
Modify table (add column)
ALTER TABLE families ADD child_name char;
Delete table
DROP TABLE families;
DROP TABLE = NoSQL :)
14. Structured Query Language (Data Operations)
Insert new data
INSERT INTO families VALUES ("Philip", "Zimmer", 3);
Query for data
SELECT f_name, s_name FROM families WHERE child_count > 2;
Modify existing data
UPDATE families SET f_name ="Jonas" WHERE f_name = "Jhn";
15. Structured Query Language (Transaction)
Transaction
Multiple operations treated as a single unit of work
Either all operations succeed or all fail
Example
BEGIN TRANSACTION
INSERT INTO families VALUES ("Philip", "Zimmer", 3);
INSERT INTO families VALUES ("Hans", "Vogler", 347);
END TRANSACTION
16. ACID Model
ACID defines who sees what changes and when
ACID transaction control properties
Atomic: operations succeed or roll-back (state before)
Consistent: database is in correct state when trans. finished
Isolated: transactions do not disturb/effect another
4 isolation levels (speed vs consistency)
Durable: results are permanent, even if error'd
18. RDBMS Drawbacks
Scaling is hard (ACID)
Expensive
'Free' solutions are not mature for 9...9%
Non-structured data is hard to store
NoSQL for rescue
For majority of uses RDBMS is just enough
21. Distributed Databases (Sharding and Federation)
Sharding
Break data into smaller chunks by key
Store chunks on different servers
Federation
Databases by domain functions
No single monolith database
Query impact (linking tables)
22. Data Warehouse TODO!!!!
Current and historical data
Store structured data (schema)
Query focused (Business Analytic)
Large and central data store
23. Data Mart TODO!!!!
Specific views by business departments
Based on data warehouse
Multiple data marts, not a single monolith
More summarized than data warehouse
24. Data Lake TODO!!!!
Central location for all data
Store raw data (no schema)
Purpose of data is not defined
Data science
26. Data Mesh TODO!!!!
Architectural pattern
Data ownership and distribution
Analytical data (optimizing the business)
Historical and aggregated view
Operational data (running the business)
Current and transactional state
28. NoSQL Databases
Schema/structure definition is optional
Store anything (mix data in collections)
Need to know major use cases before design
Performance
Very good for expected use cases
Bad for unexpected use cases
Varied transaction support (event-cons, quorum)
Query language complexities
Scalable distributed systems
29. Consistency Models TODO!!!!
When reader sees a system change TODO!!!!
Weak
Reader might or might not (at all) see the change
Eventual
Reader will see the change sometime
Strong
Reader sees the change immediately
30. CAP Theorem
Eric Brewer (~1997)
CAP theorem (Reliability)
Consistency: a read receives the most recent data or an error
Availability: a request receives a (non-error) response with
(maybe old) data
Partition tolerance: system operates when network is not
reliable
Choose two (but P shall be a must)
Some systems support configurable CAP modes
31. BASE Model
Similar to ACID, but for NoSQL
BASE model properties
Basically Available: system guarantees availability
Soft state: system state may change over time, even with no
input
Eventual consistency: system will be consistent over a period of
time, if no input received
33. NoSQL Databases (Key-Value)
123 ↠ firstName = "Arthur" ⌁ surName = "Legend"
8874 ↠ color = "Black ⌁ make = "Ford"
Very Fast
Simple to use
Access by keys only
Caches (Infinispan, Redis, Memcached, Ignite, etc.)
34. NoSQL Databases (Document I.)
Store JSON structured data
Documents can have different fields
{ ⌲ Document 1 Start
name: { ⌲ Complex field
first: "John" ⌲ Simple field
last: "Dee"
}
birth: "2/2/1982" ⌲ Document 1 field (only)
} ⌲ Document 1 End
{ ⌲ Document 2 Start
fullName : ⌲ Simple field
"James Doe"
} ⌲ Document 2 End
35. NoSQL Databases (Document II.)
Effective document (text) store
Free-text search engine
Documents are JSON based
Various query format
Varied transaction support (single doc.)
Couchbase, Elasticsearch, MongoDB, etc.
36. NoSQL Databases (Wide Column I.)
Rows (keys) with many (~1000) columns
Write optimized (call logs, bank transactions, etc.)
SQL like query language
Limited ACID support
Heavy weight systems
HBase, Cassandra, etc.
38. NoSQL Databases (Graph I.)
Based on directed graph
Nodes, properties and relations
Replacement for complex relational models
High level query language
ACID transactions
Neo4j (Cypher), GraphDB (SparQL), etc.
40. NoSQL Databases (Time I.)
Data points (measurement) over time interval
Regular intervals (metrics)
Irregular intervals (events)
Data is more useful as aggregate (continuous queries)
SQL like query language with time related additions
No transaction concept
PK is time in high precision
Data modification is rare (append only)
InfluxDB, Kdb+, Prometheus, etc.
41. NoSQL Databases (Time II.)
Example measurement:
weather,location=us-midwest temperature=82 144488740
| ─────────┬────────── ──┬─────────── |
measurement tag field timestamp
measurement ≈ table
tag ≈ indexed field
field ≈ not indexed field
42. NoSQL Databases (Computing Grid)
Calculations performed in a computing grid
Move program logic to data, not the other way around
Ignite, Infinispan, etc.
43. NoSQL Drawbacks
Operational/developer experience needed
Complex Infrastructure
Planned usage drives database design
Data de-normalization might be needed (!)
ACID/BASE compliance varies
Complex queries can be hard
Large distributed systems are always in the state of partial failure