1. BTM 382 Database Management
Chapter 2: Data models
Chapter 12-12: CAP
Chapter 14-2a: Hadoop
Chitu Okoli
Associate Professor in Business Technology Management
John Molson School of Business, Concordia University, Montréal
2. Structure of BTM 382 Database Management
Week 1: Introduction and overview
ch1: Introduction
Weeks 2-6: Database design
ch3: Relational model
ch4: ER modeling
ch6: Normalization
ERD modeling exercise
ch5: Advanced data modeling
Week 7: Midterm exam
Weeks 8-10: Database programming
ch7: Intro to SQL
ch8: Advanced SQL
SQL exercises
Weeks 11-13: Database management
ch2,12,14: Data models
ch13: Business intelligence and data warehousing
ch9,14,15: Selected managerial topics
3. Review of Chapters 2, 12, 14:
Data models
What is a data model?
How have data models developed over the
years?
What is the Object-Oriented Data Model
(OODM), and when is it useful?
What is Big Data, and how does NoSQL
resolve the major Big Data challenges?
Which data models should we use for which
situations?
5. What is a model?
A model is a simplified way to describe or explain a
complex reality
A model helps people communicate and work simply
yet effectively when talking about and manipulating
complex real-world phenomena
8. Importance of Data Models
Communication tool
Give an overall view of the database
Organize data for various users
Are an abstraction for the creation of well-
designed good database
11. The Relational Model
Uses key concepts from mathematical relations (tables)
“Relational” in “relational model” means “tables” (mathematical
relations), not “relationships”
Table (relations)
Intersections of
rows (various data types) and
columns (same data type)
Relations have well defined methods (queries) for combining
their data members
Selecting (reading) and joining (combining) data is defined based
on mathematical principles
Relational data management system (RDBMS)
Relations were originally too advanced for 1970s computing power
As computing power increased, simplicity of the model prevailed
12. The Entity Relationship Model
Enhancement of the relational model
Relations (tables) become entities
Very detailed specification of relationships and their properties
Entity relationship diagram (ERD)
Uses graphic representations to model database components
Many variations for notation exist
In this class, we use the Crow’s Foot notation
14. The Object-Oriented Data Model (OODM)
Tries to reconcile the ER model with
object-oriented programming (OOP)
The ER model’s view of data (tables) and programmers’ view of data (objects
in OOP), is completely different
This mismatch can sometimes make database programming painful,
especially for very complex data structures
An OODM uses OOP concepts to store data
Objects represent nouns (entities or records)
Objects have attributes (properties or fields) with values (data)
Objects have methods (operations or functions)
Classes group similar objects using a hierarchy and inheritance
In an OODBMS, the data retrieval and storage closely mirrors the data
structures that programmers use, and so programming complex objects
is much easier than with the ER model
More advanced forms support the Extended Relational Data Model,
Object/Relational DBMS, and XML data structures
20. Big Data
Volume
Huge amounts of data (terabytes and petabytes),
especially from the Internet
Velocity
Organizations need to process the huge amounts of
data rapidly, just as fast as with smaller databases
Variety
Many different types of data, much of it unstructured
and even changing in structure
21. How do you handle Big Data?
Where RDBMSs run into trouble
1. Solution: Scale up
Use more powerful, expensive servers
But RDBMSs are very computing intensive
Big data would require much faster, more capable,
more expensive computers, and even that’s not good
enough for big data
2. Solution: Scale out
Use many cheap distributed servers
But RDBMS is slow with distributed processing
Consistency is the biggest problem: guaranteeing
consistency (which RDBMS is great at) is slow
Slow infrastructure isn’t good enough for big data
23. NoSQL databases to the Big Data rescue
“NoSQL” means:
Non-relational or non-RDBMS
Also “Not only SQL”—a few in fact do support SQL
It is not one model; it is many different models that are not
relational data models
Scale out (many cheap distributed servers) instead of scale up
High scalability
Support distributed database architectures
High availability
Rapid performance for big data, including unstructured and sparse data
Fault tolerance
Continue to work even if some servers in the cluster fail
Emphasis is high performance speed, rather than
transaction consistency
24. Types of NoSQL databases
Image sources:
https://www.linkedin.com/pulse/20140823125259-38485481-nosql-databases-where-i-can-use?trk=sushi_topic_posts
http://www.monitis.com/blog/2011/05/22/picking-the-right-nosql-database-tool/
Also see:
Picking the Right
NoSQL Database Tool
25. Disadvantages of NoSQL
Complex programming is often required
“NoSQL” means you lose the ease-of-use and structural
independence of SQL
There is often no built-in implementation of relationships in
the database—you might have to program relationships
yourself in code
Data might be sometimes inconsistent
No guarantee of transaction integrity
Entity integrity and referential integrity not guaranteed
The data you retrieve at any given moment might be
inaccurate… but it will eventually become OK
This is the price to pay for rapid performance in a distributed
database
26. The CAP theorem for distributed databases
CAP stands for:
Consistency: All nodes see the same data
Availability: A request always gets a response (success or failure)
Partition tolerance: Even if a node fails, the system can still
function
A distributed database can guarantee only two of the three
CAP characteristics, not all three at the same time
Over time, it will eventually provide all three,
but it cannot guarantee all three at the same time
NoSQL databases are distributed, and so the CAP theorem
restricts them to providing BASE, not ACID
Image source: PRWEB
27. ACID versus BASE
A relational database guarantees the ACID properties:
Atomicity, Consistency, Isolation, Durability
In short, a set of SQL statements (called a transaction) will
either completely work or completely fail—no half way success,
and the result will not corrupt the database
A price to pay: results might be somewhat slow
A NoSQL database does not guarantee ACID; it only
guarantees BASE properties:
Basically Available, Soft-state, Eventual consistency
In short, at any given moment, not everything might be
consistent, but the database will eventually get consistent
In return, these imperfect results are delivered fast
33. Which data model should you use?
Hierarchical or network models
Obsolete—no one uses these any longer
Entity-relationship model
Almost always
90% or more of professional database situations
Object-oriented database
When you have very complex data structures, you need rapid
performance, and it helps achieve organizational objectives
Source: Barry & Associates, Inc
When data structures are so complex that organizing data as tables
causes headaches in programming retrieval and storage
NoSQL
When you have vast amounts of unstructured data and you need
rapid performance
When speed is more important than data consistency
Popularity ranking of DBMSs: http://db-engines.com/en/ranking
34. Summary of Chapters 2, 12, 14:
Data models
A data model is an abstract way of thinking about how
data is organized
Although the relational model has become the dominant
data model, it cannot solve all database challenges
The Object-Oriented Data Model is useful for complex
data coupled with object-oriented programming
Big Data is data with high volume, velocity and variety
NoSQL generally handles big data better than relational
databases, but it sacrifices consistency for speed
No single data model is the best for all situations, so we
should understand the pros and cons of each model
35.
36. Sources
Most of the slides are adapted from Database
Systems: Design, Implementation and
Management by Carlos Coronel and Steven Morris.
11th edition (2015) published by Cengage Learning.
ISBN 13: 978-1-285-19614-5
Other sources are noted on the slides themselves