2. Outline (Part 1)
• What are Structured, unstructured, semi-structured data?
• DBMS (Database Management System)
• ACID consistency
• Distributed DB
• Replicas and Sharding
• What is No-SQL?
• When to use No-SQL and when not to use No-SQL?
3. Outline (Part 2)
• GCP options for database
• What is managed cloud services
• Cloud storage
• Cloud SQL
• Cloud datastore (no sql)
• Cloud big table (no sql)
4. Structured data
• Can be stored in DB in tables (rows and columns)
• Tables have relational keys
• High degree of organization
• Database contains a schema
• Schema defines tables, tables fields and relationships
5. Click to add text
This erd represents one
to many relation
6. Unstructured data
• Data has no structure
• Data has no model
• Cannot fit in relational databases
• Can be represented as a sequence of bytes
• Examples: Images, Videos, files and emails
7. NLP, text analysis and data mining provide
methods to find patterns of info in unstructured
data
Unstructured data
8. Semi-structured data
• It is a type of structured data, but it lacks the model
• Tags are used to identify certain elements
• Data does not have rigid structure
• A group of data that belongs to a certain class can
have different attributes
• Data cannot fit into tables
• Examples : JSON, XML
10. DBMS
• A DBMS makes it possible for end users to create,
read, update and delete data in a database
• The DBMS essentially serves as an interface between
the database and end users or application programs
16. Atomic
• A transaction must be treated as atomic unit
• All operations must be executed or none
• As in the bank transfer example we cannot
execute
The transaction cannot be partly excuted for example
subtract the transferred money from the first account and
not send it to the other account
17. Consistency
• A transaction either creates a new and valid state of
data, or, if any failure occurs, returns all data to its
state before the transaction was started
• If database was consistent before transaction then DB
must be consistent after transaction
18. Isolation
• If more than one transaction is executed at the same
time both should be in total isolation of each other
19. Isolation
• Example:
One day you go to a restaurant and the restaurant had
no other customers and you make an order then you
will get your order
The next day you go to the same restaurant and it was
full and you make an order then you should get the
same order as the day before
20. Isolation
In other words you should get the
same order whether another
orders are taken place in
the restaurant or not
21. Isolation
So If you execute T1 alone you will get
The same result when you execute T1 while T2 is
executing
22. Durability
• Guarantees that transaction that have committed will
survive permanently
• Whatever changes are made to DB those must have
affect irrespective to hardware or software failure
• Durability can be achieved by flushing the
transaction's log records to non-volatile
storage before acknowledging commitment.
23. Distributed database
•It is a database in which storage devices are not all attached to a
common processor.
• It may be stored in multiple computers, located in the same
physical location; or may be dispersed over a network of
interconnected computers
24. Distributes Database Management System
•DDBMS sync all data periodically and insures any
changes (updates, deletes and additions ) are
performed on data in one place will be automatically
reflected on data stored elsewhere
•User will always see data consistent with data seen by
another user
25. Replicas
• It is frequent copying data in db from one computer or
server to another
• So that all users share the same level of information
• The result : is distributed DB which portions of DB are
stored in multi physical locations and processing is
distributed among different nodes
26. Sharding
• A database sharding is partitioning of data in DB
• Each individual portion is referred to as shard
• Breaking BD into much smaller DBs
27. Eventually consistent data
• Eventual consistency is a consistency model used in
distributed computing to achieve high availability that
informally guarantees that, if no new updates are made to a
given data item, eventually all accesses to that item will
return the last updated value(consistency is maintained
later)
• Eventually-consistent services are often classified as
providing BASE (Basically Available, Soft state, Eventual
consistency) semantics, in contrast to traditional ACID
(Atomicity, Consistency, Isolation, Durability) guarantees.
28. Cap for distributed DB
• System made up of multi nodes (scaled out)
communicating with each other over a network
29. CAP Theorem
• Consistency : if you write to one node and read from
another node then you get what you wrote
Data is consistent across all nodes
All nodes must get the same most recent writes
• Availability: when you talk to a node it will respond
without a guarantee that respond is the most recent
• Partition tolerance: when the network is partitioned it
a node fails the system continues to work
30. CAP Theorem says
we can only have two of these properties
at most in distributed system
31.
32. No-SQL
A NoSQL (originally referring to "non SQL" or "non-
relational") database provides a mechanism for storage
and retrieval of data that is modeled in means other
than the tabular relations used in relational databases.
33. No-SQL
A better was to describe no-sql is is Not Only SQL
because you can actually use SQL in No-SQL
36. No-SQL cannot support
• Joins
• No constraints support (example: null constraints)
• No support for complex transactions
• Data integrity
Example :
Insert 3 records
Update 2 records
Check something if true rollback
39. SQL vs NO-SQL
SQL NO-SQL (non-relational )
Structured data only Unstructured, semi-
structured, structured data
Fixed schema Flexible schema
Non-scalable scalable
Non distributed Distributed
transactional Non transactional
40. When to use NO-SQL
1. Storing and retrieving big quantities of data (Big Data)
2. Relationships between elements is not important
3. Dealing with growing lists of elements (social media
posts)
4. Unstructured data or the structure of data changes
rapidly
5. Constraints and validations of data can be performed
in application layer and no need to implement
constraints in DB
41. When not to use NO-SQL
1. Complex transactions (bank transfer from one
account to another example)
2. Joins must be handled by DB
3. Validations and constraints must be andeled by DB
42. GCP options for database
We need persisted durable
storage
Persistent storage means
data doesn’t go away after
device is turned off unlike
cache and ram
43. Cloud storage for unstructured data
Buckets : where you store your data
Objects : things you are storing
45. Bucket class in cloud storage
1. Standard :
Best latency , highest availability 99.9%
2. Reduced availability :
Availability 99% , less expensive
3. Nearline :
Higher latency (few more seconds for the first byte)
way less expensive
if you will access object less than once a month
Best for archival scenarios
99 % available
46. Bucket location in data storage
1. Multi regional (boarder geographic area)
2. Regional : corresponding to region supported in
other cloud platform service like compute engine
47. Uploading objects to cloud storage
1. From developer console in gcp website
2. From command line using gutil in GC SDK:
Can access files in file system and objects stored in amazon S3
48. Cloud SQL for structured data
Cloud sql is hosted sql
All that you can do with sql can be done
with cloud sq
Google takes care of keeping the os up-
to-date, preforming backups...
49. Cloud SQL for structured data
Cloud SQL is a fully-managed
database service that makes it easy to set up,
maintain, manage, and administer your
relational PostgreSQL and MySQL databases
in the cloud.
55. Cloud big table for semi-structured
and structured data
• Cloud big table is from the no-sql Family
• Stores over a terabyte of structured data
• Low lever DB
• High scalability
• Low latency (uses single zone)
• Does not scale down to small size