Cloud Data Storage and Database

Cloud-Based Data
Storages and Databases
for Data Science
Waterloo Data Science and Data Engineering Meetup
Zia Babar
LinkedIn: https://www.linkedin.com/in/zbabar
Twitter: https://twitter.com/ziababar
July 2018

Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
In Part 1 of this Series...

Cloud Database
● SQL Databases
● NoSQL Databases
○ Key-Value
○ Column
○ Document
○ Graph
● Cloud-Based (on Azure)
Agenda for Part 2
Cloud Storage
● File Storage
● Block Storage
● Object Storage
● Cloud-Based
○ On Microsoft Azure
○ On Google Cloud

● Differences between
○ Data Lakes
○ Databases
○ Data Warehouses
● Enterprise Data Integration
What will be covered in Part 3...
● How to handle various data
types…
○ Streaming data (e.g. IoT)
○ Batch data
○ Event data
○ Log Data

About the Presenter
Zia has 19 years of professional industry experience, with the most recent 8 years
being in technical leadership roles, where he led various engineering teams
pertaining to the design, development and deployment of enterprise applications
with a particular focus on incorporating machine learning practices and cognitive
services into software applications.
Presently Zia is finishing up his PhD at the University of Toronto with particular
research interests on designing enterprise cognitive systems.

● Many companies require a centralized, easily accessible way to store files
and folders.
● File level storage provide traditional and simple approach to data storage
at low cost.
● Files are given a name, tagged with metadata, and organized in folders
under directories and sub-directories.
● Standard naming convention is used, which makes them easy to organize.
● Storage technologies such as NAS (Network Attached Storage) allow for
sharing at the local system level.
File Storage

File Storage - Advantages
● Traditional and simple approach.
● Hierarchical system that excels at handling relatively small amounts of
data.
● Low data storage cost and complexity.

● Navigating through large number of files is time consuming.
● Searching for data is problematic.
● File based operations (backup, restore, etc.) take much longer.
File Storage - Disadvantages

● Sharing files is simple and effective.
● Scalability can be quickly achieved using scale-out NAS solutions at low a
cost for archiving files.
● Deployment is easily attained. Porting over data is simple.
● Support for standard protocols and encryption, native replication, and
various drive technologies ensures protection of data..
File Storage Use Cases

● Raw storage volume filled with files that are split into chunks of data of
equal size.
● A server-based OS manages these volumes and uses them as individual
hard drives to perform native OS functions.
● Typically deployed as SAN (Storage Area Network).
Block Storage

Block Storage - Advantages
● Unlike in file-based architectures, there are no additional meta-data
details associated with a block outside of its address.
● The controlling OS manages data block storage by allocating storage for
different applications and deciding where data goes in the block.
● This results in high performance with large amounts of data transfer
possible.

● Storage is tied to one server at a time.
● Limited metadata about the information being stored.
● Cost is calculated on block storage allocated, and not block storage used.
● Accessibility only through a running server.
● Requires needs more hands-on setup vs object storage e.g. filesystem
choices, permissions, versioning, backups, etc.
Block Storage - Disadvantages

Block Storage Use Cases
● Databases and other mission-critical applications that demand
consistently high performance.
● Multiple data disks configured in a RAID array to bolster data protection
and performance.
● Virtualization software vendors use block storage as file systems for the
guest OS.

● Object-based storage stores data in isolated containers known as objects.
● Objects are given a unique identifiers and stored it in a flat memory
model.
● Retrieval of objects is done using its unique ID and rely on REST APIs for
access.
● Lends much greater flexibility to metadata where customized metadata
can be paired with objects with specific applications.
Object Storage

● Endless customization possibilities are possible with object storage.
● Object based operations are possible. Example, move objects to different
areas of storage, delete objects when no longer needed, etc.
● Scaling out an object architecture is as simple as adding additional nodes
to the storage cluster due to location transparency.
● REST based authentication and authorization approaches can be applied
here as well.
Object Storage - Advantages

● Generally object storage offers far more manageability, flexibility and
scalability than file and block-level systems, however this is often at the
expense of performance.
Object Storage - Disadvantages

● Ability to accommodate unstructured data with relative ease, thus
serving well big data needs of organizations.
● API based and object based access makes for simplified integration and
usage in web applications.
● Optimum for massive amounts of data that typically accompany archived
backups.
Object Storage Use Cases

Storage Types on Microsoft Azure
Source: http://microsoftgeek.com/?p=2444

Azure Storage
Source: https://www.cloudberrylab.com/blog/microsoft-azure-storage-types-explained/

● Microsoft Azure Table Storage
○ Designed to store structured noSQL data (in tables) with the storage being
scalable and with cheap.
○ Can be a substitute to store structured data without relying on expensive
RDBMS databases and SQL querying techniques.
Microsoft Azure Storage
Source:
https://www.infragistics.com/community/blogs/b/mihail_mateev/
posts/how-to-manage-microsoft-azure-table-storage-with-node-
js

● Microsoft Azure Blob Storage
○ BLOB = Binary Large OBject
○ Object storage solution for the cloud
○ Optimized for storing massive amounts of unstructured data, such as text or
binary data.
Source:
https://www.qnap.com/en-uk/how-to/tutorial/article/backup-q
nap-nas-data-to-microsoft-azure-storage

● Microsoft Azure Queue Storage
○ Type of storage designed to connect multiple decoupled and independent
application components.
○ Allows for stateless applications and also asynchronous message queuing.
Source:
https://newhelptech.wordpress.com/2017/11/10/step-
by-step-how-to-create-and-configure-azure-queue-sto
rage-using-visual-studio-in-microsoft-azure/

● Microsoft Azure Disk Storage
○ A service that allows you to create disks for your virtual machines.
○ The disk can be accessed from only one virtual machine as a local drive.
● Microsoft Azure File Storage
○ A network based storage share that can allows file to be accessed from from
different Virtual Machines.

Storage Types on Google Cloud Platform
Source: Google

Choosing a storage option
Source: https://cloud.google.com/storage-options/

Storage Classes
Source: https://cloudplatform.googleblog.com/2016/10/introducing-Coldline-and-a-unified-platform-for-data-storage.html

SQL Databases
● One type of structure (relational)
● Developed in the 1970s to deal with first wave of data
storage applications
● Structure and data types are fixed in advance
● Uses SQL based querying through keywords such as SELECT,
INSERT, UPDATE etc.
● Examples: MySQL, Postgres, Oracle, SQL Server, etc.

NoSQL Databases
● Many different types of databases (key-value, document,
wide-column, and graph)
● Developed in 2000s to deal with limitations of SQL DBs
(concerning scale, replication, unstructured data)
● Dynamic schemas. Records add new information on the fly.
● Access is through object-oriented APIs.
● Examples: MongoDB, Cassandra, Neo4j, Redis Cache, etc.

SQL vs. NoSQL
Source: https://twitter.com/aricitak/status/781051974011813888

Source: Wikipedia
Key-Value Pair
● Hold a single serialized object for each key value.
● Good for storing large volumes of data where you want to get one item
for a given key value
● No need to query based on other properties of the item.

Column
● Key/value data stores that structure data storage into collections of
related columns called column families.
● Store each column family in a separate partition, with same key.
● An application can read a single column family without reading through
all of the data for an entity.
Source:
https://database.guide/what-is-a-column-store-dat
abase/

DocumentDB
● Key/value DBs in which the values are documents, "document" is a
collection of named fields and values.
● Data typically stored as XML, YAML, JSON, or plain text.
● Can query on non-key fields and define secondary indexes for faster
querying.
● Suitable for applications that retrieve data based on complex criteria
(beyond the value of the document key)
Source:
https://lennilobel.wordpress.com/2015/06/01/relational-dat
abases-vs-nosql-document-databases/

Index compare b/w MongoDB and SQL
Source: http://sql-vs-nosql.blogspot.com/2013/11/indexes-comparison-mongodb-vs-mssqlserver.html

● Data schema consists of nodes, edges and properties to model the
relationship between objects.
● Can efficiently perform queries that traverse the network of objects and
the relationships between them.
Graph Databases
Source: Wikipedia

Azure NoSQL Databases
Source:
https://docs.microsoft.com/en-us/aspnet/aspnet/overview/developing-apps-with-windows-azure/building-real-world-cloud-apps-with-windows-azure/dat
a-storage-options

Google Cloud Database Portfolio
Source: https://twitter.com/gregsramblings/status/839667109634293760

Cloud Data Storage and Database

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Cloud Data Storage and Database

Similar to Cloud Data Storage and Database (20)

More from Zia Babar

More from Zia Babar (6)

Recently uploaded

Recently uploaded (20)

Cloud Data Storage and Database