SlideShare a Scribd company logo
Cloud-Based Data
Storages and Databases
for Data Science
Waterloo Data Science and Data Engineering Meetup
Zia Babar
LinkedIn: https://www.linkedin.com/in/zbabar
Twitter: https://twitter.com/ziababar
July 2018
Introduction
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
In Part 1 of this Series...
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
Cloud Database
● SQL Databases
● NoSQL Databases
○ Key-Value
○ Column
○ Document
○ Graph
● Cloud-Based (on Azure)
Agenda for Part 2
Cloud Storage
● File Storage
● Block Storage
● Object Storage
● Cloud-Based
○ On Microsoft Azure
○ On Google Cloud
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
● Differences between
○ Data Lakes
○ Databases
○ Data Warehouses
● Enterprise Data Integration
What will be covered in Part 3...
● How to handle various data
types…
○ Streaming data (e.g. IoT)
○ Batch data
○ Event data
○ Log Data
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
About the Presenter
Zia has 19 years of professional industry experience, with the most recent 8 years
being in technical leadership roles, where he led various engineering teams
pertaining to the design, development and deployment of enterprise applications
with a particular focus on incorporating machine learning practices and cognitive
services into software applications.
Presently Zia is finishing up his PhD at the University of Toronto with particular
research interests on designing enterprise cognitive systems.
Cloud-Based Data Storage
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
● Many companies require a centralized, easily accessible way to store files
and folders.
● File level storage provide traditional and simple approach to data storage
at low cost.
● Files are given a name, tagged with metadata, and organized in folders
under directories and sub-directories.
● Standard naming convention is used, which makes them easy to organize.
● Storage technologies such as NAS (Network Attached Storage) allow for
sharing at the local system level.
File Storage
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
File Storage - Advantages
● Traditional and simple approach.
● Hierarchical system that excels at handling relatively small amounts of
data.
● Low data storage cost and complexity.
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
● Navigating through large number of files is time consuming.
● Searching for data is problematic.
● File based operations (backup, restore, etc.) take much longer.
File Storage - Disadvantages
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
● Sharing files is simple and effective.
● Scalability can be quickly achieved using scale-out NAS solutions at low a
cost for archiving files.
● Deployment is easily attained. Porting over data is simple.
● Support for standard protocols and encryption, native replication, and
various drive technologies ensures protection of data..
File Storage Use Cases
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
● Raw storage volume filled with files that are split into chunks of data of
equal size.
● A server-based OS manages these volumes and uses them as individual
hard drives to perform native OS functions.
● Typically deployed as SAN (Storage Area Network).
Block Storage
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
Block Storage - Advantages
● Unlike in file-based architectures, there are no additional meta-data
details associated with a block outside of its address.
● The controlling OS manages data block storage by allocating storage for
different applications and deciding where data goes in the block.
● This results in high performance with large amounts of data transfer
possible.
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
● Storage is tied to one server at a time.
● Limited metadata about the information being stored.
● Cost is calculated on block storage allocated, and not block storage used.
● Accessibility only through a running server.
● Requires needs more hands-on setup vs object storage e.g. filesystem
choices, permissions, versioning, backups, etc.
Block Storage - Disadvantages
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
Block Storage Use Cases
● Databases and other mission-critical applications that demand
consistently high performance.
● Multiple data disks configured in a RAID array to bolster data protection
and performance.
● Virtualization software vendors use block storage as file systems for the
guest OS.
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
● Object-based storage stores data in isolated containers known as objects.
● Objects are given a unique identifiers and stored it in a flat memory
model.
● Retrieval of objects is done using its unique ID and rely on REST APIs for
access.
● Lends much greater flexibility to metadata where customized metadata
can be paired with objects with specific applications.
Object Storage
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
● Endless customization possibilities are possible with object storage.
● Object based operations are possible. Example, move objects to different
areas of storage, delete objects when no longer needed, etc.
● Scaling out an object architecture is as simple as adding additional nodes
to the storage cluster due to location transparency.
● REST based authentication and authorization approaches can be applied
here as well.
Object Storage - Advantages
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
● Generally object storage offers far more manageability, flexibility and
scalability than file and block-level systems, however this is often at the
expense of performance.
Object Storage - Disadvantages
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
● Ability to accommodate unstructured data with relative ease, thus
serving well big data needs of organizations.
● API based and object based access makes for simplified integration and
usage in web applications.
● Optimum for massive amounts of data that typically accompany archived
backups.
Object Storage Use Cases
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
Storage Types on Microsoft Azure
Source: http://microsoftgeek.com/?p=2444
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
Azure Storage
Source: https://www.cloudberrylab.com/blog/microsoft-azure-storage-types-explained/
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
● Microsoft Azure Table Storage
○ Designed to store structured noSQL data (in tables) with the storage being
scalable and with cheap.
○ Can be a substitute to store structured data without relying on expensive
RDBMS databases and SQL querying techniques.
Microsoft Azure Storage
Source:
https://www.infragistics.com/community/blogs/b/mihail_mateev/
posts/how-to-manage-microsoft-azure-table-storage-with-node-
js
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
● Microsoft Azure Blob Storage
○ BLOB = Binary Large OBject
○ Object storage solution for the cloud
○ Optimized for storing massive amounts of unstructured data, such as text or
binary data.
Microsoft Azure Storage
Source:
https://www.qnap.com/en-uk/how-to/tutorial/article/backup-q
nap-nas-data-to-microsoft-azure-storage
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
● Microsoft Azure Queue Storage
○ Type of storage designed to connect multiple decoupled and independent
application components.
○ Allows for stateless applications and also asynchronous message queuing.
Microsoft Azure Storage
Source:
https://newhelptech.wordpress.com/2017/11/10/step-
by-step-how-to-create-and-configure-azure-queue-sto
rage-using-visual-studio-in-microsoft-azure/
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
● Microsoft Azure Disk Storage
○ A service that allows you to create disks for your virtual machines.
○ The disk can be accessed from only one virtual machine as a local drive.
● Microsoft Azure File Storage
○ A network based storage share that can allows file to be accessed from from
different Virtual Machines.
Microsoft Azure Storage
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
Storage Types on Google Cloud Platform
Source: Google
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
Choosing a storage option
Source: https://cloud.google.com/storage-options/
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
Storage Classes
Source: https://cloudplatform.googleblog.com/2016/10/introducing-Coldline-and-a-unified-platform-for-data-storage.html
Cloud-Based Databases
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
SQL Databases
● One type of structure (relational)
● Developed in the 1970s to deal with first wave of data
storage applications
● Structure and data types are fixed in advance
● Uses SQL based querying through keywords such as SELECT,
INSERT, UPDATE etc.
● Examples: MySQL, Postgres, Oracle, SQL Server, etc.
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
NoSQL Databases
● Many different types of databases (key-value, document,
wide-column, and graph)
● Developed in 2000s to deal with limitations of SQL DBs
(concerning scale, replication, unstructured data)
● Dynamic schemas. Records add new information on the fly.
● Access is through object-oriented APIs.
● Examples: MongoDB, Cassandra, Neo4j, Redis Cache, etc.
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
SQL vs. NoSQL
Source: https://twitter.com/aricitak/status/781051974011813888
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
Source: Wikipedia
Key-Value Pair
● Hold a single serialized object for each key value.
● Good for storing large volumes of data where you want to get one item
for a given key value
● No need to query based on other properties of the item.
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
Column
● Key/value data stores that structure data storage into collections of
related columns called column families.
● Store each column family in a separate partition, with same key.
● An application can read a single column family without reading through
all of the data for an entity.
Source:
https://database.guide/what-is-a-column-store-dat
abase/
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
DocumentDB
● Key/value DBs in which the values are documents, "document" is a
collection of named fields and values.
● Data typically stored as XML, YAML, JSON, or plain text.
● Can query on non-key fields and define secondary indexes for faster
querying.
● Suitable for applications that retrieve data based on complex criteria
(beyond the value of the document key)
Source:
https://lennilobel.wordpress.com/2015/06/01/relational-dat
abases-vs-nosql-document-databases/
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
Index compare b/w MongoDB and SQL
Source: http://sql-vs-nosql.blogspot.com/2013/11/indexes-comparison-mongodb-vs-mssqlserver.html
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
● Data schema consists of nodes, edges and properties to model the
relationship between objects.
● Can efficiently perform queries that traverse the network of objects and
the relationships between them.
Graph Databases
Source: Wikipedia
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
Azure NoSQL Databases
Source:
https://docs.microsoft.com/en-us/aspnet/aspnet/overview/developing-apps-with-windows-azure/building-real-world-cloud-apps-with-windows-azure/dat
a-storage-options
Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018
Google Cloud Database Portfolio
Source: https://twitter.com/gregsramblings/status/839667109634293760
Thank You!

More Related Content

What's hot

Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryIntro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Chris Schalk
 
10 benefits to thinking inside Box
10 benefits to thinking inside Box10 benefits to thinking inside Box
10 benefits to thinking inside Box
IBM Analytics
 
BigQuery for the Big Data win
BigQuery for the Big Data winBigQuery for the Big Data win
BigQuery for the Big Data win
Ken Taylor
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
Adam Doyle
 
Quick Intro to Google Cloud Technologies
Quick Intro to Google Cloud TechnologiesQuick Intro to Google Cloud Technologies
Quick Intro to Google Cloud Technologies
Chris Schalk
 
Rails with MongoDB
Rails with MongoDBRails with MongoDB
Rails with MongoDBEugene Park
 
Metadata-powered dissemination of content
Metadata-powered dissemination of contentMetadata-powered dissemination of content
Metadata-powered dissemination of content
Nikos Manouselis
 
Google BigQuery Best Practices
Google BigQuery Best PracticesGoogle BigQuery Best Practices
Google BigQuery Best Practices
Matillion
 
An overview of BigQuery
An overview of BigQuery An overview of BigQuery
An overview of BigQuery
GirdhareeSaran
 
Redshift VS BigQuery
Redshift VS BigQueryRedshift VS BigQuery
Redshift VS BigQuery
Kostas Pardalis
 
Your data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureYour data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the future
ObjectRocket
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
Márton Kodok
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query Basics
Ido Green
 
How BigQuery broke my heart
How BigQuery broke my heartHow BigQuery broke my heart
How BigQuery broke my heart
Gabriel Hamilton
 
From hadoop to spark
From hadoop to sparkFrom hadoop to spark
From hadoop to spark
steccami
 
Kazoup software appliance - A technical deep dive
Kazoup software appliance - A technical deep diveKazoup software appliance - A technical deep dive
Kazoup software appliance - A technical deep dive
Kazoup
 
Autonomous ETL with Materialized Views
Autonomous ETL with Materialized ViewsAutonomous ETL with Materialized Views
Autonomous ETL with Materialized Views
Abhishek Somani
 
How Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science StackHow Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science Stack
Denodo
 
Big data == lean data
Big data == lean dataBig data == lean data
Big data == lean data
Lars Albertsson
 
Business Insight
Business InsightBusiness Insight
Business Insight
Microsoft
 

What's hot (20)

Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryIntro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
 
10 benefits to thinking inside Box
10 benefits to thinking inside Box10 benefits to thinking inside Box
10 benefits to thinking inside Box
 
BigQuery for the Big Data win
BigQuery for the Big Data winBigQuery for the Big Data win
BigQuery for the Big Data win
 
Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
Quick Intro to Google Cloud Technologies
Quick Intro to Google Cloud TechnologiesQuick Intro to Google Cloud Technologies
Quick Intro to Google Cloud Technologies
 
Rails with MongoDB
Rails with MongoDBRails with MongoDB
Rails with MongoDB
 
Metadata-powered dissemination of content
Metadata-powered dissemination of contentMetadata-powered dissemination of content
Metadata-powered dissemination of content
 
Google BigQuery Best Practices
Google BigQuery Best PracticesGoogle BigQuery Best Practices
Google BigQuery Best Practices
 
An overview of BigQuery
An overview of BigQuery An overview of BigQuery
An overview of BigQuery
 
Redshift VS BigQuery
Redshift VS BigQueryRedshift VS BigQuery
Redshift VS BigQuery
 
Your data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureYour data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the future
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query Basics
 
How BigQuery broke my heart
How BigQuery broke my heartHow BigQuery broke my heart
How BigQuery broke my heart
 
From hadoop to spark
From hadoop to sparkFrom hadoop to spark
From hadoop to spark
 
Kazoup software appliance - A technical deep dive
Kazoup software appliance - A technical deep diveKazoup software appliance - A technical deep dive
Kazoup software appliance - A technical deep dive
 
Autonomous ETL with Materialized Views
Autonomous ETL with Materialized ViewsAutonomous ETL with Materialized Views
Autonomous ETL with Materialized Views
 
How Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science StackHow Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science Stack
 
Big data == lean data
Big data == lean dataBig data == lean data
Big data == lean data
 
Business Insight
Business InsightBusiness Insight
Business Insight
 

Similar to Cloud Data Storage and Database

Webinar: Cloud Archiving – Amazon Glacier, Microsoft Azure or Something Else?
Webinar: Cloud Archiving – Amazon Glacier, Microsoft Azure or Something Else?Webinar: Cloud Archiving – Amazon Glacier, Microsoft Azure or Something Else?
Webinar: Cloud Archiving – Amazon Glacier, Microsoft Azure or Something Else?
Storage Switzerland
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio, Inc.
 
Azure data lakes
Azure data lakesAzure data lakes
Azure data lakes
Vishwas N
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
DATAVERSITY
 
Webinar: Cloud Storage vs. On-Premises Storage
Webinar: Cloud Storage vs. On-Premises StorageWebinar: Cloud Storage vs. On-Premises Storage
Webinar: Cloud Storage vs. On-Premises Storage
Storage Switzerland
 
Se training storage grid webscale technical overview
Se training   storage grid webscale technical overviewSe training   storage grid webscale technical overview
Se training storage grid webscale technical overview
solarisyougood
 
Webinar: Data Protection for Kubernetes
Webinar: Data Protection for KubernetesWebinar: Data Protection for Kubernetes
Webinar: Data Protection for Kubernetes
MayaData Inc
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
Denodo
 
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureShaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Denodo
 
Webinar: Sizing Up Object Storage for the Enterprise
Webinar: Sizing Up Object Storage for the EnterpriseWebinar: Sizing Up Object Storage for the Enterprise
Webinar: Sizing Up Object Storage for the Enterprise
Storage Switzerland
 
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureShaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Denodo
 
Data Platform on GCP
Data Platform on GCPData Platform on GCP
Data Platform on GCP
Patrick Alexander
 
Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)
Ido Green
 
Best practices on building data lakes and lake formation
Best practices on building data lakes and lake formationBest practices on building data lakes and lake formation
Best practices on building data lakes and lake formation
John Varghese
 
Just in Case: Archive-It & DuraCloud Integration
Just in Case: Archive-It & DuraCloud IntegrationJust in Case: Archive-It & DuraCloud Integration
Just in Case: Archive-It & DuraCloud Integration
Kristen Yarmey
 
Microsoft Azure News - 2018 March
Microsoft Azure News - 2018 MarchMicrosoft Azure News - 2018 March
Microsoft Azure News - 2018 March
Daniel Toomey
 
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCOCloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Storage Switzerland
 
Cloud Storage for all
Cloud Storage for allCloud Storage for all
Cloud Storage for all
Tony Ramos de la Torre
 
NetApp Se training storage grid webscale technical overview
NetApp Se training   storage grid webscale technical overviewNetApp Se training   storage grid webscale technical overview
NetApp Se training storage grid webscale technical overview
solarisyougood
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platforms
JamesAnderson599331
 

Similar to Cloud Data Storage and Database (20)

Webinar: Cloud Archiving – Amazon Glacier, Microsoft Azure or Something Else?
Webinar: Cloud Archiving – Amazon Glacier, Microsoft Azure or Something Else?Webinar: Cloud Archiving – Amazon Glacier, Microsoft Azure or Something Else?
Webinar: Cloud Archiving – Amazon Glacier, Microsoft Azure or Something Else?
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
Azure data lakes
Azure data lakesAzure data lakes
Azure data lakes
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
Webinar: Cloud Storage vs. On-Premises Storage
Webinar: Cloud Storage vs. On-Premises StorageWebinar: Cloud Storage vs. On-Premises Storage
Webinar: Cloud Storage vs. On-Premises Storage
 
Se training storage grid webscale technical overview
Se training   storage grid webscale technical overviewSe training   storage grid webscale technical overview
Se training storage grid webscale technical overview
 
Webinar: Data Protection for Kubernetes
Webinar: Data Protection for KubernetesWebinar: Data Protection for Kubernetes
Webinar: Data Protection for Kubernetes
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureShaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
 
Webinar: Sizing Up Object Storage for the Enterprise
Webinar: Sizing Up Object Storage for the EnterpriseWebinar: Sizing Up Object Storage for the Enterprise
Webinar: Sizing Up Object Storage for the Enterprise
 
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureShaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
 
Data Platform on GCP
Data Platform on GCPData Platform on GCP
Data Platform on GCP
 
Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)
 
Best practices on building data lakes and lake formation
Best practices on building data lakes and lake formationBest practices on building data lakes and lake formation
Best practices on building data lakes and lake formation
 
Just in Case: Archive-It & DuraCloud Integration
Just in Case: Archive-It & DuraCloud IntegrationJust in Case: Archive-It & DuraCloud Integration
Just in Case: Archive-It & DuraCloud Integration
 
Microsoft Azure News - 2018 March
Microsoft Azure News - 2018 MarchMicrosoft Azure News - 2018 March
Microsoft Azure News - 2018 March
 
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCOCloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
 
Cloud Storage for all
Cloud Storage for allCloud Storage for all
Cloud Storage for all
 
NetApp Se training storage grid webscale technical overview
NetApp Se training   storage grid webscale technical overviewNetApp Se training   storage grid webscale technical overview
NetApp Se training storage grid webscale technical overview
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platforms
 

More from Zia Babar

Ways to Extract Variable Insights when Data is Scarse
Ways to Extract Variable Insights when Data is ScarseWays to Extract Variable Insights when Data is Scarse
Ways to Extract Variable Insights when Data is Scarse
Zia Babar
 
Key Influencers in Social Networks
Key Influencers in Social NetworksKey Influencers in Social Networks
Key Influencers in Social Networks
Zia Babar
 
Lykaio Wang - Data Visualization in Web
Lykaio Wang - Data Visualization in WebLykaio Wang - Data Visualization in Web
Lykaio Wang - Data Visualization in Web
Zia Babar
 
Daria Voronova - The Art of Telling a Story
Daria Voronova - The Art of Telling a StoryDaria Voronova - The Art of Telling a Story
Daria Voronova - The Art of Telling a Story
Zia Babar
 
Waterloo Data Science and Data Engineering Meetup - 2018-08-29
Waterloo Data Science and Data Engineering Meetup - 2018-08-29Waterloo Data Science and Data Engineering Meetup - 2018-08-29
Waterloo Data Science and Data Engineering Meetup - 2018-08-29
Zia Babar
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
Zia Babar
 

More from Zia Babar (6)

Ways to Extract Variable Insights when Data is Scarse
Ways to Extract Variable Insights when Data is ScarseWays to Extract Variable Insights when Data is Scarse
Ways to Extract Variable Insights when Data is Scarse
 
Key Influencers in Social Networks
Key Influencers in Social NetworksKey Influencers in Social Networks
Key Influencers in Social Networks
 
Lykaio Wang - Data Visualization in Web
Lykaio Wang - Data Visualization in WebLykaio Wang - Data Visualization in Web
Lykaio Wang - Data Visualization in Web
 
Daria Voronova - The Art of Telling a Story
Daria Voronova - The Art of Telling a StoryDaria Voronova - The Art of Telling a Story
Daria Voronova - The Art of Telling a Story
 
Waterloo Data Science and Data Engineering Meetup - 2018-08-29
Waterloo Data Science and Data Engineering Meetup - 2018-08-29Waterloo Data Science and Data Engineering Meetup - 2018-08-29
Waterloo Data Science and Data Engineering Meetup - 2018-08-29
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
 

Recently uploaded

Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 

Cloud Data Storage and Database

  • 1. Cloud-Based Data Storages and Databases for Data Science Waterloo Data Science and Data Engineering Meetup Zia Babar LinkedIn: https://www.linkedin.com/in/zbabar Twitter: https://twitter.com/ziababar July 2018
  • 3. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 In Part 1 of this Series...
  • 4. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 Cloud Database ● SQL Databases ● NoSQL Databases ○ Key-Value ○ Column ○ Document ○ Graph ● Cloud-Based (on Azure) Agenda for Part 2 Cloud Storage ● File Storage ● Block Storage ● Object Storage ● Cloud-Based ○ On Microsoft Azure ○ On Google Cloud
  • 5. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 ● Differences between ○ Data Lakes ○ Databases ○ Data Warehouses ● Enterprise Data Integration What will be covered in Part 3... ● How to handle various data types… ○ Streaming data (e.g. IoT) ○ Batch data ○ Event data ○ Log Data
  • 6. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 About the Presenter Zia has 19 years of professional industry experience, with the most recent 8 years being in technical leadership roles, where he led various engineering teams pertaining to the design, development and deployment of enterprise applications with a particular focus on incorporating machine learning practices and cognitive services into software applications. Presently Zia is finishing up his PhD at the University of Toronto with particular research interests on designing enterprise cognitive systems.
  • 8. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 ● Many companies require a centralized, easily accessible way to store files and folders. ● File level storage provide traditional and simple approach to data storage at low cost. ● Files are given a name, tagged with metadata, and organized in folders under directories and sub-directories. ● Standard naming convention is used, which makes them easy to organize. ● Storage technologies such as NAS (Network Attached Storage) allow for sharing at the local system level. File Storage
  • 9. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 File Storage - Advantages ● Traditional and simple approach. ● Hierarchical system that excels at handling relatively small amounts of data. ● Low data storage cost and complexity.
  • 10. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 ● Navigating through large number of files is time consuming. ● Searching for data is problematic. ● File based operations (backup, restore, etc.) take much longer. File Storage - Disadvantages
  • 11. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 ● Sharing files is simple and effective. ● Scalability can be quickly achieved using scale-out NAS solutions at low a cost for archiving files. ● Deployment is easily attained. Porting over data is simple. ● Support for standard protocols and encryption, native replication, and various drive technologies ensures protection of data.. File Storage Use Cases
  • 12. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 ● Raw storage volume filled with files that are split into chunks of data of equal size. ● A server-based OS manages these volumes and uses them as individual hard drives to perform native OS functions. ● Typically deployed as SAN (Storage Area Network). Block Storage
  • 13. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 Block Storage - Advantages ● Unlike in file-based architectures, there are no additional meta-data details associated with a block outside of its address. ● The controlling OS manages data block storage by allocating storage for different applications and deciding where data goes in the block. ● This results in high performance with large amounts of data transfer possible.
  • 14. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 ● Storage is tied to one server at a time. ● Limited metadata about the information being stored. ● Cost is calculated on block storage allocated, and not block storage used. ● Accessibility only through a running server. ● Requires needs more hands-on setup vs object storage e.g. filesystem choices, permissions, versioning, backups, etc. Block Storage - Disadvantages
  • 15. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 Block Storage Use Cases ● Databases and other mission-critical applications that demand consistently high performance. ● Multiple data disks configured in a RAID array to bolster data protection and performance. ● Virtualization software vendors use block storage as file systems for the guest OS.
  • 16. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 ● Object-based storage stores data in isolated containers known as objects. ● Objects are given a unique identifiers and stored it in a flat memory model. ● Retrieval of objects is done using its unique ID and rely on REST APIs for access. ● Lends much greater flexibility to metadata where customized metadata can be paired with objects with specific applications. Object Storage
  • 17. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 ● Endless customization possibilities are possible with object storage. ● Object based operations are possible. Example, move objects to different areas of storage, delete objects when no longer needed, etc. ● Scaling out an object architecture is as simple as adding additional nodes to the storage cluster due to location transparency. ● REST based authentication and authorization approaches can be applied here as well. Object Storage - Advantages
  • 18. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 ● Generally object storage offers far more manageability, flexibility and scalability than file and block-level systems, however this is often at the expense of performance. Object Storage - Disadvantages
  • 19. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 ● Ability to accommodate unstructured data with relative ease, thus serving well big data needs of organizations. ● API based and object based access makes for simplified integration and usage in web applications. ● Optimum for massive amounts of data that typically accompany archived backups. Object Storage Use Cases
  • 20. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 Storage Types on Microsoft Azure Source: http://microsoftgeek.com/?p=2444
  • 21. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 Azure Storage Source: https://www.cloudberrylab.com/blog/microsoft-azure-storage-types-explained/
  • 22. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 ● Microsoft Azure Table Storage ○ Designed to store structured noSQL data (in tables) with the storage being scalable and with cheap. ○ Can be a substitute to store structured data without relying on expensive RDBMS databases and SQL querying techniques. Microsoft Azure Storage Source: https://www.infragistics.com/community/blogs/b/mihail_mateev/ posts/how-to-manage-microsoft-azure-table-storage-with-node- js
  • 23. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 ● Microsoft Azure Blob Storage ○ BLOB = Binary Large OBject ○ Object storage solution for the cloud ○ Optimized for storing massive amounts of unstructured data, such as text or binary data. Microsoft Azure Storage Source: https://www.qnap.com/en-uk/how-to/tutorial/article/backup-q nap-nas-data-to-microsoft-azure-storage
  • 24. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 ● Microsoft Azure Queue Storage ○ Type of storage designed to connect multiple decoupled and independent application components. ○ Allows for stateless applications and also asynchronous message queuing. Microsoft Azure Storage Source: https://newhelptech.wordpress.com/2017/11/10/step- by-step-how-to-create-and-configure-azure-queue-sto rage-using-visual-studio-in-microsoft-azure/
  • 25. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 ● Microsoft Azure Disk Storage ○ A service that allows you to create disks for your virtual machines. ○ The disk can be accessed from only one virtual machine as a local drive. ● Microsoft Azure File Storage ○ A network based storage share that can allows file to be accessed from from different Virtual Machines. Microsoft Azure Storage
  • 26. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 Storage Types on Google Cloud Platform Source: Google
  • 27. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 Choosing a storage option Source: https://cloud.google.com/storage-options/
  • 28. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 Storage Classes Source: https://cloudplatform.googleblog.com/2016/10/introducing-Coldline-and-a-unified-platform-for-data-storage.html
  • 30. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 SQL Databases ● One type of structure (relational) ● Developed in the 1970s to deal with first wave of data storage applications ● Structure and data types are fixed in advance ● Uses SQL based querying through keywords such as SELECT, INSERT, UPDATE etc. ● Examples: MySQL, Postgres, Oracle, SQL Server, etc.
  • 31. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 NoSQL Databases ● Many different types of databases (key-value, document, wide-column, and graph) ● Developed in 2000s to deal with limitations of SQL DBs (concerning scale, replication, unstructured data) ● Dynamic schemas. Records add new information on the fly. ● Access is through object-oriented APIs. ● Examples: MongoDB, Cassandra, Neo4j, Redis Cache, etc.
  • 32. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 SQL vs. NoSQL Source: https://twitter.com/aricitak/status/781051974011813888
  • 33. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 Source: Wikipedia Key-Value Pair ● Hold a single serialized object for each key value. ● Good for storing large volumes of data where you want to get one item for a given key value ● No need to query based on other properties of the item.
  • 34. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 Column ● Key/value data stores that structure data storage into collections of related columns called column families. ● Store each column family in a separate partition, with same key. ● An application can read a single column family without reading through all of the data for an entity. Source: https://database.guide/what-is-a-column-store-dat abase/
  • 35. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 DocumentDB ● Key/value DBs in which the values are documents, "document" is a collection of named fields and values. ● Data typically stored as XML, YAML, JSON, or plain text. ● Can query on non-key fields and define secondary indexes for faster querying. ● Suitable for applications that retrieve data based on complex criteria (beyond the value of the document key) Source: https://lennilobel.wordpress.com/2015/06/01/relational-dat abases-vs-nosql-document-databases/
  • 36. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 Index compare b/w MongoDB and SQL Source: http://sql-vs-nosql.blogspot.com/2013/11/indexes-comparison-mongodb-vs-mssqlserver.html
  • 37. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 ● Data schema consists of nodes, edges and properties to model the relationship between objects. ● Can efficiently perform queries that traverse the network of objects and the relationships between them. Graph Databases Source: Wikipedia
  • 38. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 Azure NoSQL Databases Source: https://docs.microsoft.com/en-us/aspnet/aspnet/overview/developing-apps-with-windows-azure/building-real-world-cloud-apps-with-windows-azure/dat a-storage-options
  • 39. Zia Babar @ Waterloo Data Science and Data Engineering Meetup, July 2018 Google Cloud Database Portfolio Source: https://twitter.com/gregsramblings/status/839667109634293760