SlideShare a Scribd company logo
1 of 46
Relational databases vs
Non-relational databases
James Serra
Big Data Evangelist
Microsoft
JamesSerra3@gmail.com
(RDBMS vs NoSQL vs Hadoop)
About Me
 Microsoft, Big Data Evangelist
 In IT for 30 years, worked on many BI and DW projects
 Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW/APS developer
 Been perm employee, contractor, consultant, business owner
 Presenter at PASS Business Analytics Conference, PASS Summit, Enterprise Data World conference
 Certifications: MCSE: Data Platform, Business Intelligence; MS: Architecting Microsoft Azure
Solutions, Design and Implement Big Data Analytics Solutions, Design and Implement Cloud Data
Platform Solutions
 Blog at JamesSerra.com
 Former SQL Server MVP
 Author of book “Reporting with Microsoft SQL Server 2012”
Agenda
 Definition and differences
 ACID vs BASE
 Four categories of NoSQL
 Use cases
 CAP theorem
 On-prem vs cloud
 Product categories
 Polyglot persistence
 Architecture samples
Goal
My goal is to give you a high level overview of all the technologies so you know where to start and put you on
the right path to be a hero!
Relational and non-relational defined
Relational databases (RDBMS, SQL Databases)
• Example: Microsoft SQL Server, Oracle Database, IBM DB2
• Mostly used in large enterprise scenarios
• Analytical RDBMS (OLAP, MPP) solutions are Analytics Platform System, Teradata, Netezza
Non-relational databases (NoSQL databases)
• Example: Azure Cosmos DB, MongoDB, Cassandra
• Four categories: Key-value stores, Wide-column stores, Document stores and Graph stores
Hadoop: Made up of Hadoop Distributed File System (HDFS), YARN and MapReduce
Origins
Using SQL Server, I need to index a few thousand documents and search them.
No problem. I can use Full-Text Search.
I’m a healthcare company and I need to store and analyze millions of medical claims per day.
Problem. Enter Hadoop.
Using SQL Server, my internal company app needs to handle a few thousand transactions per second.
No problem. I can handle that with a nice size server.
Now I have Pokémon Go where users can enter millions of transactions per second.
Problem. Enter NoSQL.
But most enterprise data just needs an RDBMS (89% market share – Gartner).
Main differences (Relational)
Pros
• Works with structured data
• Supports strict ACID transactional consistency
• Supports joins
• Built-in data integrity
• Large eco-system
• Relationships via constraints
• Limitless indexing
• Strong SQL
• OLTP and OLAP
• Most off-the-shelf applications run on RDBMS
Main differences (Relational)
Cons
• Does not scale out horizontally (concurrency and data size) – only vertically, unless use sharding
• Data is normalized, meaning lots of joins, affecting speed
• Difficulty in working with semi-structured data
• Schema-on-write
• Cost
Main differences (Non-relational/NoSQL)
Pros
• Works with semi-structured data (JSON, XML)
• Scales out (horizontal scaling – parallel query performance, replication)
• High concurrency, high volume random reads and writes
• Massive data stores
• Schema-free, schema-on-read
• Supports documents with different fields
• High availability
• Cost
• Simplicity of design: no “impedance mismatch”
• Finer control over availability
• Speed, due to not having to join tables
Main differences (Non-relational/NoSQL)
Cons
• Weaker or eventual consistency (BASE) instead of ACID
• Limited support for joins, does not support star schema
• Data is denormalized, requiring mass updates (i.e. product name change)
• Does not have built-in data integrity (must do in code)
• No relationship enforcement
• Limited indexing
• Weak SQL
• Limited transaction support
• Slow mass updates
• Uses 10-50x more space (replication, denormalized, documents)
• Difficulty tracking schema changes over time
• Most NoSQL databases are still too immature for reliable enterprise operational applications
Main differences (Hadoop)
Pros
• Not a type of database, but rather a open-source software ecosystem that allows for massively
parallel computing
• No inherent structure (no conversion to relational or JSON needed)
• Good for batch processing, large files, volume writes, parallel scans, sequential access
• Great for large, distributed data processing tasks where time isn’t a constraint (i.e. end-of-day
reports, scanning months of historical data)
• Tradeoff: In order to make deep connections between many data points, the technology
sacrifices speed
• Some NoSQL databases such as HBase are built on top of HDFS
Main differences (Hadoop)
Cons
• File system, not a database
• Not good for millions of users, random access, fast individual record lookups or updates (OLTP)
• Not so great for real-time analytics
• Lacks: indexing, metadata layer, query optimizer, memory management
• Same cons at non-relational: no ACID support, data integrity, limited indexing, weak SQL, etc
• Security limitations
• More complex debugging
Hadoop adoption has slowed
• Too much hype
• Companies adopt is without understanding use cases (i.e. real big data)
• Difficulty in finding skillset
• Pace of change too fast
• Too many products involved in a solution
• Other technologies (RDBMS, NoSQL) improving and expanding use cases
• Higher learning curve
ACID (RDBMS) vs BASE (NoSQL)
ATOMICITY: All data and commands in a
transaction succeed, or all fail and roll back
CONSISTENCY: All committed data must be
consistent with all data rules including
constraints, triggers, cascades, atomicity,
isolation, and durability
ISOLATION: Other operations cannot access
data that has been modified during a
transaction that has not yet completed
DURABILITY: Once a transaction is
committed, data will survive system failures,
and can be reliably recovered after an
unwanted deletion
Needed for bank transactions
Basically Available: Guaranteed Availability
Soft-state: The state of the system may change, even
without a query (because of node updates)
Eventually Consistent: The system will become
consistent over time
Ok for web page visits
ACID BASE
Strong Consistency Weak Consistency – stale data OK
Isolation Last Write Wins
Transaction Programmer Managed
Available/Consistent Available/Partition Tolerant
Robust Database/Simpler Code Simpler Database, Harder Code
Data stored in tables.
Tables contain some number of columns, each of a type.
A schema describes the columns each table can have.
Every table’s data is stored in one or more rows.
Each row contains a value for every column in that table.
Rows aren’t kept in any particular order.
Thanks to: Harri Kauhanan, http://www.slideshare.net/harrikauhanen/nosql-3376398
Relational stores
Key-value stores offer very high speed via the
least complicated data model—anything can
be stored as a value, as long as each value is
associated with a key or name.
Key Value
Key-value stores
Key “dog_12”: value_name “Stella”, value_mood “Happy”, etc
Wide-column stores are fast and can be nearly as simple as key-value stores. They include a primary
key, an optional secondary key, and anything stored as a value. Also called column stores
Values
Primary key Keys and values can
be sparse or
numerous
Secondary
key
Wide-column stores
Document stores contain data objects that are
inherently hierarchical, tree-like structures (most
notably JSON or XML). Not Word documents!
Document stores
Title:
Forgotten
Bridges
Title:
Mythical
Bridges
Purchased
Date: 03-02-2011
Purchased
Date: 09-09-2011
Purchased
Date: 05-07-2011
Name:
Ian
Name:
Alan
Graph store
Use cases for NoSQL categories
• Key-value stores: [Redis] For cache, queues, fit in memory, rapidly changing data, store blob data.
Examples: shopping cart, session data, leaderboards, stock prices. Fastest performance
• Wide-column stores: [Cassandra] Real-time querying of random (non-sequential) data, huge
number of writes, sensors. Examples: Web analytics, time series analytics, real-time data analysis,
banking industry. Internet scale
• Document stores: [MongoDB] Flexible schemas, dynamic queries, defined indexes, good
performance on big DB. Examples: order data, customer data, log data, product catalog, user
generated content (chat sessions, tweets, blog posts, ratings, comments). Fastest development
• Graph databases: [Neo4j] Graph-style data, social network, master data management, network and
IT operations. Examples: social relations, real-time recommendations, fraud detection, identity and
access management, graph-based search, web browsing, portfolio analytics, gene sequencing, class
curriculum
Note: Many NoSQL solutions are now multi-model
Velocity
Volume Per
Day
Real-world
Transactions
Per Day
Real-world
Transactions
Per Second
Relational DB Document DB Key Value or
Wide Column
8 GB 8.64B 100,000 As Is
86 GB 86.4B 1M Tuned* As Is
432 GB 432B 5M Appliance Tuned* As Is
864 GB 864B 10M Clustered
Appliance
Clustered
Servers
Tuned*
8,640 GB 8.64T 100M Many
Clustered
Servers
Clustered
Servers
43,200 GB 43.2T 500M Many
Clustered
Servers
* Tuned means tuning the model, queries, and/or hardware (more CPU, RAM, and Flash)
Focus of different data models
…you may not have the data volume for NoSQL (yet), but there are other reasons to use
NoSQL (semi-structured data, schemaless, high availability, etc)
Relational NewSQL stores are designed for web-scale
applications, but still require up-front schemas, joins, and
table management that can be labor intensive.
Blend RDBMS with NoSQL: provide the same scalable
performance of NoSQL systems for OLTP read-write
workloads while still maintaining the ACID guarantees of
a traditional relational database system.
Use case for different database technologies
• Traditional OLTP business systems (i.e. ERP, CRM, In-house app): relational database (RDBMS)
• Data warehouses (OLAP): relational database (SMP or MPP)
• Web and mobile global OLTP applications: non-relational database (NoSQL)
• Data lake: Hadoop
• Relational and scalable OLTP: NewSQL
CAP Theorem
Impossible for any shared data system to guarantee simultaneously all of the
following three properties:
Consistency: Once data is written, all future requests will contain the data. “Is
the data I’m looking at now the same if I look at it somewhere else?”
Availability: The database is always available and responsive. “What happens
if my database goes down?”
Partitioning: If part of the database is unavailable, other parts are unaffected.
“What if my data is on a different node?”
Relational: CA (i.e. SQL Server with no replication)
Non-relational: AP (Cassandra, CoachDB, Riak); CP (Hbase, Cosmos DB, MongoDB, Redis)
NoSQL can’t be both consistent and available. If two nodes (A and B) and B goes down, if
the A node takes requests, it is available but not consistent with B node. If A node stops
taking requests, it remains consistent with B node but it is not available. RDBMS is
consistent and available because it only has one node/partition (so no partition tolerance)
Microsoft data platform solutions
Product Category Description More Info
SQL Server 2016 RDBMS Earned top spot in Gartner’s Operational Database magic
quadrant. JSON support
https://www.microsoft.com/en-us/server-
cloud/products/sql-server-2016/
SQL Database RDBMS/DBaaS Cloud-based service that is provisioned and scaled quickly.
Has built-in high availability and disaster recovery. JSON
support
https://azure.microsoft.com/en-
us/services/sql-database/
SQL Data Warehouse MPP RDBMS/DBaaS Cloud-based service that handles relational big data.
Provision and scale quickly. Can pause service to reduce
cost
https://azure.microsoft.com/en-
us/services/sql-data-warehouse/
Analytics Platform System (APS) MPP RDBMS Big data analytics appliance for high performance and
seamless integration of all your data
https://www.microsoft.com/en-us/server-
cloud/products/analytics-platform-
system/
Azure Data Lake Store Hadoop storage Removes the complexities of ingesting and storing all of
your data while making it faster to get up and running with
batch, streaming, and interactive analytics
https://azure.microsoft.com/en-
us/services/data-lake-store/
Azure Data Lake Analytics On-demand analytics job
service/Big Data-as-a-
service
Cloud-based service that dynamically provisions resources
so you can run queries on exabytes of data. Includes U-
SQL, a new big data query language
https://azure.microsoft.com/en-
us/services/data-lake-analytics/
HDInsight PaaS Hadoop compute A managed Apache Hadoop, Spark, R, HBase, and Storm
cloud service made easy
https://azure.microsoft.com/en-
us/services/hdinsight/
Azure Cosmos DB PaaS NoSQL: Document
Store
Get your apps up and running in hours with a fully
managed NoSQL database service that indexes, stores, and
queries data using familiar SQL syntax
https://azure.microsoft.com/en-
us/services/cosmos-db/
Azure Table Storage PaaS NoSQL: Key-value
Store
Store large amount of semi-structured data in the cloud https://azure.microsoft.com/en-
us/services/storage/tables/
PolyBase
Query relational and non-relational data with T-SQL
Azure Cosmos DB consistency options
• Strong, which is the slowest of the four, but is guaranteed to always return correct data
• Bounded staleness, which ensures that an application will see changes in the order in which they were
made. This option does allow an application to see out-of-date data, but only within a specified
window, e.g., 500 milliseconds
• Session, which ensures that an application always sees its own writes correctly, but allows access to
potentially out-of-date or out-of-order data written by other applications
• Consistent Prefix (new), updates returned are some prefix of all the updates, with no gaps
• Eventual, which provides the fastest access, but also has the highest chance of returning out-of-date
data
On-prem vs Cloud
• On-prem: SQL Server, APS, MongoDB, Oracle, Cassandra, Neo4J
• IaaS Cloud: SQL Server in Azure VM, Oracle in Azure VM
• DBaaS/PaaS Cloud: SQL Database, SQL Data Warehouse, Azure Cosmos DB, Redshift, RDS, MongoLab
41
Product Categories
, Azure Cosmos DB, Coachbase
, APS, SQL DW
SQL Database, SQLite
, PostgreSQL
, Redis
, OrientDB
Product Categories
Rankings from
db-engines.com
Azure Product Categories
SQL DW
ADLS, ADLA
(PaaS)
(IaaS)
PostgreSQL
db-engines.com/en/ranking
Method of calculation:
• Number of mentions of the system
on websites
• General interest in the system
• Frequency of technical discussions
about the system
• Number of job offers, in which the
system is mentioned
• Number of profiles in professional
networks, in which the system is
mentioned
• Relevance in social networks
db-engines.com/en/ranking_definition
db-engines.com/en/ranking_categories
NoSQL = 14%
Polyglot Persistence
• Sometimes a relational store is the right choice, sometimes a NoSQL store is the right choice
• Sometimes you need more than one store: Using the right tool for the right job
Summary
Choose NoSQL when…
• You are bringing in new data with a lot of volume and/or variety
• Your data is non-relational/semi-structured
• Your team will be trained in these new technologies (NoSQL)
• You have enough information to correctly select the type and product of NoSQL for your situation
• You can relax transactional consistency when scalability or performance is more important
• You can service a large number of user requests vs rigorously enforcing business rules
Relational databases are created for strong consistency, but at the cost of speed and scale. NoSQL slightly sacrifices
consistency across nodes for both speed and scalability.
NoSQL and Hadoop are viable technologies for a subset of specialized needs and use cases.
Lines are getting blurred – do your homework!
Bottom line!
• RDBMS for enterprise OLTP and ACID compliance, or db’s under 5TB
• NoSQL for scaled OLTP and JSON documents
• Hadoop for big data analytics (OLAP)
Resources
 Relational database vs Non-relational databases: http://bit.ly/1HXn2Rt
 Types of NoSQL databases: http://bit.ly/1HXn8Zl
 What is Polyglot Persistence? http://bit.ly/1HXnhMm
 Hadoop and Data Warehouses: http://bit.ly/1xuXfu9
 Hadoop and Microsoft: http://bit.ly/20Cg2hA
Q & A ?
James Serra, Big Data Evangelist
Email me at: JamesSerra3@gmail.com
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com (where this slide deck is posted via the “Presentations” link on the top menu)

More Related Content

What's hot

Column oriented database
Column oriented databaseColumn oriented database
Column oriented database
Kanike Krishna
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
Derek Stainer
 

What's hot (20)

SQL vs. NoSQL Databases
SQL vs. NoSQL DatabasesSQL vs. NoSQL Databases
SQL vs. NoSQL Databases
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
Column oriented database
Column oriented databaseColumn oriented database
Column oriented database
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdf
 
Introduction to ETL and Data Integration
Introduction to ETL and Data IntegrationIntroduction to ETL and Data Integration
Introduction to ETL and Data Integration
 
Sql vs NoSQL-Presentation
 Sql vs NoSQL-Presentation Sql vs NoSQL-Presentation
Sql vs NoSQL-Presentation
 
Object Oriented Database Management System
Object Oriented Database Management SystemObject Oriented Database Management System
Object Oriented Database Management System
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 

Viewers also liked

An Introduction to Graph Databases
An Introduction to Graph DatabasesAn Introduction to Graph Databases
An Introduction to Graph Databases
InfiniteGraph
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
BigBlueHat
 
Introduction to graph databases GraphDays
Introduction to graph databases  GraphDaysIntroduction to graph databases  GraphDays
Introduction to graph databases GraphDays
Neo4j
 

Viewers also liked (17)

Relational vs. Non-Relational
Relational vs. Non-RelationalRelational vs. Non-Relational
Relational vs. Non-Relational
 
Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...
 
An Introduction to Graph Databases
An Introduction to Graph DatabasesAn Introduction to Graph Databases
An Introduction to Graph Databases
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
 
Converting Relational to Graph Databases
Converting Relational to Graph DatabasesConverting Relational to Graph Databases
Converting Relational to Graph Databases
 
Graph Based Recommendation Systems at eBay
Graph Based Recommendation Systems at eBayGraph Based Recommendation Systems at eBay
Graph Based Recommendation Systems at eBay
 
Graph databases
Graph databasesGraph databases
Graph databases
 
Graph Database, a little connected tour - Castano
Graph Database, a little connected tour - CastanoGraph Database, a little connected tour - Castano
Graph Database, a little connected tour - Castano
 
Lju Lazarevic
Lju LazarevicLju Lazarevic
Lju Lazarevic
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational Databases
 
Neo4j - graph database for recommendations
Neo4j - graph database for recommendationsNeo4j - graph database for recommendations
Neo4j - graph database for recommendations
 
Relational to Graph - Import
Relational to Graph - ImportRelational to Graph - Import
Relational to Graph - Import
 
Introduction to graph databases GraphDays
Introduction to graph databases  GraphDaysIntroduction to graph databases  GraphDays
Introduction to graph databases GraphDays
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 

Similar to Relational databases vs Non-relational databases

NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa
 

Similar to Relational databases vs Non-relational databases (20)

Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
 
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesDropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
 
NoSQL
NoSQLNoSQL
NoSQL
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with Cassandra
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
 
No sql
No sqlNo sql
No sql
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
Database awareness
Database awarenessDatabase awareness
Database awareness
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 

More from James Serra

More from James Serra (20)

Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and Governance
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
 
How to build your career
How to build your careerHow to build your career
How to build your career
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed Instance
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Learning to present and becoming good at it
Learning to present and becoming good at itLearning to present and becoming good at it
Learning to present and becoming good at it
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Relational databases vs Non-relational databases

  • 1. Relational databases vs Non-relational databases James Serra Big Data Evangelist Microsoft JamesSerra3@gmail.com (RDBMS vs NoSQL vs Hadoop)
  • 2. About Me  Microsoft, Big Data Evangelist  In IT for 30 years, worked on many BI and DW projects  Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM architect, PDW/APS developer  Been perm employee, contractor, consultant, business owner  Presenter at PASS Business Analytics Conference, PASS Summit, Enterprise Data World conference  Certifications: MCSE: Data Platform, Business Intelligence; MS: Architecting Microsoft Azure Solutions, Design and Implement Big Data Analytics Solutions, Design and Implement Cloud Data Platform Solutions  Blog at JamesSerra.com  Former SQL Server MVP  Author of book “Reporting with Microsoft SQL Server 2012”
  • 3. Agenda  Definition and differences  ACID vs BASE  Four categories of NoSQL  Use cases  CAP theorem  On-prem vs cloud  Product categories  Polyglot persistence  Architecture samples
  • 4. Goal My goal is to give you a high level overview of all the technologies so you know where to start and put you on the right path to be a hero!
  • 5. Relational and non-relational defined Relational databases (RDBMS, SQL Databases) • Example: Microsoft SQL Server, Oracle Database, IBM DB2 • Mostly used in large enterprise scenarios • Analytical RDBMS (OLAP, MPP) solutions are Analytics Platform System, Teradata, Netezza Non-relational databases (NoSQL databases) • Example: Azure Cosmos DB, MongoDB, Cassandra • Four categories: Key-value stores, Wide-column stores, Document stores and Graph stores Hadoop: Made up of Hadoop Distributed File System (HDFS), YARN and MapReduce
  • 6. Origins Using SQL Server, I need to index a few thousand documents and search them. No problem. I can use Full-Text Search. I’m a healthcare company and I need to store and analyze millions of medical claims per day. Problem. Enter Hadoop. Using SQL Server, my internal company app needs to handle a few thousand transactions per second. No problem. I can handle that with a nice size server. Now I have Pokémon Go where users can enter millions of transactions per second. Problem. Enter NoSQL. But most enterprise data just needs an RDBMS (89% market share – Gartner).
  • 7. Main differences (Relational) Pros • Works with structured data • Supports strict ACID transactional consistency • Supports joins • Built-in data integrity • Large eco-system • Relationships via constraints • Limitless indexing • Strong SQL • OLTP and OLAP • Most off-the-shelf applications run on RDBMS
  • 8. Main differences (Relational) Cons • Does not scale out horizontally (concurrency and data size) – only vertically, unless use sharding • Data is normalized, meaning lots of joins, affecting speed • Difficulty in working with semi-structured data • Schema-on-write • Cost
  • 9. Main differences (Non-relational/NoSQL) Pros • Works with semi-structured data (JSON, XML) • Scales out (horizontal scaling – parallel query performance, replication) • High concurrency, high volume random reads and writes • Massive data stores • Schema-free, schema-on-read • Supports documents with different fields • High availability • Cost • Simplicity of design: no “impedance mismatch” • Finer control over availability • Speed, due to not having to join tables
  • 10. Main differences (Non-relational/NoSQL) Cons • Weaker or eventual consistency (BASE) instead of ACID • Limited support for joins, does not support star schema • Data is denormalized, requiring mass updates (i.e. product name change) • Does not have built-in data integrity (must do in code) • No relationship enforcement • Limited indexing • Weak SQL • Limited transaction support • Slow mass updates • Uses 10-50x more space (replication, denormalized, documents) • Difficulty tracking schema changes over time • Most NoSQL databases are still too immature for reliable enterprise operational applications
  • 11. Main differences (Hadoop) Pros • Not a type of database, but rather a open-source software ecosystem that allows for massively parallel computing • No inherent structure (no conversion to relational or JSON needed) • Good for batch processing, large files, volume writes, parallel scans, sequential access • Great for large, distributed data processing tasks where time isn’t a constraint (i.e. end-of-day reports, scanning months of historical data) • Tradeoff: In order to make deep connections between many data points, the technology sacrifices speed • Some NoSQL databases such as HBase are built on top of HDFS
  • 12. Main differences (Hadoop) Cons • File system, not a database • Not good for millions of users, random access, fast individual record lookups or updates (OLTP) • Not so great for real-time analytics • Lacks: indexing, metadata layer, query optimizer, memory management • Same cons at non-relational: no ACID support, data integrity, limited indexing, weak SQL, etc • Security limitations • More complex debugging Hadoop adoption has slowed • Too much hype • Companies adopt is without understanding use cases (i.e. real big data) • Difficulty in finding skillset • Pace of change too fast • Too many products involved in a solution • Other technologies (RDBMS, NoSQL) improving and expanding use cases • Higher learning curve
  • 13. ACID (RDBMS) vs BASE (NoSQL) ATOMICITY: All data and commands in a transaction succeed, or all fail and roll back CONSISTENCY: All committed data must be consistent with all data rules including constraints, triggers, cascades, atomicity, isolation, and durability ISOLATION: Other operations cannot access data that has been modified during a transaction that has not yet completed DURABILITY: Once a transaction is committed, data will survive system failures, and can be reliably recovered after an unwanted deletion Needed for bank transactions Basically Available: Guaranteed Availability Soft-state: The state of the system may change, even without a query (because of node updates) Eventually Consistent: The system will become consistent over time Ok for web page visits ACID BASE Strong Consistency Weak Consistency – stale data OK Isolation Last Write Wins Transaction Programmer Managed Available/Consistent Available/Partition Tolerant Robust Database/Simpler Code Simpler Database, Harder Code
  • 14. Data stored in tables. Tables contain some number of columns, each of a type. A schema describes the columns each table can have. Every table’s data is stored in one or more rows. Each row contains a value for every column in that table. Rows aren’t kept in any particular order.
  • 15. Thanks to: Harri Kauhanan, http://www.slideshare.net/harrikauhanen/nosql-3376398 Relational stores
  • 16. Key-value stores offer very high speed via the least complicated data model—anything can be stored as a value, as long as each value is associated with a key or name. Key Value
  • 17. Key-value stores Key “dog_12”: value_name “Stella”, value_mood “Happy”, etc
  • 18. Wide-column stores are fast and can be nearly as simple as key-value stores. They include a primary key, an optional secondary key, and anything stored as a value. Also called column stores Values Primary key Keys and values can be sparse or numerous Secondary key
  • 20. Document stores contain data objects that are inherently hierarchical, tree-like structures (most notably JSON or XML). Not Word documents!
  • 24. Use cases for NoSQL categories • Key-value stores: [Redis] For cache, queues, fit in memory, rapidly changing data, store blob data. Examples: shopping cart, session data, leaderboards, stock prices. Fastest performance • Wide-column stores: [Cassandra] Real-time querying of random (non-sequential) data, huge number of writes, sensors. Examples: Web analytics, time series analytics, real-time data analysis, banking industry. Internet scale • Document stores: [MongoDB] Flexible schemas, dynamic queries, defined indexes, good performance on big DB. Examples: order data, customer data, log data, product catalog, user generated content (chat sessions, tweets, blog posts, ratings, comments). Fastest development • Graph databases: [Neo4j] Graph-style data, social network, master data management, network and IT operations. Examples: social relations, real-time recommendations, fraud detection, identity and access management, graph-based search, web browsing, portfolio analytics, gene sequencing, class curriculum Note: Many NoSQL solutions are now multi-model
  • 25. Velocity Volume Per Day Real-world Transactions Per Day Real-world Transactions Per Second Relational DB Document DB Key Value or Wide Column 8 GB 8.64B 100,000 As Is 86 GB 86.4B 1M Tuned* As Is 432 GB 432B 5M Appliance Tuned* As Is 864 GB 864B 10M Clustered Appliance Clustered Servers Tuned* 8,640 GB 8.64T 100M Many Clustered Servers Clustered Servers 43,200 GB 43.2T 500M Many Clustered Servers * Tuned means tuning the model, queries, and/or hardware (more CPU, RAM, and Flash)
  • 26. Focus of different data models …you may not have the data volume for NoSQL (yet), but there are other reasons to use NoSQL (semi-structured data, schemaless, high availability, etc)
  • 27. Relational NewSQL stores are designed for web-scale applications, but still require up-front schemas, joins, and table management that can be labor intensive. Blend RDBMS with NoSQL: provide the same scalable performance of NoSQL systems for OLTP read-write workloads while still maintaining the ACID guarantees of a traditional relational database system.
  • 28. Use case for different database technologies • Traditional OLTP business systems (i.e. ERP, CRM, In-house app): relational database (RDBMS) • Data warehouses (OLAP): relational database (SMP or MPP) • Web and mobile global OLTP applications: non-relational database (NoSQL) • Data lake: Hadoop • Relational and scalable OLTP: NewSQL
  • 29. CAP Theorem Impossible for any shared data system to guarantee simultaneously all of the following three properties: Consistency: Once data is written, all future requests will contain the data. “Is the data I’m looking at now the same if I look at it somewhere else?” Availability: The database is always available and responsive. “What happens if my database goes down?” Partitioning: If part of the database is unavailable, other parts are unaffected. “What if my data is on a different node?” Relational: CA (i.e. SQL Server with no replication) Non-relational: AP (Cassandra, CoachDB, Riak); CP (Hbase, Cosmos DB, MongoDB, Redis) NoSQL can’t be both consistent and available. If two nodes (A and B) and B goes down, if the A node takes requests, it is available but not consistent with B node. If A node stops taking requests, it remains consistent with B node but it is not available. RDBMS is consistent and available because it only has one node/partition (so no partition tolerance)
  • 30. Microsoft data platform solutions Product Category Description More Info SQL Server 2016 RDBMS Earned top spot in Gartner’s Operational Database magic quadrant. JSON support https://www.microsoft.com/en-us/server- cloud/products/sql-server-2016/ SQL Database RDBMS/DBaaS Cloud-based service that is provisioned and scaled quickly. Has built-in high availability and disaster recovery. JSON support https://azure.microsoft.com/en- us/services/sql-database/ SQL Data Warehouse MPP RDBMS/DBaaS Cloud-based service that handles relational big data. Provision and scale quickly. Can pause service to reduce cost https://azure.microsoft.com/en- us/services/sql-data-warehouse/ Analytics Platform System (APS) MPP RDBMS Big data analytics appliance for high performance and seamless integration of all your data https://www.microsoft.com/en-us/server- cloud/products/analytics-platform- system/ Azure Data Lake Store Hadoop storage Removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics https://azure.microsoft.com/en- us/services/data-lake-store/ Azure Data Lake Analytics On-demand analytics job service/Big Data-as-a- service Cloud-based service that dynamically provisions resources so you can run queries on exabytes of data. Includes U- SQL, a new big data query language https://azure.microsoft.com/en- us/services/data-lake-analytics/ HDInsight PaaS Hadoop compute A managed Apache Hadoop, Spark, R, HBase, and Storm cloud service made easy https://azure.microsoft.com/en- us/services/hdinsight/ Azure Cosmos DB PaaS NoSQL: Document Store Get your apps up and running in hours with a fully managed NoSQL database service that indexes, stores, and queries data using familiar SQL syntax https://azure.microsoft.com/en- us/services/cosmos-db/ Azure Table Storage PaaS NoSQL: Key-value Store Store large amount of semi-structured data in the cloud https://azure.microsoft.com/en- us/services/storage/tables/
  • 31. PolyBase Query relational and non-relational data with T-SQL
  • 32. Azure Cosmos DB consistency options • Strong, which is the slowest of the four, but is guaranteed to always return correct data • Bounded staleness, which ensures that an application will see changes in the order in which they were made. This option does allow an application to see out-of-date data, but only within a specified window, e.g., 500 milliseconds • Session, which ensures that an application always sees its own writes correctly, but allows access to potentially out-of-date or out-of-order data written by other applications • Consistent Prefix (new), updates returned are some prefix of all the updates, with no gaps • Eventual, which provides the fastest access, but also has the highest chance of returning out-of-date data
  • 33. On-prem vs Cloud • On-prem: SQL Server, APS, MongoDB, Oracle, Cassandra, Neo4J • IaaS Cloud: SQL Server in Azure VM, Oracle in Azure VM • DBaaS/PaaS Cloud: SQL Database, SQL Data Warehouse, Azure Cosmos DB, Redshift, RDS, MongoLab
  • 34. 41
  • 35. Product Categories , Azure Cosmos DB, Coachbase , APS, SQL DW SQL Database, SQLite , PostgreSQL , Redis , OrientDB
  • 37. Azure Product Categories SQL DW ADLS, ADLA (PaaS) (IaaS) PostgreSQL
  • 38. db-engines.com/en/ranking Method of calculation: • Number of mentions of the system on websites • General interest in the system • Frequency of technical discussions about the system • Number of job offers, in which the system is mentioned • Number of profiles in professional networks, in which the system is mentioned • Relevance in social networks db-engines.com/en/ranking_definition
  • 40. Polyglot Persistence • Sometimes a relational store is the right choice, sometimes a NoSQL store is the right choice • Sometimes you need more than one store: Using the right tool for the right job
  • 41.
  • 42.
  • 43. Summary Choose NoSQL when… • You are bringing in new data with a lot of volume and/or variety • Your data is non-relational/semi-structured • Your team will be trained in these new technologies (NoSQL) • You have enough information to correctly select the type and product of NoSQL for your situation • You can relax transactional consistency when scalability or performance is more important • You can service a large number of user requests vs rigorously enforcing business rules Relational databases are created for strong consistency, but at the cost of speed and scale. NoSQL slightly sacrifices consistency across nodes for both speed and scalability. NoSQL and Hadoop are viable technologies for a subset of specialized needs and use cases. Lines are getting blurred – do your homework!
  • 44. Bottom line! • RDBMS for enterprise OLTP and ACID compliance, or db’s under 5TB • NoSQL for scaled OLTP and JSON documents • Hadoop for big data analytics (OLAP)
  • 45. Resources  Relational database vs Non-relational databases: http://bit.ly/1HXn2Rt  Types of NoSQL databases: http://bit.ly/1HXn8Zl  What is Polyglot Persistence? http://bit.ly/1HXnhMm  Hadoop and Data Warehouses: http://bit.ly/1xuXfu9  Hadoop and Microsoft: http://bit.ly/20Cg2hA
  • 46. Q & A ? James Serra, Big Data Evangelist Email me at: JamesSerra3@gmail.com Follow me at: @JamesSerra Link to me at: www.linkedin.com/in/JamesSerra Visit my blog at: JamesSerra.com (where this slide deck is posted via the “Presentations” link on the top menu)

Editor's Notes

  1. There is a lot of confusion about the place and purpose of the many recent non-relational database solutions (“NoSQL databases”) compared to the relational database solutions that have been around for so many years.  In this presentation I will first clarify what exactly these database solutions are, how they compare to Hadoop, and discuss the best use cases for each.  I’ll discuss topics involving OLTP, scaling, data warehousing, polyglot persistence, and the CAP theorem.  We will even touch on a new type of database solution called NewSQL.  If you are building a new solution it is important to understand all your options so you take the right path to success.
  2. Fluff, but point is I bring real work experience to the session
  3. My goal is to give you a high level overview of all the technologies so you know where to start Make you a hero
  4. LAMP stack (Linux, Apache, MySQL, PHP/ Python/ Perl)
  5. Hadoop started 2006. NoSQL started 2009 DocumentDB has done 5m/tps per region for 4 regions, so 20m/tps. DocumentDB uses local storage Kevin Cox: What is the highest performance (transactions per second) you have seen out of SQL Server?  Over 500k/sec.  Very dependent on using flash-type storage for tran log; i.e. FusionIO or similar.  Also short transactions (stock trades). Matt Goswell: Please see attached.  SDX offers 171,800 TPS however this is using SQL 2014.  We are waiting on updated numbers for SQL 2016. Arvind Shyamsundar: The question is fairly open-ended and the answer is dependent on the workload pattern. On the in-memory OLTP front, we achieved 1.2 million batch requests / second on a Fujitsu Primergy server (4 sockets, 72 cores, 144 logical procs) last October. The Superdome X can go up to 16-sockets and hundreds of cores, but with the form factor beyond 4 sockets comes increased NUMA memory latency. So more sockets does not necessarily translate to more throughput. The recent 10TB TPC-H numbers we released were all on 8-socket Lenovo boxes, and the workload involved is predominantly read-workload https://blogs.msdn.microsoft.com/sqlcat/2016/10/26/how-bwin-is-using-sql-server-2016-in-memory-oltp-to-achieve-unprecedented-performance-and-scale/ sql server: 1.2m batch requests/sec (30-40 sql statements each batch) Batch requests / second is the nearest equivalent to compare transactions / second. Statements is not an accurate comparison. Transactions / second is too overloaded / ambiguous because it could mean any of:   Business transactions / second (one business transactions could mean multiple SQL batches) Batch requests / second (assuming one business transaction == one SQL batches) Some other number involving interplay between SQL commands and external web services etc.   So from a pure OLTP perspective we prefer to quote batch requests / second in this ‘benchmark’. Proper benchmarks like TPC benchmarks have their own clearly defined unit of meansurement (http://www.tpc.org/tpcc/detail.asp)   Arvind Shyamsundar
  6. OLTP DBMS now called Operational DBMS: http://www.gartner.com/technology/reprints.do?id=1-2RIVJYE&ct=151104&st=sb Hadoop is kind of FileSystem on which Several Ecosystem can work. Its not a DB. Nosql is a kind of DB, Which having specific property. The diff between filesystem and database is subtle. Anyway databases store all data in files or in RAM. Also we have "object storages"(like S3), or " key-value data stores"(like Riak), or "data structure stores"(like Redis) and we can treat them as the databases. Hadoop is file system and technology stack including NoSQL solutions(HBase for example). NoSQL is a set of methods or ways of data handling. Hadoop HDFS + YARN is a file system on steroids... i.e. it is neither a relational DBMS's nor non-relational (NoSql) DBMS's... it is optimized for string processing (large strings in large amounts of data)... Hadoop allows users to interact with the data via SQL (multiple options of SQL dialects) and NoSql (multiple options of procedural languages)... unfortunately, in a sub-optimal performance and functionally restrictive for all non string related processing... that's the reason for all vendors and gurus to be so emphatic about Hadoop costs...  For any real-time processing or analytics, NoSQL would be a better use case, rather than Hadoop. However, there are several factors to keep in mind. NoSQL is better suited for simple data structure (key-value, doc etc), but Hadoop has no inherent structure. Hadoop is better for volume writes and parallel scans, but NoSQL is better for high volume random reads (indexed access) and writes.  Finally, it would be important to look at what type of analytics you want to do: statistical (with R), Visualization etc to pick the right store. Sometimes it would mean to have both hadoop and NoSQL On SQL, you nay not need to define schema, but you still need to convert to key/value or JSON before you can store  Hadoop is good for batch processing and you don't want to expose to millions of users Historically Hadoop ecosystem(hdfs,map reduce,yarn etc) targeted OLAP use cases and No Sql (Cassandra, Couchbase etc) were more towards OLTP work loads. However lines are getting blurred. You gave a good example of Map Reduce on Couchbase. Or Hbase on Hadoop ecosystem targeting real time use cases. HDFS (Hadoop File System) has been built for large files and is very efficient in batch processing ,supports sequential access of data only , hence no support for random access and fast individual record lookups and data update is not efficient either, while NoSQL database addresses all the these challenges. To reiterate in short, Hadoop is a computation platform, while NoSQL is an unstructured database. Hadoop on its most basic constituent is a distributed file system HDFS built to store large volume of string data in parallel with redundancy. But the filesystem by itself is of little use without the rest of the ecosystem like YARN, HBASE, HIVE, etc (and now SPARC for more realtime usage) providing more user friendly usage. HBASE also falls under the noSQL category. NoSQL come in different flavors based on the inherent architecture and use-cases they support. 
  7. OLTP DBMS now called Operational DBMS: http://www.gartner.com/technology/reprints.do?id=1-2RIVJYE&ct=151104&st=sb Hadoop is kind of FileSystem on which Several Ecosystem can work. Its not a DB. Nosql is a kind of DB, Which having specific property. The diff between filesystem and database is subtle. Anyway databases store all data in files or in RAM. Also we have "object storages"(like S3), or " key-value data stores"(like Riak), or "data structure stores"(like Redis) and we can treat them as the databases. Hadoop is file system and technology stack including NoSQL solutions(HBase for example). NoSQL is a set of methods or ways of data handling. Hadoop HDFS + YARN is a file system on steroids... i.e. it is neither a relational DBMS's nor non-relational (NoSql) DBMS's... it is optimized for string processing (large strings in large amounts of data)... Hadoop allows users to interact with the data via SQL (multiple options of SQL dialects) and NoSql (multiple options of procedural languages)... unfortunately, in a sub-optimal performance and functionally restrictive for all non string related processing... that's the reason for all vendors and gurus to be so emphatic about Hadoop costs...  For any real-time processing or analytics, NoSQL would be a better use case, rather than Hadoop. However, there are several factors to keep in mind. NoSQL is better suited for simple data structure (key-value, doc etc), but Hadoop has no inherent structure. Hadoop is better for volume writes and parallel scans, but NoSQL is better for high volume random reads (indexed access) and writes.  Finally, it would be important to look at what type of analytics you want to do: statistical (with R), Visualization etc to pick the right store. Sometimes it would mean to have both hadoop and NoSQL On SQL, you nay not need to define schema, but you still need to convert to key/value or JSON before you can store  Hadoop is good for batch processing and you don't want to expose to millions of users Historically Hadoop ecosystem(hdfs,map reduce,yarn etc) targeted OLAP use cases and No Sql (Cassandra, Couchbase etc) were more towards OLTP work loads. However lines are getting blurred. You gave a good example of Map Reduce on Couchbase. Or Hbase on Hadoop ecosystem targeting real time use cases. HDFS (Hadoop File System) has been built for large files and is very efficient in batch processing ,supports sequential access of data only , hence no support for random access and fast individual record lookups and data update is not efficient either, while NoSQL database addresses all the these challenges. To reiterate in short, Hadoop is a computation platform, while NoSQL is an unstructured database. Hadoop on its most basic constituent is a distributed file system HDFS built to store large volume of string data in parallel with redundancy. But the filesystem by itself is of little use without the rest of the ecosystem like YARN, HBASE, HIVE, etc (and now SPARC for more realtime usage) providing more user friendly usage. HBASE also falls under the noSQL category. NoSQL come in different flavors based on the inherent architecture and use-cases they support. 
  8. NoSQL: Analogy of building a race car from a regular car…stripping off the parts scalable because all data within one doc and no need to move data to join tables Join not a problem for OLTP, but a problem for OLAP
  9. NoSQL: Analogy of building a race car from a regular car…stripping off the parts scalable because all data within one doc and no need to move data to join tables Join not a problem for OLTP, but a problem for OLAP
  10. http://www.jamesserra.com/archive/2014/05/hadoop-and-data-warehouses/ Hadoop Common – Contains libraries and utilities needed by other Hadoop modules Hadoop Distributed File System (HDFS) – A distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster Hadoop MapReduce – A programming model for large scale data processing.  It is designed for batch processing.  Although the Hadoop framework is implemented in Java, MapReduce applications can be written in other programming languages (R, Python, C# etc).  But Java is the most popular Hadoop YARN – YARN is a resource manager introduced in Hadoop 2 that was created by separating the processing engine and resource management capabilities of MapReduce as it was implemented in Hadoop 1 (see Hadoop 1.0 vs Hadoop 2.0).  YARN is often called the operating system of Hadoop because it is responsible for managing and monitoring workloads, maintaining a multi-tenant environment, implementing security controls, and managing high availability features of Hadoop
  11. http://www.jamesserra.com/archive/2014/05/hadoop-and-data-warehouses/ Hadoop Common – Contains libraries and utilities needed by other Hadoop modules Hadoop Distributed File System (HDFS) – A distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster Hadoop MapReduce – A programming model for large scale data processing.  It is designed for batch processing.  Although the Hadoop framework is implemented in Java, MapReduce applications can be written in other programming languages (R, Python, C# etc).  But Java is the most popular Hadoop YARN – YARN is a resource manager introduced in Hadoop 2 that was created by separating the processing engine and resource management capabilities of MapReduce as it was implemented in Hadoop 1 (see Hadoop 1.0 vs Hadoop 2.0).  YARN is often called the operating system of Hadoop because it is responsible for managing and monitoring workloads, maintaining a multi-tenant environment, implementing security controls, and managing high availability features of Hadoop https://www.linkedin.com/pulse/hadoop-falling-george-hill
  12. http://www.jamesserra.com/archive/2014/05/hadoop-and-data-warehouses/ Hadoop Common – Contains libraries and utilities needed by other Hadoop modules Hadoop Distributed File System (HDFS) – A distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster Hadoop MapReduce – A programming model for large scale data processing.  It is designed for batch processing.  Although the Hadoop framework is implemented in Java, MapReduce applications can be written in other programming languages (R, Python, C# etc).  But Java is the most popular Hadoop YARN – YARN is a resource manager introduced in Hadoop 2 that was created by separating the processing engine and resource management capabilities of MapReduce as it was implemented in Hadoop 1 (see Hadoop 1.0 vs Hadoop 2.0).  YARN is often called the operating system of Hadoop because it is responsible for managing and monitoring workloads, maintaining a multi-tenant environment, implementing security controls, and managing high availability features of Hadoop
  13. Am I doing bank transactions or counting web page visits? In NoSQL, to maintain high availability or for performance reasons, data has multiple copies. These copies will not all be updated instantaneously when there is a data change, but will all eventually be updated (“eventually consistent”)
  14. https://azure.microsoft.com/en-us/documentation/articles/documentdb-consistency-levels/ http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
  15. Use cases: complex transactions, inventory system
  16. add tables to relational stores example
  17. Use cases: scale, cache, store blob data, shopping cart, session data, leaderboards, queues See http://www.slideshare.net/harrikauhanen/nosql-3376398
  18. Also called Columnar stores or column stores Use cases: scale, real-time querying of random (non-sequential) data. Web analytics, time series analytics, huge number of writes, big data storage. Like document stores except data is stored on nodes
  19. Use cases: scale, flexible schemas, orders, customers, log data, product catalog
  20. Use cases: social network, master data management, network and IT operations, real-time recommendations, fraud detection, identity and access management, graph-based search, web browsing, portfolio analytics, gene sequencing, class curriculum
  21. MongoDB vs Cassandra: http://theprofessionalspoint.blogspot.com/2014/01/mongodb-vs-cassandra-difference-and.html: Cassandra is much better suited for highly distributed applications due to its tunable replication engine. It was built from the ground up to be a shared-nothing data engine. MongoDB, by contrast, is better suited for applications that need a dynamic schema-less approach. https://www.youtube.com/watch?v=PENcqjVKqr4c https://www.youtube.com/watch?v=gJFG04Sy6NY http://maxivak.com/differences-between-nosql-databases-cassandra-vs-mongodb-vs-couchdb-vs-redis-vs-riak-vs-hbase-vs-membase-vs-neo4j/ http://www.infoworld.com/article/2848722/nosql/mongodb-cassandra-hbase-three-nosql-databases-to-watch.html
  22. MongoDB vs Cassandra: http://theprofessionalspoint.blogspot.com/2014/01/mongodb-vs-cassandra-difference-and.html: Cassandra is much better suited for highly distributed applications due to its tunable replication engine. It was built from the ground up to be a shared-nothing data engine. MongoDB, by contrast, is better suited for applications that need a dynamic schema-less approach. https://www.youtube.com/watch?v=PENcqjVKqr4c https://www.youtube.com/watch?v=gJFG04Sy6NY http://maxivak.com/differences-between-nosql-databases-cassandra-vs-mongodb-vs-couchdb-vs-redis-vs-riak-vs-hbase-vs-membase-vs-neo4j/ http://www.infoworld.com/article/2848722/nosql/mongodb-cassandra-hbase-three-nosql-databases-to-watch.html
  23. http://www.slideshare.net/quipo/nosql-databases-why-what-and-when/172 http://www.rosebt.com/blog/row-vs-columnar-vs-nosql-database-options http://jeffsayre.com/2010/09/17/web-3-0-smartups-moving-beyond-the-relational-database/
  24. http://db.cs.cmu.edu/papers/2016/pavlo-newsql-sigmodrec2016.pdf Use cases: scale, A class of modern RDBMS’s that seek to provide the same scalable performance of NoSQL systems for OLTP read-write workloads while still maintaining the ACID guarantees of a traditional relational database system.  The disadvantages is they are not for OLAP-style queries, and they are inappropriate for databases over a few terabytes.   Aims to blend NoSQL and Relational/SQL. VoltDB, NuoDB, MemSQL, SAP HANA, Splice Machine, Clustrix, Altibase
  25. If you would rather go the route of using Hadoop software, many of the above technologies have Hadoop or open source equivalents: AtScale and Apache Kylin create SSAS-like OLAP cubes on Hadoop, Jethro Data creates indexes on Hadoop data, Apache Atlas for metadata and lineage tools, Apache Drill to query Hadoop files via SQL, Apache Mahout or Spark MLib for machine learning, Apache Flink for distributed stream and batch data processing, Apache HBase for storing non-relational streaming data and supporting fast query response times, SQLite/MySQL/PostgreSQL for storing relational data, Apache Kafka for event queuing, Apache Falcon for data and pipeline management (ETL), and Apache Knox for authentication and authorization.
  26. https://codahale.com/you-cant-sacrifice-partition-tolerance/ Emails don’t need to be consistent, stock prices do http://www.3pillarglobal.com/insights/short-history-databases-rdbms-nosql-beyond In NoSQL, to maintain high availability or for performance reasons, data has multiple copies. These copies will not all be updated instantaneously when there is a data change, but will all eventually be updated (“eventually consistent”) https://www.infoq.com/news/2014/04/bitcoin-banking-mongodb
  27. https://azure.microsoft.com/en-us/blog/json-functionalities-in-azure-sql-database-public-preview/ “If you need a specialized JSON database in order to take advantage of automatic indexing of JSON fields, tunable consistency levels for globally distributed data, and JavaScript integration, you may want to choose Azure DocumentDB as a storage engine.” https://blogs.msdn.microsoft.com/jocapc/2015/05/16/json-support-in-sql-server-2016/ https://msdn.microsoft.com/en-us/library/dn921897.aspx “If you have pure JSON workloads where you want to use some query language that is customized and dedicated for processing of JSON documents, you might consider Microsoft Azure DocumentDB.” http://demo.sqlmag.com/scaling-success-sql-server-2016/integrating-big-data-and-sql-server-2016 https://www.simple-talk.com/sql/learn-sql-server/json-support-in-sql-server-2016/
  28. So now that you’re convinced of the benefits of PaaS, let’s take a look at the menu of available PaaS data services on Azure. It’s important to remember that with any application, you can use multiple data stores. Cache and Search are specialized data stores that you wouldn’t use as a primary data store, but they are worth mentioning here. Note: speaker should do a brief verbal overview of the information contained in this chart.
  29. Presenter guidance: Introduce the family portrait. Slide talk track: This is how we think about the core differences across the data services for capturing and managing data On the left, you have more database imposed structure on the left and this loosens as you move to the right, ending in blobs which is just large containers of binary data.
  30. Presenter guidance: At this point, let’s take a slight detour to mention SQL Server in a VM and how it fits into the mix. It’s important in the context of dev/test and lift and shift (or migrating existing apps). Establish app dev scenarios as common ground. Slide talk track: Let’s first orient ourselves in what we see as common application scenarios. Are you seeing these? Are you interested in these scenarios? Do these represent scenarios you would be willing to move to the cloud? The services listed are generally those that we would see you using for these scenarios, but this is just what we see. There are infinite ways to do things and at the end of the day, it’s your decision. Azure is there to make sure you have all of the options and choices available that you need.
  31. https://msdn.microsoft.com/en-us/library/mt143171.aspx When it comes to key BI investments we are making it much easier to manage relational and non-relational data with Polybase technology that allows you to query Hadoop data and SQL Server relational data through single T-SQL query. One of the challenges we see with Hadoop is there are not enough people out there with Hadoop and Map Reduce skillset and this technology simplifies the skillset needed to manage Hadoop data. This can also work across your on-premises environment or SQL Server running in Azure.
  32. https://docs.microsoft.com/en-us/Azure/documentdb/documentdb-consistency-levels http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
  33. https://azure.microsoft.com/en-us/documentation/articles/documentdb-consistency-levels/ http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
  34. Ranked by popularity as of 3-14-16
  35. https://azure.microsoft.com/en-us/documentation/articles/documentdb-consistency-levels/ http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
  36. Clock- 47 Minutes In this scenario based HOL, you will learn how to build a ‘polyglot persistence’ data pattern that is common in modern cloud hosted applications. Requirements of modern applications, such as, greater scale and availability, have driven the industry to begin using a much broader range of technologies for storing data within an application. Microsoft Azure provides a range of storage technologies that support these architectures and this HOL provides an example of the use of these in the well understood scenario of e-Commerce. With data services in Microsoft Azure, you can quickly design, deploy, and manage highly-available apps that scale without downtime and that enable you to rapidly respond to security threats. Features built into services like Azure SQL Database, Azure Search, and Azure DocumentDB help your apps scale smartly, run faster, and stay available and secure. In this HOL you will see a browser based e-commerce application running under the LCA approved sample company name ‘AdventureWorks’. It has been created to demonstrate functionality provided by the following data storage technologies (SQL Database, DocumentDB, Search, Table Storage). In a real application, decisions will need to be made as to where data is stored. In this HOL we wish to highlight; how using multiple Azure data service technologies allows you to take a modern approach to data in your applications. Note- The website (which gets built out of this HOL) is not intended to be a fully functioning site. It is not designed to be a reference e-commerce implementation nor a starting point for a customer’s implementation of an e-commerce site on Azure, rather it will provide the following functionality in order to demonstrate the selected storage technologies. In the course of this lab, you will gain greater familiarity with Azure SQL Database, DocumentDB, Azure Search and Table Storage through performing the following tasks: Familiarize yourself with one of the tenant-company’s websites and its Azure SQL Database backend. Create a new database using the Azure portal. Configure and implement vertical scaling by increasing the capacity of a database. Use Azure SQL Database auditing features to track down an erroneous deletion from a database. Use Azure SQL Database point-in-time restore to correct the deletion (Optional) Configure and implement Azure SQL Database geographic disaster recovery to prevent large-scale data loss. Locate data using Azure Search. Modernize and create an iterative experience using DocumentDB. http://INMMDDYYYY.azurewebsites.net/
  37. Show a couple of examples of using multiple data services.
  38. Show a couple of examples of using multiple data services.
  39. 1) Copy source data into the Azure Data Lake Store (twitter data example) 2) Massage/filter the data using Hadoop (or skip using Hadoop and use stored procedures in SQL DW/DB to massage data after step #5) 3) Pass data into Azure ML to build models using Hive query (or pass in directly from Blob Storage).  You can use a Python package to pull data directly from the Azure Data Lake Store 4) Azure ML feeds prediction results into the data warehouse (you can also pull in data from SQL Database or SQL Data Warehouse) 5) Non-relational data in Azure Data Lake Store copied to data warehouse in relational format (optionally use PolyBase with external tables to avoid copying data) 6) Power BI pulls data from data warehouse to build dashboards and reports 7) Azure Data Catalog captures metadata from Azure Data Lake Store, SQL DW/DB, and SSAS cubes 8) Power BI can pull data from the Azure Data Lake Store via HDInsight/Spark (beta) or directly.  Excel can pull data from the Azure Data Lake Store via Hive ODBC or PowerQuery/HDInsight 9) To support high concurrency if using SQL DW, or for easier end-user data layer, create an SSAS cube