SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

SQL AND NOSQL ARE TWO SIDES OF THE
SAME COIN
Michael Rys, Microsoft Corp.
@SQLServerMike

© 2012 Microsoft

Strata 2012 Conference, March 2012

AGENDA

• Scaling out your business is important!
• NoSQL Paradigms and NoSQL Platforms
• SQL learns from NoSQL
(with a demo of SQL Azure Federations)
• NoSQL learns from SQL
• Scalable Data Processing Platform of the Future

THE WEB 2.0 BUSINESS ARCHITECTURE

Attract Individual
Consumers:
- Provide interesting
service
- Provide mobility Online
- Provide social Monetize the Social:
Business - Improve individual
Monetize Individual: experience
- Upsell service
- VIP
Application - Re-sell Aggregate Data
(e.g., Advertisers)
- Speed
- Extra
Capabilities

SOCIAL NETWORKING: THE BUSINESS PROBLEM
• 100s of million of users
• 10s of million of users concurrently
• Terabytes to petabytes of data
• Structured and unstructured
• Required (eventual) data
consistency across users
• E.g. show your updated state in your
friends’ profile pages

SOLUTION
• Shard/Partition user data across hundreds to
thousands of SQL Databases
• Propagate data changes from one DB to other
DBs using reliable, async Message Service
• Managing routes from each DB to every other DB
would be too complex
• Global Transactions would hinder scale and
availability
• Provide a caching layer for performance
• And also used for
o Clean-up state (e.g. on account close)
o Deploy business logic (stored procedures)

EXAMPLE ARCHITECTURE

1-1000 3001-4000 I change
My DB
Async
gets updated my status
Message
Service TX1
TX3 TX2
Dispatcher Async userId=1024
Message
2001-3000 Async
Message
1001-2000
TX4 TX5

4001-5000 5001-6000 Web Tier
Data Tier

MANY LARGE SCALE CUSTOMERS USING SIMILAR PATTERNS
• Patterns
• Sharding and reliable messaging
• Sharding and fan/out query layer
• Caching layer

• Customer Examples
• Social Networking: Facebook, MySpace, etc
• Online electronic stores (cannot give names )
• Travel reservation systems (e.g. Choice International)
• MSN Casual Gaming
• etc.

LESSONS LEARNED FROM THESE SCENARIOS
• Require high availability
• Be able to scale out:
• Functional and Data Partitioning Architecture
• Provide scale-out processing:
o Function shipping
o Fanout and Map/Reduce processing
• Be able to deal with failures:
o Quorum
o Retries
o Eventual Consistency (similar to Read-consistent Snapshot Isolation)
• Be able to quickly grow and change:
• Elastic scale
• Flexible, open schema
• Multi-version schema support

Move better support for these patterns into the Data Platform!

WHAT IS NOSQL ABOUT?
• NoSQL = operational and developer agility at low CapEx and OpEx!

• Low Cost
• Free Open Source Stores, Community Support
• Scale CapEx cost below customer growth rate
• Web friendly developer model and tool chain, ease of use

• Processing Paradigms
• High Availability (scalable Replication, Fast Failover, DR/GeoDR, tunable latency)
• Scale-out (Sharding, Map-Reduce, Elasticity)
• Performance (tuned for specific workloads, Caching, co-located compute with partitioned state)
• Tunable/Eventual Consistency

• Data Model Paradigms
• Data first: Flexible Schema
• Low-impedance mismatch between programming and data model:
o Key-Documents and Objects (BLOBS, JSON, XML, POJO)
o Key-Wide Sparse Column Sets
o Graphs (e.g., RDF)

• Range from devices, over OLTP Web 2.0 applications to BigData Analytics

DATA MODELS
Data Model Example Stores (apologies to the ones I did not list)

Simple Key-Value Pairs Memcache, Redis, Dynamo, Voldermort, LevelDB, Azure Caching

Wide Sparse Column Sets HyperTable, Big Table, Cassandra, HBASE, Hyperbase, Amazon
DynamoDB, Windows Azure Tables, SQL Server/Azure Sparse
columns
BLOBs Amazon S3, Oracle Berkeley NoSQL, Windows Azure Blob
Store, SQL Server RBS/FileTable

JSON Documents MongoDB, CouchBase, Riak, RavenDB

Graph Neo4J, GraphDB, HypergraphDB, Stig, Intellidimension

Objects and XML Documents Versant, Oracle Berkeley NoSQL, MarkLogic, existDB, EMC
HiveDB, SQL Server/Azure, Oracle, IBM DB2

Extended Relational Oracle, EMC SQLFire, IBM DB2, MySQL, Postgres, SQL
Server/Azure

WHAT CAN SQL LEARN FROM NOSQL?
• Low CapEx, Low OpEx
• Built-in tunable High-Availability
• Data scale-out (Sharding)
• Processing scale-out (Map-Reduce, Fan-Out, tunable consistency)
• Flexible Data Models
• JSON (& XML) support
• Sparse columns/Column sets
• Integrate with BigData Analytics (e.g., Hadoop)

Many Relational Database Systems are incorporating these learning!

EXAMPLE: SQL AZURE FEDERATIONS
• Provides Data Partitioning/Sharding at the Data Platform
• Enables applications to build elastic scale-out applications
• Provides non-blocking SPLIT/DROP for shards (MERGE to come later)
• Auto-connect to right shard based on sharding keyvalue
• Provides SPLIT resilient query mode

SQL AZURE FEDERATION CONCEPTS
 Federation
Represents the data being sharded
Azure DB with Federation Root
 Federation Root Federation Directories, Federation
Database that logically houses federations, contains Users, Federation Distributions, …
federation meta data
 Federation Key
Value that determines the routing of a piece of data Federation “Orders_Fed”
(defines a Federation Distribution) (Federation Key: CustomerID)
 Atomic Unit
Member: PK [min, 100)
All rows with the same federation key value: always
together! AU AU AU
PK=5 PK=25 PK=35
 Federation Member (aka Shard)
A physical container for a set of federated tables for
a specific key range and reference tables Member: PK [100, 488)
 Federated Table AU AU AU
Table that contains only atomic units for the PK=105 PK=235 PK=365

Connection
member’s key range
Gateway
 Reference Table Member: PK [488, max)
Non-sharded table AU AU AU
PK=555 PK=2545 PK=3565
Sharded
16
Application

DEMO
MAP-REDUCE SCALE-OUT OVER SQL
AZURE FEDERATIONS
• Sharded GamesInfo table using SQL Azure Federations

• Use a C# library that does implement a Map/Reduce
processor on top SQL Azure Federations

• Mapper and Reducer are specified using SQL
17

WHAT CAN NOSQL LEARN FROM SQL?
• Flexible data is good, but:
• Provide optional schema in data platform to help with constraints and optimizations
• Procedural Scale-Out processing is good, but:
• Develop a declarative language suited for and across the data models (e.g., coSQL)
• Standardize suitable abstractions and languages
• Eventual Consistency is good, but:
• Provide users the choice
• Simple Queries are good, but:
• Provide me with secondary indexes
• it will be more efficient to join between two collections of JSON documents in the
query engine than in the Application layer

Many NoSQL Database Systems are starting to incorporate these learnings!

SCALE-OUT DATA PLATFORM ARCHITECTURE
Readable
Replica
Primary Copy
Shard
Readable
OLTP Workloads Replica
Traditional OLAP Workloads
Highly Available known schema
High Scale Readable Data warehouse, “Star joins”
High Flexibility Replica
Primary
Shard Dynamic OLAP Workloads
mostly touching 1 Readable
to low number of Replica 3Vs (Volume, Velocity, Variety)
shards Exploratory
Readable
Replica
Primary Scale-out queries, often using
Shard Query eventual consistent scale-out
Readable frameworks like Hadoop
Replica

SQL or NoSQL Store

BIG DATA REQUIRES AN END-TO-END APPROACH

21

CALL TO ACTION
• Familiarize yourself with the NoSQL genes in the Microsoft Online Platform
• Free 3-Month Trial for Windows and SQL Azure: http://www.windowsazure.com

• Engage with us throughout Strata
Presentation Speaker Date and Time
Do We Have the Tools We Need to Navigate
Dave Campbell 2/29 9:00am PST
the New World of Data?
Onsite Interview * Tim O’Reilly, Dave Campbell 2/29 10:15am PST
Unleash Insights on All Data With Microsoft
Alexander Stojanovic 2/29 11:30am PST
Big Data
Office Hours (Q&A session) Dave Campbell 2/29 1:30pm PST
Hadoop + Javascript: What We Learned Asad Khan 2/29 2:20pm PST
Democratizing BI at Microsoft: 40,000 Users
Kirkland Barrett 3/1 10:40am PST
and Counting
Data Marketplaces For Your Extended
Piyush Lumba 3/1 2:20pm PST
Enterprise

• Download slides with additional information and related resources:
http://www.slideshare.net/MichaelRys/presentations
22

RELATED RESOURCES
• Scale-Out with SQL Databases
• http://gigaom.com/cloud/facebook-shares-some-secrets-on-making-mysql-scale/
• Windows Gaming Experience Case Study:
http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=4000008310
• Scalable SQL: http://cacm.acm.org/magazines/2011/6/108663-scalable-sql
• http://www.slideshare.net/MichaelRys/scaling-with-sql-server-and-sql-azure-federations

• NoSQL and the Windows Azure Platform
• Whitepaper:
http://download.microsoft.com/download/9/E/9/9E9F240D-0EB6-472E-B4DE-
6D9FCBB505DD/Windows%20Azure%20No%20SQL%20White%20Paper.pdf
• SQL Federation blog:
http://blogs.msdn.com/b/cbiyikoglu/archive/2011/03/03/nosql-genes-in-sql-azure-
federations.aspx

• Contact me
• @SQLServerMike
• http://sqlblog.com/blogs/michael_rys/default.aspx

SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (10)

More from Michael Rys

More from Michael Rys (17)

Recently uploaded

Recently uploaded (20)

SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

Editor's Notes