SQL and NoSQL
in the Context of SQL Server
Michael Rys
Program Manager, Microsoft Corp.
@SQLServerMike
Key Session Takeaways

 Scaling your Business is important
 What are the NoSQL paradigms
 You can use NoSQL Paradigms with SQL
  Server and SQL Azure
 We are working on moving the paradigms
  into SQL Server
The Web 2.0 Business Architecture


Attract Individual
Consumers:
- Provide interesting
  service
- Provide mobility
- Provide social

Monetize Individual:
- Upsell service
                          Online      Monetize the Social:
                                      - Improve individual
    - VIP
    - Speed
                         Business       experience
                                      - Re-sell Aggregate Data
    - Extra
       Capabilities
                        Application     (e.g., Advertisers)
Social Networking: the Business Problem
 100s of million of users
   10s of million of users
    concurrently
 Terabytes to petabytes of
  data
   Structured and unstructured
 Required (eventual) data
  consistency across users
   E.g. show your updated state
    in your friends’ profile pages
Solution
 Shard/Partition user data across
  hundreds to thousands of SQL
  Databases
 Propagate data changes using
  reliable, async Message Service
    No Global Transactions! Hinder scale
     and availability!
 Provide a caching layer for
  performance
 Also used for
       Clean-up state (e.g. on account close)
       Deploy business logic (stored procedures)
Example Architecture (MySpace.com)



1-1000         3001-4000
          Async                              My DB                 I change
         Message
                                             gets updated           my status
                Service         TX1
   TX3         TX2
               Dispatcher Async                                    userId=1024
                           Message
             Async
2001-3000   Message
                                 1001-2000

    TX4            TX5

4001-5000      5001-6000                                    Web Tier
              Data Tier
Many Large Scale Customers using Similar Patterns

 Patterns
   Sharding and reliable messaging
   Sharding and fan/out query layer
   Caching layer

 Customer Examples
     Social Networking: Facebook, MySpace, etc
     Online electronic stores (cannot give names )
     Travel reservation systems (e.g. Choice International)
     MSN Casual Gaming
     etc.
Lessons Learned from these Scenarios

 Require high availability
 Be able to scale out
    Functional and Data Partitioning Architecture
    Provide scale-out processing
    Be able to deal with failures
 Be able to quickly grow and change
    Elastic scale
    Flexible, open schema
    Multi-version schema support

Move better support for these patterns into the Data
Platform!
What is NoSQL about?
 NoSQL = operational and developer agility at low CapEx and OpEx!
 Low Cost
    Free Software and Support
    Scale CapEx cost below customer growth rate
    Web friendly developer model and tool chain, Easy to use
 Processing Paradigms
      High Availability
      Data and Processing Scale-out
      Performance
      Tunable/Eventual Consistency
 Data Model Paradigms
    Data first: Flexible Schema
    Low-impedance mismatch between programming and data model
From devices, over OLTP Web 2.0 applications to BigData Analytics
Data Models
Data Model                  Example Stores
Simple Key-Value Pairs      Memcache, Redis, Dynamo, Voldermort, LevelDB,
                            Azure Caching
Wide Sparse Column Sets     HyperTable, Big Table, Cassandra, HBASE,
                            Hyperbase, Amazon DynamoDB, Windows Azure
                            Tables, SQL Server/Azure Sparse columns
BLOBs                       Amazon S3, Oracle Berkeley NoSQL, Windows
                            Azure Blob Store, SQL Server RBS/FileTable
JSON Documents              MongoDB, CouchBase, Riak, RavenDB
Graph                       Neo4J, GraphDB, HypergraphDB, Stig,
                            Intellidimension
Objects and XML Documents   Versant, Oracle Berkeley NoSQL, MarkLogic,
                            existDB, EMC HiveDB, SQL Server/Azure, Oracle,
                            IBM DB2
Extended Relational         Oracle, EMC SQLFire, IBM DB2, MySQL, Postgres,
                            SQL Server/Azure/Parallel DW
Operational Agility
 You want:
   Availability of service (scalability)
   Global consistency
   Network Partition Tolerance
 You can only get 2 of 3 (CAP Theorem)
 In Brave New World:
     Online businesses need availability
     It is distributed, because it is big
     thus Network Partitioning is unavoidable
     Hence global consistency must be relaxed
      → BASE vs ACID
BASE vs ACID Consistency
 ACID :
  Atomicity, Consistency, Isolation, Durability
    Full Serializability provides all 4
    Distributed transactions providing all 4 limits
     service availability, throughput and scalability
 BASE: Basically Available, Soft state, Eventual
  consistency
    Relaxes ACID properties to increase                          Replica
     availability, throughput and scalability
                                                        Primary
    Replica consistency:
                                                                  Replica
       Impacts recoverability
    Cross-node consistency:                                      Replica
       Impacts globally consistent view of the world
                                                        Primary
                                                                  Replica
Operational Agility
   Performance and Scale
   Automate management lifecycle (or fail)
   Simple deployment lifecycle
   No DB or OS Admin telling me what to do
Developer Agility

   Code First and revise quickly
   Application-model first (before database)
   Flexible open data models
   You don’t know exactly what you are looking for
   Lower Pain of adoption and maintenance
   No DB or OS Admin telling me what to do
NoSQL and BigData: Two sides of the same coin

  BigData:
    Origin: large unstructured data processing
     (sensor data, scientific research, web stream analysis)
    Analytics focused (“new” OLAP, Map-Reduce, Hadoop)
    Scale-out data and processing paradigm at low cost
  NoSQL:
    Origin: developing agile, scalable web applications
    Realtime customer transaction focused (“new” OLTP)
    Scale-out data and processing paradigm with flexible
     data model at low cost
  Both use many of the same paradigms
The Web 2.0 Business Architecture


Attract Individual
Consumers:
- Provide interesting
  service
- Provide mobility
- Provide social

Monetize Individual:
- Upsell service
                          Online      Monetize the Social:
                                      - Improve individual
    - VIP
    - Speed
                         Business       experience
                                      - Re-sell Aggregate Data
    - Extra
       Capabilities
                        Application     (e.g., Advertisers)
Scale-Out Data PLATFORM Architecture

                              Readable
                               Replica

                    Primary              Copy
                     Shard

OLTP Workloads                Readable
                               Replica
                                                 Traditional OLAP Workloads
Highly Available
                                                 known schema
High Scale
                                                 Data warehouse, “Star joins”
High Flexibility              Readable
                               Replica
mostly touching 1   Primary
to low number of     Shard                       Dynamic OLAP Workloads
shards                        Readable
                               Replica           3Vs (Volume, Velocity, Variety)
                                                 Exploratory

                              Readable           Scale-out queries, often using
                               Replica           eventual consistent scale-out
                                                 frameworks like Hadoop
                    Primary
                     Shard               Query
                              Readable
                               Replica
What does SQL Server provide today?
   Scale-programming models
        Service Broker provides:
              Functional, service-oriented architecture
              Scale out on demand
              Async reliable messaging provides for true eventual consistency
        SQL Azure Federations provides Sharding support
        Distributed Queries
        SQL Server Parallel Data Warehouse
   Programmer Agility
        XML, XQuery for XML documents
        FileTable for documents (but what is equivalent solution in the cloud?)
        Open Schema: Sparse Columns and column sets (but still schema first)
        CLR extensibility, but
              No indexing, bad cost-models
              Difficult to deploy (and DB Admins often do not allow it!)
   Failure Resilience
        SQL Azure has local automatic HA, self-healing
   Rich Services
        Semantic Extraction and Similarity Search in SQL Server 2012
   DB/OS Admin “interference”
        SQL Azure: Self-maintaining and Self-provisioning
Introducing SQL Azure Federations

 Provides Data Partitioning/Sharding
  at the Data Platform
 Enables applications to build elastic
  scale-out applications
 Provides non-blocking SPLIT/DROP for
  shards (MERGE to come later)
 Auto-connect to right shard based on
  sharding keyvalue
 Provides SPLIT resilient query mode
SQL Azure Federation Concepts
 Federation
                                                              Azure DB with Federation Root
         Represents the data being sharded
 Federation Root                                             Federation Directories, Federation
         Database that logically houses                       Users, Federation Distributions, …
         federations, contains federation meta data
 Federation Key
         Value that determines the routing of a piece         Federation “Orders_Fed”
         of data (defines a Federation Distribution)          (Federation Key: CustomerID)
 Federation Member (aka Shard)
         Physical container for a set of federated
         tables of a specific key range and reference               Member: PK [min, 100)
         tables
 Atomic Unit                                                     AU
                                                                 PK=5
                                                                               AU
                                                                              PK=25
                                                                                          AU
                                                                                         PK=35
         All rows with the same federation
         key value: always together!
 Federated Table
                                                                     Member: PK [100, 488)
         Table that contains only atomic units
         for the member’s key range
                                                                   AU           AU          AU

                                                 Connection


    Reference Table                                              PK=105       PK=235      PK=365
                                                  Gateway




         Non-sharded table

                                                                    Member: PK [488, max)

                                                                   AU          AU          AU
                              Sharded                            PK=555      PK=2545     PK=3565

    20                       Application
Demo
Map-Reduce scale-out
over SQL Azure Federations
SQL Azure: A Not Only SQL Data Platform
SQL Azure adds support for NoSQL paradigms in the data platform:
   No CapEx, Low OpEx (which should/will be even lower )
   High-Availability (each DB has two replicas)
   Sharding support with federations:
      Data platform provides online SPLIT/DROP
      Filtered connection to provide split resilient programming model
   Flexible Data Models:
      XML support
      Sparse columns/Column sets
   More to come in the future…
        More scale and tunable HA (to support OLTP/OLAP model)
        Taking Federations further (orthogonality, merge, fanout)
        Integration with Hadoop eco-system
        More data-first (data-driven columnsets, JSON)
Call to Action

 Download the Presentation from:
  http://www.slideshare.net/MichaelRys/presentations
 Fill out SQL Azure Federation Survey:
  http://connect.microsoft.com/BusinessPlatform/Survey/S
  urvey.aspx?SurveyID=13625
Related Content
   Related Whitepapers and Presentations:
        CACM: Scalable SQL: http://cacm.acm.org/magazines/2011/6/108663-scalable-sql
        NoSQL and the Windows Azure Platform:
         http://download.microsoft.com/download/9/E/9/9E9F240D-0EB6-472E-B4DE-
         6D9FCBB505DD/Windows%20Azure%20No%20SQL%20White%20Paper.pdf
        SQL Federation blog: http://blogs.msdn.com/b/cbiyikoglu/archive/2011/03/03/nosql-genes-in-
         sql-azure-federations.aspx
      Windows Gaming Experience Case Study:
       http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=4
       000008310
      NoSQL Presentations: http://www.slideshare.net/MichaelRys/presentations

   Contact me:
        mrys@microsoft.com
        @SQLServerMike
        http://sqlblog.com/blogs/michael_rys/default.aspx

SQL and NoSQL in SQL Server

  • 1.
    SQL and NoSQL inthe Context of SQL Server Michael Rys Program Manager, Microsoft Corp. @SQLServerMike
  • 2.
    Key Session Takeaways Scaling your Business is important  What are the NoSQL paradigms  You can use NoSQL Paradigms with SQL Server and SQL Azure  We are working on moving the paradigms into SQL Server
  • 3.
    The Web 2.0Business Architecture Attract Individual Consumers: - Provide interesting service - Provide mobility - Provide social Monetize Individual: - Upsell service Online Monetize the Social: - Improve individual - VIP - Speed Business experience - Re-sell Aggregate Data - Extra Capabilities Application (e.g., Advertisers)
  • 4.
    Social Networking: theBusiness Problem  100s of million of users  10s of million of users concurrently  Terabytes to petabytes of data  Structured and unstructured  Required (eventual) data consistency across users  E.g. show your updated state in your friends’ profile pages
  • 5.
    Solution  Shard/Partition userdata across hundreds to thousands of SQL Databases  Propagate data changes using reliable, async Message Service  No Global Transactions! Hinder scale and availability!  Provide a caching layer for performance  Also used for  Clean-up state (e.g. on account close)  Deploy business logic (stored procedures)
  • 6.
    Example Architecture (MySpace.com) 1-1000 3001-4000 Async My DB I change Message gets updated my status Service TX1 TX3 TX2 Dispatcher Async userId=1024 Message Async 2001-3000 Message 1001-2000 TX4 TX5 4001-5000 5001-6000 Web Tier Data Tier
  • 7.
    Many Large ScaleCustomers using Similar Patterns  Patterns  Sharding and reliable messaging  Sharding and fan/out query layer  Caching layer  Customer Examples  Social Networking: Facebook, MySpace, etc  Online electronic stores (cannot give names )  Travel reservation systems (e.g. Choice International)  MSN Casual Gaming  etc.
  • 8.
    Lessons Learned fromthese Scenarios  Require high availability  Be able to scale out  Functional and Data Partitioning Architecture  Provide scale-out processing  Be able to deal with failures  Be able to quickly grow and change  Elastic scale  Flexible, open schema  Multi-version schema support Move better support for these patterns into the Data Platform!
  • 9.
    What is NoSQLabout?  NoSQL = operational and developer agility at low CapEx and OpEx!  Low Cost  Free Software and Support  Scale CapEx cost below customer growth rate  Web friendly developer model and tool chain, Easy to use  Processing Paradigms  High Availability  Data and Processing Scale-out  Performance  Tunable/Eventual Consistency  Data Model Paradigms  Data first: Flexible Schema  Low-impedance mismatch between programming and data model From devices, over OLTP Web 2.0 applications to BigData Analytics
  • 10.
    Data Models Data Model Example Stores Simple Key-Value Pairs Memcache, Redis, Dynamo, Voldermort, LevelDB, Azure Caching Wide Sparse Column Sets HyperTable, Big Table, Cassandra, HBASE, Hyperbase, Amazon DynamoDB, Windows Azure Tables, SQL Server/Azure Sparse columns BLOBs Amazon S3, Oracle Berkeley NoSQL, Windows Azure Blob Store, SQL Server RBS/FileTable JSON Documents MongoDB, CouchBase, Riak, RavenDB Graph Neo4J, GraphDB, HypergraphDB, Stig, Intellidimension Objects and XML Documents Versant, Oracle Berkeley NoSQL, MarkLogic, existDB, EMC HiveDB, SQL Server/Azure, Oracle, IBM DB2 Extended Relational Oracle, EMC SQLFire, IBM DB2, MySQL, Postgres, SQL Server/Azure/Parallel DW
  • 11.
    Operational Agility  Youwant:  Availability of service (scalability)  Global consistency  Network Partition Tolerance  You can only get 2 of 3 (CAP Theorem)  In Brave New World:  Online businesses need availability  It is distributed, because it is big  thus Network Partitioning is unavoidable  Hence global consistency must be relaxed → BASE vs ACID
  • 12.
    BASE vs ACIDConsistency  ACID : Atomicity, Consistency, Isolation, Durability  Full Serializability provides all 4  Distributed transactions providing all 4 limits service availability, throughput and scalability  BASE: Basically Available, Soft state, Eventual consistency  Relaxes ACID properties to increase Replica availability, throughput and scalability Primary  Replica consistency: Replica  Impacts recoverability  Cross-node consistency: Replica  Impacts globally consistent view of the world Primary Replica
  • 13.
    Operational Agility  Performance and Scale  Automate management lifecycle (or fail)  Simple deployment lifecycle  No DB or OS Admin telling me what to do
  • 14.
    Developer Agility  Code First and revise quickly  Application-model first (before database)  Flexible open data models  You don’t know exactly what you are looking for  Lower Pain of adoption and maintenance  No DB or OS Admin telling me what to do
  • 15.
    NoSQL and BigData:Two sides of the same coin  BigData:  Origin: large unstructured data processing (sensor data, scientific research, web stream analysis)  Analytics focused (“new” OLAP, Map-Reduce, Hadoop)  Scale-out data and processing paradigm at low cost  NoSQL:  Origin: developing agile, scalable web applications  Realtime customer transaction focused (“new” OLTP)  Scale-out data and processing paradigm with flexible data model at low cost  Both use many of the same paradigms
  • 16.
    The Web 2.0Business Architecture Attract Individual Consumers: - Provide interesting service - Provide mobility - Provide social Monetize Individual: - Upsell service Online Monetize the Social: - Improve individual - VIP - Speed Business experience - Re-sell Aggregate Data - Extra Capabilities Application (e.g., Advertisers)
  • 17.
    Scale-Out Data PLATFORMArchitecture Readable Replica Primary Copy Shard OLTP Workloads Readable Replica Traditional OLAP Workloads Highly Available known schema High Scale Data warehouse, “Star joins” High Flexibility Readable Replica mostly touching 1 Primary to low number of Shard Dynamic OLAP Workloads shards Readable Replica 3Vs (Volume, Velocity, Variety) Exploratory Readable Scale-out queries, often using Replica eventual consistent scale-out frameworks like Hadoop Primary Shard Query Readable Replica
  • 18.
    What does SQLServer provide today?  Scale-programming models  Service Broker provides:  Functional, service-oriented architecture  Scale out on demand  Async reliable messaging provides for true eventual consistency  SQL Azure Federations provides Sharding support  Distributed Queries  SQL Server Parallel Data Warehouse  Programmer Agility  XML, XQuery for XML documents  FileTable for documents (but what is equivalent solution in the cloud?)  Open Schema: Sparse Columns and column sets (but still schema first)  CLR extensibility, but  No indexing, bad cost-models  Difficult to deploy (and DB Admins often do not allow it!)  Failure Resilience  SQL Azure has local automatic HA, self-healing  Rich Services  Semantic Extraction and Similarity Search in SQL Server 2012  DB/OS Admin “interference”  SQL Azure: Self-maintaining and Self-provisioning
  • 19.
    Introducing SQL AzureFederations  Provides Data Partitioning/Sharding at the Data Platform  Enables applications to build elastic scale-out applications  Provides non-blocking SPLIT/DROP for shards (MERGE to come later)  Auto-connect to right shard based on sharding keyvalue  Provides SPLIT resilient query mode
  • 20.
    SQL Azure FederationConcepts  Federation Azure DB with Federation Root Represents the data being sharded  Federation Root Federation Directories, Federation Database that logically houses Users, Federation Distributions, … federations, contains federation meta data  Federation Key Value that determines the routing of a piece Federation “Orders_Fed” of data (defines a Federation Distribution) (Federation Key: CustomerID)  Federation Member (aka Shard) Physical container for a set of federated tables of a specific key range and reference Member: PK [min, 100) tables  Atomic Unit AU PK=5 AU PK=25 AU PK=35 All rows with the same federation key value: always together!  Federated Table Member: PK [100, 488) Table that contains only atomic units for the member’s key range AU AU AU  Connection Reference Table PK=105 PK=235 PK=365 Gateway Non-sharded table Member: PK [488, max) AU AU AU Sharded PK=555 PK=2545 PK=3565 20 Application
  • 21.
  • 22.
    SQL Azure: ANot Only SQL Data Platform SQL Azure adds support for NoSQL paradigms in the data platform:  No CapEx, Low OpEx (which should/will be even lower )  High-Availability (each DB has two replicas)  Sharding support with federations:  Data platform provides online SPLIT/DROP  Filtered connection to provide split resilient programming model  Flexible Data Models:  XML support  Sparse columns/Column sets  More to come in the future…  More scale and tunable HA (to support OLTP/OLAP model)  Taking Federations further (orthogonality, merge, fanout)  Integration with Hadoop eco-system  More data-first (data-driven columnsets, JSON)
  • 23.
    Call to Action Download the Presentation from: http://www.slideshare.net/MichaelRys/presentations  Fill out SQL Azure Federation Survey: http://connect.microsoft.com/BusinessPlatform/Survey/S urvey.aspx?SurveyID=13625
  • 24.
    Related Content  Related Whitepapers and Presentations:  CACM: Scalable SQL: http://cacm.acm.org/magazines/2011/6/108663-scalable-sql  NoSQL and the Windows Azure Platform: http://download.microsoft.com/download/9/E/9/9E9F240D-0EB6-472E-B4DE- 6D9FCBB505DD/Windows%20Azure%20No%20SQL%20White%20Paper.pdf  SQL Federation blog: http://blogs.msdn.com/b/cbiyikoglu/archive/2011/03/03/nosql-genes-in- sql-azure-federations.aspx  Windows Gaming Experience Case Study: http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=4 000008310  NoSQL Presentations: http://www.slideshare.net/MichaelRys/presentations  Contact me:  mrys@microsoft.com  @SQLServerMike  http://sqlblog.com/blogs/michael_rys/default.aspx

Editor's Notes

  • #5 Example MySpace architecture:Service Dispatcher coordination point between all SQL ServersCentralizes route managementAvoids routes explosion Load-balanced across 30 SQL ServersMessages are sent randomly to theseEnables multicast/broadcast functionalitySupports destination lists and wildcards e.g. [DB1,DB3, DB4], DB%18,000 ~2k msgs/sec per dispatcher SQL ServerMyDB sends a message with my status change and a target list specifying the DBs that store my friends data.The Service Dispatcher forwards the message these DBs.Each DB processes the message updating my status in a partitioned table
  • #6 Example MSN Casual Gaming:~2 Million users at launch~86 Million services requests/day 135 Windows Azure Data Services Hosting VMs ca. 18K connections in Connection Pools, this could grow with trafficCa. 1200 SQL Azure requests/second spread across all partitions during peak load~ 90% reads vs 10% writes (this varies per storage type)~ 200 bytes of storage per user~ 20% of database storage is currently used, but expect this to growSharded over 400 SQL Azure Databases
  • #9 Requirehigh availabilityBe able to scale out:Functional and Data Partitioning ArchitectureProvide scale-out processing:Function shippingFanout and Map/Reduce processingBe able to deal with failures:QuorumRetriesEventual Consistency (similar to Read-consistent Snapshot Isolation)Be able to quickly grow and change:Elastic scaleFlexible, open schemaMulti-version schema supportMove better support for these patterns into the Data Platform!
  • #10 Note: Big-sized companies invest resources in building these platforms instead of using existing relational platforms!Low CostFree Open Source Stores, Community SupportScale cost below customer growth rateWeb friendly developer model and tool chain, Easy to useProcessing ParadigmsHigh Availability (scalable Replication, Fast Failover, DR/GeoDR, tunable latency)Scale-out (Sharding, Map-Reduce, Elasticity)Performance (tuned for workloads, Caching, co-located compute with partitioned state)Tunable/Eventual ConsistencyData Model ParadigmsData first: Flexible SchemaLow-impedance mismatch between programming and data model:Key-Documents and Objects (BLOBS, JSON, XML, POJO)Key-Wide Sparse Column SetsGraphs (e.g., RDF)
  • #14 Performance and Scale:Map/Reduce PatternsEventual consistency (trade-off due to CAP)ShardingCachingAutomate management Lifecycle:Elastic Scale on demand (no need to pay for resources until needed)Automatic Fail-overScalable Schema version rolloutPerf troubleshootingAuto alertingAuto loadbalancingAuto resourcing (e.g., auto splits based on policies)Declarative policy-based management
  • #15 Code First and revise quicklyWorking software over comprehensive documentationResponding to change over following a planApplication-model first (before database) Dictates the data model and queriesFlexible data modelsNo a priori modeling: Data first, schema later/Open SchemaKey/Value storesReduced impedance mismatch: JSON, XML, YAMLYou don’t know exactly what you are looking forMap/Reduce for adhoc analysisProvide Search across all your data instead of just queryLower Pain of adoption and maintenance From code to deployment & “monetization” of data, services, apps and tenantsRich Services out of the BoxData and services mashupEasy troubleshooting of deployed appsNo DB or OS Admin telling me what to do
  • #22 ShardedGamesInfo table using SQL Azure FederationsUse a C# library that does implement a Map/Reduce processor on top SQL Azure FederationsMapper and Reducer are specified using SQL