SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)


Published on

Presentation for http://strataconf.com/strata2012/public/schedule/detail/22693

Many of the new online and device-oriented application models require a high degree of operational and development agility such as unlimited elastic scale and flexible data models. The nascent NoSQL market is aiming to address these requirements but is extremely fragmented, with many competing vendors and technologies. Programming, deploying, and managing NoSQL solutions requires specialized and low-level knowledge that does not easily carry over from one vendor’s product to another. The SQL market on the other hand has a high level of maturity and at least conceptual standardization, but relational database systems were not originally designed for these requirements.

However, in contrast to common belief, the question of big versus small data is orthogonal to the question of SQL versus NoSQL. While the NoSQL model naturally supports extreme sharding, the fact that it does not require strong typing and normalization makes it attractive for “small” data as well. On the other hand, it is possible to scale relational SQL databases.

In this presentation, I will provide a short introduction to some architectural patterns that SQL-based solutions have been using to achieve scale and operational agility, contrast them with the NoSQL paradigms and show how SQL can be augmented with NoSQL paradigms at the platform level by using SQL Azure Federations as an example. I will also show how NoSQL offerings can benefit from the lessons learned with SQL.

What this all means is that NoSQL, BigData and SQL are not in conflict, like good and evil. Instead they are sometimes overlapping, but often complementary solutions that benefit from common paradigms addressing different requirements and can and will coexist.

Published in: Technology
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Example MySpace architecture:Service Dispatcher coordination point between all SQL ServersCentralizes route managementAvoids routes explosion Load-balanced across 30 SQL ServersMessages are sent randomly to theseEnables multicast/broadcast functionalitySupports destination lists and wildcards e.g. [DB1,DB3, DB4], DB%18,000 ~2k msgs/sec per dispatcher SQL ServerMyDB sends a message with my status change and a target list specifying the DBs that store my friends data.The Service Dispatcher forwards the message these DBs.Each DB processes the message updating my status in a partitioned table
  • Example MSN Casual Gaming:~2 Million users at launch~86 Million services requests/day 135 Windows Azure Data Services Hosting VMs ca. 18K connections in Connection Pools, this could grow with trafficCa. 1200 SQL Azure requests/second spread across all partitions during peak load~ 90% reads vs 10% writes (this varies per storage type)~ 200 bytes of storage per user~ 20% of database storage is currently used, but expect this to growSharded over 400 SQL Azure Databases
  • Note: Big-sized companies invest resources in building these platforms instead of using existing relational platforms!
  • No DB or OS Admin telling me what to do!
  • Performance and Scale:Map/Reduce PatternsEventual consistency (trade-off due to CAP)ShardingCachingAutomate management Lifecycle:Elastic Scale on demand (no need to pay for resources until needed)Automatic Fail-overScalable Schema version rolloutPerf troubleshootingAuto alertingAuto loadbalancingAuto resourcing (e.g., auto splits based on policies)Declarative policy-based management
  • Code First and revise quicklyWorking software over comprehensive documentationResponding to change over following a planApplication-model first (before database) Dictates the data model and queriesFlexible data modelsNo a priori modeling: Data first, schema later/Open SchemaKey/Value storesReduced impedance mismatch: JSON, XML, YAMLYou don’t know exactly what you are looking forMap/Reduce for adhoc analysisProvide Search across all your data instead of just queryLower Pain of adoption and maintenance From code to deployment & “monetization” of data, services, apps and tenantsRich Services out of the BoxData and services mashupEasy troubleshooting of deployed appsNo DB or OS Admin telling me what to do
  • Low CapEx, Low OpEx: SQL Azure and other Platform as a Service offeringsBuilt-in High-Availability (tunable): SQL Azure has quorum based built-in replicasData scale-out (Sharding): SQL Azure FederationsProcessing scale-out (Map-Reduce, Fan-Out, tunable consistency)Flexible Data ModelsJSON (& XML) supportSparse columns/Column sets Integrate with BigData Analytics (e.g., Hadoop)
  • SharePoint – BI, Enterprise Search, Enterprise Content Management, CollaborationTransform - ETLClean – Data Quality, AugmentationDiscover – Search, Meta-data, Classification, Information CatalogInfer – Recommendation Engines, Machine LearningShare – Publish, CollaborateGovern – Lineage & Impact Analysis, Master Data ManagementMarketplace – Private, Public, Bing Data, 3rd Party Data Sources, Models, Algorithms, APIs
  • SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

    1. 1. SQL AND NOSQL ARE TWO SIDES OF THESAME COINMichael Rys, Microsoft Corp.@SQLServerMike © 2012 MicrosoftStrata 2012 Conference, March 2012
    2. 2. AGENDA• Scaling out your business is important!• NoSQL Paradigms and NoSQL Platforms• SQL learns from NoSQL (with a demo of SQL Azure Federations)• NoSQL learns from SQL• Scalable Data Processing Platform of the Future
    3. 3. THE WEB 2.0 BUSINESS ARCHITECTUREAttract IndividualConsumers:- Provide interesting service- Provide mobility Online- Provide social Monetize the Social: Business - Improve individualMonetize Individual: experience- Upsell service - VIP Application - Re-sell Aggregate Data (e.g., Advertisers) - Speed - Extra Capabilities
    4. 4. SOCIAL NETWORKING: THE BUSINESS PROBLEM• 100s of million of users • 10s of million of users concurrently• Terabytes to petabytes of data • Structured and unstructured• Required (eventual) data consistency across users • E.g. show your updated state in your friends’ profile pages
    5. 5. SOLUTION• Shard/Partition user data across hundreds to thousands of SQL Databases• Propagate data changes from one DB to other DBs using reliable, async Message Service • Managing routes from each DB to every other DB would be too complex • Global Transactions would hinder scale and availability• Provide a caching layer for performance• And also used for o Clean-up state (e.g. on account close) o Deploy business logic (stored procedures)
    6. 6. EXAMPLE ARCHITECTURE1-1000 3001-4000 I change My DB Async gets updated my status Message Service TX1 TX3 TX2 Dispatcher Async userId=1024 Message2001-3000 Async Message 1001-2000 TX4 TX54001-5000 5001-6000 Web Tier Data Tier
    7. 7. MANY LARGE SCALE CUSTOMERS USING SIMILAR PATTERNS• Patterns • Sharding and reliable messaging • Sharding and fan/out query layer • Caching layer• Customer Examples • Social Networking: Facebook, MySpace, etc • Online electronic stores (cannot give names ) • Travel reservation systems (e.g. Choice International) • MSN Casual Gaming • etc.
    8. 8. LESSONS LEARNED FROM THESE SCENARIOS• Require high availability• Be able to scale out: • Functional and Data Partitioning Architecture • Provide scale-out processing: o Function shipping o Fanout and Map/Reduce processing • Be able to deal with failures: o Quorum o Retries o Eventual Consistency (similar to Read-consistent Snapshot Isolation)• Be able to quickly grow and change: • Elastic scale • Flexible, open schema • Multi-version schema supportMove better support for these patterns into the Data Platform!
    9. 9. WHAT IS NOSQL ABOUT?• NoSQL = operational and developer agility at low CapEx and OpEx!• Low Cost • Free Open Source Stores, Community Support • Scale CapEx cost below customer growth rate • Web friendly developer model and tool chain, ease of use• Processing Paradigms • High Availability (scalable Replication, Fast Failover, DR/GeoDR, tunable latency) • Scale-out (Sharding, Map-Reduce, Elasticity) • Performance (tuned for specific workloads, Caching, co-located compute with partitioned state) • Tunable/Eventual Consistency• Data Model Paradigms • Data first: Flexible Schema • Low-impedance mismatch between programming and data model: o Key-Documents and Objects (BLOBS, JSON, XML, POJO) o Key-Wide Sparse Column Sets o Graphs (e.g., RDF)• Range from devices, over OLTP Web 2.0 applications to BigData Analytics
    10. 10. DATA MODELSData Model Example Stores (apologies to the ones I did not list)Simple Key-Value Pairs Memcache, Redis, Dynamo, Voldermort, LevelDB, Azure CachingWide Sparse Column Sets HyperTable, Big Table, Cassandra, HBASE, Hyperbase, Amazon DynamoDB, Windows Azure Tables, SQL Server/Azure Sparse columnsBLOBs Amazon S3, Oracle Berkeley NoSQL, Windows Azure Blob Store, SQL Server RBS/FileTableJSON Documents MongoDB, CouchBase, Riak, RavenDBGraph Neo4J, GraphDB, HypergraphDB, Stig, IntellidimensionObjects and XML Documents Versant, Oracle Berkeley NoSQL, MarkLogic, existDB, EMC HiveDB, SQL Server/Azure, Oracle, IBM DB2Extended Relational Oracle, EMC SQLFire, IBM DB2, MySQL, Postgres, SQL Server/Azure
    11. 11. WHAT CAN SQL LEARN FROM NOSQL?• Low CapEx, Low OpEx• Built-in tunable High-Availability• Data scale-out (Sharding)• Processing scale-out (Map-Reduce, Fan-Out, tunable consistency)• Flexible Data Models • JSON (& XML) support • Sparse columns/Column sets• Integrate with BigData Analytics (e.g., Hadoop)Many Relational Database Systems are incorporating these learning!
    12. 12. EXAMPLE: SQL AZURE FEDERATIONS• Provides Data Partitioning/Sharding at the Data Platform• Enables applications to build elastic scale-out applications• Provides non-blocking SPLIT/DROP for shards (MERGE to come later)• Auto-connect to right shard based on sharding keyvalue• Provides SPLIT resilient query mode
    13. 13. SQL AZURE FEDERATION CONCEPTS Federation Represents the data being sharded Azure DB with Federation Root Federation Root Federation Directories, Federation Database that logically houses federations, contains Users, Federation Distributions, … federation meta data Federation Key Value that determines the routing of a piece of data Federation “Orders_Fed” (defines a Federation Distribution) (Federation Key: CustomerID) Atomic Unit Member: PK [min, 100) All rows with the same federation key value: always together! AU AU AU PK=5 PK=25 PK=35 Federation Member (aka Shard) A physical container for a set of federated tables for a specific key range and reference tables Member: PK [100, 488) Federated Table AU AU AU Table that contains only atomic units for the PK=105 PK=235 PK=365 Connection member’s key range Gateway Reference Table Member: PK [488, max) Non-sharded table AU AU AU PK=555 PK=2545 PK=3565 Sharded 16 Application
    14. 14. DEMOMAP-REDUCE SCALE-OUT OVER SQLAZURE FEDERATIONS• Sharded GamesInfo table using SQL Azure Federations• Use a C# library that does implement a Map/Reduce processor on top SQL Azure Federations• Mapper and Reducer are specified using SQL 17
    15. 15. WHAT CAN NOSQL LEARN FROM SQL?• Flexible data is good, but: • Provide optional schema in data platform to help with constraints and optimizations• Procedural Scale-Out processing is good, but: • Develop a declarative language suited for and across the data models (e.g., coSQL) • Standardize suitable abstractions and languages• Eventual Consistency is good, but: • Provide users the choice• Simple Queries are good, but: • Provide me with secondary indexes • it will be more efficient to join between two collections of JSON documents in the query engine than in the Application layerMany NoSQL Database Systems are starting to incorporate these learnings!
    16. 16. THE WEB 2.0 BUSINESS ARCHITECTUREAttract IndividualConsumers:- Provide interesting service- Provide mobility Online- Provide social Monetize the Social: Business - Improve individualMonetize Individual: experience- Upsell service - VIP Application - Re-sell Aggregate Data (e.g., Advertisers) - Speed - Extra Capabilities
    17. 17. SCALE-OUT DATA PLATFORM ARCHITECTURE Readable Replica Primary Copy Shard ReadableOLTP Workloads Replica Traditional OLAP WorkloadsHighly Available known schemaHigh Scale Readable Data warehouse, “Star joins”High Flexibility Replica Primary Shard Dynamic OLAP Workloadsmostly touching 1 Readableto low number of Replica 3Vs (Volume, Velocity, Variety)shards Exploratory Readable Replica Primary Scale-out queries, often using Shard Query eventual consistent scale-out Readable frameworks like Hadoop Replica SQL or NoSQL Store
    19. 19. CALL TO ACTION• Familiarize yourself with the NoSQL genes in the Microsoft Online Platform • Free 3-Month Trial for Windows and SQL Azure: http://www.windowsazure.com• Engage with us throughout Strata Presentation Speaker Date and Time Do We Have the Tools We Need to Navigate Dave Campbell 2/29 9:00am PST the New World of Data? Onsite Interview * Tim O’Reilly, Dave Campbell 2/29 10:15am PST Unleash Insights on All Data With Microsoft Alexander Stojanovic 2/29 11:30am PST Big Data Office Hours (Q&A session) Dave Campbell 2/29 1:30pm PST Hadoop + Javascript: What We Learned Asad Khan 2/29 2:20pm PST Democratizing BI at Microsoft: 40,000 Users Kirkland Barrett 3/1 10:40am PST and Counting Data Marketplaces For Your Extended Piyush Lumba 3/1 2:20pm PST Enterprise• Download slides with additional information and related resources: http://www.slideshare.net/MichaelRys/presentations 22
    20. 20. APPENDIX 23
    21. 21. RELATED RESOURCES• Scale-Out with SQL Databases • http://gigaom.com/cloud/facebook-shares-some-secrets-on-making-mysql-scale/ • Windows Gaming Experience Case Study: http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=4000008310 • Scalable SQL: http://cacm.acm.org/magazines/2011/6/108663-scalable-sql • http://www.slideshare.net/MichaelRys/scaling-with-sql-server-and-sql-azure-federations• NoSQL and the Windows Azure Platform • Whitepaper: http://download.microsoft.com/download/9/E/9/9E9F240D-0EB6-472E-B4DE- 6D9FCBB505DD/Windows%20Azure%20No%20SQL%20White%20Paper.pdf • SQL Federation blog: http://blogs.msdn.com/b/cbiyikoglu/archive/2011/03/03/nosql-genes-in-sql-azure- federations.aspx• Contact me • @SQLServerMike • http://sqlblog.com/blogs/michael_rys/default.aspx