SQL and NoSQL in SQL Server

  • 7,289 views
Uploaded on

SQL Saturday 109 Presentation on NoSQL Paradigms in SQL Server context

SQL Saturday 109 Presentation on NoSQL Paradigms in SQL Server context

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
7,289
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
115
Comments
0
Likes
8

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Example MySpace architecture:Service Dispatcher coordination point between all SQL ServersCentralizes route managementAvoids routes explosion Load-balanced across 30 SQL ServersMessages are sent randomly to theseEnables multicast/broadcast functionalitySupports destination lists and wildcards e.g. [DB1,DB3, DB4], DB%18,000 ~2k msgs/sec per dispatcher SQL ServerMyDB sends a message with my status change and a target list specifying the DBs that store my friends data.The Service Dispatcher forwards the message these DBs.Each DB processes the message updating my status in a partitioned table
  • Example MSN Casual Gaming:~2 Million users at launch~86 Million services requests/day 135 Windows Azure Data Services Hosting VMs ca. 18K connections in Connection Pools, this could grow with trafficCa. 1200 SQL Azure requests/second spread across all partitions during peak load~ 90% reads vs 10% writes (this varies per storage type)~ 200 bytes of storage per user~ 20% of database storage is currently used, but expect this to growSharded over 400 SQL Azure Databases
  • Requirehigh availabilityBe able to scale out:Functional and Data Partitioning ArchitectureProvide scale-out processing:Function shippingFanout and Map/Reduce processingBe able to deal with failures:QuorumRetriesEventual Consistency (similar to Read-consistent Snapshot Isolation)Be able to quickly grow and change:Elastic scaleFlexible, open schemaMulti-version schema supportMove better support for these patterns into the Data Platform!
  • Note: Big-sized companies invest resources in building these platforms instead of using existing relational platforms!Low CostFree Open Source Stores, Community SupportScale cost below customer growth rateWeb friendly developer model and tool chain, Easy to useProcessing ParadigmsHigh Availability (scalable Replication, Fast Failover, DR/GeoDR, tunable latency)Scale-out (Sharding, Map-Reduce, Elasticity)Performance (tuned for workloads, Caching, co-located compute with partitioned state)Tunable/Eventual ConsistencyData Model ParadigmsData first: Flexible SchemaLow-impedance mismatch between programming and data model:Key-Documents and Objects (BLOBS, JSON, XML, POJO)Key-Wide Sparse Column SetsGraphs (e.g., RDF)
  • Performance and Scale:Map/Reduce PatternsEventual consistency (trade-off due to CAP)ShardingCachingAutomate management Lifecycle:Elastic Scale on demand (no need to pay for resources until needed)Automatic Fail-overScalable Schema version rolloutPerf troubleshootingAuto alertingAuto loadbalancingAuto resourcing (e.g., auto splits based on policies)Declarative policy-based management
  • Code First and revise quicklyWorking software over comprehensive documentationResponding to change over following a planApplication-model first (before database) Dictates the data model and queriesFlexible data modelsNo a priori modeling: Data first, schema later/Open SchemaKey/Value storesReduced impedance mismatch: JSON, XML, YAMLYou don’t know exactly what you are looking forMap/Reduce for adhoc analysisProvide Search across all your data instead of just queryLower Pain of adoption and maintenance From code to deployment & “monetization” of data, services, apps and tenantsRich Services out of the BoxData and services mashupEasy troubleshooting of deployed appsNo DB or OS Admin telling me what to do
  • ShardedGamesInfo table using SQL Azure FederationsUse a C# library that does implement a Map/Reduce processor on top SQL Azure FederationsMapper and Reducer are specified using SQL

Transcript

  • 1. SQL and NoSQLin the Context of SQL ServerMichael RysProgram Manager, Microsoft Corp.@SQLServerMike
  • 2. Key Session Takeaways Scaling your Business is important What are the NoSQL paradigms You can use NoSQL Paradigms with SQL Server and SQL Azure We are working on moving the paradigms into SQL Server
  • 3. The Web 2.0 Business ArchitectureAttract IndividualConsumers:- Provide interesting service- Provide mobility- Provide socialMonetize Individual:- Upsell service Online Monetize the Social: - Improve individual - VIP - Speed Business experience - Re-sell Aggregate Data - Extra Capabilities Application (e.g., Advertisers)
  • 4. Social Networking: the Business Problem 100s of million of users  10s of million of users concurrently Terabytes to petabytes of data  Structured and unstructured Required (eventual) data consistency across users  E.g. show your updated state in your friends’ profile pages
  • 5. Solution Shard/Partition user data across hundreds to thousands of SQL Databases Propagate data changes using reliable, async Message Service  No Global Transactions! Hinder scale and availability! Provide a caching layer for performance Also used for  Clean-up state (e.g. on account close)  Deploy business logic (stored procedures)
  • 6. Example Architecture (MySpace.com)1-1000 3001-4000 Async My DB I change Message gets updated my status Service TX1 TX3 TX2 Dispatcher Async userId=1024 Message Async2001-3000 Message 1001-2000 TX4 TX54001-5000 5001-6000 Web Tier Data Tier
  • 7. Many Large Scale Customers using Similar Patterns Patterns  Sharding and reliable messaging  Sharding and fan/out query layer  Caching layer Customer Examples  Social Networking: Facebook, MySpace, etc  Online electronic stores (cannot give names )  Travel reservation systems (e.g. Choice International)  MSN Casual Gaming  etc.
  • 8. Lessons Learned from these Scenarios Require high availability Be able to scale out  Functional and Data Partitioning Architecture  Provide scale-out processing  Be able to deal with failures Be able to quickly grow and change  Elastic scale  Flexible, open schema  Multi-version schema supportMove better support for these patterns into the DataPlatform!
  • 9. What is NoSQL about? NoSQL = operational and developer agility at low CapEx and OpEx! Low Cost  Free Software and Support  Scale CapEx cost below customer growth rate  Web friendly developer model and tool chain, Easy to use Processing Paradigms  High Availability  Data and Processing Scale-out  Performance  Tunable/Eventual Consistency Data Model Paradigms  Data first: Flexible Schema  Low-impedance mismatch between programming and data modelFrom devices, over OLTP Web 2.0 applications to BigData Analytics
  • 10. Data ModelsData Model Example StoresSimple Key-Value Pairs Memcache, Redis, Dynamo, Voldermort, LevelDB, Azure CachingWide Sparse Column Sets HyperTable, Big Table, Cassandra, HBASE, Hyperbase, Amazon DynamoDB, Windows Azure Tables, SQL Server/Azure Sparse columnsBLOBs Amazon S3, Oracle Berkeley NoSQL, Windows Azure Blob Store, SQL Server RBS/FileTableJSON Documents MongoDB, CouchBase, Riak, RavenDBGraph Neo4J, GraphDB, HypergraphDB, Stig, IntellidimensionObjects and XML Documents Versant, Oracle Berkeley NoSQL, MarkLogic, existDB, EMC HiveDB, SQL Server/Azure, Oracle, IBM DB2Extended Relational Oracle, EMC SQLFire, IBM DB2, MySQL, Postgres, SQL Server/Azure/Parallel DW
  • 11. Operational Agility You want:  Availability of service (scalability)  Global consistency  Network Partition Tolerance You can only get 2 of 3 (CAP Theorem) In Brave New World:  Online businesses need availability  It is distributed, because it is big  thus Network Partitioning is unavoidable  Hence global consistency must be relaxed → BASE vs ACID
  • 12. BASE vs ACID Consistency ACID : Atomicity, Consistency, Isolation, Durability  Full Serializability provides all 4  Distributed transactions providing all 4 limits service availability, throughput and scalability BASE: Basically Available, Soft state, Eventual consistency  Relaxes ACID properties to increase Replica availability, throughput and scalability Primary  Replica consistency: Replica  Impacts recoverability  Cross-node consistency: Replica  Impacts globally consistent view of the world Primary Replica
  • 13. Operational Agility Performance and Scale Automate management lifecycle (or fail) Simple deployment lifecycle No DB or OS Admin telling me what to do
  • 14. Developer Agility Code First and revise quickly Application-model first (before database) Flexible open data models You don’t know exactly what you are looking for Lower Pain of adoption and maintenance No DB or OS Admin telling me what to do
  • 15. NoSQL and BigData: Two sides of the same coin  BigData:  Origin: large unstructured data processing (sensor data, scientific research, web stream analysis)  Analytics focused (“new” OLAP, Map-Reduce, Hadoop)  Scale-out data and processing paradigm at low cost  NoSQL:  Origin: developing agile, scalable web applications  Realtime customer transaction focused (“new” OLTP)  Scale-out data and processing paradigm with flexible data model at low cost  Both use many of the same paradigms
  • 16. The Web 2.0 Business ArchitectureAttract IndividualConsumers:- Provide interesting service- Provide mobility- Provide socialMonetize Individual:- Upsell service Online Monetize the Social: - Improve individual - VIP - Speed Business experience - Re-sell Aggregate Data - Extra Capabilities Application (e.g., Advertisers)
  • 17. Scale-Out Data PLATFORM Architecture Readable Replica Primary Copy ShardOLTP Workloads Readable Replica Traditional OLAP WorkloadsHighly Available known schemaHigh Scale Data warehouse, “Star joins”High Flexibility Readable Replicamostly touching 1 Primaryto low number of Shard Dynamic OLAP Workloadsshards Readable Replica 3Vs (Volume, Velocity, Variety) Exploratory Readable Scale-out queries, often using Replica eventual consistent scale-out frameworks like Hadoop Primary Shard Query Readable Replica
  • 18. What does SQL Server provide today? Scale-programming models  Service Broker provides:  Functional, service-oriented architecture  Scale out on demand  Async reliable messaging provides for true eventual consistency  SQL Azure Federations provides Sharding support  Distributed Queries  SQL Server Parallel Data Warehouse Programmer Agility  XML, XQuery for XML documents  FileTable for documents (but what is equivalent solution in the cloud?)  Open Schema: Sparse Columns and column sets (but still schema first)  CLR extensibility, but  No indexing, bad cost-models  Difficult to deploy (and DB Admins often do not allow it!) Failure Resilience  SQL Azure has local automatic HA, self-healing Rich Services  Semantic Extraction and Similarity Search in SQL Server 2012 DB/OS Admin “interference”  SQL Azure: Self-maintaining and Self-provisioning
  • 19. Introducing SQL Azure Federations Provides Data Partitioning/Sharding at the Data Platform Enables applications to build elastic scale-out applications Provides non-blocking SPLIT/DROP for shards (MERGE to come later) Auto-connect to right shard based on sharding keyvalue Provides SPLIT resilient query mode
  • 20. SQL Azure Federation Concepts Federation Azure DB with Federation Root Represents the data being sharded Federation Root Federation Directories, Federation Database that logically houses Users, Federation Distributions, … federations, contains federation meta data Federation Key Value that determines the routing of a piece Federation “Orders_Fed” of data (defines a Federation Distribution) (Federation Key: CustomerID) Federation Member (aka Shard) Physical container for a set of federated tables of a specific key range and reference Member: PK [min, 100) tables Atomic Unit AU PK=5 AU PK=25 AU PK=35 All rows with the same federation key value: always together! Federated Table Member: PK [100, 488) Table that contains only atomic units for the member’s key range AU AU AU Connection Reference Table PK=105 PK=235 PK=365 Gateway Non-sharded table Member: PK [488, max) AU AU AU Sharded PK=555 PK=2545 PK=3565 20 Application
  • 21. DemoMap-Reduce scale-outover SQL Azure Federations
  • 22. SQL Azure: A Not Only SQL Data PlatformSQL Azure adds support for NoSQL paradigms in the data platform: No CapEx, Low OpEx (which should/will be even lower ) High-Availability (each DB has two replicas) Sharding support with federations:  Data platform provides online SPLIT/DROP  Filtered connection to provide split resilient programming model Flexible Data Models:  XML support  Sparse columns/Column sets More to come in the future…  More scale and tunable HA (to support OLTP/OLAP model)  Taking Federations further (orthogonality, merge, fanout)  Integration with Hadoop eco-system  More data-first (data-driven columnsets, JSON)
  • 23. Call to Action Download the Presentation from: http://www.slideshare.net/MichaelRys/presentations Fill out SQL Azure Federation Survey: http://connect.microsoft.com/BusinessPlatform/Survey/S urvey.aspx?SurveyID=13625
  • 24. Related Content Related Whitepapers and Presentations:  CACM: Scalable SQL: http://cacm.acm.org/magazines/2011/6/108663-scalable-sql  NoSQL and the Windows Azure Platform: http://download.microsoft.com/download/9/E/9/9E9F240D-0EB6-472E-B4DE- 6D9FCBB505DD/Windows%20Azure%20No%20SQL%20White%20Paper.pdf  SQL Federation blog: http://blogs.msdn.com/b/cbiyikoglu/archive/2011/03/03/nosql-genes-in- sql-azure-federations.aspx  Windows Gaming Experience Case Study: http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=4 000008310  NoSQL Presentations: http://www.slideshare.net/MichaelRys/presentations Contact me:  mrys@microsoft.com  @SQLServerMike  http://sqlblog.com/blogs/michael_rys/default.aspx