Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SQL and NoSQL in SQL Server


Published on

SQL Saturday 109 Presentation on NoSQL Paradigms in SQL Server context

Published in: Technology
    Are you sure you want to  Yes  No
    Your message goes here

SQL and NoSQL in SQL Server

  1. 1. SQL and NoSQLin the Context of SQL ServerMichael RysProgram Manager, Microsoft Corp.@SQLServerMike
  2. 2. Key Session Takeaways Scaling your Business is important What are the NoSQL paradigms You can use NoSQL Paradigms with SQL Server and SQL Azure We are working on moving the paradigms into SQL Server
  3. 3. The Web 2.0 Business ArchitectureAttract IndividualConsumers:- Provide interesting service- Provide mobility- Provide socialMonetize Individual:- Upsell service Online Monetize the Social: - Improve individual - VIP - Speed Business experience - Re-sell Aggregate Data - Extra Capabilities Application (e.g., Advertisers)
  4. 4. Social Networking: the Business Problem 100s of million of users  10s of million of users concurrently Terabytes to petabytes of data  Structured and unstructured Required (eventual) data consistency across users  E.g. show your updated state in your friends’ profile pages
  5. 5. Solution Shard/Partition user data across hundreds to thousands of SQL Databases Propagate data changes using reliable, async Message Service  No Global Transactions! Hinder scale and availability! Provide a caching layer for performance Also used for  Clean-up state (e.g. on account close)  Deploy business logic (stored procedures)
  6. 6. Example Architecture ( 3001-4000 Async My DB I change Message gets updated my status Service TX1 TX3 TX2 Dispatcher Async userId=1024 Message Async2001-3000 Message 1001-2000 TX4 TX54001-5000 5001-6000 Web Tier Data Tier
  7. 7. Many Large Scale Customers using Similar Patterns Patterns  Sharding and reliable messaging  Sharding and fan/out query layer  Caching layer Customer Examples  Social Networking: Facebook, MySpace, etc  Online electronic stores (cannot give names )  Travel reservation systems (e.g. Choice International)  MSN Casual Gaming  etc.
  8. 8. Lessons Learned from these Scenarios Require high availability Be able to scale out  Functional and Data Partitioning Architecture  Provide scale-out processing  Be able to deal with failures Be able to quickly grow and change  Elastic scale  Flexible, open schema  Multi-version schema supportMove better support for these patterns into the DataPlatform!
  9. 9. What is NoSQL about? NoSQL = operational and developer agility at low CapEx and OpEx! Low Cost  Free Software and Support  Scale CapEx cost below customer growth rate  Web friendly developer model and tool chain, Easy to use Processing Paradigms  High Availability  Data and Processing Scale-out  Performance  Tunable/Eventual Consistency Data Model Paradigms  Data first: Flexible Schema  Low-impedance mismatch between programming and data modelFrom devices, over OLTP Web 2.0 applications to BigData Analytics
  10. 10. Data ModelsData Model Example StoresSimple Key-Value Pairs Memcache, Redis, Dynamo, Voldermort, LevelDB, Azure CachingWide Sparse Column Sets HyperTable, Big Table, Cassandra, HBASE, Hyperbase, Amazon DynamoDB, Windows Azure Tables, SQL Server/Azure Sparse columnsBLOBs Amazon S3, Oracle Berkeley NoSQL, Windows Azure Blob Store, SQL Server RBS/FileTableJSON Documents MongoDB, CouchBase, Riak, RavenDBGraph Neo4J, GraphDB, HypergraphDB, Stig, IntellidimensionObjects and XML Documents Versant, Oracle Berkeley NoSQL, MarkLogic, existDB, EMC HiveDB, SQL Server/Azure, Oracle, IBM DB2Extended Relational Oracle, EMC SQLFire, IBM DB2, MySQL, Postgres, SQL Server/Azure/Parallel DW
  11. 11. Operational Agility You want:  Availability of service (scalability)  Global consistency  Network Partition Tolerance You can only get 2 of 3 (CAP Theorem) In Brave New World:  Online businesses need availability  It is distributed, because it is big  thus Network Partitioning is unavoidable  Hence global consistency must be relaxed → BASE vs ACID
  12. 12. BASE vs ACID Consistency ACID : Atomicity, Consistency, Isolation, Durability  Full Serializability provides all 4  Distributed transactions providing all 4 limits service availability, throughput and scalability BASE: Basically Available, Soft state, Eventual consistency  Relaxes ACID properties to increase Replica availability, throughput and scalability Primary  Replica consistency: Replica  Impacts recoverability  Cross-node consistency: Replica  Impacts globally consistent view of the world Primary Replica
  13. 13. Operational Agility Performance and Scale Automate management lifecycle (or fail) Simple deployment lifecycle No DB or OS Admin telling me what to do
  14. 14. Developer Agility Code First and revise quickly Application-model first (before database) Flexible open data models You don’t know exactly what you are looking for Lower Pain of adoption and maintenance No DB or OS Admin telling me what to do
  15. 15. NoSQL and BigData: Two sides of the same coin  BigData:  Origin: large unstructured data processing (sensor data, scientific research, web stream analysis)  Analytics focused (“new” OLAP, Map-Reduce, Hadoop)  Scale-out data and processing paradigm at low cost  NoSQL:  Origin: developing agile, scalable web applications  Realtime customer transaction focused (“new” OLTP)  Scale-out data and processing paradigm with flexible data model at low cost  Both use many of the same paradigms
  16. 16. The Web 2.0 Business ArchitectureAttract IndividualConsumers:- Provide interesting service- Provide mobility- Provide socialMonetize Individual:- Upsell service Online Monetize the Social: - Improve individual - VIP - Speed Business experience - Re-sell Aggregate Data - Extra Capabilities Application (e.g., Advertisers)
  17. 17. Scale-Out Data PLATFORM Architecture Readable Replica Primary Copy ShardOLTP Workloads Readable Replica Traditional OLAP WorkloadsHighly Available known schemaHigh Scale Data warehouse, “Star joins”High Flexibility Readable Replicamostly touching 1 Primaryto low number of Shard Dynamic OLAP Workloadsshards Readable Replica 3Vs (Volume, Velocity, Variety) Exploratory Readable Scale-out queries, often using Replica eventual consistent scale-out frameworks like Hadoop Primary Shard Query Readable Replica
  18. 18. What does SQL Server provide today? Scale-programming models  Service Broker provides:  Functional, service-oriented architecture  Scale out on demand  Async reliable messaging provides for true eventual consistency  SQL Azure Federations provides Sharding support  Distributed Queries  SQL Server Parallel Data Warehouse Programmer Agility  XML, XQuery for XML documents  FileTable for documents (but what is equivalent solution in the cloud?)  Open Schema: Sparse Columns and column sets (but still schema first)  CLR extensibility, but  No indexing, bad cost-models  Difficult to deploy (and DB Admins often do not allow it!) Failure Resilience  SQL Azure has local automatic HA, self-healing Rich Services  Semantic Extraction and Similarity Search in SQL Server 2012 DB/OS Admin “interference”  SQL Azure: Self-maintaining and Self-provisioning
  19. 19. Introducing SQL Azure Federations Provides Data Partitioning/Sharding at the Data Platform Enables applications to build elastic scale-out applications Provides non-blocking SPLIT/DROP for shards (MERGE to come later) Auto-connect to right shard based on sharding keyvalue Provides SPLIT resilient query mode
  20. 20. SQL Azure Federation Concepts Federation Azure DB with Federation Root Represents the data being sharded Federation Root Federation Directories, Federation Database that logically houses Users, Federation Distributions, … federations, contains federation meta data Federation Key Value that determines the routing of a piece Federation “Orders_Fed” of data (defines a Federation Distribution) (Federation Key: CustomerID) Federation Member (aka Shard) Physical container for a set of federated tables of a specific key range and reference Member: PK [min, 100) tables Atomic Unit AU PK=5 AU PK=25 AU PK=35 All rows with the same federation key value: always together! Federated Table Member: PK [100, 488) Table that contains only atomic units for the member’s key range AU AU AU Connection Reference Table PK=105 PK=235 PK=365 Gateway Non-sharded table Member: PK [488, max) AU AU AU Sharded PK=555 PK=2545 PK=3565 20 Application
  21. 21. DemoMap-Reduce scale-outover SQL Azure Federations
  22. 22. SQL Azure: A Not Only SQL Data PlatformSQL Azure adds support for NoSQL paradigms in the data platform: No CapEx, Low OpEx (which should/will be even lower ) High-Availability (each DB has two replicas) Sharding support with federations:  Data platform provides online SPLIT/DROP  Filtered connection to provide split resilient programming model Flexible Data Models:  XML support  Sparse columns/Column sets More to come in the future…  More scale and tunable HA (to support OLTP/OLAP model)  Taking Federations further (orthogonality, merge, fanout)  Integration with Hadoop eco-system  More data-first (data-driven columnsets, JSON)
  23. 23. Call to Action Download the Presentation from: Fill out SQL Azure Federation Survey: urvey.aspx?SurveyID=13625
  24. 24. Related Content Related Whitepapers and Presentations:  CACM: Scalable SQL:  NoSQL and the Windows Azure Platform: 6D9FCBB505DD/Windows%20Azure%20No%20SQL%20White%20Paper.pdf  SQL Federation blog: sql-azure-federations.aspx  Windows Gaming Experience Case Study: 000008310  NoSQL Presentations: Contact me:   @SQLServerMike 