SlideShare a Scribd company logo
1 of 21
SQL AND NOSQL ARE TWO SIDES OF THE
SAME COIN
Michael Rys, Microsoft Corp.
@SQLServerMike




 © 2012 Microsoft




Strata 2012 Conference, March 2012
AGENDA

• Scaling out your business is important!
• NoSQL Paradigms and NoSQL Platforms
• SQL learns from NoSQL
  (with a demo of SQL Azure Federations)
• NoSQL learns from SQL
• Scalable Data Processing Platform of the Future
THE WEB 2.0 BUSINESS ARCHITECTURE

Attract Individual
Consumers:
- Provide interesting
  service
- Provide mobility        Online
- Provide social                      Monetize the Social:
                         Business     - Improve individual
Monetize Individual:                    experience
- Upsell service
     - VIP
                        Application   - Re-sell Aggregate Data
                                        (e.g., Advertisers)
     - Speed
     - Extra
       Capabilities
SOCIAL NETWORKING: THE BUSINESS PROBLEM
• 100s of million of users
 • 10s of million of users concurrently
• Terabytes to petabytes of data
 • Structured and unstructured
• Required (eventual) data
  consistency across users
 • E.g. show your updated state in your
   friends’ profile pages
SOLUTION
• Shard/Partition user data across hundreds to
  thousands of SQL Databases
• Propagate data changes from one DB to other
  DBs using reliable, async Message Service
  • Managing routes from each DB to every other DB
    would be too complex
  • Global Transactions would hinder scale and
    availability
• Provide a caching layer for performance
• And also used for
     o Clean-up state (e.g. on account close)
     o Deploy business logic (stored procedures)
EXAMPLE ARCHITECTURE


1-1000         3001-4000                                      I change
                                         My DB
          Async
                                         gets updated          my status
         Message
                 Service          TX1
   TX3         TX2
               Dispatcher    Async                            userId=1024
                            Message
2001-3000    Async
            Message
                                  1001-2000
    TX4               TX5

4001-5000     5001-6000                                 Web Tier
              Data Tier
MANY LARGE SCALE CUSTOMERS USING SIMILAR PATTERNS
• Patterns
  • Sharding and reliable messaging
  • Sharding and fan/out query layer
  • Caching layer


• Customer Examples
  •   Social Networking: Facebook, MySpace, etc
  •   Online electronic stores (cannot give names )
  •   Travel reservation systems (e.g. Choice International)
  •   MSN Casual Gaming
  •   etc.
LESSONS LEARNED FROM THESE SCENARIOS
• Require high availability
• Be able to scale out:
  • Functional and Data Partitioning Architecture
  • Provide scale-out processing:
    o Function shipping
    o Fanout and Map/Reduce processing
  • Be able to deal with failures:
    o Quorum
    o Retries
    o Eventual Consistency (similar to Read-consistent Snapshot Isolation)
• Be able to quickly grow and change:
  • Elastic scale
  • Flexible, open schema
  • Multi-version schema support

Move better support for these patterns into the Data Platform!
WHAT IS NOSQL ABOUT?
• NoSQL = operational and developer agility at low CapEx and OpEx!

• Low Cost
  • Free Open Source Stores, Community Support
  • Scale CapEx cost below customer growth rate
  • Web friendly developer model and tool chain, ease of use

• Processing Paradigms
  •   High Availability (scalable Replication, Fast Failover, DR/GeoDR, tunable latency)
  •   Scale-out (Sharding, Map-Reduce, Elasticity)
  •   Performance (tuned for specific workloads, Caching, co-located compute with partitioned state)
  •   Tunable/Eventual Consistency

• Data Model Paradigms
  • Data first: Flexible Schema
  • Low-impedance mismatch between programming and data model:
      o Key-Documents and Objects (BLOBS, JSON, XML, POJO)
      o Key-Wide Sparse Column Sets
      o Graphs (e.g., RDF)

• Range from devices, over OLTP Web 2.0 applications to BigData Analytics
DATA MODELS
Data Model                  Example Stores (apologies to the ones I did not list)

Simple Key-Value Pairs      Memcache, Redis, Dynamo, Voldermort, LevelDB, Azure Caching

Wide Sparse Column Sets     HyperTable, Big Table, Cassandra, HBASE, Hyperbase, Amazon
                            DynamoDB, Windows Azure Tables, SQL Server/Azure Sparse
                            columns
BLOBs                       Amazon S3, Oracle Berkeley NoSQL, Windows Azure Blob
                            Store, SQL Server RBS/FileTable

JSON Documents              MongoDB, CouchBase, Riak, RavenDB

Graph                       Neo4J, GraphDB, HypergraphDB, Stig, Intellidimension

Objects and XML Documents   Versant, Oracle Berkeley NoSQL, MarkLogic, existDB, EMC
                            HiveDB, SQL Server/Azure, Oracle, IBM DB2

Extended Relational         Oracle, EMC SQLFire, IBM DB2, MySQL, Postgres, SQL
                            Server/Azure
WHAT CAN SQL LEARN FROM NOSQL?
• Low CapEx, Low OpEx
• Built-in tunable High-Availability
• Data scale-out (Sharding)
• Processing scale-out (Map-Reduce, Fan-Out, tunable consistency)
• Flexible Data Models
  • JSON (& XML) support
  • Sparse columns/Column sets
• Integrate with BigData Analytics (e.g., Hadoop)


Many Relational Database Systems are incorporating these learning!
EXAMPLE: SQL AZURE FEDERATIONS
•   Provides Data Partitioning/Sharding at the Data Platform
•   Enables applications to build elastic scale-out applications
•   Provides non-blocking SPLIT/DROP for shards (MERGE to come later)
•   Auto-connect to right shard based on sharding keyvalue
•   Provides SPLIT resilient query mode
SQL AZURE FEDERATION CONCEPTS
   Federation
         Represents the data being sharded
                                                                         Azure DB with Federation Root
   Federation Root                                                     Federation Directories, Federation
         Database that logically houses federations, contains           Users, Federation Distributions, …
         federation meta data
   Federation Key
         Value that determines the routing of a piece of data            Federation “Orders_Fed”
         (defines a Federation Distribution)                            (Federation Key: CustomerID)
   Atomic Unit
                                                                               Member: PK [min, 100)
         All rows with the same federation key value: always
         together!                                                           AU          AU           AU
                                                                            PK=5        PK=25        PK=35
   Federation Member (aka Shard)
         A physical container for a set of federated tables for
         a specific key range and reference tables                              Member: PK [100, 488)
   Federated Table                                                           AU           AU          AU
         Table that contains only atomic units for the                      PK=105       PK=235      PK=365

                                                           Connection
         member’s key range
                                                            Gateway
   Reference Table                                                            Member: PK [488, max)
         Non-sharded table                                                    AU          AU           AU
                                                                            PK=555      PK=2545      PK=3565
                                        Sharded
                                                                                                               16
                                       Application
DEMO
MAP-REDUCE SCALE-OUT OVER SQL
AZURE FEDERATIONS
•   Sharded GamesInfo table using SQL Azure Federations

•   Use a C# library that does implement a Map/Reduce
    processor on top SQL Azure Federations

•   Mapper and Reducer are specified using SQL
                                                          17
WHAT CAN NOSQL LEARN FROM SQL?
• Flexible data is good, but:
  • Provide optional schema in data platform to help with constraints and optimizations
• Procedural Scale-Out processing is good, but:
  • Develop a declarative language suited for and across the data models (e.g., coSQL)
  • Standardize suitable abstractions and languages
• Eventual Consistency is good, but:
  • Provide users the choice
• Simple Queries are good, but:
  • Provide me with secondary indexes
  • it will be more efficient to join between two collections of JSON documents in the
    query engine than in the Application layer


Many NoSQL Database Systems are starting to incorporate these learnings!
THE WEB 2.0 BUSINESS ARCHITECTURE

Attract Individual
Consumers:
- Provide interesting
  service
- Provide mobility        Online
- Provide social                      Monetize the Social:
                         Business     - Improve individual
Monetize Individual:                    experience
- Upsell service
     - VIP
                        Application   - Re-sell Aggregate Data
                                        (e.g., Advertisers)
     - Speed
     - Extra
       Capabilities
SCALE-OUT DATA PLATFORM ARCHITECTURE
                                Readable
                                 Replica
                      Primary              Copy
                       Shard
                                Readable
OLTP Workloads                   Replica
                                                   Traditional OLAP Workloads
Highly Available                                   known schema
High Scale                      Readable           Data warehouse, “Star joins”
High Flexibility                 Replica
                      Primary
                       Shard                       Dynamic OLAP Workloads
mostly touching 1               Readable
to low number of                 Replica           3Vs (Volume, Velocity, Variety)
shards                                             Exploratory
                                Readable
                                 Replica
                      Primary                      Scale-out queries, often using
                       Shard               Query   eventual consistent scale-out
                                Readable           frameworks like Hadoop
                                 Replica

                    SQL or NoSQL Store
BIG DATA REQUIRES AN END-TO-END APPROACH




21
CALL TO ACTION
• Familiarize yourself with the NoSQL genes in the Microsoft Online Platform
  • Free 3-Month Trial for Windows and SQL Azure: http://www.windowsazure.com

• Engage with us throughout Strata
 Presentation                                          Speaker               Date and Time
 Do We Have the Tools We Need to Navigate
                                                     Dave Campbell           2/29 9:00am PST
 the New World of Data?
 Onsite Interview *                            Tim O’Reilly, Dave Campbell   2/29 10:15am PST
 Unleash Insights on All Data With Microsoft
                                                  Alexander Stojanovic       2/29 11:30am PST
 Big Data
 Office Hours (Q&A session)                          Dave Campbell           2/29 1:30pm PST
 Hadoop + Javascript: What We Learned                  Asad Khan             2/29 2:20pm PST
 Democratizing BI at Microsoft: 40,000 Users
                                                     Kirkland Barrett        3/1 10:40am PST
 and Counting
 Data Marketplaces For Your Extended
                                                     Piyush Lumba             3/1 2:20pm PST
 Enterprise

• Download slides with additional information and related resources:
  http://www.slideshare.net/MichaelRys/presentations
                                                                                                22
APPENDIX




           23
RELATED RESOURCES
• Scale-Out with SQL Databases
  • http://gigaom.com/cloud/facebook-shares-some-secrets-on-making-mysql-scale/
  • Windows Gaming Experience Case Study:
    http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=4000008310
  • Scalable SQL: http://cacm.acm.org/magazines/2011/6/108663-scalable-sql
  • http://www.slideshare.net/MichaelRys/scaling-with-sql-server-and-sql-azure-federations

• NoSQL and the Windows Azure Platform
  • Whitepaper:
    http://download.microsoft.com/download/9/E/9/9E9F240D-0EB6-472E-B4DE-
    6D9FCBB505DD/Windows%20Azure%20No%20SQL%20White%20Paper.pdf
  • SQL Federation blog:
    http://blogs.msdn.com/b/cbiyikoglu/archive/2011/03/03/nosql-genes-in-sql-azure-
    federations.aspx

• Contact me
  • @SQLServerMike
  • http://sqlblog.com/blogs/michael_rys/default.aspx

More Related Content

Viewers also liked

U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)Michael Rys
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLMichael Rys
 
U-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningU-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningMichael Rys
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)Michael Rys
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Michael Rys
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Michael Rys
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersMichael Rys
 
Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQLMichael Rys
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Michael Rys
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)Michael Rys
 

Viewers also liked (10)

U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
 
U-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningU-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance Tuning
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
 
Azure Data Lake and U-SQL
Azure Data Lake and U-SQLAzure Data Lake and U-SQL
Azure Data Lake and U-SQL
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)
 

More from Michael Rys

Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Michael Rys
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Michael Rys
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Michael Rys
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Michael Rys
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Michael Rys
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Michael Rys
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Michael Rys
 
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Michael Rys
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Michael Rys
 
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...Michael Rys
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
 
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...Michael Rys
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)Michael Rys
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)Michael Rys
 
U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)Michael Rys
 

More from Michael Rys (17)

Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
 
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
 
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)
 
U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)U-SQL Meta Data Catalog (SQLBits 2016)
U-SQL Meta Data Catalog (SQLBits 2016)
 

Recently uploaded

Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

SQL and NoSQL Are Two Sides of the Same Coin (Strata 2012)

  • 1. SQL AND NOSQL ARE TWO SIDES OF THE SAME COIN Michael Rys, Microsoft Corp. @SQLServerMike © 2012 Microsoft Strata 2012 Conference, March 2012
  • 2. AGENDA • Scaling out your business is important! • NoSQL Paradigms and NoSQL Platforms • SQL learns from NoSQL (with a demo of SQL Azure Federations) • NoSQL learns from SQL • Scalable Data Processing Platform of the Future
  • 3. THE WEB 2.0 BUSINESS ARCHITECTURE Attract Individual Consumers: - Provide interesting service - Provide mobility Online - Provide social Monetize the Social: Business - Improve individual Monetize Individual: experience - Upsell service - VIP Application - Re-sell Aggregate Data (e.g., Advertisers) - Speed - Extra Capabilities
  • 4. SOCIAL NETWORKING: THE BUSINESS PROBLEM • 100s of million of users • 10s of million of users concurrently • Terabytes to petabytes of data • Structured and unstructured • Required (eventual) data consistency across users • E.g. show your updated state in your friends’ profile pages
  • 5. SOLUTION • Shard/Partition user data across hundreds to thousands of SQL Databases • Propagate data changes from one DB to other DBs using reliable, async Message Service • Managing routes from each DB to every other DB would be too complex • Global Transactions would hinder scale and availability • Provide a caching layer for performance • And also used for o Clean-up state (e.g. on account close) o Deploy business logic (stored procedures)
  • 6. EXAMPLE ARCHITECTURE 1-1000 3001-4000 I change My DB Async gets updated my status Message Service TX1 TX3 TX2 Dispatcher Async userId=1024 Message 2001-3000 Async Message 1001-2000 TX4 TX5 4001-5000 5001-6000 Web Tier Data Tier
  • 7. MANY LARGE SCALE CUSTOMERS USING SIMILAR PATTERNS • Patterns • Sharding and reliable messaging • Sharding and fan/out query layer • Caching layer • Customer Examples • Social Networking: Facebook, MySpace, etc • Online electronic stores (cannot give names ) • Travel reservation systems (e.g. Choice International) • MSN Casual Gaming • etc.
  • 8. LESSONS LEARNED FROM THESE SCENARIOS • Require high availability • Be able to scale out: • Functional and Data Partitioning Architecture • Provide scale-out processing: o Function shipping o Fanout and Map/Reduce processing • Be able to deal with failures: o Quorum o Retries o Eventual Consistency (similar to Read-consistent Snapshot Isolation) • Be able to quickly grow and change: • Elastic scale • Flexible, open schema • Multi-version schema support Move better support for these patterns into the Data Platform!
  • 9. WHAT IS NOSQL ABOUT? • NoSQL = operational and developer agility at low CapEx and OpEx! • Low Cost • Free Open Source Stores, Community Support • Scale CapEx cost below customer growth rate • Web friendly developer model and tool chain, ease of use • Processing Paradigms • High Availability (scalable Replication, Fast Failover, DR/GeoDR, tunable latency) • Scale-out (Sharding, Map-Reduce, Elasticity) • Performance (tuned for specific workloads, Caching, co-located compute with partitioned state) • Tunable/Eventual Consistency • Data Model Paradigms • Data first: Flexible Schema • Low-impedance mismatch between programming and data model: o Key-Documents and Objects (BLOBS, JSON, XML, POJO) o Key-Wide Sparse Column Sets o Graphs (e.g., RDF) • Range from devices, over OLTP Web 2.0 applications to BigData Analytics
  • 10. DATA MODELS Data Model Example Stores (apologies to the ones I did not list) Simple Key-Value Pairs Memcache, Redis, Dynamo, Voldermort, LevelDB, Azure Caching Wide Sparse Column Sets HyperTable, Big Table, Cassandra, HBASE, Hyperbase, Amazon DynamoDB, Windows Azure Tables, SQL Server/Azure Sparse columns BLOBs Amazon S3, Oracle Berkeley NoSQL, Windows Azure Blob Store, SQL Server RBS/FileTable JSON Documents MongoDB, CouchBase, Riak, RavenDB Graph Neo4J, GraphDB, HypergraphDB, Stig, Intellidimension Objects and XML Documents Versant, Oracle Berkeley NoSQL, MarkLogic, existDB, EMC HiveDB, SQL Server/Azure, Oracle, IBM DB2 Extended Relational Oracle, EMC SQLFire, IBM DB2, MySQL, Postgres, SQL Server/Azure
  • 11. WHAT CAN SQL LEARN FROM NOSQL? • Low CapEx, Low OpEx • Built-in tunable High-Availability • Data scale-out (Sharding) • Processing scale-out (Map-Reduce, Fan-Out, tunable consistency) • Flexible Data Models • JSON (& XML) support • Sparse columns/Column sets • Integrate with BigData Analytics (e.g., Hadoop) Many Relational Database Systems are incorporating these learning!
  • 12. EXAMPLE: SQL AZURE FEDERATIONS • Provides Data Partitioning/Sharding at the Data Platform • Enables applications to build elastic scale-out applications • Provides non-blocking SPLIT/DROP for shards (MERGE to come later) • Auto-connect to right shard based on sharding keyvalue • Provides SPLIT resilient query mode
  • 13. SQL AZURE FEDERATION CONCEPTS  Federation Represents the data being sharded Azure DB with Federation Root  Federation Root Federation Directories, Federation Database that logically houses federations, contains Users, Federation Distributions, … federation meta data  Federation Key Value that determines the routing of a piece of data Federation “Orders_Fed” (defines a Federation Distribution) (Federation Key: CustomerID)  Atomic Unit Member: PK [min, 100) All rows with the same federation key value: always together! AU AU AU PK=5 PK=25 PK=35  Federation Member (aka Shard) A physical container for a set of federated tables for a specific key range and reference tables Member: PK [100, 488)  Federated Table AU AU AU Table that contains only atomic units for the PK=105 PK=235 PK=365 Connection member’s key range Gateway  Reference Table Member: PK [488, max) Non-sharded table AU AU AU PK=555 PK=2545 PK=3565 Sharded 16 Application
  • 14. DEMO MAP-REDUCE SCALE-OUT OVER SQL AZURE FEDERATIONS • Sharded GamesInfo table using SQL Azure Federations • Use a C# library that does implement a Map/Reduce processor on top SQL Azure Federations • Mapper and Reducer are specified using SQL 17
  • 15. WHAT CAN NOSQL LEARN FROM SQL? • Flexible data is good, but: • Provide optional schema in data platform to help with constraints and optimizations • Procedural Scale-Out processing is good, but: • Develop a declarative language suited for and across the data models (e.g., coSQL) • Standardize suitable abstractions and languages • Eventual Consistency is good, but: • Provide users the choice • Simple Queries are good, but: • Provide me with secondary indexes • it will be more efficient to join between two collections of JSON documents in the query engine than in the Application layer Many NoSQL Database Systems are starting to incorporate these learnings!
  • 16. THE WEB 2.0 BUSINESS ARCHITECTURE Attract Individual Consumers: - Provide interesting service - Provide mobility Online - Provide social Monetize the Social: Business - Improve individual Monetize Individual: experience - Upsell service - VIP Application - Re-sell Aggregate Data (e.g., Advertisers) - Speed - Extra Capabilities
  • 17. SCALE-OUT DATA PLATFORM ARCHITECTURE Readable Replica Primary Copy Shard Readable OLTP Workloads Replica Traditional OLAP Workloads Highly Available known schema High Scale Readable Data warehouse, “Star joins” High Flexibility Replica Primary Shard Dynamic OLAP Workloads mostly touching 1 Readable to low number of Replica 3Vs (Volume, Velocity, Variety) shards Exploratory Readable Replica Primary Scale-out queries, often using Shard Query eventual consistent scale-out Readable frameworks like Hadoop Replica SQL or NoSQL Store
  • 18. BIG DATA REQUIRES AN END-TO-END APPROACH 21
  • 19. CALL TO ACTION • Familiarize yourself with the NoSQL genes in the Microsoft Online Platform • Free 3-Month Trial for Windows and SQL Azure: http://www.windowsazure.com • Engage with us throughout Strata Presentation Speaker Date and Time Do We Have the Tools We Need to Navigate Dave Campbell 2/29 9:00am PST the New World of Data? Onsite Interview * Tim O’Reilly, Dave Campbell 2/29 10:15am PST Unleash Insights on All Data With Microsoft Alexander Stojanovic 2/29 11:30am PST Big Data Office Hours (Q&A session) Dave Campbell 2/29 1:30pm PST Hadoop + Javascript: What We Learned Asad Khan 2/29 2:20pm PST Democratizing BI at Microsoft: 40,000 Users Kirkland Barrett 3/1 10:40am PST and Counting Data Marketplaces For Your Extended Piyush Lumba 3/1 2:20pm PST Enterprise • Download slides with additional information and related resources: http://www.slideshare.net/MichaelRys/presentations 22
  • 20. APPENDIX 23
  • 21. RELATED RESOURCES • Scale-Out with SQL Databases • http://gigaom.com/cloud/facebook-shares-some-secrets-on-making-mysql-scale/ • Windows Gaming Experience Case Study: http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=4000008310 • Scalable SQL: http://cacm.acm.org/magazines/2011/6/108663-scalable-sql • http://www.slideshare.net/MichaelRys/scaling-with-sql-server-and-sql-azure-federations • NoSQL and the Windows Azure Platform • Whitepaper: http://download.microsoft.com/download/9/E/9/9E9F240D-0EB6-472E-B4DE- 6D9FCBB505DD/Windows%20Azure%20No%20SQL%20White%20Paper.pdf • SQL Federation blog: http://blogs.msdn.com/b/cbiyikoglu/archive/2011/03/03/nosql-genes-in-sql-azure- federations.aspx • Contact me • @SQLServerMike • http://sqlblog.com/blogs/michael_rys/default.aspx

Editor's Notes

  1. Example MySpace architecture:Service Dispatcher coordination point between all SQL ServersCentralizes route managementAvoids routes explosion Load-balanced across 30 SQL ServersMessages are sent randomly to theseEnables multicast/broadcast functionalitySupports destination lists and wildcards e.g. [DB1,DB3, DB4], DB%18,000 ~2k msgs/sec per dispatcher SQL ServerMyDB sends a message with my status change and a target list specifying the DBs that store my friends data.The Service Dispatcher forwards the message these DBs.Each DB processes the message updating my status in a partitioned table
  2. Example MSN Casual Gaming:~2 Million users at launch~86 Million services requests/day 135 Windows Azure Data Services Hosting VMs ca. 18K connections in Connection Pools, this could grow with trafficCa. 1200 SQL Azure requests/second spread across all partitions during peak load~ 90% reads vs 10% writes (this varies per storage type)~ 200 bytes of storage per user~ 20% of database storage is currently used, but expect this to growSharded over 400 SQL Azure Databases
  3. Note: Big-sized companies invest resources in building these platforms instead of using existing relational platforms!
  4. No DB or OS Admin telling me what to do!
  5. Performance and Scale:Map/Reduce PatternsEventual consistency (trade-off due to CAP)ShardingCachingAutomate management Lifecycle:Elastic Scale on demand (no need to pay for resources until needed)Automatic Fail-overScalable Schema version rolloutPerf troubleshootingAuto alertingAuto loadbalancingAuto resourcing (e.g., auto splits based on policies)Declarative policy-based management
  6. Code First and revise quicklyWorking software over comprehensive documentationResponding to change over following a planApplication-model first (before database) Dictates the data model and queriesFlexible data modelsNo a priori modeling: Data first, schema later/Open SchemaKey/Value storesReduced impedance mismatch: JSON, XML, YAMLYou don’t know exactly what you are looking forMap/Reduce for adhoc analysisProvide Search across all your data instead of just queryLower Pain of adoption and maintenance From code to deployment & “monetization” of data, services, apps and tenantsRich Services out of the BoxData and services mashupEasy troubleshooting of deployed appsNo DB or OS Admin telling me what to do
  7. Low CapEx, Low OpEx: SQL Azure and other Platform as a Service offeringsBuilt-in High-Availability (tunable): SQL Azure has quorum based built-in replicasData scale-out (Sharding): SQL Azure FederationsProcessing scale-out (Map-Reduce, Fan-Out, tunable consistency)Flexible Data ModelsJSON (& XML) supportSparse columns/Column sets Integrate with BigData Analytics (e.g., Hadoop)
  8. SharePoint – BI, Enterprise Search, Enterprise Content Management, CollaborationTransform - ETLClean – Data Quality, AugmentationDiscover – Search, Meta-data, Classification, Information CatalogInfer – Recommendation Engines, Machine LearningShare – Publish, CollaborateGovern – Lineage & Impact Analysis, Master Data ManagementMarketplace – Private, Public, Bing Data, 3rd Party Data Sources, Models, Algorithms, APIs