Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
James Serra
Data Platform Solution Architect
Microsoft
Parallel Data
Warehouse v1
Data Allegro
product on
Windows &
SQL. First DW
appliance by
MSFT in
partnership
with
Dell and ...
Customer challenges in managing data
Increased data
types and volumes
Varied data sources
Added complexity
and cost
BI and analytics
Data management and processing
Data sources Non-relational data
Data enrichment and federated query
OLTP ...
Office 365
Azure
Parallelism
• Uses many separate CPUs running in parallel to execute a single
program
• Shared Nothing: Each CPU has its o...
SQL DW Logical Architecture (overview)
“Compute” node Balanced storage
SQL
“Compute” node Balanced storage
SQL
“Compute” n...
SQL DW Logical Architecture (overview)
“Compute” node Balanced storage
SQL“Control” node
SQL
“Compute” node Balanced stora...
Elastic scale & performance
Real-time elasticity
Resize in <1 minute On-demand compute
Expand or reduce
as needed
Storage can be as big or
small as required
Customers can execute niche
workloads without re-scanning data
Elastic scale & ...
Scale DWU’s
App Service
Intelligent App
Hadoop
Azure Machine
Learning
Power BI
Azure SQL
Database
SQL
AzureSQL Data
Warehouse
End-to-e...
Azure Data Factory
Migration Accelerator
ExpressRoute
End-to-end platform built for the cloud
Bring compute to data, keep ...
Market leading price/performance
Bring your data warehouse to the cloud
Automated
Minimize cost
Policy-based
Secure data
Market leading price/performance
Query unstructured data via PolyBase/T-SQL
PolyBase
Scale out compute
SQL DW Instance
Had...
Market leading price/performance
Hassle-free management
Infrastructure
Management
Azure support
With built-in ease of use
When Paused, Pay only for Storage
Use it only when you need it – no reloading / restoring of data
Save Costs with Dynamic ...
Geo-storage replication
 Azure Storage Page Blobs, 3 copies locally
 High durability/availability
 Another 3 copies in ...
• Auto backups, every 4 hours
• On-demand backups in Azure Storage
• REST API, PowerShell or Azure Portal
• Scheduled expo...
Hybrid scenarios which work well
Both Analytics Platform System and Azure SQL Data Warehouse
have a Massively Parallel Pro...
Microsoft
Data
Platform
Relational Beyond-Relational
On-premisesCloud
Comprehensive
Connected
Choice
SQL ServerAzureVM
Azu...
SQL DW: Building on SQL DB Foundation
Elastic, Petabyte Scale
DW Optimized
99.99% uptime SLA,
Geo-restore
Azure Compliance...
Measure of power Simply buy the query performance you need, not just hardware
Transparency Quantified by workload objectiv...
What is Hadoop?
Microsoft Confidential
 Distributed, scalable system on commodity HW
 Composed of a few parts:
 HDFS – ...
Use cases where PolyBase simplifies using Hadoop data
Bringing islands of Hadoop data together
High performance queries ag...




Azure SQL Data Warehouse loading patterns and strategies: https://blogs.msdn.microsoft.com/sqlcat/2016/02/06/azure...
Broad SQL Server Partner
Ecosystem
+ Leverage Azure ML, HDInsight, PowerBI, ADF,
and more.
+ Industry’s broadest ecosystem...
Market-Leading Price/Performance
• Best On-Demand Price/Performance
‐ Advantages in elasticity and pause to
reduce custome...
How does SQL Data Warehouse differ from Redshift?
Elasticity
Amazon Redshift SQL DW
Pause/resume
Simplicity
Hybrid
Compati...
Summary: Azure SQL DW Service
A relational data warehouse-as-a-service, fully managed by Microsoft.
Industries first elast...
Azure getting started
• Free Azure account, $200 in credit, https://azure.microsoft.com/en-us/free/
• Startups: BizSpark, ...
Questions?
James Serra
jserra@microsoft.com
Introducing Azure SQL Data Warehouse
Upcoming SlideShare
Loading in …5
×

Introducing Azure SQL Data Warehouse

4,852 views

Published on

The new Microsoft Azure SQL Data Warehouse (SQL DW) is an elastic data warehouse-as-a-service and is a Massively Parallel Processing (MPP) solution for "big data" with true enterprise class features. The SQL DW service is built for data warehouse workloads from a few hundred gigabytes to petabytes of data with truly unique features like disaggregated compute and storage allowing for customers to be able to utilize the service to match their needs. In this presentation, we take an in-depth look at implementing a SQL DW, elastic scale (grow, shrink, and pause), and hybrid data clouds with Hadoop integration via Polybase allowing for a true SQL experience across structured and unstructured data.

Published in: Technology

Introducing Azure SQL Data Warehouse

  1. 1. James Serra Data Platform Solution Architect Microsoft
  2. 2. Parallel Data Warehouse v1 Data Allegro product on Windows & SQL. First DW appliance by MSFT in partnership with Dell and HP Microsoft Acquired Data Allegro Company viewed as most efficient way to bring MPP to SQL Server world Analytics Platform System (APS) Introduction of Hadoop region within appliance and new naming to reflect broader Big Data capabilities SQL DW Service Introduction of Azure SQL DW Service based on APS’s MPP capabilities Fast Track Data Warehouse Launch DW Reference Architectures based on SMP DW best practices offered with leading H/W Partners Parallel Data Warehouse v2 Re-architected Product delivering new form factors and greatly improved price/performa nce. Microsoft & Data Warehouse 2008 20132010 201520142011
  3. 3. Customer challenges in managing data Increased data types and volumes Varied data sources Added complexity and cost
  4. 4. BI and analytics Data management and processing Data sources Non-relational data Data enrichment and federated query OLTP ERP CRM LOB Devices Web Sensors Social Self-service Corporate Collaboration Mobile Machine learning Single query model Extract, transform, load Data quality Master data management Box software Appliances Cloud SQL Server Box software Appliances Cloud
  5. 5. Office 365 Azure
  6. 6. Parallelism • Uses many separate CPUs running in parallel to execute a single program • Shared Nothing: Each CPU has its own memory and disk (scale-out) • Segments communicate using high-speed network between nodes MPP - Massively Parallel Processing • Multiple CPUs used to complete individual processes simultaneously • All CPUs share the same memory, disks, and network controllers (scale-up) • All SQL Server implementations up until now have been SMP • Mostly, the solution is housed on a shared SAN SMP - Symmetric Multiprocessing
  7. 7. SQL DW Logical Architecture (overview) “Compute” node Balanced storage SQL “Compute” node Balanced storage SQL “Compute” node Balanced storage SQL “Compute” node Balanced storage SQL DMS DMS DMS DMS Compute Node – the “worker bee” of SQL DW • Runs Azure SQL Server DB • Contains a “slice” of each database • CPU is saturated by storage Control Node – the “brains” of the SQL DW • Also runs Azure SQL Server DB • Holds a “shell” copy of each database • Metadata, statistics, etc • The “public face” of the appliance Data Movement Services (DMS) • Part of the “secret sauce” of SQL DW • Moves data around as needed • Enables parallel operations among the compute nodes (queries, loads, etc) “Control” node SQL DMS
  8. 8. SQL DW Logical Architecture (overview) “Compute” node Balanced storage SQL“Control” node SQL “Compute” node Balanced storage SQL “Compute” node Balanced storage SQL “Compute” node Balanced storage SQL DMS DMS DMS DMS DMS 1) User connects to the appliance (control node) and submits query 2) Control node query processor determines best *parallel* query plan 3) DMS distributes sub-queries to each compute node 4) Each compute node executes query on its subset of data 5) Each compute node returns a subset of the response to the control node 6) If necessary, control node does any final aggregation/computation 7) Control node returns results to user Queries running in parallel on a subset of the data, using separate pipes effectively making the pipe larger
  9. 9. Elastic scale & performance Real-time elasticity Resize in <1 minute On-demand compute Expand or reduce as needed
  10. 10. Storage can be as big or small as required Customers can execute niche workloads without re-scanning data Elastic scale & performance Scale
  11. 11. Scale DWU’s
  12. 12. App Service Intelligent App Hadoop Azure Machine Learning Power BI Azure SQL Database SQL AzureSQL Data Warehouse End-to-end platform built for the cloud Power of integration
  13. 13. Azure Data Factory Migration Accelerator ExpressRoute End-to-end platform built for the cloud Bring compute to data, keep data in its place
  14. 14. Market leading price/performance Bring your data warehouse to the cloud Automated Minimize cost Policy-based Secure data
  15. 15. Market leading price/performance Query unstructured data via PolyBase/T-SQL PolyBase Scale out compute SQL DW Instance Hadoop VMs / Azure Storage Any data, any size, anywhere
  16. 16. Market leading price/performance Hassle-free management Infrastructure Management Azure support With built-in ease of use
  17. 17. When Paused, Pay only for Storage Use it only when you need it – no reloading / restoring of data Save Costs with Dynamic Pause and Resume • When paused, cloud-scale storage is min cost. • Policy-based (i.e. Nights/weekends) • Automate via PowerShell/REST API • Data remains in place
  18. 18. Geo-storage replication  Azure Storage Page Blobs, 3 copies locally  High durability/availability  Another 3 copies in different region Defend against regional disasters Geo replication
  19. 19. • Auto backups, every 4 hours • On-demand backups in Azure Storage • REST API, PowerShell or Azure Portal • Scheduled exports • Near-online backup/restore • Backups retention policy: • Auto backups, up to 35 days • On-demand backups retained indefinitely Geo- replicated Restore from backup SQL DW backups sabcp01bl21 Azure Storage sabcp01bl21 Automatic backup and geo-restore Recover from data deletion or alteration or disaster
  20. 20. Hybrid scenarios which work well Both Analytics Platform System and Azure SQL Data Warehouse have a Massively Parallel Processing (MPP) engine. Here are a few scenarios where they can be leveraged together. Dev/test Test new ideas in SQL DW before rolling out to production in APS Archive Archive cold data to blob storage for any workload execution Governance Store data in APS that company policy prohibits being in the cloud
  21. 21. Microsoft Data Platform Relational Beyond-Relational On-premisesCloud Comprehensive Connected Choice SQL ServerAzureVM Azure SQL DB Azure SQL DW AzureData Lake Analytics AzureData Lake Store Fast Trackfor SQL Server AnalyticsPlatformSystem SQL Server2016 + SuperdomeX AnalyticsPlatformSystem Hadoop Federated Query Power BI AzureMachineLearning AzureData Factory
  22. 22. SQL DW: Building on SQL DB Foundation Elastic, Petabyte Scale DW Optimized 99.99% uptime SLA, Geo-restore Azure Compliance (ISO, HIPAA, EU, etc.) True SQL Server Experience; Existing Tools Just Work SQL DW SQL DB Service Tiers
  23. 23. Measure of power Simply buy the query performance you need, not just hardware Transparency Quantified by workload objectives: how fast rows are scanned, loaded, copied On demand First DW service to offer compute power on demand, independent of storage Scan Rate 3.36M row/sec Loading Rate 130K row/sec Table Copy Rate 350K row/sec * * 100 DWU = 297 sec 400 DWU = 74 sec 800 DWU = 37 sec 1,600 DWU = 19 sec *
  24. 24. What is Hadoop? Microsoft Confidential  Distributed, scalable system on commodity HW  Composed of a few parts:  HDFS – Distributed file system  MapReduce – Programming model  Other tools: Hive, Pig, SQOOP, HCatalog, HBase, Flume, Mahout, YARN, Tez, Spark, Stinger, Oozie, ZooKeeper, Flume, Storm  Main players are Hortonworks, Cloudera, MapR  WARNING: Hadoop, while ideal for processing huge volumes of data, is inadequate for analyzing that data in real time (companies do batch analytics instead) Core Services OPERATIONAL SERVICES DATA SERVICES HDFS SQOOP FLUME NFS LOAD & EXTRACT WebHDFS OOZIE AMBARI YARN MAP REDUCE HIVE & HCATALOG PIG HBASEFALCON Hadoop Cluster compute & storage . . . . . . . . compute & storage . . Hadoop clusters provide scale-out storage and distributed data processing on commodity hardware
  25. 25. Use cases where PolyBase simplifies using Hadoop data Bringing islands of Hadoop data together High performance queries against Hadoop data (Predicate pushdown) Archiving data warehouse data to Hadoop (move) (Hadoop as cold storage) Exporting relational data to Hadoop (copy) (Hadoop as backup, analysis, on-prem use) Importing Hadoop data into data warehouse (copy) (Hadoop as staging area, sandbox, Data Lake)
  26. 26.     Azure SQL Data Warehouse loading patterns and strategies: https://blogs.msdn.microsoft.com/sqlcat/2016/02/06/azure-sql-data-warehouse-loading-patterns-and-strategies/
  27. 27. Broad SQL Server Partner Ecosystem + Leverage Azure ML, HDInsight, PowerBI, ADF, and more. + Industry’s broadest ecosystem of DW partners, including Tableau, Informatica, Attunity, and SAP. Streamlined deployment with Azure Portal. Deep tool integration with top partners including: • Single-click configuration • Optimized data movement • Logical pushdown Azure SQL DW Azure ML Azure Event Hub Azure HDInsight
  28. 28. Market-Leading Price/Performance • Best On-Demand Price/Performance ‐ Advantages in elasticity and pause to reduce customer cost • SQL DW start small, can grow to PB+ • Pay for performance by scaling compute against storage 100GB 1TB 2TB 1+PB Performance
  29. 29. How does SQL Data Warehouse differ from Redshift? Elasticity Amazon Redshift SQL DW Pause/resume Simplicity Hybrid Compatibility
  30. 30. Summary: Azure SQL DW Service A relational data warehouse-as-a-service, fully managed by Microsoft. Industries first elastic cloud data warehouse with enterprise-grade capabilities. Support your smallest to your largest data storage needs while handling queries up to 100x faster.
  31. 31. Azure getting started • Free Azure account, $200 in credit, https://azure.microsoft.com/en-us/free/ • Startups: BizSpark, $750/month free Azure, BizSpark Plus - $120k/year free Azure, https://www.microsoft.com/bizspark/ • MSDN subscription, $150/month free Azure, https://azure.microsoft.com/en-us/pricing/member- offers/msdn-benefits/ • Microsoft Educator Grant Program, faculty - $250/month free Azure for a year, students - $100/month free Azure for 6 months, https://azure.microsoft.com/en-us/pricing/member- offers/msdn-benefits/ • Microsoft Azure for Research Grant, http://research.microsoft.com/en- us/projects/azure/default.aspx • DreamSpark for students, https://www.dreamspark.com/Student/Default.aspx • DreamSpark for academic institutions: https://www.dreamspark.com/Institution/Subscription.aspx • Various Microsoft funds
  32. 32. Questions? James Serra jserra@microsoft.com

×