Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to PolyBase

2,167 views

Published on

First introduced with the Analytics Platform System (APS), PolyBase simplifies management and querying of both relational and non-relational data using T-SQL. It is now available in both Azure SQL Data Warehouse and SQL Server 2016. The major features of PolyBase include the ability to do ad-hoc queries on Hadoop data and the ability to import data from Hadoop and Azure blob storage to SQL Server for persistent storage. A major part of the presentation will be a demo on querying and creating data on HDFS (using Azure Blobs). Come see why PolyBase is the “glue” to creating federated data warehouse solutions where you can query data as it sits instead of having to move it all to one data platform.

Published in: Technology

Introduction to PolyBase

  1. 1. Introduction to PolyBase James Serra Big Data Evangelist Microsoft JamesSerra3@gmail.com
  2. 2. About Me  Microsoft, Big Data Evangelist  In IT for 30 years, worked on many BI and DW projects  Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM architect, PDW/APS developer  Been perm employee, contractor, consultant, business owner  Presenter at PASS Business Analytics Conference, PASS Summit, Enterprise Data World conference  Certifications: MCSE: Data Platform, Business Intelligence; MS: Architecting Microsoft Azure Solutions, Design and Implement Big Data Analytics Solutions, Design and Implement Cloud Data Platform Solutions  Blog at JamesSerra.com  Former SQL Server MVP  Author of book “Reporting with Microsoft SQL Server 2012”
  3. 3. Provides a scalable, T-SQL compatible query processing framework for combining data from both universes
  4. 4. 2012 2013 ……… 2016…2014 PolyBase in SQL Server 16 (CTP3) PolyBase in SQL DW PolyBase in SQL Server 2016 2015
  5. 5. PolyBase Query relational and non-relational data with T-SQL
  6. 6. Disaster recovery: We have several customers that use a pattern of APS > Blob Storage > SQL DW (all via PolyBase) as a pattern for DR (using the cloud service)
  7. 7. SELECT TOP 10 * FROM SQLServer S JOIN Hadoop H S.Key = H.Key
  8. 8. SELECT TOP 10 * FROM SQLServer S JOIN Hadoop H S.Key = H.Key
  9. 9. SELECT TOP 10 * FROM SQLServer S JOIN Hadoop H S.Key = H.Key
  10. 10. SELECT TOP 10 * FROM SQLServer S JOIN Blob B S.Key = B.Key
  11. 11. SELECT TOP 10 * FROM SQLServer S JOIN Blob B S.Key = B.Key
  12. 12. SELECT TOP 10 * FROM SQLServer S JOIN Hadoop H S.Key = H.Key JOIN Blob B and S.Key = B.Key
  13. 13. https://msdn.microsoft.com/en-us/library/mt143174.aspx
  14. 14. Polybase (works with) Azure Blob Store Push Down HDInsight Push Down Cloudera Push Down HortonWorks Push Down Azure Data Lake Store Push Down SQL 2016 (Now) Yes N/A Yes No Yes Yes Yes Yes No N/A SQL 2016 (Near future) Yes N/A Yes No Yes Yes Yes Yes No N/A Azure SQL DW (Now) Yes N/A Yes No No No No No Yes! N/A Azure SQL DW (Near future) Yes N/A Yes No No No No No Yes N/A APS (Now) Yes N/A Yes Yes (int). No (ext) Yes Yes Yes Yes No N/A APS (Near future) Yes N/A Yes Yes/No Yes Yes Yes Yes No N/A
  15. 15. https://msdn.microsoft.com/en-us/library/mt607030.aspx Allows you to create a cluster of SQL Server instances to process large data sets from external data sources in a scale-out fashion for better query performance
  16. 16. CREATE DATABASE SCOPED CREDENTIAL HadoopCredential WITH IDENTITY = 'hadoopUserName', Secret = 'hadoopPassword'; CREATE EXTERNAL DATA SOURCE HadoopCluster WITH (TYPE = Hadoop, LOCATION = 'hdfs://10.193.26.177:8020', RESOURCE_MANAGER_LOCATION = '10.193.26.178:8050', HadoopCredential); CREATE EXTERNAL FILE FORMAT TextFile WITH ( FORMAT_TYPE = DELIMITEDTEXT, DATA_COMPRESSION = 'org.apache.hadoop.io.compress.GzipCodec', FORMAT_OPTIONS (FIELD_TERMINATOR ='|', USE_TYPE_DEFAULT = TRUE)); CREATE EXTERNAL TABLE [dbo].[Customer] ( [SensorKey] int NOT NULL, int NOT NULL, [Speed] float NOT NULL ) WITH (LOCATION='//Sensor_Data//May2014/', DATA_SOURCE = HadoopCluster, FILE_FORMAT = TextFile ); Once per Hadoop User HDFS File Path Once per File Format Once per Hadoop Cluster per user
  17. 17. Resources  PolyBase guide: https://msdn.microsoft.com/en-us/library/mt143171.aspx  Azure SQL Data Warehouse loading patterns and strategies: http://bit.ly/1XskZL2
  18. 18. Q & A ? James Serra, Big Data Evangelist Email me at: JamesSerra3@gmail.com Follow me at: @JamesSerra Link to me at: www.linkedin.com/in/JamesSerra Visit my blog at: JamesSerra.com (where this slide deck is posted via the “Presentations” link on the top menu)

×