Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

U-SQL Federated Distributed Queries (SQLBits 2016)


Published on

U-SQL Federated Distributed Queries (SQLBits 2016 ADL/U-SQL Pre-Conference)
Data Sources, External Tables, Credentials, Federated Queries

Published in: Data & Analytics
  • Be the first to comment

U-SQL Federated Distributed Queries (SQLBits 2016)

  1. 1. Michael Rys Principal Program Manager, Big Data @ Microsoft @MikeDoesBigData, {mrys, usql} U-SQL Federated Distributed Queries
  2. 2. Query data where it lives Easily query data in multiple Azure data stores without moving it to a single store Benefits • Avoid moving large amounts of data across the network between stores • Single view of data irrespective of physical location • Minimize data proliferation issues caused by maintaining multiple copies • Single query language for all data • Each data store maintains its own sovereignty • Design choices based on the need • Push SQL expressions to remote SQL sources • Filters • Joins U-SQL Query Query Azure Storage Blobs Azure SQL in VMs Azure SQL DB Azure Data Lake Analytics Azure SQL Data Warehouse Azure Data Lake Storage
  3. 3. Federated queries • Minimize data proliferation through data consolidation • Same U-SQL over all Azure data (WASB, SQL Azure) • Efficient and reliable execution strategies • Striving to maintain semantic equivalence • Design choices based on requirements: • Schema-less design • fast time-to-query and exploratory analysis • Schematized design • protect applications from data source changes • Advanced federated query capabilities: • Built-in decisions to optimize for performance • push downs of joins, predicates, projection • Control when and what to push down • Prevent data source overload • Provide control over semantics
  4. 4. Data sources and external tables • Secure credential management • Data sources to manage connections and remoting of queries • Schematized design: external tables to provide early bound tables for federated queries Create secret in PowerShell New-AzureRMDataLakeAnalyticsCatalogSecret Create credential CREATE CREDENTIAL Secret WITH USER_NAME = “user@server", IDENTITY = "Secret"; Create external data source on • Azure SQL DB • Azure SQL DW • SQL Server in Azure VM CREATE DATA SOURCE SQL_PATIENTS FROM SQLSERVER WITH ( PROVIDER_STRING = "Database=DB;Trusted_Connection=False;Encrypt=False" , CREDENTIAL = Secret , REMOTABLE_TYPES = (bool, byte, short, string, DateTime) ); External tables (optional) CREATE EXTERNAL TABLE sql_patients ( [custkey] int, [name] string, [address] string ) FROM SQL_PATIENTS LOCATION "dbo.patients";
  5. 5. Federated queries • Queries have to be in a different script from data source • Pass-through queries to execute remote language • Schema-less design: query data source location • Schematized design: query external tables • Semantics of federated queries close to U-SQL and C# Pass-Through Query @alive_patients = SELECT * FROM EXTERNAL SQL_PATIENTS EXECUTE @" SELECT name , CASE WHEN is_alive = 1 THEN 'Alive' ELSE 'Deceased' END AS status , address, nationkey, phone FROM dbo.patients"; Query Data Source Location @patients = SELECT * FROM EXTERNAL master.SQL_PATIENTS LOCATION "dbo.patients"; Query External Tables @patients = SELECT * FROM EXTERNAL master.dbo.sql_patients; Execution • U-SQL Semantics • Pushes predicates and even joins based on remotable types
  6. 6.