Michael Rys
Principal Program Manager, Big Data @ Microsoft
@MikeDoesBigData, {mrys, usql}@microsoft.com
U-SQL Meta Data Catalog
2016/04/04
Meta Data Object Model
ADLA Catalog
Database
Schema
[1,n]
[1,n]
[0,n]
tables views TVFs
C# Fns C# UDAgg
Clustered
Index
partitions
C#
Assemblies
C# Extractors
Data Source
C# Reducers
C# Processors
C# Combiners
C# Outputters
Ext. tables Procedures
Creden-
tials
C# Applier
Table Types
Statistics
C# UDTs
Abstract
objects
User
objects
Refers toContains Implemented
and named by
MD
Name
C# Name
Legend
U-SQL Catalog
• Naming
• Discovery
• Sharing
• Securing
Naming
• Default database and schema context: master.dbo
• Quote identifiers with []: [my table]
• Stores data in ADL Storage /catalog folder
Discovery
• Visual Studio Server Explorer
• Azure Data Lake Analytics Portal
• SDKs and Azure PowerShell commands
Sharing
• Within an Azure Data Lake Analytics account
Securing
• Secured with AAD principals at catalog level (inherited
from ADL Storage)
Views and TVFs
• Views for simple
cases
• TVFs for
parameterization
and most cases
Views
CREATE VIEW V AS EXTRACT…
CREATE VIEW V AS SELECT …
• Cannot contain user-defined objects (such as UDFs or
UDOs)
• Will be inlined
Table-Valued Functions (TVFs)
CREATE FUNCTION F (@arg string = "default")
RETURNS @res [TABLE ( … )]
AS BEGIN … @res = … END;
• Provides parameterization
• One or more results
• Can contain multiple statements
• Can contain user-code (needs assembly reference)
• Will always be inlined
• Infers schema or checks against specified return schema
Procedures
Allows encapsulation
of non-DDL scripts
CREATE PROCEDURE P (@arg string = "default“)
AS
BEGIN
…;
OUTPUT @res TO …;
INSERT INTO T …;
END;
• Provides parameterization
• No result but writes into file or table
• Can contain multiple statements
• Can contain user code (needs assembly
reference)
• Will always be inlined
• Cannot contain DDL (no CREATE, DROP)
Table types
Enables you to name
a table schema
Provides reuse for
function/procedure
definitions
CREATE TYPE T AS TABLE(c1 string, c2 int );
CREATE FUNCTION F (@table_arg T)
RETURNS @res T
AS BEGIN … @res = … END;
Tables
• CREATE TABLE
• CREATE TABLE AS
SELECT
CREATE TABLE T (col1 int
, col2 string
, col3 SQL.MAP<string,string>
, INDEX idx CLUSTERED (col1 ASC)
PARTITIONED BY HASH (driver_id)
);
• Structured Data
• Built-in Data types only (no UDTs)
• Clustered index (must be specified): row-oriented
• Fine-grained partitioning (must be specified):
• HASH, DIRECT HASH, RANGE, ROUND ROBIN
CREATE TABLE T (INDEX idx CLUSTERED …) AS SELECT …;
CREATE TABLE T (INDEX idx CLUSTERED …) AS EXTRACT…;
CREATE TABLE T (INDEX idx CLUSTERED …) AS
myTVF(DEFAULT);
• Infer the schema from the query
• Still requires index and partitioning
Additional
Resources
Documentation
U-SQL DDL: https://msdn.microsoft.com/en-
us/library/azure/mt621299.aspx
Sample Projects
https://github.com/Azure/usql/tree/master/Examples/Ambulan
ceDemos/AmbulanceDemos/2-Ambulance-Structured%20Data
https://github.com/Azure/usql/tree/master/Examples/TweetAn
alysis
http://aka.ms/AzureDataLake

U-SQL Meta Data Catalog (SQLBits 2016)

  • 1.
    Michael Rys Principal ProgramManager, Big Data @ Microsoft @MikeDoesBigData, {mrys, usql}@microsoft.com U-SQL Meta Data Catalog 2016/04/04
  • 2.
    Meta Data ObjectModel ADLA Catalog Database Schema [1,n] [1,n] [0,n] tables views TVFs C# Fns C# UDAgg Clustered Index partitions C# Assemblies C# Extractors Data Source C# Reducers C# Processors C# Combiners C# Outputters Ext. tables Procedures Creden- tials C# Applier Table Types Statistics C# UDTs Abstract objects User objects Refers toContains Implemented and named by MD Name C# Name Legend
  • 3.
    U-SQL Catalog • Naming •Discovery • Sharing • Securing Naming • Default database and schema context: master.dbo • Quote identifiers with []: [my table] • Stores data in ADL Storage /catalog folder Discovery • Visual Studio Server Explorer • Azure Data Lake Analytics Portal • SDKs and Azure PowerShell commands Sharing • Within an Azure Data Lake Analytics account Securing • Secured with AAD principals at catalog level (inherited from ADL Storage)
  • 5.
    Views and TVFs •Views for simple cases • TVFs for parameterization and most cases Views CREATE VIEW V AS EXTRACT… CREATE VIEW V AS SELECT … • Cannot contain user-defined objects (such as UDFs or UDOs) • Will be inlined Table-Valued Functions (TVFs) CREATE FUNCTION F (@arg string = "default") RETURNS @res [TABLE ( … )] AS BEGIN … @res = … END; • Provides parameterization • One or more results • Can contain multiple statements • Can contain user-code (needs assembly reference) • Will always be inlined • Infers schema or checks against specified return schema
  • 6.
    Procedures Allows encapsulation of non-DDLscripts CREATE PROCEDURE P (@arg string = "default“) AS BEGIN …; OUTPUT @res TO …; INSERT INTO T …; END; • Provides parameterization • No result but writes into file or table • Can contain multiple statements • Can contain user code (needs assembly reference) • Will always be inlined • Cannot contain DDL (no CREATE, DROP)
  • 7.
    Table types Enables youto name a table schema Provides reuse for function/procedure definitions CREATE TYPE T AS TABLE(c1 string, c2 int ); CREATE FUNCTION F (@table_arg T) RETURNS @res T AS BEGIN … @res = … END;
  • 8.
    Tables • CREATE TABLE •CREATE TABLE AS SELECT CREATE TABLE T (col1 int , col2 string , col3 SQL.MAP<string,string> , INDEX idx CLUSTERED (col1 ASC) PARTITIONED BY HASH (driver_id) ); • Structured Data • Built-in Data types only (no UDTs) • Clustered index (must be specified): row-oriented • Fine-grained partitioning (must be specified): • HASH, DIRECT HASH, RANGE, ROUND ROBIN CREATE TABLE T (INDEX idx CLUSTERED …) AS SELECT …; CREATE TABLE T (INDEX idx CLUSTERED …) AS EXTRACT…; CREATE TABLE T (INDEX idx CLUSTERED …) AS myTVF(DEFAULT); • Infer the schema from the query • Still requires index and partitioning
  • 9.
    Additional Resources Documentation U-SQL DDL: https://msdn.microsoft.com/en- us/library/azure/mt621299.aspx SampleProjects https://github.com/Azure/usql/tree/master/Examples/Ambulan ceDemos/AmbulanceDemos/2-Ambulance-Structured%20Data https://github.com/Azure/usql/tree/master/Examples/TweetAn alysis
  • 10.

Editor's Notes

  • #5 https://github.com/Azure/usql/tree/master/Examples/AmbulanceDemos/AmbulanceDemos/2-Ambulance-Structured%20Data https://github.com/Azure/usql/tree/master/Examples/TweetAnalysis