USQL Trivadis Azure Data Lake Event

4
Data sourcesNon-relational data
DESIGNED FOR THE
QUESTIONS YOU KNOW!

The Data Lake Approach
Ingest all data
regardless of
requirements
Store all data
in native format
without schema
definition
Do analysis
Hadoop, Spark, R,
Azure Data Lake
Analytics (ADLA)
Interactive queries
Batch queries
Machine Learning
Data warehouse
Real-time analytics
Devices

Microsoft’s Big Data Journey
We needed to better leverage data and analytics to
do more experimentation
So, we built a Data Lake for Microsoft:
• A data lake for everyone to put their data
• Tools approachable by any developer
• Batch, Interactive, Streaming, ML
By the numbers
• Exabytes of data under management
• 100Ks of Physical Servers
• 100Ks of Batch Jobs, Millions of Interactive Queries
• Huge Streaming Pipelines
• 10K+ Developers running diverse workloads and scenarios
2010 2013 2017
Windows
SMSG
Live
Bing
CRM/Dynamics
Xbox Live
Office365
Malware Protection Microsoft Stores
Commerce Risk
Skype
LCA
Exchange
Yammer
Data Stored

Culture Changes Engineering
How is the system performing? What is the experience my customers are
having? How does that correlate to other actions?
Is my feature successful ?
Marketing
What can we observe from our customers to increase revenues?
Management
How do I drive my business based on the data?
Field
Where are there new opportunities? How can I connect with my
customers more deeply?
Support
How does this customer’s experience compare with others?

HDFS Compatible REST API
ADL Store
.NET, SQL, Python, R
scaled out by U-SQL
ADL Analytics
Open Source Apache
Hadoop ADL Client
Azure Databricks
HDInsight
Hive
• Performance at
scale
• Optimized for
analytics
• Multiple
analytics engines
• Single repository
sharing

ADL Store
Storage
• Architected and built for very high throughput at scale for Big Data workloads
• No limits to file size, account size or number of files
• Single-repository for sharing
• Cloud-scale distributed filesystem with file/folder ACLS and RBAC
• Encryption-at-rest by default with Azure Key Vault
• Authenticated access with Azure Active Directory integration
• Formal Certifications incl. ISO, SOC, PCI, HIPAA

ADL Store
Analytics
Storage
Cloudera CDH
Hortonworks HDP
Qubole QDS
• Open Source Apache® ADL client
for commercial and custom Hadoop
• Cloud IaaS and Hybrid

Best of Databricks Best of Microsoft
Designed in collaboration with the founders of Apache Spark
One-click set up; streamlined workflows
Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.
Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage)
Enterprise-grade Azure security (Active Directory integration, compliance, enterprise -grade SLAs)
A Z U R E D ATA B R I C K S
A F A S T , E A S Y , A N D C O L L A B O R A T I V E A P A C H E S P A R K B A S E D A N A L Y T I C S P L A T F O R M

HDInsight
ADL Store
Hive
Analytics
Storage
• 63% lower TCO
than on-premise*
• SLA- managed,
monitored and
supported by
Microsoft
• Fully managed
Hadoop, Spark
and R
• Clusters
deployed in
minutes
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”

ADL Store
.NET, SQL, Python, R
scaled out by U-SQL
ADL Analytics• Serverless. Pay per job. Starts in
seconds. Scales instantly.
• Develop massively parallel
programs with simplicity
• Federated query from multiple data
sources

Scales out your custom code in .NET, Python, R over
your Data Lake
Familiar syntax to millions of SQL & .NET developers
Unifies
• Declarative nature of SQL with the imperative
power of your language of choice (e.g., C#,
Python)
• Processing of structured, semi-structured and
unstructured data
• Querying multiple Azure Data Sources
(Federated Query)
U-SQL
A framework for Big Data

• SQL forms the declarative basis of the language:
• GROUP BY/Aggs
• Windowing Expressions
• PIVOT/UNPIVOT
• CROSS APPLY
• JOINs
• Etc.
• Uses .NET Types and C# Expression language
• Rich Extensibility model that allows to scale out your custom
extension code written in .Net/C#, Python, R
• Operates on unstructured data (Csv, images etc)
• Operates on semistructured data (XML, JSON, Avro)
• Operates on structured files (Parquet)
• Provides Metadata Catalog (DB, Schema):
• U-SQL Tables (for improved performance)
• U-SQL code objects (View, TVFs, Procs)
• Extension code objects (U-SQL Assemblies)
• Etc.
• Provides Federated Queries against “SQL in Azure”

Develop massively parallel programs with simplicity
A simple U-SQL script can scale
from Gigabytes to Petabytes
without learning complex big data
programming techniques.
U-SQL automatically generates a scaled
out and optimized execution plan to
handle any amount of data.
Execution nodes immediately
rapidly allocated to run the
program.
Error handling, network issues, and
runtime optimization are handled
automatically.
@searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int,
Urls string,
ClickedUrls string
FROM @"/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
OUTPUT @searchlog
TO @"/Samples/Output/SearchLog_output.tsv"
USING Outputters.Tsv();

• Admin and Dev Tooling in
• Azure Portal
• VisualStudio 2013 to 2017 (with local execution mode!)
• VS Code (cross platform)
• Azure Data Factory:
• Data movement
• Job submission and orchestration
• Powershell and Cross-Platform CLI support
• SDKs for common languages:
• .Net
• Java
• Python
• Node.js

 Automatic "in-lining"
optimized out-of-the-
box
 Per job
parallelization
visibility into execution
 Heatmap to identify
bottlenecks

https://github.com/Azure/usql/tree/master/Examples/ImageApp
https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u-sql-cognitive
Car
Green
Parked
Outdoor
Racing

High-level
Roadmap
• Worldwide Region Availability (currently US and EU)
• Interactive Access with T-SQL query
• Scale out your custom code in the language of choice
(.Net, Java, Python, etc)
• Process the data formats of your choice (incl. Parquet,
ORC; larger string values)
• Continued ADF, AAS, ADC, SQL DW, EventHub, SSIS
integration
• Administrative policies to control usage/cost for storage
& compute
• Secure data sharing between common AAD and public
read-only sharing, fine grained ACLing
• Intense focus on developer productivity for authoring,
debugging, and optimization
• General customer feedback
http://aka.ms/adlfeedback

Resources http://usql.io
http://blogs.msdn.microsoft.com/azuredatalake/
http://blogs.msdn.microsoft.com/mrys/
https://channel9.msdn.com/Search?term=U-SQL#ch9Search
http://aka.ms/usql_reference
https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u-
sql-programmability-guide
https://docs.microsoft.com/en-us/azure/data-lake-analytics/
https://msdn.microsoft.com/en-us/magazine/mt614251
https://msdn.microsoft.com/magazine/mt790200
http://www.slideshare.net/MichaelRys
Getting Started with R in U-SQL
https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u-
sql-python-extensions
https://social.msdn.microsoft.com/Forums/azure/en-
US/home?forum=AzureDataLake
http://stackoverflow.com/questions/tagged/u-sql
http://aka.ms/adlfeedback
Continue your education at
Microsoft Virtual Academy
online.

USQL Trivadis Azure Data Lake Event

USQL Trivadis Azure Data Lake Event

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to USQL Trivadis Azure Data Lake Event

Similar to USQL Trivadis Azure Data Lake Event (20)

More from Trivadis

More from Trivadis (20)

Recently uploaded

Recently uploaded (20)

USQL Trivadis Azure Data Lake Event