Intelligence
Dashboards &
Visualizations
Information
Management
Data Stores Machine Learning
and Analytics
CortanaEvent Hubs
HDInsight
(Hadoop and
Spark)
Stream
Analytics
Data Intelligence Action
People
Automated
Systems
Apps
Web
Mobile
Bots
Bot
Framework
SQL Data
WarehouseData Catalog
Data Lake
Analytics
Data Factory
Machine
Learning
Data Lake Store
Cognitive
Services
Power BI
Data
Sources
Apps
Sensors
and
devices
Data
IoT Hubs
Storage
SQL Database
DocumentDB
Analysis
Services
Big Data Tools
Big Data Tools
HDInsight
Java, Eclipse, Hive, etc.
Kontrolle über das Cluster
Azure Data Lake Analytics
C#, SQL & PowerShell
Schneller Skalierbar
“Job Service” Formfaktor
Azure Data Lake
YARN
U-SQL
Analytics Service HDInsight
(managed Hadoop Clusters)
Analytics
WebHDFS
Store
Azure Data Lake Storage
Von wenigen KBs
Zu mehreren PBs
Freie Wahl der
Analyse Tools
Verschlüsselung
und Zugriffsrechte
Azure Data Lake Storage
Datei im Azure Data Lake Store
…Block 1 Block 2 Block 2
Backend Storage
Data node Data node Data node Data node Data nodeData node
Block Block Block Block Block Block
Azure Data Lake Storage - High Availability
Data is never lost or unavailable
even under failures
Replica 1
Replica 2 Replica 3
Fault/upgrade
domains
Write Commit
Azure Data Lake Storage - Ingress
Server logs
Azure Event Hub
Apache
Flume
Azure Storage Blobs
Custom programs
.NET SDK
JavaScript CLI
Azure Portal
Azure PowerShell
Azure Data Factory
Apache Sqoop
Azure SQL DB
Azure SQL DW
Azure tables
Table Storage
On-premises databases
SQL
ADL
Store
Built-in
copy service
Azure Data Lake Storage - Egress
Azure SQL DB
SQL
Azure SQL DW
Azure
Tables
Table Storage
On-premises databases
Azure Data Factory
Apache Sqoop
Azure Storage Blobs
Custom programs
.NET SDK
JavaScript CLI
Azure Portal
Azure PowerShell
Built-in
copy service
ADL
Store
Woher kommt U-SQL?
Die Philosophie hinter U-SQL
REFERENCE MyDB.MyAssembly;
CREATE TABLE T( cid int, first_order DateTime
, last_order DateTime, order_count int
, order_amount float );
@o = EXTRACT oid int, cid int, odate DateTime, amount float
FROM "/input/orders.txt"
USING Extractors.Csv();
@c = EXTRACT cid int, name string, city string
FROM "/input/customers.txt"
USING Extractors.Csv();
@j = SELECT c.cid, MIN(o.odate) AS firstorder
, MAX(o.date) AS lastorder, COUNT(o.oid) AS ordercnt
, AGG<MyAgg.MySum>(c.amount) AS totalamount
FROM @c AS c LEFT OUTER JOIN @o AS o ON c.cid == o.cid
WHERE c.city.StartsWith("New")
&& MyNamespace.MyFunction(o.odate) > 10
GROUP BY c.cid;
OUTPUT @j TO "/output/result.txt"
USING new MyData.Write();
INSERT INTO T SELECT * FROM @j;
ADLA Compiler
U-SQL
C#
C++
Algebra
Many other files
managed dll
Unmanaged dll
Input
script
Compilation output
Compiler
Vereinfachter Ablauf eines Jobs
Job Front End
Job Scheduler Compiler Service
Job Queue
Job Manager
U-SQL Catalog
YARN
Job submission
Job execution
U-SQL Runtime Vertex execution
Job execution graph – node details
Hovering over
them, you can
get details about
the nodes.
Job execution “progress” playback (video)
For performance
tuning, identify
bottlenecks and
debugging, you can
playback the job
execution graph
“Data read” playback (video)
For performance
tuning, identify
bottlenecks and
debugging, you can
playback the job
execution graph
Job diagnostics
Diagnostics
information is shown
to help with
debugging and
performance issues
Query design
U-SQL Studio
lets you see the
logical query
design including:
Schema
Join conditions
Filter plan
Sort plan
Query design -RowSet
The query design can
also be visualized in
terms of the RowSets
and the transformation
applied to them.
Types of user-defined operators
User defined
operators
Outputters [~ Writer of non standard data]
Processors [~Transform / Derrive]
Appliers [~ Table valued function]
Reducers [~ Selfdefined Agg. on Rows]
Combiners [~ Selfdefined Join]
Extractors [~ Reader of non standard data]
Custom Extractor - Sample
Einen eigenen Extractor hinzufügen
Upload and Register Assembly
2
CREATE ASSEMBLY WebLogExtAsm
FROM @”/WebLogExtAsm.dll"
WITH PERMISSION_SET = RESTRICTED;
CREATE EXTRACTOR WebLogExtractor
EXTERNAL NAME
WebLogExtractor;
Implement IExtractor Interface
using Microsoft.SCOPE.Interfaces;
public WebLogExtractor:IExtractor
{
public override
IEnumerable<IRow> Extract(…)
{
…
}
…
}
1
REFERENCE ASSEMBLY WebLogExtAsm;
//now just use it like a built-in
extractor
SELECT * FROM
@“swebhdfs://Logs/WebRecords.txt”
USING WebLogExtractor();
Reference the Assembly and Use
3
Externe Datenquellen
U-SQL
Query
Result
Query
Azure
Storage Blobs
Azure SQL
in VMs
Azure
SQL DB
Azure Data
Lake Analytics
Externe Datenbankabfragen
CREATE CREDENTIAL sqldbc WITH USER_NAME ="John Brat", IDENTITY =
"AzureAdmin";
CREATE DATA SOURCE Purchase FROM SQLAZURE WITH
(PROVIDER_STRING="Server=tcp:wrt.database.windows.net,1435;Database=
TPC;Trusted_Connection=False;Encrypt=True",
CREDENTIAL=sqldbc,
REMOTABLE_TYPES =(bool, byte, int, uint, short, ushort, long,
decimal, float, sbyte, double));
@result = SELECT * FROM EXTERNAL Purchase
EXECUTE @"SELECT SUM(Amount) FROM
dbo.PurchaseOrders";
OUTPUT @result TO “swebhdfs://Logs/PurchaseAmountOut.Tsv” USING
Ouputters.Tsv();
Create CREDENTIAL object in metadata
Create external data source
Specify remotable types
Run pass-through T-SQL query
Pass-throughT-SQL query
Results
(C# types)
ADL
Analytics Service
Purchase orders
table
Azure SQL DB
T-SQL query
Results
(C# types)
Offizielle Produkseite
https://azure.microsoft.com/en-us/services/data-lake-analytics/
https://azure.microsoft.com/en-us/documentation/services/data-lake-analytics/
Sascha‘s Ressourcen
Blog http://www.sascha-dittmann.de/
YouTube http://bit.ly/ADLVideos
Weiterführende Informationen

C# + SQL = Big Data

  • 3.
    Intelligence Dashboards & Visualizations Information Management Data StoresMachine Learning and Analytics CortanaEvent Hubs HDInsight (Hadoop and Spark) Stream Analytics Data Intelligence Action People Automated Systems Apps Web Mobile Bots Bot Framework SQL Data WarehouseData Catalog Data Lake Analytics Data Factory Machine Learning Data Lake Store Cognitive Services Power BI Data Sources Apps Sensors and devices Data IoT Hubs Storage SQL Database DocumentDB Analysis Services
  • 4.
  • 5.
    Big Data Tools HDInsight Java,Eclipse, Hive, etc. Kontrolle über das Cluster Azure Data Lake Analytics C#, SQL & PowerShell Schneller Skalierbar “Job Service” Formfaktor
  • 6.
    Azure Data Lake YARN U-SQL AnalyticsService HDInsight (managed Hadoop Clusters) Analytics WebHDFS Store
  • 8.
    Azure Data LakeStorage Von wenigen KBs Zu mehreren PBs Freie Wahl der Analyse Tools Verschlüsselung und Zugriffsrechte
  • 9.
    Azure Data LakeStorage Datei im Azure Data Lake Store …Block 1 Block 2 Block 2 Backend Storage Data node Data node Data node Data node Data nodeData node Block Block Block Block Block Block
  • 10.
    Azure Data LakeStorage - High Availability Data is never lost or unavailable even under failures Replica 1 Replica 2 Replica 3 Fault/upgrade domains Write Commit
  • 11.
    Azure Data LakeStorage - Ingress Server logs Azure Event Hub Apache Flume Azure Storage Blobs Custom programs .NET SDK JavaScript CLI Azure Portal Azure PowerShell Azure Data Factory Apache Sqoop Azure SQL DB Azure SQL DW Azure tables Table Storage On-premises databases SQL ADL Store Built-in copy service
  • 12.
    Azure Data LakeStorage - Egress Azure SQL DB SQL Azure SQL DW Azure Tables Table Storage On-premises databases Azure Data Factory Apache Sqoop Azure Storage Blobs Custom programs .NET SDK JavaScript CLI Azure Portal Azure PowerShell Built-in copy service ADL Store
  • 15.
  • 16.
    Die Philosophie hinterU-SQL REFERENCE MyDB.MyAssembly; CREATE TABLE T( cid int, first_order DateTime , last_order DateTime, order_count int , order_amount float ); @o = EXTRACT oid int, cid int, odate DateTime, amount float FROM "/input/orders.txt" USING Extractors.Csv(); @c = EXTRACT cid int, name string, city string FROM "/input/customers.txt" USING Extractors.Csv(); @j = SELECT c.cid, MIN(o.odate) AS firstorder , MAX(o.date) AS lastorder, COUNT(o.oid) AS ordercnt , AGG<MyAgg.MySum>(c.amount) AS totalamount FROM @c AS c LEFT OUTER JOIN @o AS o ON c.cid == o.cid WHERE c.city.StartsWith("New") && MyNamespace.MyFunction(o.odate) > 10 GROUP BY c.cid; OUTPUT @j TO "/output/result.txt" USING new MyData.Write(); INSERT INTO T SELECT * FROM @j;
  • 17.
    ADLA Compiler U-SQL C# C++ Algebra Many otherfiles managed dll Unmanaged dll Input script Compilation output Compiler
  • 18.
    Vereinfachter Ablauf einesJobs Job Front End Job Scheduler Compiler Service Job Queue Job Manager U-SQL Catalog YARN Job submission Job execution U-SQL Runtime Vertex execution
  • 20.
    Job execution graph– node details Hovering over them, you can get details about the nodes.
  • 21.
    Job execution “progress”playback (video) For performance tuning, identify bottlenecks and debugging, you can playback the job execution graph
  • 22.
    “Data read” playback(video) For performance tuning, identify bottlenecks and debugging, you can playback the job execution graph
  • 23.
    Job diagnostics Diagnostics information isshown to help with debugging and performance issues
  • 24.
    Query design U-SQL Studio letsyou see the logical query design including: Schema Join conditions Filter plan Sort plan
  • 25.
    Query design -RowSet Thequery design can also be visualized in terms of the RowSets and the transformation applied to them.
  • 27.
    Types of user-definedoperators User defined operators Outputters [~ Writer of non standard data] Processors [~Transform / Derrive] Appliers [~ Table valued function] Reducers [~ Selfdefined Agg. on Rows] Combiners [~ Selfdefined Join] Extractors [~ Reader of non standard data]
  • 28.
  • 29.
    Einen eigenen Extractorhinzufügen Upload and Register Assembly 2 CREATE ASSEMBLY WebLogExtAsm FROM @”/WebLogExtAsm.dll" WITH PERMISSION_SET = RESTRICTED; CREATE EXTRACTOR WebLogExtractor EXTERNAL NAME WebLogExtractor; Implement IExtractor Interface using Microsoft.SCOPE.Interfaces; public WebLogExtractor:IExtractor { public override IEnumerable<IRow> Extract(…) { … } … } 1 REFERENCE ASSEMBLY WebLogExtAsm; //now just use it like a built-in extractor SELECT * FROM @“swebhdfs://Logs/WebRecords.txt” USING WebLogExtractor(); Reference the Assembly and Use 3
  • 31.
    Externe Datenquellen U-SQL Query Result Query Azure Storage Blobs AzureSQL in VMs Azure SQL DB Azure Data Lake Analytics
  • 32.
    Externe Datenbankabfragen CREATE CREDENTIALsqldbc WITH USER_NAME ="John Brat", IDENTITY = "AzureAdmin"; CREATE DATA SOURCE Purchase FROM SQLAZURE WITH (PROVIDER_STRING="Server=tcp:wrt.database.windows.net,1435;Database= TPC;Trusted_Connection=False;Encrypt=True", CREDENTIAL=sqldbc, REMOTABLE_TYPES =(bool, byte, int, uint, short, ushort, long, decimal, float, sbyte, double)); @result = SELECT * FROM EXTERNAL Purchase EXECUTE @"SELECT SUM(Amount) FROM dbo.PurchaseOrders"; OUTPUT @result TO “swebhdfs://Logs/PurchaseAmountOut.Tsv” USING Ouputters.Tsv(); Create CREDENTIAL object in metadata Create external data source Specify remotable types Run pass-through T-SQL query Pass-throughT-SQL query Results (C# types) ADL Analytics Service Purchase orders table Azure SQL DB T-SQL query Results (C# types)
  • 34.

Editor's Notes

  • #4 T: Cortana Intelligence provides everything you need to transform your organization’s data into intelligent action. Next, let’s take a look at another demo.
  • #5 3 Reasons for Spark: Simplicity: Spark's capabilities are accessible via a set of rich APIs, all designed specifically for interacting quickly and easily with data at scale. These APIs are well documented, and structured in a way that makes it straightforward for data scientists and application developers to quickly put Spark to work; Support: Spark supports a range of programming languages, including Java, Python, R, and Scala. Although often closely associated with Hadoop's underlying storage system, HDFS, Spark includes native support for tight integration with a number of leading storage solutions in the Hadoop ecosystem and beyond. Additionally, the Apache Spark community is large, active, and international. A growing set of commercial providers including Databricks, IBM, and all of the main Hadoop vendors deliver comprehensive support for Spark-based solutions.
  • #18 Declarative Query and Transformation Language: Uses SQL’s SELECT FROM WHERE with GROUP BY/Aggregation, Joins, SQL Analytics functions Optimizable, Scalable Expression-flow programming style: Easy to use functional lambda composition Composable, globally optimizable Operates on Unstructured & Structured Data Schema on read over files Relational metadata objects (e.g. database, table) Extensible from ground up: Type system is based on C# Expression language IS C# User-defined functions (U-SQL and C#) User-defined Aggregators (C#) User-defined Operators (UDO) (C#) U-SQL provides the Parallelization and Scale-out Framework for Usercode EXTRACTOR, OUTPUTTER, PROCESSOR, REDUCER, COMBINER, APPLIER Federated query across distributed data sources