C# + SQL = Big Data

Intelligence
Dashboards &
Visualizations
Information
Management
Data Stores Machine Learning
and Analytics
CortanaEvent Hubs
HDInsight
(Hadoop and
Spark)
Stream
Analytics
Data Intelligence Action
People
Automated
Systems
Apps
Web
Mobile
Bots
Bot
Framework
SQL Data
WarehouseData Catalog
Data Lake
Analytics
Data Factory
Machine
Learning
Data Lake Store
Cognitive
Services
Power BI
Data
Sources
Apps
Sensors
and
devices
Data
IoT Hubs
Storage
SQL Database
DocumentDB
Analysis
Services

Big Data Tools
HDInsight
Java, Eclipse, Hive, etc.
Kontrolle über das Cluster
Azure Data Lake Analytics
C#, SQL & PowerShell
Schneller Skalierbar
“Job Service” Formfaktor

Azure Data Lake
YARN
U-SQL
Analytics Service HDInsight
(managed Hadoop Clusters)
Analytics
WebHDFS
Store

Azure Data Lake Storage
Von wenigen KBs
Zu mehreren PBs
Freie Wahl der
Analyse Tools
Verschlüsselung
und Zugriffsrechte

Azure Data Lake Storage
Datei im Azure Data Lake Store
…Block 1 Block 2 Block 2
Backend Storage
Data node Data node Data node Data node Data nodeData node
Block Block Block Block Block Block

Azure Data Lake Storage - High Availability
Data is never lost or unavailable
even under failures
Replica 1
Replica 2 Replica 3
Fault/upgrade
domains
Write Commit

Azure Data Lake Storage - Ingress
Server logs
Azure Event Hub
Apache
Flume
Azure Storage Blobs
Custom programs
.NET SDK
JavaScript CLI
Azure Portal
Azure PowerShell
Azure Data Factory
Apache Sqoop
Azure SQL DB
Azure SQL DW
Azure tables
Table Storage
On-premises databases
SQL
ADL
Store
Built-in
copy service

Azure Data Lake Storage - Egress
Azure SQL DB
SQL
Azure SQL DW
Azure
Tables
Table Storage
On-premises databases
Azure Data Factory
Apache Sqoop
Azure Storage Blobs
Custom programs
.NET SDK
JavaScript CLI
Azure Portal
Azure PowerShell
Built-in
copy service
ADL
Store

Die Philosophie hinter U-SQL
REFERENCE MyDB.MyAssembly;
CREATE TABLE T( cid int, first_order DateTime
, last_order DateTime, order_count int
, order_amount float );
@o = EXTRACT oid int, cid int, odate DateTime, amount float
FROM "/input/orders.txt"
USING Extractors.Csv();
@c = EXTRACT cid int, name string, city string
FROM "/input/customers.txt"
USING Extractors.Csv();
@j = SELECT c.cid, MIN(o.odate) AS firstorder
, MAX(o.date) AS lastorder, COUNT(o.oid) AS ordercnt
, AGG<MyAgg.MySum>(c.amount) AS totalamount
FROM @c AS c LEFT OUTER JOIN @o AS o ON c.cid == o.cid
WHERE c.city.StartsWith("New")
&& MyNamespace.MyFunction(o.odate) > 10
GROUP BY c.cid;
OUTPUT @j TO "/output/result.txt"
USING new MyData.Write();
INSERT INTO T SELECT * FROM @j;

ADLA Compiler
U-SQL
C#
C++
Algebra
Many other files
managed dll
Unmanaged dll
Input
script
Compilation output
Compiler

Vereinfachter Ablauf eines Jobs
Job Front End
Job Scheduler Compiler Service
Job Queue
Job Manager
U-SQL Catalog
YARN
Job submission
Job execution
U-SQL Runtime Vertex execution

Job execution graph – node details
Hovering over
them, you can
get details about
the nodes.

Job execution “progress” playback (video)
For performance
tuning, identify
bottlenecks and
debugging, you can
playback the job
execution graph

“Data read” playback (video)
For performance
tuning, identify
bottlenecks and
debugging, you can
playback the job
execution graph

Job diagnostics
Diagnostics
information is shown
to help with
debugging and
performance issues

Query design
U-SQL Studio
lets you see the
logical query
design including:
Schema
Join conditions
Filter plan
Sort plan

Query design -RowSet
The query design can
also be visualized in
terms of the RowSets
and the transformation
applied to them.

Types of user-defined operators
User defined
operators
Outputters [~ Writer of non standard data]
Processors [~Transform / Derrive]
Appliers [~ Table valued function]
Reducers [~ Selfdefined Agg. on Rows]
Combiners [~ Selfdefined Join]
Extractors [~ Reader of non standard data]

Einen eigenen Extractor hinzufügen
Upload and Register Assembly
2
CREATE ASSEMBLY WebLogExtAsm
FROM @”/WebLogExtAsm.dll"
WITH PERMISSION_SET = RESTRICTED;
CREATE EXTRACTOR WebLogExtractor
EXTERNAL NAME
WebLogExtractor;
Implement IExtractor Interface
using Microsoft.SCOPE.Interfaces;
public WebLogExtractor:IExtractor
{
public override
IEnumerable<IRow> Extract(…)
{
…
}
…
}
1
REFERENCE ASSEMBLY WebLogExtAsm;
//now just use it like a built-in
extractor
SELECT * FROM
@“swebhdfs://Logs/WebRecords.txt”
USING WebLogExtractor();
Reference the Assembly and Use
3

Externe Datenquellen
U-SQL
Query
Result
Query
Azure
Storage Blobs
Azure SQL
in VMs
Azure
SQL DB
Azure Data
Lake Analytics

Externe Datenbankabfragen
CREATE CREDENTIAL sqldbc WITH USER_NAME ="John Brat", IDENTITY =
"AzureAdmin";
CREATE DATA SOURCE Purchase FROM SQLAZURE WITH
(PROVIDER_STRING="Server=tcp:wrt.database.windows.net,1435;Database=
TPC;Trusted_Connection=False;Encrypt=True",
CREDENTIAL=sqldbc,
REMOTABLE_TYPES =(bool, byte, int, uint, short, ushort, long,
decimal, float, sbyte, double));
@result = SELECT * FROM EXTERNAL Purchase
EXECUTE @"SELECT SUM(Amount) FROM
dbo.PurchaseOrders";
OUTPUT @result TO “swebhdfs://Logs/PurchaseAmountOut.Tsv” USING
Ouputters.Tsv();
Create CREDENTIAL object in metadata
Create external data source
Specify remotable types
Run pass-through T-SQL query
Pass-throughT-SQL query
Results
(C# types)
ADL
Analytics Service
Purchase orders
table
Azure SQL DB
T-SQL query
Results
(C# types)

Offizielle Produkseite
https://azure.microsoft.com/en-us/services/data-lake-analytics/
https://azure.microsoft.com/en-us/documentation/services/data-lake-analytics/
Sascha‘s Ressourcen
Blog http://www.sascha-dittmann.de/
YouTube http://bit.ly/ADLVideos
Weiterführende Informationen

C# + SQL = Big Data

More Related Content

What's hot

Similar to C# + SQL = Big Data

More from Sascha Dittmann

Recently uploaded

C# + SQL = Big Data

Editor's Notes