SlideShare a Scribd company logo
1 of 60
Modern Data Science Lifecycle with ADX & Azure
Azure Data Explorer
Riccardo Zamana, IIoT BU Manager, Beantech s.r.l.
Explore your
PASS
community
Free online
webinar events
Connect with the global
data community
Local user groups
around the world
Online special
interest user groups
Learning on-demand
and delivered to you
Get involved
Own your career with interactive learning built
by community and guided by data experts.
Get involved. Get ahead.
.org
Missed PASS Summit 2019?
Get the Recordings
Download all PASS Summit sessions on
Data Management, Analytics, or
Architecture for only $399 USD
More options available at
PASSstuff.com
We are covering all bases to ensure our community can continue reaching new and exciting heights. Plans
are underway for the in-person event you all know and love along with a new venture, a new opportunity:
a PASS Summit 2020 Virtual Event.
Find out more at PASS.org/summit
Thank you to
our Global
Sponsors and
Supporters
This event was sponsored by Microsoft
Learn more about SQL Server 2019 today:
-Get free training: aka.ms/sqlworkshops
-Download the SQL19 eBook: aka.ms/sql19_ebook
Questions
What about TIME SERIES DATABASE?
When Have I to use it?
Which are market possible choices?
OpenTSDB? Kairos over Scylla/Cassandra? Influx?
Why Have I to learn Another DB yet!?
Why not SQL? Why not COSMOS?
•seconds freshness, days retention
•in-mem aggregated data
•pre-defined standing queries
•split-seconds query performance
•data viewing
Hot
•minutes freshness, months retention
•raw data
•ad-hoc queries
•seconds-minutes query perf
•data exploration
Warm
•hours freshness, years retention
•raw data
•programmatic batch processing
•minutes-hours query perf
•data manipulation
Cold
• in-mem cube
• stream analytics
• …
• column store
• Indexing
• …
• distributed file
system
• map reduce
• …
Example Scenario: Multi-temperature data processing paths
Where is located the ADX Process?
What is Azure Data Explorer
Any append-
only stream
of records
Relational query
model:
Filter, aggregate, join,
calculated columns, …
Fully-
managed
Rapid iterations to
explore the data
High volume
High velocity
High variance
(structured, semi-
structured, free-text)
PaaS,
Vanilla,
Database
Purposely built
Fully managed big data analytics service
• Fully managed
for efficiency
Focus on insights, not the
infra-structure for fast time to
value
• No infrastructure to manage;
provision the service, choose
the SKU for your workload,
and create database.
• Optimized for
streaming data
Get near-instant insights
from fast-flowing data
• Scale linearly up to 200 MB per
second per node with highly
performant, low latency
ingestion.
• Designed for
data exploration
• Run ad-hoc queries using the
intuitive query language
• Returns results from 1 Billion
records < 1 second without
modifying the data or
metadata
Use cases & Best Scenario
When is it useful?
• Analyze Telemetry data
• Retrieve trends/Series from clustered data
• Make regression over Big Data
• Summarize and export ordered streams
Which kind of scenario is suitable for?
• IoT
• Troubleshooting and diagnostics
• Monitoring
• Security research
• Usage analytics
Azure Data Explorer Architecture
SPARK
ADF
Apps => API
Logstash plg
Kafka sync
IotHub
EventHub
EventGrid
Data
Management
Engine
SSD
Blob /
ADLS
STREAM
BATCH
Ingested
Data
ODBC
PowerBI
ADX UI
MS Flow
Logic Apps
Notebooks
Grafana
Spark
Azure Data Explorer SLA
SLA: at least 99.9%
availability (Last
updated: Feb 2019)
Maximum Available Minutes: is the total number of minutes for a given Cluster deployed by Customer in a
Microsoft Azure subscription during a billing month.
Downtime: is the total number of minutes within Maximum Available Minutes during which a Cluster is unavailable.
Monthly Uptime Percentage: for the Azure Data Explorer is calculated as Maximum Available Minutes less
Downtime divided by Maximum Available Minutes.
Monthly Uptime % = (Maximum Available Minutes-Downtime) / Maximum Available Minutes X 100
Daily: 01m 26.4s
Weekly: 10m 04.8s
Monthly: 43m 49.7s
How about the pricing Model?
1) Is a cluster (N Engine VMs + 1 Data management VM + Classic Iaas Network)
2) Billed per minutes (if you don’t use it, stop it!)
3) Honest price (if you stop VMs, ADX charges markup stops also)
4) Markup is proportionally related to Engine VMs number (only 1 DM VM “naked”)
Developer VM
(QPU)
Compute Optimized
(vCPU)
Storage Optimized
(vCPU)
Only for query test or
analytics automation
development
Production (Azure Data Explorer will also charge for Storage and Networking charges
incurred)
Workload that need high
rate of queries over a
smaller data size
workloads that need fewer
queries over a large volume
of data
Development
How about the Pricing «REAL» Model?
Pricing considerations:
• More RI savings on D
• More space on L
• DM Doubled price on L series (2x VM, with 2xresources)
https://dataexplorer.azure.com/AzureDataExplorerCostEstimator.html
Pricing
https://dataexplorer.azure.com/AzureDataExplorerCostEstimator.html
Think to ADX as an
ANALYSIS TOOL, in a
MultiTenant
Environment
if you want to
pay little
money
if you can
afford a space
shuttle
Think to ADX as an
INGESTION & RESILIENCY
TOOL in order to break
through your traditional
«Live DWH»
* Space shuttle definition: ADX Can scale up to 500 VMs => 8000 CORE x 64TB RAM….. and 1M
Your spending will
be a mix between
ADX and ADF /
PowerBI costs
Your spending will
be a mix between
ADX, Compute&
Event Services
Available SKU
Attribute D SKU L SKU
Small SKUs Minimal size is D11 with two cores Minimal size is L4 with four cores
Availability Available in all regions (the DS+PS
version has more limited availability)
Available in a few regions
Cost per GB cache per core High with the D SKU, low with the
DS+PS version
Lowest with the Pay-As-You-Go option
Reserved Instances (RI) pricing High discount (over 55 percent for a
three-year commitment)
Lower discount (20 percent for a three-
year commitment)
• D v2: The D SKU is compute-optimized (Optional Premium Storage disk)
• LS: The L SKU is storage-optimized (greater SSD size than D SKU equivalent)
First questions about ADX
It is normal to evaluate a new Azure service, in terms of Maturity, Applicability and
Affordability. So:
A) Are we sure that ADX is a mature Service?
B) Which are the correct use cases where it is really useful?
C) Which are the OSS Alternatives that we should compare with?
A) Are we sure that ADX is a mature Service?
Telemetry Analytics for internal Analytics Data Platform for products
AI OM
S
ASC Defender IOT
Interactive Analytics Big Data Platform
2015 - 2016
Starting with 1st party validation
Building modern analytics
Vision of analytics platform for MSFT
2019
Analytics engine for 3rd party offers
Unified platform across OMS/AI
Expanded scenarios for IOT timeseries
Bridged across client/server security
2017
GA - February 2019
B) Which are the correct use cases where it is really useful?
SCENARIO USE CASE USER STORY
Asset management
Troubleshooting Scenario
You need a Telemetry Analytics
Platform, in order to retrieve
aggregations or statistical calcultation
on historial series
(«As an IT Manager» I want a platform
to load logs from various file types, in
order to analyze them and focus
graphically the problem during time)
IoT Saas Solution
Logistics Saas Solution
You want to offer multi tenant SAAS
Solutions
(«As a Product Lead Engineer» I want to
manage the backend of my multitenant
SAAS solution using a unique, fat, huge
backend service)
IIoT Quality Management You need, within an Industrial IoT
solution deelopment, to have common
backend to handle with process
variables, to make a correation analysis
using continuous stream query
(«As a Quality manager» I need a
prebuilt backend solutions to
dynamically configure time based query
on data in order tofind out correlations
from process variables )
C) Which are the OSS Alternatives that we should compare with?
From db-engines.com
Azure Data Explorer
Fully managed big data
interactive analytics platform
Elastic Search
A distributed, RESTful modern
search and analytics engine
ADX can be a replacement for search and log analytics
engines such as Elasticsearch, Splunk, InfluxDB.
Splunk
real-time insights Engine to
boost productivity & security.
InfluxDB
DBMS for storing time series,
events and metrics
Vs
Comparison chart
Name Elasticsearch (ELASTIC) InfluxDB (InfluxData Inc.) Azure Data Explorer (Microsoft) Splunk (Splunk Inc.)
Description A distributed, RESTful modern search and
analytics engine based on Apache
Lucene
DBMS for storing time series, events and
metrics
Fully managed big data interactive
analytics platform
Analytics Platform for Big Data
Database models Search engine, Document store Time Series DBMS Time Series DBMS, Search engine,
Document store , Event Store,
Relational DBMS
Search engine
Initial release 2010 2013 2019 2003
License Open Source Open Source commercial commercial
Cloud-based only no no yes no
Implementation language Java Go
Server operating systems All OS with a Java VM Linux, OS X hosted Linux, OS X, Solaris, Windows
Data scheme schema-free schema-free Fixed schema with schema-less datatypes
(dynamic)
yes
Typing yes Numeric data and Strings yes yes
XML support no no yes yes
Secondary indexes yes no all fields are automatically indexed yes
SQL SQL-like query language SQL-like query language Kusto Query Language (KQL), SQL
subset
no
APIs and other access methods RESTful HTTP/JSON API HTTP API RESTful HTTP API HTTP REST
Java API JSON over UDP Microsoft SQL Server communication
protocol (MS-TDS)
Supported programming
languages
.Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python
Ruby, PHP, Perl, Groovy, Community
Contributed Clients
R,Ruby,PHP,Perl,Haskell,Clojure,Erlang,Go,
Lisp,Rust,Scala
R, PowerShell Ruby, PHP
Server-side scripts yes no Yes, possible languages: KQL, Python,
R
yes
Triggers yes no yes yes
Partitioning methods Sharding Sharding Sharding Sharding
Replication methods yes selectable replication factor yes Master-master replication
MapReduce ES-Hadoop Connector no no yes
Consistency concepts Eventual Consistency Eventual Consistency Eventual Consistency
Immediate Consistency
Foreign keys no no no no
Transaction concepts no no no no
Concurrency yes yes yes yes
Durability yes yes yes yes
In-memory capabilities Memcached and Redis integration yes no no
User concepts simple rights management via user
accounts
Azure Active Directory Authentication Access rights for users and roles
Why ADX is Unique
Simplified costs
• Vm costs
• ADX service add
on cost
Many Prebuilt Inputs
• ADF
• Spark
• Logstash
• Kafka
• Iothub
• EventHub
Many Prebuilt
Outputs
• TDS/SQL
• Power BI
• ODBC Connector
• Spark
• Jupyter
• Grafana
Azure services with ADX usage
Azure Monitor
• Log Analytics
• Application Insights
Security Products
• Windows Defender
• Azure Security Center
• Azure Sentinel
IoT
• Time Series Insights
• Azure IoT Central
ADX QUICK START
Then go to https://dataexplorer.azure.com/
Sign-in to Azure
Portal, search for
‘Azure Data Explorer
cluster’ and click on
Create.
Fill the name of the
database, retention
and cache period in
days and hit the
Create button.
click on Create data connection to
load/ingest data from Event Hub, Blob
Storage or IoT Hub, Kafka connector,
Azure Data Factory, Event Hub, into the
database you just created.
With The first command creates a table, the second command ingests data from the csv file to this table (example).
How to use it
How to set ADX «really»?
Create a database
Use Database to link Ingestion Sources
DataConnection
Installa Kusto Extension
az extension add -n kusto
Login
az login
Select Subscription
az account set --subscription MyAzureSub
Cluster creation
az kusto cluster create --name azureclitest --sku
D11_V2 –capacity 2 --resource-group testrg
Database Creation
az kusto database create --cluster-name azureclitest -
-name clidatabase --resource-group testrg --soft-
delete-period P365D --hot-cache-period P31D
HOT-CACHE-PERIOD: Amount of time that data should be kept in cache.
Duration in ISO8601 format (for example, 100 days would be P100D).
SOFT-DELETE-PERIOD: Amount of time that data should be kept so it is available
to query.
Duration in ISO8601 format (for example, 100 days would be P100D)
Like a Newbie
Like a Pro
Configuration
How to set and use ADX «really»?
Or like a BOSS
After retrieving Azure identity, create (with async method)
Check for a Succeded Provisioning state
Create the database
Then Do the fucking job and remove all traces [like a Killer]
FIRST
SECOND
THIRD
..LAST!
How to script with ADX «really»?
• Open VSC, create a file, save it and then edit
• Use Log Analytics or KUSTO/KQL extensions (
.csl | .kusto | .kql)
• Use Kuskus polugins (4) to highlight grammar
(Kuskus use Textmate grammar compliancy)
Like a Newbie Like a Pro
• Open https://dataexplorer.azure.com/
• Connect your cluster
• Use Online editor to test queries
• Save KQL Files using
How to script with ADX «really»?
Make ADX as part
of a bigger,
collaborative,
workflow.
Make your own web Application
Build a Analysis Workflow with
• Creation of Temporary Clusters
• Collaborative workspaces
• Track History of Analysis done
• Score People usage
Make your application secure
Make your result exportable
Natively to your Databases
Serve Power BI «for humans»
<iframe src="https://dataexplorer.azure.com/clusters/<cluster>?ibizaPortal=true" ></iframe>
https://microsoft.github.io/monaco-editor/index.html
Or like a BOSS Dev
How to ingest data in ADX «really» ?
Like a Newbie Like a Pro
• Open https://dataexplorer.azure.com/
• Right click on DB, Ingest new data (preview)
Use LIGHT INGEST
• command-line utility for ad-hoc data ingestion into
Kusto
• pull source data from a local folder or a Azure Blob
Storage container
[Ingest JSON data from blobs]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:db001
-table:LAB
-sourcePath: "Blob storage accunt with container SAS Token"
-prefix:MyDir1/MySubDir2
-format:json
-mappingRef:DefaultJsonMapping
-pattern:*.json
-limit:100
[Ingest CSV data with headers from local files]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:MyDb
-table:MyTable
-sourcePath:"D:MyFolderData"
-format:csv
-ignoreFirstRecord:true
-mappingPath:"D:MyFolderCsvMapping.txt"
-pattern:*.csv.gz
-limit:100
https://docs.microsoft.com/en-us/azure/kusto/tools/lightingest
How to ingest data in ADX «really» ?
Or like a a BOSS Dev
• Use a DataLake to store files
• Use EventGrid in order to have a Trigger Platform
• Make a Function … and That’s it!
How to visualize data in ADX «really»?
Like a Newbie Like a Pro
• Open https://dataexplorer.azure.com/ using «RENDER»
• Download Kusto Explorer and click the right Chart
• Use PowerBI Connector (Direct or Import Modes)
• Use Excel connector
How to visualize data in ADX «really»?
Use Python as your Swiss
knife for everything you
don’t wanna be «Properly
Developed»
Or like a a BOSS «Dev» Use VSC as a Jupyter Notebook IDE with «KQLMagic»
KQLMagic extends the capabilities of the Python kernel in Jupyter, so you
can run Kusto language queries natively, combining Python and Kusto
query language.
https://github.com/microsoft/jupyter-Kqlmagic
Advanced Hints
ADVANCED INGESTION CAPABILITIES
Connect Edge to Cloud through a Secure Chain
Path no.1: Local Log Streaming
KAFKA cluster as a Log Streamer, equipped with a KUSTO SINK,
that writes a direct stream
Path no.2: Remote Log Streaming
Connect KAFKA directly to Azure Event hub, and then ingest with
built in process (preview)
Path no.2: Local Pipeline
LogStash as a local dynamic data collection pipeline with an
extensible plugin ecosystem
Path no.3: Remote Pipeline
Use Integration runtime to link OnPremise Data using Azure
Datafactory Dataflow
Ingestion Tecniques
For high-volume, reliable, and
cheap data ingestion
Batch ingestion
(provided by SDK)
the client uploads the data to Azure Blob storage
(designated by the Azure Data Explorer data
management service) and posts a notification to an
Azure Queue.
Batch ingestion is the recommended technique.
Most appropriate for
exploration and prototyping
.Inline ingestion
(provided by query tools)
Inline ingestion: control command (.ingest inline)
containing in-band data is intended for ad hoc
testing purposes.
Ingest from query: control command (.set, .set-or-
append, .set-or-replace) that points to query
results is used for generating reports or small
temporary tables.
Ingest from storage: control command (.ingest
into) with data stored externally (for example,
Azure Blob Storage) allows efficient bulk ingestion
of data.
Supported data formats
The supported data formats are:
- CSV, TSV, TSVE, PSV, SCSV, SOH
- JSON (line-separated, multi-line), Avro
- ZIP and GZIP
• CSV Mapping (optional): works with all ordinal-based formats. It can be performed using the
ingest command parameter or pre-created on the table and referenced from the ingest
command parameter.
• JSON Mapping (mandatory) and Avro mapping (mandatory) can be performed using the
ingest command parameter. They can also be pre-created on the table and referenced from the
ingest command parameter.
For all ingestion methods other than ingest from query, YOU HAVE TO FORMAT THE DATA
so that Azure Data Explorer can parse it. Schema mapping helps bind source data fields to
destination table columns.
How about orchestration?
Three use cases in which FLOW + KUSTO are the solution
Push data to Power BI dataset
Periodically do queries, and
push to PowerBI dataset
Conditional queries
Make data checks, and send
notifications with no code
Email multiple ADX Flow charts
Send incredible emails with HTML5
Chart as query result
Use ADX with Application not natively supported by a connector
Use ODBC to connect to Azure Data Explorer from applications that don't have a dedicated connector. Azure Data
Explorer supports a subset of the SQL Server communication protocol.
Than you can use your preferred tool, compliant with ODBC Connection.
Use ODBC Driver 17 Connect to ADX
Cluster
Select Database Test connection
IMPORTANT NOTES:
• T-SQL SELECT statements are supported (sub-queries are not supported instead)
• Considering it a SQL Server with AD authentication, you can use LINQPad, Azure Data Studio, Power BI Desktop,
Tableau, Excel, DBeavera and SSMS(they're all MS-TDS clients).
• SQL server on-premises allows to attach linked server, and you can use "ADX as SQL"
From T-SQL to KQL and viceversa
Kusto supports translating T-SQL queries to Kusto query language.
Is translated into
Some example
NOTE: remember the dashes ( !#£$%!! )
Deep dive
ADX Functions
Functions are reusable queries or query parts. Kusto
supports several kinds of functions:
• Stored functions, which are user-defined functions that are
stored and managed a one kind of a database's schema entities.
See Stored functions.
• Query-defined functions, which are user-defined functions that
are defined and used within the scope of a single query. The
definition of such functions is done through a let statement. See
User-defined functions.
• Built-in functions, which are hard-coded (defined by Kusto and
cannot be modified by users).
Language examples
Alias
database["wiki"] = cluster("https://somecluster.kusto.windows.net:443").database("somedatabase");
database("wiki").PageViews | count
Let
start = ago(5h);
let period = 2h;
T | where Time > start and Time < start + period | ...
Bin:
T | summarize Hits=count() by bin(Duration, 1s)
Batch:
let m = materialize(StormEvents | summarize n=count() by State);
m | where n > 2000; m | where n < 10
Tabular expression:
Logs
| where Timestamp > ago(1d)
| join ( Events | where continent == 'Europe' ) on RequestId
Time Series Analysis – Bin Operator
T | summarize Hits=count() by bin(Duration, 1s)
bin(value,roundTo)
bin operator
Rounds values down to an integer multiple of a given bin size. If you have a scattered set of values, they will
be grouped into a smaller set of specific values.
[Rule]
[Example]
Time Series Analysis – Make Series Operator
T | make-series sum(amount) default=0, avg(price)
default=0 on timestamp from datetime(2016-01-01)
to datetime(2016-01-10) step 1d by supplier
T | make-series [MakeSeriesParamters] [Column =] Aggregation [default = DefaultValue] [, ...] on
AxisColumn from start to end step step [by [Column =] GroupExpression [, ...]]
make-series operator
[Rule]
[Example]
Time Series Analysis – Basket Operator
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State, EventType, Damage, DamageCrops
| evaluate basket(0.2)
basket operator
Basket finds all frequent patterns of discrete attributes (dimensions) in the data and will return all frequent
patterns that passed the frequency threshold in the original query.
[Rule]
[Example]
T | evaluate basket([Threshold, WeightColumn, MaxDimensions, CustomWildcard, CustomWildcard, ...])
Time Series Analysis – Autocluster Operator
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" ,
"NO")
| project State , EventType , Damage
| evaluate autocluster(0.6)
autocluster operator
AutoCluster finds common patterns of discrete attributes (dimensions) in the data and will reduce the results
of the original query (whether it's 100 or 100k rows) to a small number of patterns.
[Rule]
[Example]
T | evaluate autocluster([SizeWeight, WeightColumn, NumSeeds, CustomWildcard, CustomWildcard, ...])
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.2, '~', '~', '*')
Export
To Storage
.export async compressed to csv (
h@"https://storage1.blob.core.windows.net/containerName;secretKey",
h@"https://storage1.blob.core.windows.net/containerName2;secretKey" ) with (
sizeLimit=100000, namePrefix=export, includeHeaders=all, encoding =UTF8NoBOM )
<| myLogs | where id == "moshe" | limit 10000
To Sql
.export async to sql ['dbo.MySqlTable']
h@"Server=tcp:myserver.database.windows.net,1433;Database=MyDatabase;Authenti
cation=Active Directory Integrated;Connection Timeout=30;" with
(createifnotexists="true", primarykey="Id") <| print Message = "Hello World!",
Timestamp = now(), Id=12345678
1. DEFINE COMMAND
Define ADX command and try
your recurrent export strategy
2. TRY IN EDITOR
Use an Editor to try command,
verifying conection strings and
parametrizing them
3. BUILD A JOB
Build a Notebook or a C# JOB
using the command as a SQL
QUERY in your CODE
External tables & Continuous Export
It’s an external endpoint:
Azure Storage
Azure Datalake Store
SQL Server
You need to define:
Destination
Continuous-Export Strategy
EXT TABLE CREATION
.create external table ExternalAdlsGen2 (Timestamp:datetime, x:long,
s:string) kind=adl partition by bin(Timestamp, 1d) dataformat=csv (
h@'abfss://filesystem@storageaccount.dfs.core.windows.net/path;secre
tKey' ) with ( docstring = "Docs", folder = "ExternalTables",
namePrefix="Prefix" )
EXPORT to EXT TABLE
.create-or-alter continuous-export MyExport over (T) to table
ExternalAdlsGen2 with (intervalBetweenRuns=1h, forcedLatency=10m,
sizeLimit=104857600) <| T
FACTS:
A) Kusto stores its ingested data in reliable storage (most commonly Azure Blob Storage).
B) To speed-up queries on that data, Kusto caches this data (or parts of it) on its processing nodes,
Retention policy
The Kusto cache provides a granular cache policy that
customers can use to differentiate between two data
cache policies: hot data cache and cold data cache.
set query_datascope="hotcache";
T | union U | join (T datascope=all | where Timestamp < ago(365d) on X
YOU CAN SPECIFY WHICH LOCATION MUST BE USED
Cache policy
is independent
of retention policy !
Retention policy
• Soft Delete Period (number)
• Data is available for query
ts is the ADX IngestionDate
• Default is set to 100 YEARS
• Recoverability (enabled/disabled)
• Default is set to ENABLED
• Recoverable for 14 days after deletion
.alter database DatabaseName policy retention "{}"
.alter table TableName policy retention "{}«
EXAMPLE:
{ "SoftDeletePeriod": "36500.00:00:00",
"Recoverability":"Enabled" }
.delete database DatabaseName policy retention
.delete table TableName policy retention
.alter-merge table MyTable1 policy retention softdelete =
7d
2 Parameters, applicable to DB or Table Use KUSTO to set KUSTO
Data Purge
The purge process is final and
irreversible
PURGE PROCESS:
1. It requires database admin
permissions
2. Prior to Purging you have
to be ENABLED, opening a
SUPPORT TICKET.
3. Run purge QUERY, and
identify SIZE, EXEC.TIME
and give VerificationToken
4. Run REALLY purge QUERY
passing Verification Token
.purge table MyTable records in database MyDatabase <|
where CustomerId in ('X', 'Y')
NumRecordsToPurg
e
EstimatedPurg
eExecutionTim
e VerificationToken
1,596 00:00:02 e43c7184ed22f4f
23c7a9d7b124d1
96be2e57009698
7e5baadf65057fa
65736b
.purge table MyTable records in database MyDatabase with
(verificationtoken='e43c7184ed22f4f23c7a9d7b124d196be2e
570096987e5baadf65057fa65736b') <| where CustomerId in
('X', 'Y')
.purge table MyTable records
in database MyDatabase
with (noregrets='true')
2 STEP PROCESS 1 STEP PROCESS
With No Regrets !!!!
KUSTO: Do and Don’t
1. DO analytics over Big Data.
2. DO and support entities such as databases, tables, and columns
3. DO and support complex analytics query operators (calculated
columns, filtering, group by, joins).
BUT DO NOT perform in-place updates !
UC1: Event Correlation
Name City
Session
Id Timestamp
Start London 281733
0
2015-12-09T10:12:02.32
Game London 281733
0
2015-12-09T10:12:52.45
Start Manchester 426766
7
2015-12-09T10:14:02.23
Stop London 281733
0
2015-12-09T10:23:43.18
Cancel Manchester 426766
7
2015-12-09T10:27:26.29
Stop Manchester 426766
7
2015-12-09T10:28:31.72
City SessionId StartTime StopTime Duration
Londo
n
2817330 2015-12-
09T10:12:02.32
2015-12-
09T10:23:43.18
00:11:40.46
Manch
ester
4267667 2015-12-
09T10:14:02.23
2015-12-
09T10:28:31.72
00:14:29.49
Get sessions from start and stop events
Let's suppose we have a log of events, in which some events mark the start or end of an
extended activity or session. Every event has an SessionId, so the problem is to match up the
start and stop events with the same id.
Kusto
let Events = MyLogTable
| where ... ;
Events
| where Name == "Start"
| project Name, City, SessionId, StartTime=timestamp
| join (Events | where Name="Stop" | project StopTime=timestamp, SessionId)
on SessionId
| project City, SessionId, StartTime, StopTime, Duration = StopTime - StartTime
Use let to name a projection of the table that is pared down as far as possible
before going into the join.
Project is used to change the names of the timestamps so that both the start
and stop times can appear in the result. It also selects the other columns we
want to see in the result.
join matches up the start and stop entries for the same activity, creating a row
for each activity. Finally, project again adds a column to show the duration of
the activity.
UC2: In Place Enrichment
Creating and using query-time dimension tables
In many cases one wants to join the results of a query with some ad-hoc dimension table that is not stored in the
database. It is possible to define an expression whose result is a table scoped to a single query by doing something
like this:
Kusto
// Create a query-time dimension table using datatable
let DimTable = datatable(EventType:string, Code:string) [ "Heavy Rain", "HR", "Tornado", "T" ] ;
DimTable
| join StormEvents on EventType
| summarize count() by Code
Brief Summary: Azure Data Explorer
Easy to ingest the data and easy to query the data
Blob
&
Azure
Queue
Python
SDK
IoT Hub
.NET SDK
Azure Data
Explorer
REST API
Event Hub
.NET SDK
Python SDK
Web UI
Desktop App
Jupyter
Magic
APIs UX
Power BI
Direct Query
Microsoft
Flow
Azure App
Logic
Connectors
Grafana
ADF
MS-TDS
Java SDK
Java Script
Monaco IDE
Azure
Notebooks
Protocols
Streaming
Bulk
APIs
Blob
&
Event Grid
Queued
Ingestion Direct
Java SDK
Thank you to
our Local
Sponsors and
Supporters
Thank you
Ricordatevi di
compilate il feedback
form 
https://speakerscore.
com/xxx
#SqlSat921
Azure Data Explorer deep dive - review 04.2020

More Related Content

What's hot

Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMark Kromer
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Tal Bar-Zvi
 
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020Timothy McAliley
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2inovex GmbH
 
Data saturday Oslo Azure Purview Erwin de Kreuk
Data saturday Oslo Azure Purview Erwin de KreukData saturday Oslo Azure Purview Erwin de Kreuk
Data saturday Oslo Azure Purview Erwin de KreukErwin de Kreuk
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack PresentationAmr Alaa Yassen
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
Diving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction LogDiving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction LogDatabricks
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeDatabricks
 
Azure Storage
Azure StorageAzure Storage
Azure StorageMustafa
 

What's hot (20)

Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019 Kusto (Azure Data Explorer) Training for R&D - January 2019
Kusto (Azure Data Explorer) Training for R&D - January 2019
 
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 
Data saturday Oslo Azure Purview Erwin de Kreuk
Data saturday Oslo Azure Purview Erwin de KreukData saturday Oslo Azure Purview Erwin de Kreuk
Data saturday Oslo Azure Purview Erwin de Kreuk
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Diving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction LogDiving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction Log
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Azure Storage
Azure StorageAzure Storage
Azure Storage
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 

Similar to Azure Data Explorer deep dive - review 04.2020

Azure satpn19 time series analytics with azure adx
Azure satpn19   time series analytics with azure adxAzure satpn19   time series analytics with azure adx
Azure satpn19 time series analytics with azure adxRiccardo Zamana
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric IntroductionJames Serra
 
Building a Real-Time IoT monitoring application with Azure
Building a Real-Time IoT monitoring application with AzureBuilding a Real-Time IoT monitoring application with Azure
Building a Real-Time IoT monitoring application with AzureDavide Mauri
 
[「RDB技術者のためのNoSQLガイド」出版記念セミナー] Azure DocumentDB
[「RDB技術者のためのNoSQLガイド」出版記念セミナー] Azure DocumentDB[「RDB技術者のためのNoSQLガイド」出版記念セミナー] Azure DocumentDB
[「RDB技術者のためのNoSQLガイド」出版記念セミナー] Azure DocumentDBNaoki (Neo) SATO
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsClusterpoint
 
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Amazon Web Services
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...Amazon Web Services
 
Technology Overview
Technology OverviewTechnology Overview
Technology OverviewLiran Zelkha
 
Azure Overview Csco
Azure Overview CscoAzure Overview Csco
Azure Overview Cscorajramab
 
Microsoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science RecapMicrosoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science RecapMark Tabladillo
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationMatthew W. Bowers
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on AzureTrivadis
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopCCG
 
Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)Amazon Web Services Korea
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?James Serra
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorialrustd
 
1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for releaseJen Stirrup
 
Financial Services Analytics on AWS
Financial Services Analytics on AWSFinancial Services Analytics on AWS
Financial Services Analytics on AWSAmazon Web Services
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBconfluent
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptxFedoRam1
 

Similar to Azure Data Explorer deep dive - review 04.2020 (20)

Azure satpn19 time series analytics with azure adx
Azure satpn19   time series analytics with azure adxAzure satpn19   time series analytics with azure adx
Azure satpn19 time series analytics with azure adx
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
 
Building a Real-Time IoT monitoring application with Azure
Building a Real-Time IoT monitoring application with AzureBuilding a Real-Time IoT monitoring application with Azure
Building a Real-Time IoT monitoring application with Azure
 
[「RDB技術者のためのNoSQLガイド」出版記念セミナー] Azure DocumentDB
[「RDB技術者のためのNoSQLガイド」出版記念セミナー] Azure DocumentDB[「RDB技術者のためのNoSQLガイド」出版記念セミナー] Azure DocumentDB
[「RDB技術者のためのNoSQLガイド」出版記念セミナー] Azure DocumentDB
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
 
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
 
Technology Overview
Technology OverviewTechnology Overview
Technology Overview
 
Azure Overview Csco
Azure Overview CscoAzure Overview Csco
Azure Overview Csco
 
Microsoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science RecapMicrosoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science Recap
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual Workshop
 
Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)Big data on_aws in korea by abhishek sinha (lunch and learn)
Big data on_aws in korea by abhishek sinha (lunch and learn)
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
 
1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release
 
Financial Services Analytics on AWS
Financial Services Analytics on AWSFinancial Services Analytics on AWS
Financial Services Analytics on AWS
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 

More from Riccardo Zamana

Copilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdfCopilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdfRiccardo Zamana
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTORiccardo Zamana
 
Data saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewData saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewRiccardo Zamana
 
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Riccardo Zamana
 
Azure Industrial Iot Edge
Azure Industrial Iot EdgeAzure Industrial Iot Edge
Azure Industrial Iot EdgeRiccardo Zamana
 
Industrial iot: dalle parole ai fatti
Industrial iot: dalle parole ai fatti Industrial iot: dalle parole ai fatti
Industrial iot: dalle parole ai fatti Riccardo Zamana
 
Azure dayroma java, il lato oscuro del cloud
Azure dayroma   java, il lato oscuro del cloudAzure dayroma   java, il lato oscuro del cloud
Azure dayroma java, il lato oscuro del cloudRiccardo Zamana
 
Industrial Iot - IotSaturday
Industrial Iot - IotSaturday Industrial Iot - IotSaturday
Industrial Iot - IotSaturday Riccardo Zamana
 

More from Riccardo Zamana (11)

Copilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdfCopilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdf
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTO
 
Data saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewData saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overview
 
MCT Virtual Summit 2021
MCT Virtual Summit 2021MCT Virtual Summit 2021
MCT Virtual Summit 2021
 
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
 
Azure Industrial Iot Edge
Azure Industrial Iot EdgeAzure Industrial Iot Edge
Azure Industrial Iot Edge
 
Industrial iot: dalle parole ai fatti
Industrial iot: dalle parole ai fatti Industrial iot: dalle parole ai fatti
Industrial iot: dalle parole ai fatti
 
Azure dayroma java, il lato oscuro del cloud
Azure dayroma   java, il lato oscuro del cloudAzure dayroma   java, il lato oscuro del cloud
Azure dayroma java, il lato oscuro del cloud
 
Industrial Iot - IotSaturday
Industrial Iot - IotSaturday Industrial Iot - IotSaturday
Industrial Iot - IotSaturday
 
Azure reactive systems
Azure reactive systemsAzure reactive systems
Azure reactive systems
 
Industrial IoT on azure
Industrial IoT on azureIndustrial IoT on azure
Industrial IoT on azure
 

Recently uploaded

Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrsaastr
 
Motivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfMotivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfakankshagupta7348026
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...NETWAYS
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Pooja Nehwal
 
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptxBasil Achie
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )Pooja Nehwal
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@vikas rana
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptssuser319dad
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝soniya singh
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...NETWAYS
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 

Recently uploaded (20)

Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
 
Motivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfMotivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdf
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
 
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.ppt
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 

Azure Data Explorer deep dive - review 04.2020

  • 1. Modern Data Science Lifecycle with ADX & Azure Azure Data Explorer Riccardo Zamana, IIoT BU Manager, Beantech s.r.l.
  • 2. Explore your PASS community Free online webinar events Connect with the global data community Local user groups around the world Online special interest user groups Learning on-demand and delivered to you Get involved Own your career with interactive learning built by community and guided by data experts. Get involved. Get ahead. .org
  • 3. Missed PASS Summit 2019? Get the Recordings Download all PASS Summit sessions on Data Management, Analytics, or Architecture for only $399 USD More options available at PASSstuff.com
  • 4. We are covering all bases to ensure our community can continue reaching new and exciting heights. Plans are underway for the in-person event you all know and love along with a new venture, a new opportunity: a PASS Summit 2020 Virtual Event. Find out more at PASS.org/summit
  • 5. Thank you to our Global Sponsors and Supporters
  • 6. This event was sponsored by Microsoft Learn more about SQL Server 2019 today: -Get free training: aka.ms/sqlworkshops -Download the SQL19 eBook: aka.ms/sql19_ebook
  • 7. Questions What about TIME SERIES DATABASE? When Have I to use it? Which are market possible choices? OpenTSDB? Kairos over Scylla/Cassandra? Influx? Why Have I to learn Another DB yet!? Why not SQL? Why not COSMOS?
  • 8. •seconds freshness, days retention •in-mem aggregated data •pre-defined standing queries •split-seconds query performance •data viewing Hot •minutes freshness, months retention •raw data •ad-hoc queries •seconds-minutes query perf •data exploration Warm •hours freshness, years retention •raw data •programmatic batch processing •minutes-hours query perf •data manipulation Cold • in-mem cube • stream analytics • … • column store • Indexing • … • distributed file system • map reduce • … Example Scenario: Multi-temperature data processing paths Where is located the ADX Process?
  • 9. What is Azure Data Explorer Any append- only stream of records Relational query model: Filter, aggregate, join, calculated columns, … Fully- managed Rapid iterations to explore the data High volume High velocity High variance (structured, semi- structured, free-text) PaaS, Vanilla, Database Purposely built
  • 10. Fully managed big data analytics service • Fully managed for efficiency Focus on insights, not the infra-structure for fast time to value • No infrastructure to manage; provision the service, choose the SKU for your workload, and create database. • Optimized for streaming data Get near-instant insights from fast-flowing data • Scale linearly up to 200 MB per second per node with highly performant, low latency ingestion. • Designed for data exploration • Run ad-hoc queries using the intuitive query language • Returns results from 1 Billion records < 1 second without modifying the data or metadata
  • 11. Use cases & Best Scenario When is it useful? • Analyze Telemetry data • Retrieve trends/Series from clustered data • Make regression over Big Data • Summarize and export ordered streams Which kind of scenario is suitable for? • IoT • Troubleshooting and diagnostics • Monitoring • Security research • Usage analytics
  • 12. Azure Data Explorer Architecture SPARK ADF Apps => API Logstash plg Kafka sync IotHub EventHub EventGrid Data Management Engine SSD Blob / ADLS STREAM BATCH Ingested Data ODBC PowerBI ADX UI MS Flow Logic Apps Notebooks Grafana Spark
  • 13. Azure Data Explorer SLA SLA: at least 99.9% availability (Last updated: Feb 2019) Maximum Available Minutes: is the total number of minutes for a given Cluster deployed by Customer in a Microsoft Azure subscription during a billing month. Downtime: is the total number of minutes within Maximum Available Minutes during which a Cluster is unavailable. Monthly Uptime Percentage: for the Azure Data Explorer is calculated as Maximum Available Minutes less Downtime divided by Maximum Available Minutes. Monthly Uptime % = (Maximum Available Minutes-Downtime) / Maximum Available Minutes X 100 Daily: 01m 26.4s Weekly: 10m 04.8s Monthly: 43m 49.7s
  • 14. How about the pricing Model? 1) Is a cluster (N Engine VMs + 1 Data management VM + Classic Iaas Network) 2) Billed per minutes (if you don’t use it, stop it!) 3) Honest price (if you stop VMs, ADX charges markup stops also) 4) Markup is proportionally related to Engine VMs number (only 1 DM VM “naked”) Developer VM (QPU) Compute Optimized (vCPU) Storage Optimized (vCPU) Only for query test or analytics automation development Production (Azure Data Explorer will also charge for Storage and Networking charges incurred) Workload that need high rate of queries over a smaller data size workloads that need fewer queries over a large volume of data Development
  • 15. How about the Pricing «REAL» Model? Pricing considerations: • More RI savings on D • More space on L • DM Doubled price on L series (2x VM, with 2xresources) https://dataexplorer.azure.com/AzureDataExplorerCostEstimator.html
  • 16. Pricing https://dataexplorer.azure.com/AzureDataExplorerCostEstimator.html Think to ADX as an ANALYSIS TOOL, in a MultiTenant Environment if you want to pay little money if you can afford a space shuttle Think to ADX as an INGESTION & RESILIENCY TOOL in order to break through your traditional «Live DWH» * Space shuttle definition: ADX Can scale up to 500 VMs => 8000 CORE x 64TB RAM….. and 1M Your spending will be a mix between ADX and ADF / PowerBI costs Your spending will be a mix between ADX, Compute& Event Services
  • 17. Available SKU Attribute D SKU L SKU Small SKUs Minimal size is D11 with two cores Minimal size is L4 with four cores Availability Available in all regions (the DS+PS version has more limited availability) Available in a few regions Cost per GB cache per core High with the D SKU, low with the DS+PS version Lowest with the Pay-As-You-Go option Reserved Instances (RI) pricing High discount (over 55 percent for a three-year commitment) Lower discount (20 percent for a three- year commitment) • D v2: The D SKU is compute-optimized (Optional Premium Storage disk) • LS: The L SKU is storage-optimized (greater SSD size than D SKU equivalent)
  • 18. First questions about ADX It is normal to evaluate a new Azure service, in terms of Maturity, Applicability and Affordability. So: A) Are we sure that ADX is a mature Service? B) Which are the correct use cases where it is really useful? C) Which are the OSS Alternatives that we should compare with?
  • 19. A) Are we sure that ADX is a mature Service? Telemetry Analytics for internal Analytics Data Platform for products AI OM S ASC Defender IOT Interactive Analytics Big Data Platform 2015 - 2016 Starting with 1st party validation Building modern analytics Vision of analytics platform for MSFT 2019 Analytics engine for 3rd party offers Unified platform across OMS/AI Expanded scenarios for IOT timeseries Bridged across client/server security 2017 GA - February 2019
  • 20. B) Which are the correct use cases where it is really useful? SCENARIO USE CASE USER STORY Asset management Troubleshooting Scenario You need a Telemetry Analytics Platform, in order to retrieve aggregations or statistical calcultation on historial series («As an IT Manager» I want a platform to load logs from various file types, in order to analyze them and focus graphically the problem during time) IoT Saas Solution Logistics Saas Solution You want to offer multi tenant SAAS Solutions («As a Product Lead Engineer» I want to manage the backend of my multitenant SAAS solution using a unique, fat, huge backend service) IIoT Quality Management You need, within an Industrial IoT solution deelopment, to have common backend to handle with process variables, to make a correation analysis using continuous stream query («As a Quality manager» I need a prebuilt backend solutions to dynamically configure time based query on data in order tofind out correlations from process variables )
  • 21. C) Which are the OSS Alternatives that we should compare with? From db-engines.com Azure Data Explorer Fully managed big data interactive analytics platform Elastic Search A distributed, RESTful modern search and analytics engine ADX can be a replacement for search and log analytics engines such as Elasticsearch, Splunk, InfluxDB. Splunk real-time insights Engine to boost productivity & security. InfluxDB DBMS for storing time series, events and metrics Vs
  • 22. Comparison chart Name Elasticsearch (ELASTIC) InfluxDB (InfluxData Inc.) Azure Data Explorer (Microsoft) Splunk (Splunk Inc.) Description A distributed, RESTful modern search and analytics engine based on Apache Lucene DBMS for storing time series, events and metrics Fully managed big data interactive analytics platform Analytics Platform for Big Data Database models Search engine, Document store Time Series DBMS Time Series DBMS, Search engine, Document store , Event Store, Relational DBMS Search engine Initial release 2010 2013 2019 2003 License Open Source Open Source commercial commercial Cloud-based only no no yes no Implementation language Java Go Server operating systems All OS with a Java VM Linux, OS X hosted Linux, OS X, Solaris, Windows Data scheme schema-free schema-free Fixed schema with schema-less datatypes (dynamic) yes Typing yes Numeric data and Strings yes yes XML support no no yes yes Secondary indexes yes no all fields are automatically indexed yes SQL SQL-like query language SQL-like query language Kusto Query Language (KQL), SQL subset no APIs and other access methods RESTful HTTP/JSON API HTTP API RESTful HTTP API HTTP REST Java API JSON over UDP Microsoft SQL Server communication protocol (MS-TDS) Supported programming languages .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python Ruby, PHP, Perl, Groovy, Community Contributed Clients R,Ruby,PHP,Perl,Haskell,Clojure,Erlang,Go, Lisp,Rust,Scala R, PowerShell Ruby, PHP Server-side scripts yes no Yes, possible languages: KQL, Python, R yes Triggers yes no yes yes Partitioning methods Sharding Sharding Sharding Sharding Replication methods yes selectable replication factor yes Master-master replication MapReduce ES-Hadoop Connector no no yes Consistency concepts Eventual Consistency Eventual Consistency Eventual Consistency Immediate Consistency Foreign keys no no no no Transaction concepts no no no no Concurrency yes yes yes yes Durability yes yes yes yes In-memory capabilities Memcached and Redis integration yes no no User concepts simple rights management via user accounts Azure Active Directory Authentication Access rights for users and roles
  • 23. Why ADX is Unique Simplified costs • Vm costs • ADX service add on cost Many Prebuilt Inputs • ADF • Spark • Logstash • Kafka • Iothub • EventHub Many Prebuilt Outputs • TDS/SQL • Power BI • ODBC Connector • Spark • Jupyter • Grafana
  • 24. Azure services with ADX usage Azure Monitor • Log Analytics • Application Insights Security Products • Windows Defender • Azure Security Center • Azure Sentinel IoT • Time Series Insights • Azure IoT Central
  • 25. ADX QUICK START Then go to https://dataexplorer.azure.com/ Sign-in to Azure Portal, search for ‘Azure Data Explorer cluster’ and click on Create. Fill the name of the database, retention and cache period in days and hit the Create button. click on Create data connection to load/ingest data from Event Hub, Blob Storage or IoT Hub, Kafka connector, Azure Data Factory, Event Hub, into the database you just created. With The first command creates a table, the second command ingests data from the csv file to this table (example).
  • 27. How to set ADX «really»? Create a database Use Database to link Ingestion Sources DataConnection Installa Kusto Extension az extension add -n kusto Login az login Select Subscription az account set --subscription MyAzureSub Cluster creation az kusto cluster create --name azureclitest --sku D11_V2 –capacity 2 --resource-group testrg Database Creation az kusto database create --cluster-name azureclitest - -name clidatabase --resource-group testrg --soft- delete-period P365D --hot-cache-period P31D HOT-CACHE-PERIOD: Amount of time that data should be kept in cache. Duration in ISO8601 format (for example, 100 days would be P100D). SOFT-DELETE-PERIOD: Amount of time that data should be kept so it is available to query. Duration in ISO8601 format (for example, 100 days would be P100D) Like a Newbie Like a Pro Configuration
  • 28. How to set and use ADX «really»? Or like a BOSS After retrieving Azure identity, create (with async method) Check for a Succeded Provisioning state Create the database Then Do the fucking job and remove all traces [like a Killer] FIRST SECOND THIRD ..LAST!
  • 29. How to script with ADX «really»? • Open VSC, create a file, save it and then edit • Use Log Analytics or KUSTO/KQL extensions ( .csl | .kusto | .kql) • Use Kuskus polugins (4) to highlight grammar (Kuskus use Textmate grammar compliancy) Like a Newbie Like a Pro • Open https://dataexplorer.azure.com/ • Connect your cluster • Use Online editor to test queries • Save KQL Files using
  • 30. How to script with ADX «really»? Make ADX as part of a bigger, collaborative, workflow. Make your own web Application Build a Analysis Workflow with • Creation of Temporary Clusters • Collaborative workspaces • Track History of Analysis done • Score People usage Make your application secure Make your result exportable Natively to your Databases Serve Power BI «for humans» <iframe src="https://dataexplorer.azure.com/clusters/<cluster>?ibizaPortal=true" ></iframe> https://microsoft.github.io/monaco-editor/index.html Or like a BOSS Dev
  • 31. How to ingest data in ADX «really» ? Like a Newbie Like a Pro • Open https://dataexplorer.azure.com/ • Right click on DB, Ingest new data (preview) Use LIGHT INGEST • command-line utility for ad-hoc data ingestion into Kusto • pull source data from a local folder or a Azure Blob Storage container [Ingest JSON data from blobs] LightIngest "https://adxclu001.kusto.windows.net;Federated=true" -database:db001 -table:LAB -sourcePath: "Blob storage accunt with container SAS Token" -prefix:MyDir1/MySubDir2 -format:json -mappingRef:DefaultJsonMapping -pattern:*.json -limit:100 [Ingest CSV data with headers from local files] LightIngest "https://adxclu001.kusto.windows.net;Federated=true" -database:MyDb -table:MyTable -sourcePath:"D:MyFolderData" -format:csv -ignoreFirstRecord:true -mappingPath:"D:MyFolderCsvMapping.txt" -pattern:*.csv.gz -limit:100 https://docs.microsoft.com/en-us/azure/kusto/tools/lightingest
  • 32. How to ingest data in ADX «really» ? Or like a a BOSS Dev • Use a DataLake to store files • Use EventGrid in order to have a Trigger Platform • Make a Function … and That’s it!
  • 33. How to visualize data in ADX «really»? Like a Newbie Like a Pro • Open https://dataexplorer.azure.com/ using «RENDER» • Download Kusto Explorer and click the right Chart • Use PowerBI Connector (Direct or Import Modes) • Use Excel connector
  • 34. How to visualize data in ADX «really»? Use Python as your Swiss knife for everything you don’t wanna be «Properly Developed» Or like a a BOSS «Dev» Use VSC as a Jupyter Notebook IDE with «KQLMagic» KQLMagic extends the capabilities of the Python kernel in Jupyter, so you can run Kusto language queries natively, combining Python and Kusto query language. https://github.com/microsoft/jupyter-Kqlmagic
  • 36. ADVANCED INGESTION CAPABILITIES Connect Edge to Cloud through a Secure Chain Path no.1: Local Log Streaming KAFKA cluster as a Log Streamer, equipped with a KUSTO SINK, that writes a direct stream Path no.2: Remote Log Streaming Connect KAFKA directly to Azure Event hub, and then ingest with built in process (preview) Path no.2: Local Pipeline LogStash as a local dynamic data collection pipeline with an extensible plugin ecosystem Path no.3: Remote Pipeline Use Integration runtime to link OnPremise Data using Azure Datafactory Dataflow
  • 37. Ingestion Tecniques For high-volume, reliable, and cheap data ingestion Batch ingestion (provided by SDK) the client uploads the data to Azure Blob storage (designated by the Azure Data Explorer data management service) and posts a notification to an Azure Queue. Batch ingestion is the recommended technique. Most appropriate for exploration and prototyping .Inline ingestion (provided by query tools) Inline ingestion: control command (.ingest inline) containing in-band data is intended for ad hoc testing purposes. Ingest from query: control command (.set, .set-or- append, .set-or-replace) that points to query results is used for generating reports or small temporary tables. Ingest from storage: control command (.ingest into) with data stored externally (for example, Azure Blob Storage) allows efficient bulk ingestion of data.
  • 38. Supported data formats The supported data formats are: - CSV, TSV, TSVE, PSV, SCSV, SOH - JSON (line-separated, multi-line), Avro - ZIP and GZIP • CSV Mapping (optional): works with all ordinal-based formats. It can be performed using the ingest command parameter or pre-created on the table and referenced from the ingest command parameter. • JSON Mapping (mandatory) and Avro mapping (mandatory) can be performed using the ingest command parameter. They can also be pre-created on the table and referenced from the ingest command parameter. For all ingestion methods other than ingest from query, YOU HAVE TO FORMAT THE DATA so that Azure Data Explorer can parse it. Schema mapping helps bind source data fields to destination table columns.
  • 39. How about orchestration? Three use cases in which FLOW + KUSTO are the solution Push data to Power BI dataset Periodically do queries, and push to PowerBI dataset Conditional queries Make data checks, and send notifications with no code Email multiple ADX Flow charts Send incredible emails with HTML5 Chart as query result
  • 40. Use ADX with Application not natively supported by a connector Use ODBC to connect to Azure Data Explorer from applications that don't have a dedicated connector. Azure Data Explorer supports a subset of the SQL Server communication protocol. Than you can use your preferred tool, compliant with ODBC Connection. Use ODBC Driver 17 Connect to ADX Cluster Select Database Test connection IMPORTANT NOTES: • T-SQL SELECT statements are supported (sub-queries are not supported instead) • Considering it a SQL Server with AD authentication, you can use LINQPad, Azure Data Studio, Power BI Desktop, Tableau, Excel, DBeavera and SSMS(they're all MS-TDS clients). • SQL server on-premises allows to attach linked server, and you can use "ADX as SQL"
  • 41. From T-SQL to KQL and viceversa Kusto supports translating T-SQL queries to Kusto query language. Is translated into Some example NOTE: remember the dashes ( !#£$%!! )
  • 43. ADX Functions Functions are reusable queries or query parts. Kusto supports several kinds of functions: • Stored functions, which are user-defined functions that are stored and managed a one kind of a database's schema entities. See Stored functions. • Query-defined functions, which are user-defined functions that are defined and used within the scope of a single query. The definition of such functions is done through a let statement. See User-defined functions. • Built-in functions, which are hard-coded (defined by Kusto and cannot be modified by users).
  • 44. Language examples Alias database["wiki"] = cluster("https://somecluster.kusto.windows.net:443").database("somedatabase"); database("wiki").PageViews | count Let start = ago(5h); let period = 2h; T | where Time > start and Time < start + period | ... Bin: T | summarize Hits=count() by bin(Duration, 1s) Batch: let m = materialize(StormEvents | summarize n=count() by State); m | where n > 2000; m | where n < 10 Tabular expression: Logs | where Timestamp > ago(1d) | join ( Events | where continent == 'Europe' ) on RequestId
  • 45. Time Series Analysis – Bin Operator T | summarize Hits=count() by bin(Duration, 1s) bin(value,roundTo) bin operator Rounds values down to an integer multiple of a given bin size. If you have a scattered set of values, they will be grouped into a smaller set of specific values. [Rule] [Example]
  • 46. Time Series Analysis – Make Series Operator T | make-series sum(amount) default=0, avg(price) default=0 on timestamp from datetime(2016-01-01) to datetime(2016-01-10) step 1d by supplier T | make-series [MakeSeriesParamters] [Column =] Aggregation [default = DefaultValue] [, ...] on AxisColumn from start to end step step [by [Column =] GroupExpression [, ...]] make-series operator [Rule] [Example]
  • 47. Time Series Analysis – Basket Operator StormEvents | where monthofyear(StartTime) == 5 | extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO") | project State, EventType, Damage, DamageCrops | evaluate basket(0.2) basket operator Basket finds all frequent patterns of discrete attributes (dimensions) in the data and will return all frequent patterns that passed the frequency threshold in the original query. [Rule] [Example] T | evaluate basket([Threshold, WeightColumn, MaxDimensions, CustomWildcard, CustomWildcard, ...])
  • 48. Time Series Analysis – Autocluster Operator StormEvents | where monthofyear(StartTime) == 5 | extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO") | project State , EventType , Damage | evaluate autocluster(0.6) autocluster operator AutoCluster finds common patterns of discrete attributes (dimensions) in the data and will reduce the results of the original query (whether it's 100 or 100k rows) to a small number of patterns. [Rule] [Example] T | evaluate autocluster([SizeWeight, WeightColumn, NumSeeds, CustomWildcard, CustomWildcard, ...]) StormEvents | where monthofyear(StartTime) == 5 | extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO") | project State , EventType , Damage | evaluate autocluster(0.2, '~', '~', '*')
  • 49. Export To Storage .export async compressed to csv ( h@"https://storage1.blob.core.windows.net/containerName;secretKey", h@"https://storage1.blob.core.windows.net/containerName2;secretKey" ) with ( sizeLimit=100000, namePrefix=export, includeHeaders=all, encoding =UTF8NoBOM ) <| myLogs | where id == "moshe" | limit 10000 To Sql .export async to sql ['dbo.MySqlTable'] h@"Server=tcp:myserver.database.windows.net,1433;Database=MyDatabase;Authenti cation=Active Directory Integrated;Connection Timeout=30;" with (createifnotexists="true", primarykey="Id") <| print Message = "Hello World!", Timestamp = now(), Id=12345678 1. DEFINE COMMAND Define ADX command and try your recurrent export strategy 2. TRY IN EDITOR Use an Editor to try command, verifying conection strings and parametrizing them 3. BUILD A JOB Build a Notebook or a C# JOB using the command as a SQL QUERY in your CODE
  • 50. External tables & Continuous Export It’s an external endpoint: Azure Storage Azure Datalake Store SQL Server You need to define: Destination Continuous-Export Strategy EXT TABLE CREATION .create external table ExternalAdlsGen2 (Timestamp:datetime, x:long, s:string) kind=adl partition by bin(Timestamp, 1d) dataformat=csv ( h@'abfss://filesystem@storageaccount.dfs.core.windows.net/path;secre tKey' ) with ( docstring = "Docs", folder = "ExternalTables", namePrefix="Prefix" ) EXPORT to EXT TABLE .create-or-alter continuous-export MyExport over (T) to table ExternalAdlsGen2 with (intervalBetweenRuns=1h, forcedLatency=10m, sizeLimit=104857600) <| T
  • 51. FACTS: A) Kusto stores its ingested data in reliable storage (most commonly Azure Blob Storage). B) To speed-up queries on that data, Kusto caches this data (or parts of it) on its processing nodes, Retention policy The Kusto cache provides a granular cache policy that customers can use to differentiate between two data cache policies: hot data cache and cold data cache. set query_datascope="hotcache"; T | union U | join (T datascope=all | where Timestamp < ago(365d) on X YOU CAN SPECIFY WHICH LOCATION MUST BE USED Cache policy is independent of retention policy !
  • 52. Retention policy • Soft Delete Period (number) • Data is available for query ts is the ADX IngestionDate • Default is set to 100 YEARS • Recoverability (enabled/disabled) • Default is set to ENABLED • Recoverable for 14 days after deletion .alter database DatabaseName policy retention "{}" .alter table TableName policy retention "{}« EXAMPLE: { "SoftDeletePeriod": "36500.00:00:00", "Recoverability":"Enabled" } .delete database DatabaseName policy retention .delete table TableName policy retention .alter-merge table MyTable1 policy retention softdelete = 7d 2 Parameters, applicable to DB or Table Use KUSTO to set KUSTO
  • 53. Data Purge The purge process is final and irreversible PURGE PROCESS: 1. It requires database admin permissions 2. Prior to Purging you have to be ENABLED, opening a SUPPORT TICKET. 3. Run purge QUERY, and identify SIZE, EXEC.TIME and give VerificationToken 4. Run REALLY purge QUERY passing Verification Token .purge table MyTable records in database MyDatabase <| where CustomerId in ('X', 'Y') NumRecordsToPurg e EstimatedPurg eExecutionTim e VerificationToken 1,596 00:00:02 e43c7184ed22f4f 23c7a9d7b124d1 96be2e57009698 7e5baadf65057fa 65736b .purge table MyTable records in database MyDatabase with (verificationtoken='e43c7184ed22f4f23c7a9d7b124d196be2e 570096987e5baadf65057fa65736b') <| where CustomerId in ('X', 'Y') .purge table MyTable records in database MyDatabase with (noregrets='true') 2 STEP PROCESS 1 STEP PROCESS With No Regrets !!!!
  • 54. KUSTO: Do and Don’t 1. DO analytics over Big Data. 2. DO and support entities such as databases, tables, and columns 3. DO and support complex analytics query operators (calculated columns, filtering, group by, joins). BUT DO NOT perform in-place updates !
  • 55. UC1: Event Correlation Name City Session Id Timestamp Start London 281733 0 2015-12-09T10:12:02.32 Game London 281733 0 2015-12-09T10:12:52.45 Start Manchester 426766 7 2015-12-09T10:14:02.23 Stop London 281733 0 2015-12-09T10:23:43.18 Cancel Manchester 426766 7 2015-12-09T10:27:26.29 Stop Manchester 426766 7 2015-12-09T10:28:31.72 City SessionId StartTime StopTime Duration Londo n 2817330 2015-12- 09T10:12:02.32 2015-12- 09T10:23:43.18 00:11:40.46 Manch ester 4267667 2015-12- 09T10:14:02.23 2015-12- 09T10:28:31.72 00:14:29.49 Get sessions from start and stop events Let's suppose we have a log of events, in which some events mark the start or end of an extended activity or session. Every event has an SessionId, so the problem is to match up the start and stop events with the same id. Kusto let Events = MyLogTable | where ... ; Events | where Name == "Start" | project Name, City, SessionId, StartTime=timestamp | join (Events | where Name="Stop" | project StopTime=timestamp, SessionId) on SessionId | project City, SessionId, StartTime, StopTime, Duration = StopTime - StartTime Use let to name a projection of the table that is pared down as far as possible before going into the join. Project is used to change the names of the timestamps so that both the start and stop times can appear in the result. It also selects the other columns we want to see in the result. join matches up the start and stop entries for the same activity, creating a row for each activity. Finally, project again adds a column to show the duration of the activity.
  • 56. UC2: In Place Enrichment Creating and using query-time dimension tables In many cases one wants to join the results of a query with some ad-hoc dimension table that is not stored in the database. It is possible to define an expression whose result is a table scoped to a single query by doing something like this: Kusto // Create a query-time dimension table using datatable let DimTable = datatable(EventType:string, Code:string) [ "Heavy Rain", "HR", "Tornado", "T" ] ; DimTable | join StormEvents on EventType | summarize count() by Code
  • 57. Brief Summary: Azure Data Explorer Easy to ingest the data and easy to query the data Blob & Azure Queue Python SDK IoT Hub .NET SDK Azure Data Explorer REST API Event Hub .NET SDK Python SDK Web UI Desktop App Jupyter Magic APIs UX Power BI Direct Query Microsoft Flow Azure App Logic Connectors Grafana ADF MS-TDS Java SDK Java Script Monaco IDE Azure Notebooks Protocols Streaming Bulk APIs Blob & Event Grid Queued Ingestion Direct Java SDK
  • 58. Thank you to our Local Sponsors and Supporters
  • 59. Thank you Ricordatevi di compilate il feedback form  https://speakerscore. com/xxx #SqlSat921

Editor's Notes

  1. - L’intento di oggi è - cosa NON DIREMO - perché gli esempi sono MS
  2. Not for Updates? Append only? Yes, of course.
  3. Approfondimento da inserire qui https://www.linkedin.com/pulse/azure-data-explorer-business-continuity-henning-rauch/?trackingId=h2RAxJ2JNJ%2BYSyTHGSDLbw%3D%3D
  4. Azure Data Explorer cluster is a pair of engine and data management clusters which uses several Azure resources such as Azure Linux VM’s and Storage. The applicable VMs, Azure Storage, Azure Networking and Azure Load balancer costs are billed directly to the customer subscription STORAGE => ci metto dati per retrieve puntuale, sempre spenta => Long retention COMPUTE => cui metto dati temporanei da spazzolare spesso Corta retention
  5. It is like a IAAS… Based on VM size, Storage and Network, and Not based on DATABASE Numbers.
  6. SE SEI IN ANTICIPO, PROVA!!!!
  7. PROVA SU VSC CTRL+P => kuskus Poi: cluster(adxclu001).database('db001').table('TBL_LAB01') | count
  8. Queued ingestion: https://docs.microsoft.com/en-us/azure/kusto/api/netfx/kusto-ingest-queued-ingest-sample Inline ingestion: https://docs.microsoft.com/it-it/azure/kusto/management/data-ingestion/ingest-inline
  9. .create-or-alter function with (folder = "AzureSaturday2019", docstring = "Func1", skipvalidation = "true") MyFunction1(i:long) {TBL_LAB0X | limit 100 | where minimum_nights > i} MyFunction1(80); explain SELECT name, minimum_nights from TBL_LAB0X .create-or-alter function with (folder = "AzureSaturday2019", docstring = "Func1", skipvalidation = "true") MyFunction1(i:long) {TBL_LAB0X | project name, minimum_nights | limit 100 | where minimum_nights > i | render columnchart} MyFunction1(80);
  10. T | summarize Hits=count() by bin(Duration, 1s)