Modern Data Science Lifecycle with ADX & Azure
This document discusses using Azure Data Explorer (ADX) for data science workflows. ADX is a fully managed analytics service for real-time analysis of streaming data. It allows for ad-hoc querying of data using Kusto Query Language (KQL) and integrates with various Azure data ingestion sources. The document provides an overview of the ADX architecture and compares it to other time series databases. It also covers best practices for ingesting data, visualizing results, and automating workflows using tools like Azure Data Factory.
1. Modern Data Science Lifecycle with ADX & Azure
Azure Data Explorer
Riccardo Zamana, IIoT BU Manager, Beantech s.r.l.
2. Explore your
PASS
community
Free online
webinar events
Connect with the global
data community
Local user groups
around the world
Online special
interest user groups
Learning on-demand
and delivered to you
Get involved
Own your career with interactive learning built
by community and guided by data experts.
Get involved. Get ahead.
.org
3. Missed PASS Summit 2019?
Get the Recordings
Download all PASS Summit sessions on
Data Management, Analytics, or
Architecture for only $399 USD
More options available at
PASSstuff.com
4. We are covering all bases to ensure our community can continue reaching new and exciting heights. Plans
are underway for the in-person event you all know and love along with a new venture, a new opportunity:
a PASS Summit 2020 Virtual Event.
Find out more at PASS.org/summit
6. This event was sponsored by Microsoft
Learn more about SQL Server 2019 today:
-Get free training: aka.ms/sqlworkshops
-Download the SQL19 eBook: aka.ms/sql19_ebook
7. Questions
What about TIME SERIES DATABASE?
When Have I to use it?
Which are market possible choices?
OpenTSDB? Kairos over Scylla/Cassandra? Influx?
Why Have I to learn Another DB yet!?
Why not SQL? Why not COSMOS?
8. •seconds freshness, days retention
•in-mem aggregated data
•pre-defined standing queries
•split-seconds query performance
•data viewing
Hot
•minutes freshness, months retention
•raw data
•ad-hoc queries
•seconds-minutes query perf
•data exploration
Warm
•hours freshness, years retention
•raw data
•programmatic batch processing
•minutes-hours query perf
•data manipulation
Cold
• in-mem cube
• stream analytics
• …
• column store
• Indexing
• …
• distributed file
system
• map reduce
• …
Example Scenario: Multi-temperature data processing paths
Where is located the ADX Process?
9. What is Azure Data Explorer
Any append-
only stream
of records
Relational query
model:
Filter, aggregate, join,
calculated columns, …
Fully-
managed
Rapid iterations to
explore the data
High volume
High velocity
High variance
(structured, semi-
structured, free-text)
PaaS,
Vanilla,
Database
Purposely built
10. Fully managed big data analytics service
• Fully managed
for efficiency
Focus on insights, not the
infra-structure for fast time to
value
• No infrastructure to manage;
provision the service, choose
the SKU for your workload,
and create database.
• Optimized for
streaming data
Get near-instant insights
from fast-flowing data
• Scale linearly up to 200 MB per
second per node with highly
performant, low latency
ingestion.
• Designed for
data exploration
• Run ad-hoc queries using the
intuitive query language
• Returns results from 1 Billion
records < 1 second without
modifying the data or
metadata
11. Use cases & Best Scenario
When is it useful?
• Analyze Telemetry data
• Retrieve trends/Series from clustered data
• Make regression over Big Data
• Summarize and export ordered streams
Which kind of scenario is suitable for?
• IoT
• Troubleshooting and diagnostics
• Monitoring
• Security research
• Usage analytics
12. Azure Data Explorer Architecture
SPARK
ADF
Apps => API
Logstash plg
Kafka sync
IotHub
EventHub
EventGrid
Data
Management
Engine
SSD
Blob /
ADLS
STREAM
BATCH
Ingested
Data
ODBC
PowerBI
ADX UI
MS Flow
Logic Apps
Notebooks
Grafana
Spark
13. Azure Data Explorer SLA
SLA: at least 99.9%
availability (Last
updated: Feb 2019)
Maximum Available Minutes: is the total number of minutes for a given Cluster deployed by Customer in a
Microsoft Azure subscription during a billing month.
Downtime: is the total number of minutes within Maximum Available Minutes during which a Cluster is unavailable.
Monthly Uptime Percentage: for the Azure Data Explorer is calculated as Maximum Available Minutes less
Downtime divided by Maximum Available Minutes.
Monthly Uptime % = (Maximum Available Minutes-Downtime) / Maximum Available Minutes X 100
Daily: 01m 26.4s
Weekly: 10m 04.8s
Monthly: 43m 49.7s
14. How about the pricing Model?
1) Is a cluster (N Engine VMs + 1 Data management VM + Classic Iaas Network)
2) Billed per minutes (if you don’t use it, stop it!)
3) Honest price (if you stop VMs, ADX charges markup stops also)
4) Markup is proportionally related to Engine VMs number (only 1 DM VM “naked”)
Developer VM
(QPU)
Compute Optimized
(vCPU)
Storage Optimized
(vCPU)
Only for query test or
analytics automation
development
Production (Azure Data Explorer will also charge for Storage and Networking charges
incurred)
Workload that need high
rate of queries over a
smaller data size
workloads that need fewer
queries over a large volume
of data
Development
15. How about the Pricing «REAL» Model?
Pricing considerations:
• More RI savings on D
• More space on L
• DM Doubled price on L series (2x VM, with 2xresources)
https://dataexplorer.azure.com/AzureDataExplorerCostEstimator.html
16. Pricing
https://dataexplorer.azure.com/AzureDataExplorerCostEstimator.html
Think to ADX as an
ANALYSIS TOOL, in a
MultiTenant
Environment
if you want to
pay little
money
if you can
afford a space
shuttle
Think to ADX as an
INGESTION & RESILIENCY
TOOL in order to break
through your traditional
«Live DWH»
* Space shuttle definition: ADX Can scale up to 500 VMs => 8000 CORE x 64TB RAM….. and 1M
Your spending will
be a mix between
ADX and ADF /
PowerBI costs
Your spending will
be a mix between
ADX, Compute&
Event Services
17. Available SKU
Attribute D SKU L SKU
Small SKUs Minimal size is D11 with two cores Minimal size is L4 with four cores
Availability Available in all regions (the DS+PS
version has more limited availability)
Available in a few regions
Cost per GB cache per core High with the D SKU, low with the
DS+PS version
Lowest with the Pay-As-You-Go option
Reserved Instances (RI) pricing High discount (over 55 percent for a
three-year commitment)
Lower discount (20 percent for a three-
year commitment)
• D v2: The D SKU is compute-optimized (Optional Premium Storage disk)
• LS: The L SKU is storage-optimized (greater SSD size than D SKU equivalent)
18. First questions about ADX
It is normal to evaluate a new Azure service, in terms of Maturity, Applicability and
Affordability. So:
A) Are we sure that ADX is a mature Service?
B) Which are the correct use cases where it is really useful?
C) Which are the OSS Alternatives that we should compare with?
19. A) Are we sure that ADX is a mature Service?
Telemetry Analytics for internal Analytics Data Platform for products
AI OM
S
ASC Defender IOT
Interactive Analytics Big Data Platform
2015 - 2016
Starting with 1st party validation
Building modern analytics
Vision of analytics platform for MSFT
2019
Analytics engine for 3rd party offers
Unified platform across OMS/AI
Expanded scenarios for IOT timeseries
Bridged across client/server security
2017
GA - February 2019
20. B) Which are the correct use cases where it is really useful?
SCENARIO USE CASE USER STORY
Asset management
Troubleshooting Scenario
You need a Telemetry Analytics
Platform, in order to retrieve
aggregations or statistical calcultation
on historial series
(«As an IT Manager» I want a platform
to load logs from various file types, in
order to analyze them and focus
graphically the problem during time)
IoT Saas Solution
Logistics Saas Solution
You want to offer multi tenant SAAS
Solutions
(«As a Product Lead Engineer» I want to
manage the backend of my multitenant
SAAS solution using a unique, fat, huge
backend service)
IIoT Quality Management You need, within an Industrial IoT
solution deelopment, to have common
backend to handle with process
variables, to make a correation analysis
using continuous stream query
(«As a Quality manager» I need a
prebuilt backend solutions to
dynamically configure time based query
on data in order tofind out correlations
from process variables )
21. C) Which are the OSS Alternatives that we should compare with?
From db-engines.com
Azure Data Explorer
Fully managed big data
interactive analytics platform
Elastic Search
A distributed, RESTful modern
search and analytics engine
ADX can be a replacement for search and log analytics
engines such as Elasticsearch, Splunk, InfluxDB.
Splunk
real-time insights Engine to
boost productivity & security.
InfluxDB
DBMS for storing time series,
events and metrics
Vs
22. Comparison chart
Name Elasticsearch (ELASTIC) InfluxDB (InfluxData Inc.) Azure Data Explorer (Microsoft) Splunk (Splunk Inc.)
Description A distributed, RESTful modern search and
analytics engine based on Apache
Lucene
DBMS for storing time series, events and
metrics
Fully managed big data interactive
analytics platform
Analytics Platform for Big Data
Database models Search engine, Document store Time Series DBMS Time Series DBMS, Search engine,
Document store , Event Store,
Relational DBMS
Search engine
Initial release 2010 2013 2019 2003
License Open Source Open Source commercial commercial
Cloud-based only no no yes no
Implementation language Java Go
Server operating systems All OS with a Java VM Linux, OS X hosted Linux, OS X, Solaris, Windows
Data scheme schema-free schema-free Fixed schema with schema-less datatypes
(dynamic)
yes
Typing yes Numeric data and Strings yes yes
XML support no no yes yes
Secondary indexes yes no all fields are automatically indexed yes
SQL SQL-like query language SQL-like query language Kusto Query Language (KQL), SQL
subset
no
APIs and other access methods RESTful HTTP/JSON API HTTP API RESTful HTTP API HTTP REST
Java API JSON over UDP Microsoft SQL Server communication
protocol (MS-TDS)
Supported programming
languages
.Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python
Ruby, PHP, Perl, Groovy, Community
Contributed Clients
R,Ruby,PHP,Perl,Haskell,Clojure,Erlang,Go,
Lisp,Rust,Scala
R, PowerShell Ruby, PHP
Server-side scripts yes no Yes, possible languages: KQL, Python,
R
yes
Triggers yes no yes yes
Partitioning methods Sharding Sharding Sharding Sharding
Replication methods yes selectable replication factor yes Master-master replication
MapReduce ES-Hadoop Connector no no yes
Consistency concepts Eventual Consistency Eventual Consistency Eventual Consistency
Immediate Consistency
Foreign keys no no no no
Transaction concepts no no no no
Concurrency yes yes yes yes
Durability yes yes yes yes
In-memory capabilities Memcached and Redis integration yes no no
User concepts simple rights management via user
accounts
Azure Active Directory Authentication Access rights for users and roles
23. Why ADX is Unique
Simplified costs
• Vm costs
• ADX service add
on cost
Many Prebuilt Inputs
• ADF
• Spark
• Logstash
• Kafka
• Iothub
• EventHub
Many Prebuilt
Outputs
• TDS/SQL
• Power BI
• ODBC Connector
• Spark
• Jupyter
• Grafana
24. Azure services with ADX usage
Azure Monitor
• Log Analytics
• Application Insights
Security Products
• Windows Defender
• Azure Security Center
• Azure Sentinel
IoT
• Time Series Insights
• Azure IoT Central
25. ADX QUICK START
Then go to https://dataexplorer.azure.com/
Sign-in to Azure
Portal, search for
‘Azure Data Explorer
cluster’ and click on
Create.
Fill the name of the
database, retention
and cache period in
days and hit the
Create button.
click on Create data connection to
load/ingest data from Event Hub, Blob
Storage or IoT Hub, Kafka connector,
Azure Data Factory, Event Hub, into the
database you just created.
With The first command creates a table, the second command ingests data from the csv file to this table (example).
27. How to set ADX «really»?
Create a database
Use Database to link Ingestion Sources
DataConnection
Installa Kusto Extension
az extension add -n kusto
Login
az login
Select Subscription
az account set --subscription MyAzureSub
Cluster creation
az kusto cluster create --name azureclitest --sku
D11_V2 –capacity 2 --resource-group testrg
Database Creation
az kusto database create --cluster-name azureclitest -
-name clidatabase --resource-group testrg --soft-
delete-period P365D --hot-cache-period P31D
HOT-CACHE-PERIOD: Amount of time that data should be kept in cache.
Duration in ISO8601 format (for example, 100 days would be P100D).
SOFT-DELETE-PERIOD: Amount of time that data should be kept so it is available
to query.
Duration in ISO8601 format (for example, 100 days would be P100D)
Like a Newbie
Like a Pro
Configuration
28. How to set and use ADX «really»?
Or like a BOSS
After retrieving Azure identity, create (with async method)
Check for a Succeded Provisioning state
Create the database
Then Do the fucking job and remove all traces [like a Killer]
FIRST
SECOND
THIRD
..LAST!
29. How to script with ADX «really»?
• Open VSC, create a file, save it and then edit
• Use Log Analytics or KUSTO/KQL extensions (
.csl | .kusto | .kql)
• Use Kuskus polugins (4) to highlight grammar
(Kuskus use Textmate grammar compliancy)
Like a Newbie Like a Pro
• Open https://dataexplorer.azure.com/
• Connect your cluster
• Use Online editor to test queries
• Save KQL Files using
30. How to script with ADX «really»?
Make ADX as part
of a bigger,
collaborative,
workflow.
Make your own web Application
Build a Analysis Workflow with
• Creation of Temporary Clusters
• Collaborative workspaces
• Track History of Analysis done
• Score People usage
Make your application secure
Make your result exportable
Natively to your Databases
Serve Power BI «for humans»
<iframe src="https://dataexplorer.azure.com/clusters/<cluster>?ibizaPortal=true" ></iframe>
https://microsoft.github.io/monaco-editor/index.html
Or like a BOSS Dev
31. How to ingest data in ADX «really» ?
Like a Newbie Like a Pro
• Open https://dataexplorer.azure.com/
• Right click on DB, Ingest new data (preview)
Use LIGHT INGEST
• command-line utility for ad-hoc data ingestion into
Kusto
• pull source data from a local folder or a Azure Blob
Storage container
[Ingest JSON data from blobs]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:db001
-table:LAB
-sourcePath: "Blob storage accunt with container SAS Token"
-prefix:MyDir1/MySubDir2
-format:json
-mappingRef:DefaultJsonMapping
-pattern:*.json
-limit:100
[Ingest CSV data with headers from local files]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:MyDb
-table:MyTable
-sourcePath:"D:MyFolderData"
-format:csv
-ignoreFirstRecord:true
-mappingPath:"D:MyFolderCsvMapping.txt"
-pattern:*.csv.gz
-limit:100
https://docs.microsoft.com/en-us/azure/kusto/tools/lightingest
32. How to ingest data in ADX «really» ?
Or like a a BOSS Dev
• Use a DataLake to store files
• Use EventGrid in order to have a Trigger Platform
• Make a Function … and That’s it!
33. How to visualize data in ADX «really»?
Like a Newbie Like a Pro
• Open https://dataexplorer.azure.com/ using «RENDER»
• Download Kusto Explorer and click the right Chart
• Use PowerBI Connector (Direct or Import Modes)
• Use Excel connector
34. How to visualize data in ADX «really»?
Use Python as your Swiss
knife for everything you
don’t wanna be «Properly
Developed»
Or like a a BOSS «Dev» Use VSC as a Jupyter Notebook IDE with «KQLMagic»
KQLMagic extends the capabilities of the Python kernel in Jupyter, so you
can run Kusto language queries natively, combining Python and Kusto
query language.
https://github.com/microsoft/jupyter-Kqlmagic
36. ADVANCED INGESTION CAPABILITIES
Connect Edge to Cloud through a Secure Chain
Path no.1: Local Log Streaming
KAFKA cluster as a Log Streamer, equipped with a KUSTO SINK,
that writes a direct stream
Path no.2: Remote Log Streaming
Connect KAFKA directly to Azure Event hub, and then ingest with
built in process (preview)
Path no.2: Local Pipeline
LogStash as a local dynamic data collection pipeline with an
extensible plugin ecosystem
Path no.3: Remote Pipeline
Use Integration runtime to link OnPremise Data using Azure
Datafactory Dataflow
37. Ingestion Tecniques
For high-volume, reliable, and
cheap data ingestion
Batch ingestion
(provided by SDK)
the client uploads the data to Azure Blob storage
(designated by the Azure Data Explorer data
management service) and posts a notification to an
Azure Queue.
Batch ingestion is the recommended technique.
Most appropriate for
exploration and prototyping
.Inline ingestion
(provided by query tools)
Inline ingestion: control command (.ingest inline)
containing in-band data is intended for ad hoc
testing purposes.
Ingest from query: control command (.set, .set-or-
append, .set-or-replace) that points to query
results is used for generating reports or small
temporary tables.
Ingest from storage: control command (.ingest
into) with data stored externally (for example,
Azure Blob Storage) allows efficient bulk ingestion
of data.
38. Supported data formats
The supported data formats are:
- CSV, TSV, TSVE, PSV, SCSV, SOH
- JSON (line-separated, multi-line), Avro
- ZIP and GZIP
• CSV Mapping (optional): works with all ordinal-based formats. It can be performed using the
ingest command parameter or pre-created on the table and referenced from the ingest
command parameter.
• JSON Mapping (mandatory) and Avro mapping (mandatory) can be performed using the
ingest command parameter. They can also be pre-created on the table and referenced from the
ingest command parameter.
For all ingestion methods other than ingest from query, YOU HAVE TO FORMAT THE DATA
so that Azure Data Explorer can parse it. Schema mapping helps bind source data fields to
destination table columns.
39. How about orchestration?
Three use cases in which FLOW + KUSTO are the solution
Push data to Power BI dataset
Periodically do queries, and
push to PowerBI dataset
Conditional queries
Make data checks, and send
notifications with no code
Email multiple ADX Flow charts
Send incredible emails with HTML5
Chart as query result
40. Use ADX with Application not natively supported by a connector
Use ODBC to connect to Azure Data Explorer from applications that don't have a dedicated connector. Azure Data
Explorer supports a subset of the SQL Server communication protocol.
Than you can use your preferred tool, compliant with ODBC Connection.
Use ODBC Driver 17 Connect to ADX
Cluster
Select Database Test connection
IMPORTANT NOTES:
• T-SQL SELECT statements are supported (sub-queries are not supported instead)
• Considering it a SQL Server with AD authentication, you can use LINQPad, Azure Data Studio, Power BI Desktop,
Tableau, Excel, DBeavera and SSMS(they're all MS-TDS clients).
• SQL server on-premises allows to attach linked server, and you can use "ADX as SQL"
41. From T-SQL to KQL and viceversa
Kusto supports translating T-SQL queries to Kusto query language.
Is translated into
Some example
NOTE: remember the dashes ( !#£$%!! )
43. ADX Functions
Functions are reusable queries or query parts. Kusto
supports several kinds of functions:
• Stored functions, which are user-defined functions that are
stored and managed a one kind of a database's schema entities.
See Stored functions.
• Query-defined functions, which are user-defined functions that
are defined and used within the scope of a single query. The
definition of such functions is done through a let statement. See
User-defined functions.
• Built-in functions, which are hard-coded (defined by Kusto and
cannot be modified by users).
44. Language examples
Alias
database["wiki"] = cluster("https://somecluster.kusto.windows.net:443").database("somedatabase");
database("wiki").PageViews | count
Let
start = ago(5h);
let period = 2h;
T | where Time > start and Time < start + period | ...
Bin:
T | summarize Hits=count() by bin(Duration, 1s)
Batch:
let m = materialize(StormEvents | summarize n=count() by State);
m | where n > 2000; m | where n < 10
Tabular expression:
Logs
| where Timestamp > ago(1d)
| join ( Events | where continent == 'Europe' ) on RequestId
45. Time Series Analysis – Bin Operator
T | summarize Hits=count() by bin(Duration, 1s)
bin(value,roundTo)
bin operator
Rounds values down to an integer multiple of a given bin size. If you have a scattered set of values, they will
be grouped into a smaller set of specific values.
[Rule]
[Example]
46. Time Series Analysis – Make Series Operator
T | make-series sum(amount) default=0, avg(price)
default=0 on timestamp from datetime(2016-01-01)
to datetime(2016-01-10) step 1d by supplier
T | make-series [MakeSeriesParamters] [Column =] Aggregation [default = DefaultValue] [, ...] on
AxisColumn from start to end step step [by [Column =] GroupExpression [, ...]]
make-series operator
[Rule]
[Example]
47. Time Series Analysis – Basket Operator
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State, EventType, Damage, DamageCrops
| evaluate basket(0.2)
basket operator
Basket finds all frequent patterns of discrete attributes (dimensions) in the data and will return all frequent
patterns that passed the frequency threshold in the original query.
[Rule]
[Example]
T | evaluate basket([Threshold, WeightColumn, MaxDimensions, CustomWildcard, CustomWildcard, ...])
48. Time Series Analysis – Autocluster Operator
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" ,
"NO")
| project State , EventType , Damage
| evaluate autocluster(0.6)
autocluster operator
AutoCluster finds common patterns of discrete attributes (dimensions) in the data and will reduce the results
of the original query (whether it's 100 or 100k rows) to a small number of patterns.
[Rule]
[Example]
T | evaluate autocluster([SizeWeight, WeightColumn, NumSeeds, CustomWildcard, CustomWildcard, ...])
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.2, '~', '~', '*')
49. Export
To Storage
.export async compressed to csv (
h@"https://storage1.blob.core.windows.net/containerName;secretKey",
h@"https://storage1.blob.core.windows.net/containerName2;secretKey" ) with (
sizeLimit=100000, namePrefix=export, includeHeaders=all, encoding =UTF8NoBOM )
<| myLogs | where id == "moshe" | limit 10000
To Sql
.export async to sql ['dbo.MySqlTable']
h@"Server=tcp:myserver.database.windows.net,1433;Database=MyDatabase;Authenti
cation=Active Directory Integrated;Connection Timeout=30;" with
(createifnotexists="true", primarykey="Id") <| print Message = "Hello World!",
Timestamp = now(), Id=12345678
1. DEFINE COMMAND
Define ADX command and try
your recurrent export strategy
2. TRY IN EDITOR
Use an Editor to try command,
verifying conection strings and
parametrizing them
3. BUILD A JOB
Build a Notebook or a C# JOB
using the command as a SQL
QUERY in your CODE
50. External tables & Continuous Export
It’s an external endpoint:
Azure Storage
Azure Datalake Store
SQL Server
You need to define:
Destination
Continuous-Export Strategy
EXT TABLE CREATION
.create external table ExternalAdlsGen2 (Timestamp:datetime, x:long,
s:string) kind=adl partition by bin(Timestamp, 1d) dataformat=csv (
h@'abfss://filesystem@storageaccount.dfs.core.windows.net/path;secre
tKey' ) with ( docstring = "Docs", folder = "ExternalTables",
namePrefix="Prefix" )
EXPORT to EXT TABLE
.create-or-alter continuous-export MyExport over (T) to table
ExternalAdlsGen2 with (intervalBetweenRuns=1h, forcedLatency=10m,
sizeLimit=104857600) <| T
51. FACTS:
A) Kusto stores its ingested data in reliable storage (most commonly Azure Blob Storage).
B) To speed-up queries on that data, Kusto caches this data (or parts of it) on its processing nodes,
Retention policy
The Kusto cache provides a granular cache policy that
customers can use to differentiate between two data
cache policies: hot data cache and cold data cache.
set query_datascope="hotcache";
T | union U | join (T datascope=all | where Timestamp < ago(365d) on X
YOU CAN SPECIFY WHICH LOCATION MUST BE USED
Cache policy
is independent
of retention policy !
52. Retention policy
• Soft Delete Period (number)
• Data is available for query
ts is the ADX IngestionDate
• Default is set to 100 YEARS
• Recoverability (enabled/disabled)
• Default is set to ENABLED
• Recoverable for 14 days after deletion
.alter database DatabaseName policy retention "{}"
.alter table TableName policy retention "{}«
EXAMPLE:
{ "SoftDeletePeriod": "36500.00:00:00",
"Recoverability":"Enabled" }
.delete database DatabaseName policy retention
.delete table TableName policy retention
.alter-merge table MyTable1 policy retention softdelete =
7d
2 Parameters, applicable to DB or Table Use KUSTO to set KUSTO
53. Data Purge
The purge process is final and
irreversible
PURGE PROCESS:
1. It requires database admin
permissions
2. Prior to Purging you have
to be ENABLED, opening a
SUPPORT TICKET.
3. Run purge QUERY, and
identify SIZE, EXEC.TIME
and give VerificationToken
4. Run REALLY purge QUERY
passing Verification Token
.purge table MyTable records in database MyDatabase <|
where CustomerId in ('X', 'Y')
NumRecordsToPurg
e
EstimatedPurg
eExecutionTim
e VerificationToken
1,596 00:00:02 e43c7184ed22f4f
23c7a9d7b124d1
96be2e57009698
7e5baadf65057fa
65736b
.purge table MyTable records in database MyDatabase with
(verificationtoken='e43c7184ed22f4f23c7a9d7b124d196be2e
570096987e5baadf65057fa65736b') <| where CustomerId in
('X', 'Y')
.purge table MyTable records
in database MyDatabase
with (noregrets='true')
2 STEP PROCESS 1 STEP PROCESS
With No Regrets !!!!
54. KUSTO: Do and Don’t
1. DO analytics over Big Data.
2. DO and support entities such as databases, tables, and columns
3. DO and support complex analytics query operators (calculated
columns, filtering, group by, joins).
BUT DO NOT perform in-place updates !
55. UC1: Event Correlation
Name City
Session
Id Timestamp
Start London 281733
0
2015-12-09T10:12:02.32
Game London 281733
0
2015-12-09T10:12:52.45
Start Manchester 426766
7
2015-12-09T10:14:02.23
Stop London 281733
0
2015-12-09T10:23:43.18
Cancel Manchester 426766
7
2015-12-09T10:27:26.29
Stop Manchester 426766
7
2015-12-09T10:28:31.72
City SessionId StartTime StopTime Duration
Londo
n
2817330 2015-12-
09T10:12:02.32
2015-12-
09T10:23:43.18
00:11:40.46
Manch
ester
4267667 2015-12-
09T10:14:02.23
2015-12-
09T10:28:31.72
00:14:29.49
Get sessions from start and stop events
Let's suppose we have a log of events, in which some events mark the start or end of an
extended activity or session. Every event has an SessionId, so the problem is to match up the
start and stop events with the same id.
Kusto
let Events = MyLogTable
| where ... ;
Events
| where Name == "Start"
| project Name, City, SessionId, StartTime=timestamp
| join (Events | where Name="Stop" | project StopTime=timestamp, SessionId)
on SessionId
| project City, SessionId, StartTime, StopTime, Duration = StopTime - StartTime
Use let to name a projection of the table that is pared down as far as possible
before going into the join.
Project is used to change the names of the timestamps so that both the start
and stop times can appear in the result. It also selects the other columns we
want to see in the result.
join matches up the start and stop entries for the same activity, creating a row
for each activity. Finally, project again adds a column to show the duration of
the activity.
56. UC2: In Place Enrichment
Creating and using query-time dimension tables
In many cases one wants to join the results of a query with some ad-hoc dimension table that is not stored in the
database. It is possible to define an expression whose result is a table scoped to a single query by doing something
like this:
Kusto
// Create a query-time dimension table using datatable
let DimTable = datatable(EventType:string, Code:string) [ "Heavy Rain", "HR", "Tornado", "T" ] ;
DimTable
| join StormEvents on EventType
| summarize count() by Code
57. Brief Summary: Azure Data Explorer
Easy to ingest the data and easy to query the data
Blob
&
Azure
Queue
Python
SDK
IoT Hub
.NET SDK
Azure Data
Explorer
REST API
Event Hub
.NET SDK
Python SDK
Web UI
Desktop App
Jupyter
Magic
APIs UX
Power BI
Direct Query
Microsoft
Flow
Azure App
Logic
Connectors
Grafana
ADF
MS-TDS
Java SDK
Java Script
Monaco IDE
Azure
Notebooks
Protocols
Streaming
Bulk
APIs
Blob
&
Event Grid
Queued
Ingestion Direct
Java SDK
- L’intento di oggi è
- cosa NON DIREMO
- perché gli esempi sono MS
Not for Updates?
Append only? Yes, of course.
Approfondimento da inserire qui
https://www.linkedin.com/pulse/azure-data-explorer-business-continuity-henning-rauch/?trackingId=h2RAxJ2JNJ%2BYSyTHGSDLbw%3D%3D
Azure Data Explorer cluster is a pair of engine and data management clusters which uses several Azure resources such as Azure Linux VM’s and Storage. The applicable VMs, Azure Storage, Azure Networking and Azure Load balancer costs are billed directly to the customer subscription
STORAGE => ci metto dati per retrieve puntuale, sempre spenta
=> Long retention
COMPUTE => cui metto dati temporanei da spazzolare spesso
Corta retention
It is like a IAAS… Based on VM size, Storage and Network, and Not based on DATABASE Numbers.
SE SEI IN ANTICIPO, PROVA!!!!
PROVA SU VSC
CTRL+P => kuskus
Poi:
cluster(adxclu001).database('db001').table('TBL_LAB01')
| count