SlideShare a Scribd company logo
1 of 76
MCT Summit 2021
Time series Analytics
A deep dive into ADX Azure Data Explorer
Riccardo Zamana
1
MCT Summit 2021
• 20+ Experience in IT
• 10+ Experience in IoT
• 5+ Experience in Azure Projects
2
MCT Summit 2021
ADX - Basics
3
MCT Summit 2021
What is ADX for me, today
4
• A Telemetry data Search engine => ELK replacement
• A TSDB envolved in LAMBDA replacements (as WARM path) => OSS LAMBDA (MinIO +
Kafka) replacement
• A Tool to Materialize data into ADLS & SQL
• A Tool for monitoring, summarizing information and send notifications
MCT Summit 2021
ADX Architecture
5
1. CONTEXT
2. SOURCES
3. INFRASTRUCTURE
4. DESTINATIONS
MCT Summit 2021
ADX - Quickstart
6
MCT Summit 2021
What is Azure Data Explorer
Any append-
only stream
of records
Relational query
model:
Filter, aggregate, join,
calculated columns, …
Fully-
managed
Rapid iterations to
explore the data
High volume
High velocity
High variance
(structured, semi-
structured, free-text)
PaaS,
Vanilla,
Database
Purposely built
MCT Summit 2021
The role of ADX
8
Raw data DWH
Refined data
Real time
derived data
Data
comparison
and fast kpi
ADX
THREE KEY USERS IN ONE TOOL:
• IoT Developer (data check, rule engine for insights)
• Data engineer (data comparison)
• Data scientist (data exploration)
MCT Summit 2021
How ADX is Organized
11
INSTANCE DATABASE SOURCES
DB Users/Apps
Ingestion URL
Querying URL
Cache storage
Blob storage
EXTERNAL
SOURCES
EXTERNAL
DESTINATIONS
IotHUB
EventHub
Storage
ADLS
Sql Server
MANY..
MCT Summit 2021
ADX - Ingest
14
MCT Summit 2021
FIRST PHASE: Ingestion
15
• Many connections & Plugins
• Many SDKs
• Many managed pipelines
• Many tools to Ingest Rapidly
Managed pipelines:
• Ingest blob using EventGrid
• Ingest Eventhub stream
• Ingest IotHub stream
• Ingest data from ADF
Connections & Plugins:
• Logstash plugin
• Kafka Connector
• Apache spark Connector
Many SDK:
• Python SDK
• .NET SDK
• Java SDK
• Node SDK
• REST API
• GO API
Tools:
• One click ingestion
• LightIngest
MCT Summit 2021
Ingestion Types:
16
• Streaming ingestion: Optimized for low volume of data per table,
over thousands of tables
• Operation completes in under 10 seconds
• Data available for query after completion
• Batching ingestion: optimized for high ingestion throughput
• Default batch params: 5 minutes, 500 items, or 1000MB
MCT Summit 2021
Ingestion Tecniques
17
For high-volume, reliable, and
cheap data ingestion
Batch ingestion
(provided by SDK)
the client uploads the data to Azure Blob
storage (designated by the Azure Data
Explorer data management service) and
posts a notification to an Azure Queue.
Batch ingestion is the recommended
technique.
Most appropriate for exploration and
prototyping
.Inline ingestion
(provided by query tools)
Inline ingestion: control command (.ingest inline) containing in-
band data is intended for ad hoc testing purposes.
Ingest from query: control command (.set, .set-or-append, .set-
or-replace) that points to query results is used for generating
reports or small temporary tables.
Ingest from storage: control command (.ingest into) with data
stored externally (for example, Azure Blob Storage) allows
efficient bulk ingestion of data.
MCT Summit 2021
What is LightIngest
18
• command-line utility for ad-hoc data
ingestion into Kusto
• pull source data from a local folder
• pull source data from an Azure Blob
Storage container
• Useful to ingest fastly and play with
ADX
• Most useful when you want to ingest a
large amount of data, (time constraint
on ingestion duration)
[Ingest JSON data from blobs]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:db001
-table:LAB
-sourcePath:"https://ACCOUNT_NAME.blob.core.windows.net/CONTAINER_NAME?SAS_TOKEN"
-prefix:MyDir1/MySubDir2
-format:json
-mappingRef:DefaultJsonMapping
-pattern:*.json
-limit:100
[Ingest CSV data with headers from local files]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:MyDb
-table:MyTable
-sourcePath:"D:MyFolderData"
-format:csv
-ignoreFirstRecord:true
-mappingPath:"D:MyFolderCsvMapping.txt"
-pattern:*.csv.gz
-limit:100
REFERENCE:
https://docs.microsoft.com/en-us/azure/kusto/tools/lightingest
MCT Summit 2021
LightIngest: pay attention with Users!
19
Queued ingestion
Direct ingestion
IMPORTANT: the -creationTimePattern argument allows users to partition the data by creation time, not ingestion time
MCT Summit 2021
LightIngest: pay attention with Users!
20
IMPORTANT:
All the data is indexed but... How is partitioned???? By Ingestion TIME !!!
the -creationTimePattern argument allows users to partition the data by creation time, not ingestion time
[Ingest CSV data with headers from local files]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:MyDb
-table:MyTable
-sourcePath:"D:MyFolderData"
-format:csv
-ignoreFirstRecord:true
-mappingPath:"D:MyFolderCsvMapping.txt"
-pattern:*.csv.gz
-limit:100
[Ingest JSON data from blobs]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:db001
-table:LAB
-sourcePath:
"https://ACCOUNT_NAME.blob.core.windows.net/CONTAIN
ER_NAME?SAS_TOKEN"
-prefix:MyDir1/MySubDir2
-format:json
-mappingRef:DefaultJsonMapping
-pattern:*.json
-limit:100
MCT Summit 2021
One Click ingestion GA
21
• One Click makes ingestion (intuitive UX)
• Start ingesting data , creating tables and mapping structures
• Different data formats
• One-time or continuous ingestion
FIRST: check your data,
create and destroy tons of
test tables
MCT Summit 2021
Kafka Gold certified connector
22
• From apache Kafka
cluster (on cloud or
onprem)
• Kafka to ingest data
into ADX at scale
• GOLD (Partner
supported <
Microsoft)
What’s the VISION behind it?
MCT Summit 2021
What is FluentBIT
23
• Collaboration with CNCF FluentBIT project
• Multi platform Log Processor and Forwarder to collect
data/logs from different sources
• Unify and send to Block Blob
• Ingest them into ADX using EventGrid
• Can use AZURITE as a storageEndpoint for Simulation
https://docs.microsoft.com/en-us/azure/storage/common/storage-use-
azurite?toc=/azure/storage/blobs/toc.json
MCT Summit 2021
Ingestion: Format & UseCases
24
• Ingest data using native formats: ApacheAvro, CSV (RFC4180),
JSON, MultiJSON (jsonLine), ORC, Parquet, PSV, SCSV, TSV, TXT
• Files/Blobs can be compressed: ZIP, GZIP
• Better to use declarative names: MyData.csv.zip, MyData.json.gz
MCT Summit 2021
Supported data formats
25
For all ingestion methods other than ingest from query, format the data so that Azure Data Explorer can parse it. The
supported data formats are:
• CSV, TSV, TSVE, PSV, SCSV, SOH
• JSON (line-separated, multi-line), Avro
• ZIP and GZIP
Schema mapping helps bind source data fields to destination table columns.
• CSV Mapping (optional) works with all ordinal-based formats. It can be performed using the ingest
command parameter or pre-created on the table and referenced from the ingest command
parameter.
• JSON Mapping (mandatory) and Avro mapping (mandatory) can be performed using the ingest
command parameter. They can also be pre-created on the table and referenced from the ingest
command parameter.
MCT Summit 2021
My ingestion best experience
26
Open points:
• Why EventHub after IotHub?
• Why the second EventHub?
MCT Summit 2021
ADX - TOOLS
27
MCT Summit 2021
How about the Tools?
28
3.VISUALIZE
• Notebooks
• Power BI
• Graphana
• ADX WEB UI
2.QUERY
• Kusto.Explorer
• Web UI
4.ORCHESTRATE
• Microsoft Flow
• Microsoft Logic App
1.LOAD
• LightIngest
• Azure Data Factory
Load
Query
Visualize
Orchestrate
BI People
IT People
ML People
MCT Summit 2021
Azure data studio plugins:
29
Manager Cluster
Manager
Notebooks
1. Select New connection from the Connections pane.
2. Fill in the Connection Details information.
3. For Connection type , select Kusto.
4. For Cluster , enter in your Azure Data Explorer cluster.
5. (When entering the cluster name, don't include the https://
prefix or a trailing /)
6. For Authentication Type , use the default - Azure Active
Directory - Universal with MFA account.
7. For Account , use your account information.
8. For Database , use Default.
9. For Server Group , use Default.
10. For Name (optional) , leave blank.
MCT Summit 2021
Azure data studio plugins:
30
• Filter/View data
• Build 3D Charts
• Take snapshot as JSON declarative file
MCT Summit 2021
Notebooks + ADX = KQL Magic
32
KQL magic:
https://github.com/microsoft/jupyter-Kqlmagic
• extends the capabilities of the Python kernel in Jupyter
• can run Kusto language queries natively
• combine Python and Kusto query language
MCT Summit 2021
A critical perspective
34
MCT Summit 2021
Which are the OSS Alternatives that we should compare with?
35
From db-engines.com
Azure Data Explorer
Fully managed big data
interactive analytics platform
Elastic Search
A distributed, RESTful modern
search and analytics engine
ADX can be a replacement for search and log analytics engines such as Elasticsearch, Splunk, InfluxDB.
Splunk
real-time insights Engine to
boost productivity & security.
InfluxDB
DBMS for storing time series,
events and metrics
Vs
MCT Summit 2021
Comparison chart
36
Name Elasticsearch (ELASTIC) InfluxDB (InfluxData Inc.) Azure Data Explorer (Microsoft) Splunk (Splunk Inc.)
Description A distributed, RESTful modern search and
analytics engine based on Apache Lucene
DBMS for storing time series, events and
metrics
Fully managed big data interactive analytics
platform
Analytics Platform for Big Data
Database models Search engine, Document store Time Series DBMS Time Series DBMS, Search engine, Document
store , Event Store, Relational DBMS
Search engine
Initial release 2010 2013 2019 2003
License Open Source Open Source commercial commercial
Cloud-based only no no yes no
Implementation language Java Go
Server operating systems All OS with a Java VM Linux, OS X hosted Linux, OS X, Solaris, Windows
Data scheme schema-free schema-free Fixed schema with schema-less datatypes
(dynamic)
yes
Typing yes Numeric data and Strings yes yes
XML support no no yes yes
Secondary indexes yes no all fields are automatically indexed yes
SQL SQL-like query language SQL-like query language Kusto Query Language (KQL), SQL subset no
APIs and other access methods RESTful HTTP/JSON API HTTP API RESTful HTTP API HTTP REST
Java API JSON over UDP Microsoft SQL Server communication
protocol (MS-TDS)
Supported programming languages .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python
Ruby, PHP, Perl, Groovy, Community
Contributed Clients
R,Ruby,PHP,Perl,Haskell,Clojure,Erlang,Go,Lisp
,Rust,Scala
R, PowerShell Ruby, PHP
Server-side scripts yes no Yes, possible languages: KQL, Python, R yes
Triggers yes no yes yes
Partitioning methods Sharding Sharding Sharding Sharding
Replication methods yes selectable replication factor yes Master-master replication
MapReduce ES-Hadoop Connector no no yes
Consistency concepts Eventual Consistency Eventual Consistency Eventual Consistency
Immediate Consistency
Foreign keys no no no no
Transaction concepts no no no no
Concurrency yes yes yes yes
Durability yes yes yes yes
In-memory capabilities Memcached and Redis integration yes no no
User concepts simple rights management via user accounts Azure Active Directory Authentication Access rights for users and roles
MCT Summit 2021
Update Policy
37
Automatically append data to a target table whenever new data is inserted into the source table, based on a
transformation query that runs on the data inserted into the source table.
USE IT IF:
• The source table is as a «free-text column based»
• The target table accepts only specific morphology
Cascading updates are allowed (TableA → TableB → TableC → ...).
Raw table Refined table
MCT Summit 2021
How to use Update Policy
38
// Create a function that will be used for update
.create function
MyUpdateFunction()
{
MyTableX
| where ColumnA == 'some-string'
| summarize MyCount=count() by ColumnB, Key=ColumnC
| join (OtherTable | project OtherColumnZ, Key=OtherColumnC) on Key
| project ColumnB, ColumnZ=OtherColumnZ, Key, MyCount
}
// Create the target table (if it doesn't already exist)
.set-or-append DerivedTableX <| MyUpdateFunction() | limit 0
// Use update policy on table DerivedTableX
.alter table DerivedTableX policy update
@'[{"IsEnabled": true, "Source": "MyTableX", "Query": "MyUpdateFunction()", "IsTransactional": false, "PropagateIngestionProperties": false}]'
MCT Summit 2021
Pay attention to failures!
39
Evaluate resource usage
.show table MySourceTable extents;
// The following line provides the extent ID for the not-yet-merged extent in the source table which has the most records
let extentId = $command_results | where MaxCreatedOn > ago(1hr) and MinCreatedOn == MaxCreatedOn | top 1 by RowCount
desc | project ExtentId;
let MySourceTable = MySourceTable | where extent_id() == toscalar(extentId);
MyFunction()
Failures
.show ingestion failures
| where FailedOn > ago(1hr) and OriginatesFromUpdatePolicy == true
• Non-transactional policy: ignored
• Transactional policy: If the ingestion method is pull => automated retry
on the entire ingestion operation (max time)
SO:
You should check failures to
trigger «BROKEN FILES» … but
HOW?
MCT Summit 2021
Use this pattern
40
First table is NEVER wide!! … but YES for the second!
First table schema is K,V,TS,Metadata
Second table schema is WT (Wide Table)
Telemetry oriented ML oriented
MCT Summit 2021
FUNCTION3
FUNCTION2
FUNCTION1
My personal approach
41
DATA
FUNCTION3.1
FUNCTION3.2
FUNCTION3.3
KPI
DEFINITION
KPI
DEFINITION
KPI
DEFINITION
KPI
DEFINITION
DASHBOARD
(use KPI to
embed and
filter them)
MCT Summit 2021
ADX - Query
42
MCT Summit 2021
Some code Examples
43
Query with between
Function with parameters «ToScalar» expression
«Extend» usage
MCT Summit 2021
Kusto for SQL USers
44
• Perform SQL SELECT (no DDL, only SELECT)
• Use KQL (Kusto Query Language)
• Supports translating T-SQL queries to Kusto query language
--
explain
select top(10) * from StormEvents
order by DamageProperty desc
StormEvents
| sort by DamageProperty desc nulls first
| take 10
MCT Summit 2021
Language examples
45
Alias
database["wiki"] =
cluster("https://somecluster.kusto.windows.net:443").da
tabase("somedatabase");
database("wiki").PageViews | count
Let
start = ago(5h);
let period = 2h;
T | where Time > start and Time < start + period | ...
Bin:
T | summarize Hits=count() by bin(Duration, 1s)
Batch:
let m = materialize(StormEvents | summarize n=count() by
State);
m | where n > 2000; m | where n < 10
Tabular expression:
Logs
| where Timestamp > ago(1d)
| join ( Events | where continent == 'Europe' ) on RequestId
MCT Summit 2021
Time Series Analysis – Bin Operator
46
T | summarize Hits=count() by bin(Duration, 1s)
bin(value,roundTo)
bin operator
Rounds values down to an integer multiple of a given bin size. If you have a scattered set of values, they will be
grouped into a smaller set of specific values.
[Rule]
[Example]
MCT Summit 2021
Time Series Analysis – Make Series Operator
47
T | make-series sum(amount) default=0, avg(price) default=0 on
timestamp from datetime(2016-01-01) to datetime(2016-01-10) step
1d by supplier
T | make-series [MakeSeriesParamters] [Column =] Aggregation [default = DefaultValue] [, ...] on
AxisColumn from start to end step step [by [Column =] GroupExpression [, ...]]
make-series operator
[Rule]
[Example]
MCT Summit 2021
Time Series Analysis – Basket Operator
48
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State, EventType, Damage, DamageCrops
| evaluate basket(0.2)
basket operator
Basket finds all frequent patterns of discrete attributes (dimensions) in the data and will return all frequent patterns
that passed the frequency threshold in the original query.
[Rule]
[Example]
T | evaluate basket([Threshold, WeightColumn, MaxDimensions, CustomWildcard, CustomWildcard, ...])
MCT Summit 2021
Time Series Analysis – Autocluster Operator
49
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.6)
autocluster operator
AutoCluster finds common patterns of discrete attributes (dimensions) in the data and will reduce the results of the
original query (whether it's 100 or 100k rows) to a small number of patterns.
[Rule]
[Example]
T | evaluate autocluster([SizeWeight, WeightColumn, NumSeeds, CustomWildcard, CustomWildcard, ...])
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.2, '~', '~', '*')
MCT Summit 2021
ADX Functions
50
Functions are reusable queries or query parts. Kusto supports several kinds
of functions:
• Stored functions, which are user-defined functions that are stored and managed a
one kind of a database's schema entities. See Stored functions.
• Query-defined functions, which are user-defined functions that are defined and
used within the scope of a single query. The definition of such functions is done
through a let statement. See User-defined functions.
• Built-in functions, which are hard-coded (defined by Kusto and cannot be modified
by users).
MCT Summit 2021
Materialized views
51
The view expose an always up-to-date view of the defined aggregation.
Advantages:
• Performance improvement
• Freshness
• Cost reduction
Behind the scenes:
• Source table is periodically materialized into the view table
• During the query time, the view combines the materialized part with the DELTA in raw table since last
materialization to return complete results
MCT Summit 2021
QUERY AND PERFORMANCE OPTIMIZATION
52
• Materialized views
• Partitioning
• Query result caching
• Near real time scoring of AML and ONNX models
• FFT functions
• Geospatial
MCT Summit 2021
Query result caching
53
• Better query performance
• Lower resource consumption
• The queries needs to be identical
• The cache policy will be defined ùby MAX AGE
• Common use cases: DASHBOARD
MCT Summit 2021
Geospatial joins
55
• Use cases
• Connected mobility solutions
• Geospatial risk analysis
• Agriculture optimization using weather data
• Technical background
• Join of polygons reference data and geospatial timeseries data
• Based on three-dimensioanl S2 geometry
• Consists on a coarse-grained join using S2 cell coverage and exact
validation using geo_point_in_polygon function
MCT Summit 2021
ADX - VIEW
59
MCT Summit 2021
ADX Dashboards
60
• Integration in KUSTO
Web Explorer
• Optimized for big data
• Using powerful KQL to
retrieve visual data
• Make dynamic views
or widgets
MCT Summit 2021
Grafana query builder
61
• Create Grafana panels with no
KQL knowledge
• Select values/filter/grouping
using simple UI dropdowns
• Switch to RawMode to enhance
queries with KQL
MCT Summit 2021
How to use Grafana easily
62
Go to https://grafana.com/
Signup and get and Account
MCT Summit 2021
How to use Grafana easily
63
Go to All Plugins section, search ADX
Datasource and install plugin
MCT Summit 2021
How to use Grafana easily
64
Go to your grafana
https://<workbenchname>.grafana.net/datasources
And configure ADX datasource
And then Start building dashboards!
MCT Summit 2021
ADX - Orchestration
65
MCT Summit 2021
How about orchestration?
Three use cases in which FLOW + KUSTO are the solution
Push data to Power BI dataset
Periodically do queries, and
push to PowerBI dataset
Conditional queries
Make data checks, and send
notifications with no code
Email multiple ADX Flow charts
Send incredible emails with HTML5
Chart as query result
MCT Summit 2021
Orchestration?
Manage costs
Starting and stopping cluster,
evaluating a condition
Query sets to check data
Plan a Set of Queries in order
to say «IT’S OK, even Today
!»
Manage data retention
Based on dynamic condition
MCT Summit 2021
An Example of:
68
1. Set trigger 2. Connect and test ADX BLOCK 3. Configure Email BLOCK with dynamic params
MCT Summit 2021
And the result is:
69
MCT Summit 2021
ADX - INTEGRATION
70
MCT Summit 2021
Export
71
• To Storage
.export async compressed to csv (
h@"https://storage1.blob.core.windows.net/containerName;secretKey",
h@"https://storage1.blob.core.windows.net/containerName2;secretKey" ) with (
sizeLimit=100000, namePrefix=export, includeHeaders=all, encoding =UTF8NoBOM
) <| myLogs | where id == "moshe" | limit 10000
• To Sql
.export async to sql ['dbo.MySqlTable']
h@"Server=tcp:myserver.database.windows.net,1433;Database=MyDatabase;Auth
entication=Active Directory Integrated;Connection Timeout=30;" with
(createifnotexists="true", primarykey="Id") <| print Message = "Hello World!",
Timestamp = now(), Id=12345678
1. DEFINE COMMAND
Define ADX command and try your
recurrent export strategy
2. TRY IN EDITOR
Use an Editor to try command,
verifying conection strings and
parametrizing them
3. BUILD A JOB
Build a Notebook or a C# JOB using
the command as a SQL QUERY in
your CODE
MCT Summit 2021
External tables & Continuous Export
72
• It’s an external
endpoint:
• Azure Storage
• Azure Datalake Store
• SQL Server
• You need to define:
• Destination
• Continuous-Export
Strategy
EXT TABLE CREATION
.create external table ExternalAdlsGen2 (Timestamp:datetime, x:long,
s:string) kind=adl partition by bin(Timestamp, 1d) dataformat=csv (
h@'abfss://filesystem@storageaccount.dfs.core.windows.net/path;secretKey
' ) with ( docstring = "Docs", folder = "ExternalTables", namePrefix="Prefix" )
EXPORT to EXT TABLE
.create-or-alter continuous-export MyExport over (T) to table
ExternalAdlsGen2 with (intervalBetweenRuns=1h, forcedLatency=10m,
sizeLimit=104857600) <| T
MCT Summit 2021
My best experience
73
Open points
• How to extract insights, using dynamic and
codeless approach?
• Ho to integrate ADX with low cost DB
solutions?
MCT Summit 2021
My final ADX recipe
74
Blob
Storage RawTables
Refined
Tables
Triggered dynamic
check queries
Datalake (long term buckets)
SQL
DWH
Update
policy
External
table
Materialized
view
Batch
ingestion
External
table
MCT Summit 2021
ADX - Management
75
MCT Summit 2021
Data encryption in ADX
• encryption rest (using Azure Storage
• A Microsoft-managed key is used
• customer-managed keys can be enabled
• key rotation, temporary disable and revoke access controls can be
implemented.
• Soft Delete and Purge Protection will be enabled on the Key Vault and cannot
be disabled.
76
MCT Summit 2021
Extents, policies and Partition
• What are data shards or extents
• Column, segments, and blocks
• merge policy and sharding policy
• Data partitioning policy (post-ingestion)
77
MCT Summit 2021
FACTS:
A) Kusto stores its ingested data in reliable storage (most commonly Azure Blob Storage).
B) To speed-up queries on that data, Kusto caches this data (or parts of it) on its processing nodes,
The Kusto cache provides a granular cache policy that
customers can use to differentiate between two data
cache policies: hot data cache and cold data cache.
set query_datascope="hotcache";
T | union U | join (T datascope=all | where Timestamp < ago(365d) on X
YOU CAN SPECIFY WHICH LOCATION MUST BE USED
Cache policy
is independent
from retention
policy !
Retention policy
78
MCT Summit 2021
Retention policy
79
• Soft Delete Period (number)
• Data is available for query
ts is the ADX IngestionDate
• Default is set to 100 YEARS
• Recoverability (enabled/disabled)
• Default is set to ENABLED
• Recoverable for 14 days after deletion
.alter database DatabaseName policy retention "{}"
.alter table TableName policy retention "{}"
EXAMPLE:
{ "SoftDeletePeriod": "36500.00:00:00",
"Recoverability":"Enabled" }
.delete database DatabaseName policy retention
.delete table TableName policy retention
.alter-merge table MyTable1 policy retention softdelete = 7d
2 Parameters, applicable to DB or Table
MCT Summit 2021
Data Purge
80
The purge process is final and irreversible
PURGE PROCESS:
1. It requires database admin
permissions
2. Prior to Purging you have to
be ENABLED, opening a
SUPPORT TICKET.
3. Run purge QUERY, and
identify SIZE, EXEC.TIME and
give VerificationToken
4. Run REALLY purge QUERY
passing Verification Token
.purge table MyTable records in database MyDatabase <| where
CustomerId in ('X', 'Y')
NumRecordsToPurge
EstimatedPurge
ExecutionTime VerificationToken
1,596 00:00:02 e43c7184ed22f4f
23c7a9d7b124d19
6be2e570096987
e5baadf65057fa6
5736b
.purge table MyTable records in database MyDatabase with
(verificationtoken='e43c7184ed22f4f23c7a9d7b124d196be2e570
096987e5baadf65057fa65736b') <| where CustomerId in ('X', 'Y')
.purge table MyTable records
in database MyDatabase
with (noregrets='true')
2 STEP PROCESS 1 STEP PROCESS
With No Regrets !!!!
MCT Summit 2021
Virtual Network
81
BENEFITS
• USE NSG rules to limit traffic.
• Connect your on-premise network to Azure Data Explorer cluster's subnet.
• Secure your data connection sources (Event Hub and Event Grid) with service
endpoints.
VNET gives you TWO Independent IPs
• Private IP: access the cluster inside the VNet.
• Public IP: access the cluster from outside the VNet (management and monitoring) and as a source address
for outbound connections initiated from the cluster.
MCT Summit 2021
My experience
82
MCT Summit 2021
Enterprise readiness
83
• RLS
• Provides fine control of access to table data by different users
• Allow specifying user access to specific rows in tables
• Provides mechanics to mask PII data in tables
MCT Summit 2021
Leader and Follower
84
• Azure Data Share creates a symbolic link between two ADX cluster.
• Sharing occurs in near-real-time (no data pipeline)
• ADX Decouples the storage and compute
• Allows customers to run multiple compute (read-only) instances on the same underlying storage
• You can attach a database as a follower database, which is a read-only database on a remote cluster.
• You can share the data at the database level or at the cluster level.
The cluster sharing the database is the leader cluster and the
cluster receiving the share is the follower cluster.
A follower cluster can follow one or more leader cluster
databases. The follower cluster periodically synchronizes to
check for changes.
The queries running on the follower cluster use local cache
and don't use the resources of the leader cluster.
Q & A
MCT Summit 2021 96
MCT Summit 2021 97
Riccardo Zamana
Riccardo.Zamana@gmail.com
@riccardozamana
www.linkedin.com/in/riccardozamana/
Thank you.

More Related Content

What's hot

5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for AnalyticsJen Stirrup
 
DataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de KreukDataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de KreukErwin de Kreuk
 
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...Databricks
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azureDavid Giard
 
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...Microsoft Tech Community
 
Data weekender4.2 azure purview erwin de kreuk
Data weekender4.2  azure purview erwin de kreukData weekender4.2  azure purview erwin de kreuk
Data weekender4.2 azure purview erwin de kreukErwin de Kreuk
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeMSAdvAnalytics
 
Is there a way that we can build our Azure Data Factory all with parameters b...
Is there a way that we can build our Azure Data Factory all with parameters b...Is there a way that we can build our Azure Data Factory all with parameters b...
Is there a way that we can build our Azure Data Factory all with parameters b...Erwin de Kreuk
 
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataGetting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataQubole
 
Azure Purview Data Toboggan Erwin de Kreuk
Azure Purview Data Toboggan Erwin de KreukAzure Purview Data Toboggan Erwin de Kreuk
Azure Purview Data Toboggan Erwin de KreukErwin de Kreuk
 
Modern data warehouse with Azure
Modern data warehouse with AzureModern data warehouse with Azure
Modern data warehouse with AzureNilesh Gule
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannDatabricks
 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure DatabricksSascha Dittmann
 
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaAzure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaDatabricks
 
Integration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeIntegration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeTom Kerkhove
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Con LA
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationMatthew W. Bowers
 
Part 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapsePart 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapseNilesh Gule
 

What's hot (20)

5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics5 Comparing Microsoft Big Data Technologies for Analytics
5 Comparing Microsoft Big Data Technologies for Analytics
 
DataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de KreukDataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de Kreuk
 
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azure
 
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
 
Data weekender4.2 azure purview erwin de kreuk
Data weekender4.2  azure purview erwin de kreukData weekender4.2  azure purview erwin de kreuk
Data weekender4.2 azure purview erwin de kreuk
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data Lake
 
Is there a way that we can build our Azure Data Factory all with parameters b...
Is there a way that we can build our Azure Data Factory all with parameters b...Is there a way that we can build our Azure Data Factory all with parameters b...
Is there a way that we can build our Azure Data Factory all with parameters b...
 
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big DataGetting to 1.5M Ads/sec: How DataXu manages Big Data
Getting to 1.5M Ads/sec: How DataXu manages Big Data
 
Azure Purview Data Toboggan Erwin de Kreuk
Azure Purview Data Toboggan Erwin de KreukAzure Purview Data Toboggan Erwin de Kreuk
Azure Purview Data Toboggan Erwin de Kreuk
 
Modern data warehouse with Azure
Modern data warehouse with AzureModern data warehouse with Azure
Modern data warehouse with Azure
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
 
Adam azure presentation
Adam   azure presentationAdam   azure presentation
Adam azure presentation
 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure Databricks
 
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaAzure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
 
Integration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeIntegration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data Lake
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
 
Technical overview of Azure Cosmos DB
Technical overview of Azure Cosmos DBTechnical overview of Azure Cosmos DB
Technical overview of Azure Cosmos DB
 
Part 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapsePart 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure Synapse
 

Similar to ADX Deep Dive into Time Series Analytics

Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Riccardo Zamana
 
MACHBASE_NEO
MACHBASE_NEOMACHBASE_NEO
MACHBASE_NEOMACHBASE
 
Fom io t_to_bigdata_step_by_step-final
Fom io t_to_bigdata_step_by_step-finalFom io t_to_bigdata_step_by_step-final
Fom io t_to_bigdata_step_by_step-finalLuis Filipe Silva
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...Cisco DevNet
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BIKellyn Pot'Vin-Gorman
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptxAndrew Lamb
 
1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for releaseJen Stirrup
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개Ha-Yang(White) Moon
 
Building a Real-Time IoT monitoring application with Azure
Building a Real-Time IoT monitoring application with AzureBuilding a Real-Time IoT monitoring application with Azure
Building a Real-Time IoT monitoring application with AzureDavide Mauri
 
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Citus Data
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationDenodo
 
Flux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JSFlux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JSIvo Andreev
 
Logging, Metrics, and APM: The Operations Trifecta
Logging, Metrics, and APM: The Operations TrifectaLogging, Metrics, and APM: The Operations Trifecta
Logging, Metrics, and APM: The Operations TrifectaElasticsearch
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformDATAVERSITY
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Bostonkbajda
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsClusterpoint
 

Similar to ADX Deep Dive into Time Series Analytics (20)

Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...
 
NextGenML
NextGenML NextGenML
NextGenML
 
MACHBASE_NEO
MACHBASE_NEOMACHBASE_NEO
MACHBASE_NEO
 
Fom io t_to_bigdata_step_by_step-final
Fom io t_to_bigdata_step_by_step-finalFom io t_to_bigdata_step_by_step-final
Fom io t_to_bigdata_step_by_step-final
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개
 
Building a Real-Time IoT monitoring application with Azure
Building a Real-Time IoT monitoring application with AzureBuilding a Real-Time IoT monitoring application with Azure
Building a Real-Time IoT monitoring application with Azure
 
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
Architecting peta-byte-scale analytics by scaling out Postgres on Azure with ...
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
 
Flux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JSFlux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JS
 
Logging, Metrics, and APM: The Operations Trifecta
Logging, Metrics, and APM: The Operations TrifectaLogging, Metrics, and APM: The Operations Trifecta
Logging, Metrics, and APM: The Operations Trifecta
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Boston
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
 

More from Riccardo Zamana

Copilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdfCopilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdfRiccardo Zamana
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTORiccardo Zamana
 
Industrial iot: dalle parole ai fatti
Industrial iot: dalle parole ai fatti Industrial iot: dalle parole ai fatti
Industrial iot: dalle parole ai fatti Riccardo Zamana
 
Azure dayroma java, il lato oscuro del cloud
Azure dayroma   java, il lato oscuro del cloudAzure dayroma   java, il lato oscuro del cloud
Azure dayroma java, il lato oscuro del cloudRiccardo Zamana
 
Industrial Iot - IotSaturday
Industrial Iot - IotSaturday Industrial Iot - IotSaturday
Industrial Iot - IotSaturday Riccardo Zamana
 

More from Riccardo Zamana (7)

Copilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdfCopilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdf
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTO
 
Industrial iot: dalle parole ai fatti
Industrial iot: dalle parole ai fatti Industrial iot: dalle parole ai fatti
Industrial iot: dalle parole ai fatti
 
Azure dayroma java, il lato oscuro del cloud
Azure dayroma   java, il lato oscuro del cloudAzure dayroma   java, il lato oscuro del cloud
Azure dayroma java, il lato oscuro del cloud
 
Industrial Iot - IotSaturday
Industrial Iot - IotSaturday Industrial Iot - IotSaturday
Industrial Iot - IotSaturday
 
Azure reactive systems
Azure reactive systemsAzure reactive systems
Azure reactive systems
 
Industrial IoT on azure
Industrial IoT on azureIndustrial IoT on azure
Industrial IoT on azure
 

Recently uploaded

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Recently uploaded (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

ADX Deep Dive into Time Series Analytics

  • 1. MCT Summit 2021 Time series Analytics A deep dive into ADX Azure Data Explorer Riccardo Zamana 1
  • 2. MCT Summit 2021 • 20+ Experience in IT • 10+ Experience in IoT • 5+ Experience in Azure Projects 2
  • 3. MCT Summit 2021 ADX - Basics 3
  • 4. MCT Summit 2021 What is ADX for me, today 4 • A Telemetry data Search engine => ELK replacement • A TSDB envolved in LAMBDA replacements (as WARM path) => OSS LAMBDA (MinIO + Kafka) replacement • A Tool to Materialize data into ADLS & SQL • A Tool for monitoring, summarizing information and send notifications
  • 5. MCT Summit 2021 ADX Architecture 5 1. CONTEXT 2. SOURCES 3. INFRASTRUCTURE 4. DESTINATIONS
  • 6. MCT Summit 2021 ADX - Quickstart 6
  • 7. MCT Summit 2021 What is Azure Data Explorer Any append- only stream of records Relational query model: Filter, aggregate, join, calculated columns, … Fully- managed Rapid iterations to explore the data High volume High velocity High variance (structured, semi- structured, free-text) PaaS, Vanilla, Database Purposely built
  • 8. MCT Summit 2021 The role of ADX 8 Raw data DWH Refined data Real time derived data Data comparison and fast kpi ADX THREE KEY USERS IN ONE TOOL: • IoT Developer (data check, rule engine for insights) • Data engineer (data comparison) • Data scientist (data exploration)
  • 9. MCT Summit 2021 How ADX is Organized 11 INSTANCE DATABASE SOURCES DB Users/Apps Ingestion URL Querying URL Cache storage Blob storage EXTERNAL SOURCES EXTERNAL DESTINATIONS IotHUB EventHub Storage ADLS Sql Server MANY..
  • 10. MCT Summit 2021 ADX - Ingest 14
  • 11. MCT Summit 2021 FIRST PHASE: Ingestion 15 • Many connections & Plugins • Many SDKs • Many managed pipelines • Many tools to Ingest Rapidly Managed pipelines: • Ingest blob using EventGrid • Ingest Eventhub stream • Ingest IotHub stream • Ingest data from ADF Connections & Plugins: • Logstash plugin • Kafka Connector • Apache spark Connector Many SDK: • Python SDK • .NET SDK • Java SDK • Node SDK • REST API • GO API Tools: • One click ingestion • LightIngest
  • 12. MCT Summit 2021 Ingestion Types: 16 • Streaming ingestion: Optimized for low volume of data per table, over thousands of tables • Operation completes in under 10 seconds • Data available for query after completion • Batching ingestion: optimized for high ingestion throughput • Default batch params: 5 minutes, 500 items, or 1000MB
  • 13. MCT Summit 2021 Ingestion Tecniques 17 For high-volume, reliable, and cheap data ingestion Batch ingestion (provided by SDK) the client uploads the data to Azure Blob storage (designated by the Azure Data Explorer data management service) and posts a notification to an Azure Queue. Batch ingestion is the recommended technique. Most appropriate for exploration and prototyping .Inline ingestion (provided by query tools) Inline ingestion: control command (.ingest inline) containing in- band data is intended for ad hoc testing purposes. Ingest from query: control command (.set, .set-or-append, .set- or-replace) that points to query results is used for generating reports or small temporary tables. Ingest from storage: control command (.ingest into) with data stored externally (for example, Azure Blob Storage) allows efficient bulk ingestion of data.
  • 14. MCT Summit 2021 What is LightIngest 18 • command-line utility for ad-hoc data ingestion into Kusto • pull source data from a local folder • pull source data from an Azure Blob Storage container • Useful to ingest fastly and play with ADX • Most useful when you want to ingest a large amount of data, (time constraint on ingestion duration) [Ingest JSON data from blobs] LightIngest "https://adxclu001.kusto.windows.net;Federated=true" -database:db001 -table:LAB -sourcePath:"https://ACCOUNT_NAME.blob.core.windows.net/CONTAINER_NAME?SAS_TOKEN" -prefix:MyDir1/MySubDir2 -format:json -mappingRef:DefaultJsonMapping -pattern:*.json -limit:100 [Ingest CSV data with headers from local files] LightIngest "https://adxclu001.kusto.windows.net;Federated=true" -database:MyDb -table:MyTable -sourcePath:"D:MyFolderData" -format:csv -ignoreFirstRecord:true -mappingPath:"D:MyFolderCsvMapping.txt" -pattern:*.csv.gz -limit:100 REFERENCE: https://docs.microsoft.com/en-us/azure/kusto/tools/lightingest
  • 15. MCT Summit 2021 LightIngest: pay attention with Users! 19 Queued ingestion Direct ingestion IMPORTANT: the -creationTimePattern argument allows users to partition the data by creation time, not ingestion time
  • 16. MCT Summit 2021 LightIngest: pay attention with Users! 20 IMPORTANT: All the data is indexed but... How is partitioned???? By Ingestion TIME !!! the -creationTimePattern argument allows users to partition the data by creation time, not ingestion time [Ingest CSV data with headers from local files] LightIngest "https://adxclu001.kusto.windows.net;Federated=true" -database:MyDb -table:MyTable -sourcePath:"D:MyFolderData" -format:csv -ignoreFirstRecord:true -mappingPath:"D:MyFolderCsvMapping.txt" -pattern:*.csv.gz -limit:100 [Ingest JSON data from blobs] LightIngest "https://adxclu001.kusto.windows.net;Federated=true" -database:db001 -table:LAB -sourcePath: "https://ACCOUNT_NAME.blob.core.windows.net/CONTAIN ER_NAME?SAS_TOKEN" -prefix:MyDir1/MySubDir2 -format:json -mappingRef:DefaultJsonMapping -pattern:*.json -limit:100
  • 17. MCT Summit 2021 One Click ingestion GA 21 • One Click makes ingestion (intuitive UX) • Start ingesting data , creating tables and mapping structures • Different data formats • One-time or continuous ingestion FIRST: check your data, create and destroy tons of test tables
  • 18. MCT Summit 2021 Kafka Gold certified connector 22 • From apache Kafka cluster (on cloud or onprem) • Kafka to ingest data into ADX at scale • GOLD (Partner supported < Microsoft) What’s the VISION behind it?
  • 19. MCT Summit 2021 What is FluentBIT 23 • Collaboration with CNCF FluentBIT project • Multi platform Log Processor and Forwarder to collect data/logs from different sources • Unify and send to Block Blob • Ingest them into ADX using EventGrid • Can use AZURITE as a storageEndpoint for Simulation https://docs.microsoft.com/en-us/azure/storage/common/storage-use- azurite?toc=/azure/storage/blobs/toc.json
  • 20. MCT Summit 2021 Ingestion: Format & UseCases 24 • Ingest data using native formats: ApacheAvro, CSV (RFC4180), JSON, MultiJSON (jsonLine), ORC, Parquet, PSV, SCSV, TSV, TXT • Files/Blobs can be compressed: ZIP, GZIP • Better to use declarative names: MyData.csv.zip, MyData.json.gz
  • 21. MCT Summit 2021 Supported data formats 25 For all ingestion methods other than ingest from query, format the data so that Azure Data Explorer can parse it. The supported data formats are: • CSV, TSV, TSVE, PSV, SCSV, SOH • JSON (line-separated, multi-line), Avro • ZIP and GZIP Schema mapping helps bind source data fields to destination table columns. • CSV Mapping (optional) works with all ordinal-based formats. It can be performed using the ingest command parameter or pre-created on the table and referenced from the ingest command parameter. • JSON Mapping (mandatory) and Avro mapping (mandatory) can be performed using the ingest command parameter. They can also be pre-created on the table and referenced from the ingest command parameter.
  • 22. MCT Summit 2021 My ingestion best experience 26 Open points: • Why EventHub after IotHub? • Why the second EventHub?
  • 23. MCT Summit 2021 ADX - TOOLS 27
  • 24. MCT Summit 2021 How about the Tools? 28 3.VISUALIZE • Notebooks • Power BI • Graphana • ADX WEB UI 2.QUERY • Kusto.Explorer • Web UI 4.ORCHESTRATE • Microsoft Flow • Microsoft Logic App 1.LOAD • LightIngest • Azure Data Factory Load Query Visualize Orchestrate BI People IT People ML People
  • 25. MCT Summit 2021 Azure data studio plugins: 29 Manager Cluster Manager Notebooks 1. Select New connection from the Connections pane. 2. Fill in the Connection Details information. 3. For Connection type , select Kusto. 4. For Cluster , enter in your Azure Data Explorer cluster. 5. (When entering the cluster name, don't include the https:// prefix or a trailing /) 6. For Authentication Type , use the default - Azure Active Directory - Universal with MFA account. 7. For Account , use your account information. 8. For Database , use Default. 9. For Server Group , use Default. 10. For Name (optional) , leave blank.
  • 26. MCT Summit 2021 Azure data studio plugins: 30 • Filter/View data • Build 3D Charts • Take snapshot as JSON declarative file
  • 27. MCT Summit 2021 Notebooks + ADX = KQL Magic 32 KQL magic: https://github.com/microsoft/jupyter-Kqlmagic • extends the capabilities of the Python kernel in Jupyter • can run Kusto language queries natively • combine Python and Kusto query language
  • 28. MCT Summit 2021 A critical perspective 34
  • 29. MCT Summit 2021 Which are the OSS Alternatives that we should compare with? 35 From db-engines.com Azure Data Explorer Fully managed big data interactive analytics platform Elastic Search A distributed, RESTful modern search and analytics engine ADX can be a replacement for search and log analytics engines such as Elasticsearch, Splunk, InfluxDB. Splunk real-time insights Engine to boost productivity & security. InfluxDB DBMS for storing time series, events and metrics Vs
  • 30. MCT Summit 2021 Comparison chart 36 Name Elasticsearch (ELASTIC) InfluxDB (InfluxData Inc.) Azure Data Explorer (Microsoft) Splunk (Splunk Inc.) Description A distributed, RESTful modern search and analytics engine based on Apache Lucene DBMS for storing time series, events and metrics Fully managed big data interactive analytics platform Analytics Platform for Big Data Database models Search engine, Document store Time Series DBMS Time Series DBMS, Search engine, Document store , Event Store, Relational DBMS Search engine Initial release 2010 2013 2019 2003 License Open Source Open Source commercial commercial Cloud-based only no no yes no Implementation language Java Go Server operating systems All OS with a Java VM Linux, OS X hosted Linux, OS X, Solaris, Windows Data scheme schema-free schema-free Fixed schema with schema-less datatypes (dynamic) yes Typing yes Numeric data and Strings yes yes XML support no no yes yes Secondary indexes yes no all fields are automatically indexed yes SQL SQL-like query language SQL-like query language Kusto Query Language (KQL), SQL subset no APIs and other access methods RESTful HTTP/JSON API HTTP API RESTful HTTP API HTTP REST Java API JSON over UDP Microsoft SQL Server communication protocol (MS-TDS) Supported programming languages .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python Ruby, PHP, Perl, Groovy, Community Contributed Clients R,Ruby,PHP,Perl,Haskell,Clojure,Erlang,Go,Lisp ,Rust,Scala R, PowerShell Ruby, PHP Server-side scripts yes no Yes, possible languages: KQL, Python, R yes Triggers yes no yes yes Partitioning methods Sharding Sharding Sharding Sharding Replication methods yes selectable replication factor yes Master-master replication MapReduce ES-Hadoop Connector no no yes Consistency concepts Eventual Consistency Eventual Consistency Eventual Consistency Immediate Consistency Foreign keys no no no no Transaction concepts no no no no Concurrency yes yes yes yes Durability yes yes yes yes In-memory capabilities Memcached and Redis integration yes no no User concepts simple rights management via user accounts Azure Active Directory Authentication Access rights for users and roles
  • 31. MCT Summit 2021 Update Policy 37 Automatically append data to a target table whenever new data is inserted into the source table, based on a transformation query that runs on the data inserted into the source table. USE IT IF: • The source table is as a «free-text column based» • The target table accepts only specific morphology Cascading updates are allowed (TableA → TableB → TableC → ...). Raw table Refined table
  • 32. MCT Summit 2021 How to use Update Policy 38 // Create a function that will be used for update .create function MyUpdateFunction() { MyTableX | where ColumnA == 'some-string' | summarize MyCount=count() by ColumnB, Key=ColumnC | join (OtherTable | project OtherColumnZ, Key=OtherColumnC) on Key | project ColumnB, ColumnZ=OtherColumnZ, Key, MyCount } // Create the target table (if it doesn't already exist) .set-or-append DerivedTableX <| MyUpdateFunction() | limit 0 // Use update policy on table DerivedTableX .alter table DerivedTableX policy update @'[{"IsEnabled": true, "Source": "MyTableX", "Query": "MyUpdateFunction()", "IsTransactional": false, "PropagateIngestionProperties": false}]'
  • 33. MCT Summit 2021 Pay attention to failures! 39 Evaluate resource usage .show table MySourceTable extents; // The following line provides the extent ID for the not-yet-merged extent in the source table which has the most records let extentId = $command_results | where MaxCreatedOn > ago(1hr) and MinCreatedOn == MaxCreatedOn | top 1 by RowCount desc | project ExtentId; let MySourceTable = MySourceTable | where extent_id() == toscalar(extentId); MyFunction() Failures .show ingestion failures | where FailedOn > ago(1hr) and OriginatesFromUpdatePolicy == true • Non-transactional policy: ignored • Transactional policy: If the ingestion method is pull => automated retry on the entire ingestion operation (max time) SO: You should check failures to trigger «BROKEN FILES» … but HOW?
  • 34. MCT Summit 2021 Use this pattern 40 First table is NEVER wide!! … but YES for the second! First table schema is K,V,TS,Metadata Second table schema is WT (Wide Table) Telemetry oriented ML oriented
  • 35. MCT Summit 2021 FUNCTION3 FUNCTION2 FUNCTION1 My personal approach 41 DATA FUNCTION3.1 FUNCTION3.2 FUNCTION3.3 KPI DEFINITION KPI DEFINITION KPI DEFINITION KPI DEFINITION DASHBOARD (use KPI to embed and filter them)
  • 36. MCT Summit 2021 ADX - Query 42
  • 37. MCT Summit 2021 Some code Examples 43 Query with between Function with parameters «ToScalar» expression «Extend» usage
  • 38. MCT Summit 2021 Kusto for SQL USers 44 • Perform SQL SELECT (no DDL, only SELECT) • Use KQL (Kusto Query Language) • Supports translating T-SQL queries to Kusto query language -- explain select top(10) * from StormEvents order by DamageProperty desc StormEvents | sort by DamageProperty desc nulls first | take 10
  • 39. MCT Summit 2021 Language examples 45 Alias database["wiki"] = cluster("https://somecluster.kusto.windows.net:443").da tabase("somedatabase"); database("wiki").PageViews | count Let start = ago(5h); let period = 2h; T | where Time > start and Time < start + period | ... Bin: T | summarize Hits=count() by bin(Duration, 1s) Batch: let m = materialize(StormEvents | summarize n=count() by State); m | where n > 2000; m | where n < 10 Tabular expression: Logs | where Timestamp > ago(1d) | join ( Events | where continent == 'Europe' ) on RequestId
  • 40. MCT Summit 2021 Time Series Analysis – Bin Operator 46 T | summarize Hits=count() by bin(Duration, 1s) bin(value,roundTo) bin operator Rounds values down to an integer multiple of a given bin size. If you have a scattered set of values, they will be grouped into a smaller set of specific values. [Rule] [Example]
  • 41. MCT Summit 2021 Time Series Analysis – Make Series Operator 47 T | make-series sum(amount) default=0, avg(price) default=0 on timestamp from datetime(2016-01-01) to datetime(2016-01-10) step 1d by supplier T | make-series [MakeSeriesParamters] [Column =] Aggregation [default = DefaultValue] [, ...] on AxisColumn from start to end step step [by [Column =] GroupExpression [, ...]] make-series operator [Rule] [Example]
  • 42. MCT Summit 2021 Time Series Analysis – Basket Operator 48 StormEvents | where monthofyear(StartTime) == 5 | extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO") | project State, EventType, Damage, DamageCrops | evaluate basket(0.2) basket operator Basket finds all frequent patterns of discrete attributes (dimensions) in the data and will return all frequent patterns that passed the frequency threshold in the original query. [Rule] [Example] T | evaluate basket([Threshold, WeightColumn, MaxDimensions, CustomWildcard, CustomWildcard, ...])
  • 43. MCT Summit 2021 Time Series Analysis – Autocluster Operator 49 StormEvents | where monthofyear(StartTime) == 5 | extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO") | project State , EventType , Damage | evaluate autocluster(0.6) autocluster operator AutoCluster finds common patterns of discrete attributes (dimensions) in the data and will reduce the results of the original query (whether it's 100 or 100k rows) to a small number of patterns. [Rule] [Example] T | evaluate autocluster([SizeWeight, WeightColumn, NumSeeds, CustomWildcard, CustomWildcard, ...]) StormEvents | where monthofyear(StartTime) == 5 | extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO") | project State , EventType , Damage | evaluate autocluster(0.2, '~', '~', '*')
  • 44. MCT Summit 2021 ADX Functions 50 Functions are reusable queries or query parts. Kusto supports several kinds of functions: • Stored functions, which are user-defined functions that are stored and managed a one kind of a database's schema entities. See Stored functions. • Query-defined functions, which are user-defined functions that are defined and used within the scope of a single query. The definition of such functions is done through a let statement. See User-defined functions. • Built-in functions, which are hard-coded (defined by Kusto and cannot be modified by users).
  • 45. MCT Summit 2021 Materialized views 51 The view expose an always up-to-date view of the defined aggregation. Advantages: • Performance improvement • Freshness • Cost reduction Behind the scenes: • Source table is periodically materialized into the view table • During the query time, the view combines the materialized part with the DELTA in raw table since last materialization to return complete results
  • 46. MCT Summit 2021 QUERY AND PERFORMANCE OPTIMIZATION 52 • Materialized views • Partitioning • Query result caching • Near real time scoring of AML and ONNX models • FFT functions • Geospatial
  • 47. MCT Summit 2021 Query result caching 53 • Better query performance • Lower resource consumption • The queries needs to be identical • The cache policy will be defined ùby MAX AGE • Common use cases: DASHBOARD
  • 48. MCT Summit 2021 Geospatial joins 55 • Use cases • Connected mobility solutions • Geospatial risk analysis • Agriculture optimization using weather data • Technical background • Join of polygons reference data and geospatial timeseries data • Based on three-dimensioanl S2 geometry • Consists on a coarse-grained join using S2 cell coverage and exact validation using geo_point_in_polygon function
  • 49. MCT Summit 2021 ADX - VIEW 59
  • 50. MCT Summit 2021 ADX Dashboards 60 • Integration in KUSTO Web Explorer • Optimized for big data • Using powerful KQL to retrieve visual data • Make dynamic views or widgets
  • 51. MCT Summit 2021 Grafana query builder 61 • Create Grafana panels with no KQL knowledge • Select values/filter/grouping using simple UI dropdowns • Switch to RawMode to enhance queries with KQL
  • 52. MCT Summit 2021 How to use Grafana easily 62 Go to https://grafana.com/ Signup and get and Account
  • 53. MCT Summit 2021 How to use Grafana easily 63 Go to All Plugins section, search ADX Datasource and install plugin
  • 54. MCT Summit 2021 How to use Grafana easily 64 Go to your grafana https://<workbenchname>.grafana.net/datasources And configure ADX datasource And then Start building dashboards!
  • 55. MCT Summit 2021 ADX - Orchestration 65
  • 56. MCT Summit 2021 How about orchestration? Three use cases in which FLOW + KUSTO are the solution Push data to Power BI dataset Periodically do queries, and push to PowerBI dataset Conditional queries Make data checks, and send notifications with no code Email multiple ADX Flow charts Send incredible emails with HTML5 Chart as query result
  • 57. MCT Summit 2021 Orchestration? Manage costs Starting and stopping cluster, evaluating a condition Query sets to check data Plan a Set of Queries in order to say «IT’S OK, even Today !» Manage data retention Based on dynamic condition
  • 58. MCT Summit 2021 An Example of: 68 1. Set trigger 2. Connect and test ADX BLOCK 3. Configure Email BLOCK with dynamic params
  • 59. MCT Summit 2021 And the result is: 69
  • 60. MCT Summit 2021 ADX - INTEGRATION 70
  • 61. MCT Summit 2021 Export 71 • To Storage .export async compressed to csv ( h@"https://storage1.blob.core.windows.net/containerName;secretKey", h@"https://storage1.blob.core.windows.net/containerName2;secretKey" ) with ( sizeLimit=100000, namePrefix=export, includeHeaders=all, encoding =UTF8NoBOM ) <| myLogs | where id == "moshe" | limit 10000 • To Sql .export async to sql ['dbo.MySqlTable'] h@"Server=tcp:myserver.database.windows.net,1433;Database=MyDatabase;Auth entication=Active Directory Integrated;Connection Timeout=30;" with (createifnotexists="true", primarykey="Id") <| print Message = "Hello World!", Timestamp = now(), Id=12345678 1. DEFINE COMMAND Define ADX command and try your recurrent export strategy 2. TRY IN EDITOR Use an Editor to try command, verifying conection strings and parametrizing them 3. BUILD A JOB Build a Notebook or a C# JOB using the command as a SQL QUERY in your CODE
  • 62. MCT Summit 2021 External tables & Continuous Export 72 • It’s an external endpoint: • Azure Storage • Azure Datalake Store • SQL Server • You need to define: • Destination • Continuous-Export Strategy EXT TABLE CREATION .create external table ExternalAdlsGen2 (Timestamp:datetime, x:long, s:string) kind=adl partition by bin(Timestamp, 1d) dataformat=csv ( h@'abfss://filesystem@storageaccount.dfs.core.windows.net/path;secretKey ' ) with ( docstring = "Docs", folder = "ExternalTables", namePrefix="Prefix" ) EXPORT to EXT TABLE .create-or-alter continuous-export MyExport over (T) to table ExternalAdlsGen2 with (intervalBetweenRuns=1h, forcedLatency=10m, sizeLimit=104857600) <| T
  • 63. MCT Summit 2021 My best experience 73 Open points • How to extract insights, using dynamic and codeless approach? • Ho to integrate ADX with low cost DB solutions?
  • 64. MCT Summit 2021 My final ADX recipe 74 Blob Storage RawTables Refined Tables Triggered dynamic check queries Datalake (long term buckets) SQL DWH Update policy External table Materialized view Batch ingestion External table
  • 65. MCT Summit 2021 ADX - Management 75
  • 66. MCT Summit 2021 Data encryption in ADX • encryption rest (using Azure Storage • A Microsoft-managed key is used • customer-managed keys can be enabled • key rotation, temporary disable and revoke access controls can be implemented. • Soft Delete and Purge Protection will be enabled on the Key Vault and cannot be disabled. 76
  • 67. MCT Summit 2021 Extents, policies and Partition • What are data shards or extents • Column, segments, and blocks • merge policy and sharding policy • Data partitioning policy (post-ingestion) 77
  • 68. MCT Summit 2021 FACTS: A) Kusto stores its ingested data in reliable storage (most commonly Azure Blob Storage). B) To speed-up queries on that data, Kusto caches this data (or parts of it) on its processing nodes, The Kusto cache provides a granular cache policy that customers can use to differentiate between two data cache policies: hot data cache and cold data cache. set query_datascope="hotcache"; T | union U | join (T datascope=all | where Timestamp < ago(365d) on X YOU CAN SPECIFY WHICH LOCATION MUST BE USED Cache policy is independent from retention policy ! Retention policy 78
  • 69. MCT Summit 2021 Retention policy 79 • Soft Delete Period (number) • Data is available for query ts is the ADX IngestionDate • Default is set to 100 YEARS • Recoverability (enabled/disabled) • Default is set to ENABLED • Recoverable for 14 days after deletion .alter database DatabaseName policy retention "{}" .alter table TableName policy retention "{}" EXAMPLE: { "SoftDeletePeriod": "36500.00:00:00", "Recoverability":"Enabled" } .delete database DatabaseName policy retention .delete table TableName policy retention .alter-merge table MyTable1 policy retention softdelete = 7d 2 Parameters, applicable to DB or Table
  • 70. MCT Summit 2021 Data Purge 80 The purge process is final and irreversible PURGE PROCESS: 1. It requires database admin permissions 2. Prior to Purging you have to be ENABLED, opening a SUPPORT TICKET. 3. Run purge QUERY, and identify SIZE, EXEC.TIME and give VerificationToken 4. Run REALLY purge QUERY passing Verification Token .purge table MyTable records in database MyDatabase <| where CustomerId in ('X', 'Y') NumRecordsToPurge EstimatedPurge ExecutionTime VerificationToken 1,596 00:00:02 e43c7184ed22f4f 23c7a9d7b124d19 6be2e570096987 e5baadf65057fa6 5736b .purge table MyTable records in database MyDatabase with (verificationtoken='e43c7184ed22f4f23c7a9d7b124d196be2e570 096987e5baadf65057fa65736b') <| where CustomerId in ('X', 'Y') .purge table MyTable records in database MyDatabase with (noregrets='true') 2 STEP PROCESS 1 STEP PROCESS With No Regrets !!!!
  • 71. MCT Summit 2021 Virtual Network 81 BENEFITS • USE NSG rules to limit traffic. • Connect your on-premise network to Azure Data Explorer cluster's subnet. • Secure your data connection sources (Event Hub and Event Grid) with service endpoints. VNET gives you TWO Independent IPs • Private IP: access the cluster inside the VNet. • Public IP: access the cluster from outside the VNet (management and monitoring) and as a source address for outbound connections initiated from the cluster.
  • 72. MCT Summit 2021 My experience 82
  • 73. MCT Summit 2021 Enterprise readiness 83 • RLS • Provides fine control of access to table data by different users • Allow specifying user access to specific rows in tables • Provides mechanics to mask PII data in tables
  • 74. MCT Summit 2021 Leader and Follower 84 • Azure Data Share creates a symbolic link between two ADX cluster. • Sharing occurs in near-real-time (no data pipeline) • ADX Decouples the storage and compute • Allows customers to run multiple compute (read-only) instances on the same underlying storage • You can attach a database as a follower database, which is a read-only database on a remote cluster. • You can share the data at the database level or at the cluster level. The cluster sharing the database is the leader cluster and the cluster receiving the share is the follower cluster. A follower cluster can follow one or more leader cluster databases. The follower cluster periodically synchronizes to check for changes. The queries running on the follower cluster use local cache and don't use the resources of the leader cluster.
  • 75. Q & A MCT Summit 2021 96
  • 76. MCT Summit 2021 97 Riccardo Zamana Riccardo.Zamana@gmail.com @riccardozamana www.linkedin.com/in/riccardozamana/ Thank you.

Editor's Notes

  1. CONTEXT SOURCES INFRASTRUCTURE DESTINATIONS
  2. EXAMPLE of QUEUED INGESTION https://docs.microsoft.com/en-us/azure/kusto/api/netfx/kusto-ingest-queued-ingest-sample Example of INLINE INGESTION https://docs.microsoft.com/it-it/azure/kusto/management/data-ingestion/ingest-inline
  3. DEMO: LightIngest is a command-line utility for ad-hoc data ingestion into Azure Data Explorer (ADX). The utility can pull source data from a local folder or from an Azure blob storage container. LightIngest is most useful when you want to ingest a large amount of data, because there is no time constraint on ingestion duration. When historical data is loaded from existing system to ADX, all records receive the same the ingestion date. Now, the -creationTimePattern argument allows users to partition the data by creation time, not ingestion time. The -creationTimePattern argument extracts the CreationTime property from the file or blob path; the pattern doesn't need to reflect the entire item path, just the section enclosing the timestamp you want.
  4. LightIngest is a command-line utility for ad-hoc data ingestion into Azure Data Explorer (ADX). The utility can pull source data from a local folder or from an Azure blob storage container. LightIngest is most useful when you want to ingest a large amount of data, because there is no time constraint on ingestion duration. When historical data is loaded from existing system to ADX, all records receive the same the ingestion date. Now, the -creationTimePattern argument allows users to partition the data by creation time, not ingestion time. The -creationTimePattern argument extracts the CreationTime property from the file or blob path; the pattern doesn't need to reflect the entire item path, just the section enclosing the timestamp you want.
  5. LightIngest is a command-line utility for ad-hoc data ingestion into Azure Data Explorer (ADX). The utility can pull source data from a local folder or from an Azure blob storage container. LightIngest is most useful when you want to ingest a large amount of data, because there is no time constraint on ingestion duration. When historical data is loaded from existing system to ADX, all records receive the same the ingestion date. Now, the -creationTimePattern argument allows users to partition the data by creation time, not ingestion time. The -creationTimePattern argument extracts the CreationTime property from the file or blob path; the pattern doesn't need to reflect the entire item path, just the section enclosing the timestamp you want.
  6. Fai vedere NOTEBOOKS
  7. PROVA SU VSC CTRL+P => kuskus Poi: cluster(adxclu001).database('db001').table('TBL_LAB01') | count
  8. https://notebooks.azure.com/riccardo-zamana/projects/azuresaturday2019
  9. instructs Kusto to automatically append data to a target table whenever new data is inserted into the source table, based on a transformation query that runs on the data inserted into the source table. The query can invoke stored functions, but can't include cross-database or cross-cluster queries. Update policy is initiated following ingestion Update policies take effect when data is ingested or moved to (extents are created in) a defined source table using any of the following commands: The update policy will behave like regular ingestion when the following conditions are met: The source table is a high-rate trace table with interesting data formatted as a free-text column. The target table on which the update policy is defined accepts only specific trace lines. The table has a well-structured schema that is a transformation of the original free-text data created by the parse operator.
  10. Se usi HAS usi indici, contains non usa indici Spiega come fai i between
  11. FARE UN PO DI PROVE FARE la distinct per introdurre la SUMMERIZE
  12. T | summarize Hits=count() by bin(Duration, 1s)
  13. .create-or-alter function with (folder = "AzureSaturday2019", docstring = "Func1", skipvalidation = "true") MyFunction1(i:long) {TBL_LAB0X | limit 100 | where minimum_nights > i} MyFunction1(80); explain SELECT name, minimum_nights from TBL_LAB0X .create-or-alter function with (folder = "AzureSaturday2019", docstring = "Func1", skipvalidation = "true") MyFunction1(i:long) {TBL_LAB0X | project name, minimum_nights | limit 100 | where minimum_nights > i | render columnchart} MyFunction1(80);
  14. FAI ESEMPIO DI EXPORT
  15. Kusto is built to support tables with a huge number of records (rows) and large amounts of data. To handle such large tables, each table's data is divided into smaller "tablets" called data shards or extents (the two terms are synonymous). The union of all the table's extents holds the table's data. Individual extents are kept smaller than a single node's capacity, and the extents are spread over the cluster's nodes, achieving scale-out. An extent is a like a type of mini-table. It contains data and metadata and information such as its creation time and optional tags that are associated with its data. Additionally, the extent usually holds information that lets Kusto query the data efficiently. For example, an index for each column of data in the extent, and an encoding dictionary, if column data is encoded. As a result, the table's data is the union of all the data in the table's extents. Extents are immutable and can never be modified. It may only be queried, reassigned to a different node, or dropped out of the table. Data modification happens by creating one or more new extents and transactionally swapping old extents with new ones. Extents hold a collection of records that are physically arranged in columns. This technique is called columnar store. It enables efficient encoding and compression of the data, because different values from the same column often "resemble" each other. It also makes querying large spans of data more efficient, because only the columns used by the query need to be loaded. Internally, each column of data in the extent is subdivided into segments, and the segments into blocks. This division isn't observable to queries, and lets Kusto optimize column compression and indexing. To maintain query efficiency, smaller extents are merged into larger extents. The merge is done automatically, as a background process, according to the configured merge policy and sharding policy. Merging extents reduces the management overhead of having a large number of extents to track. More importantly, it allows Kusto to optimize its indexes and improve compression. Extent merging stops once an extent reaches certain limits, such as size, since beyond a certain point, merging reduces rather than increases efficiency. When a Data partitioning policy is defined on a table, extents go through another background process after they're created (post-ingestion). This process reingests the data from the source extents and creates homogeneous extents, in which the values of the column that is the table's partition key all belong to the same partition. If the policy includes a hash partition key, all homogeneous extents that belong to the same partition will be assigned to the same data node in the cluster.
  16. Azure Data Share creates a symbolic link between two ADX cluster. Sharing occurs in near-real-time (no data pipeline) ADX Decouples the storage and compute Allows customers to run multiple compute (read-only) instances on the same underlying storage You can attach a database as a follower database, which is a read-only database on a remote cluster. You can share the data at the database level or at the cluster level. The cluster sharing the database is the leader cluster and the cluster receiving the share is the follower cluster. A follower cluster can follow one or more leader cluster databases. The follower cluster periodically synchronizes to check for changes. The queries running on the follower cluster use local cache and don't use the resources of the leader cluster.