SlideShare a Scribd company logo
1 of 79
Data Saturday #2 Guatemala 2021
Time series Analytics
A deep dive into ADX Azure Data Explorer
Riccardo Zamana
Riccardo.Zamana@gmail.com
@riccardozamana
www.linkedin.com/in/riccardozamana/
1
Data Saturday #2 Guatemala 2021
• 20+ Experience in IT
• 10+ Experience in IoT
• 5+ Experience in Azure Projects
2
Data Saturday #2 Guatemala 2021
ADX - Basics
3
Data Saturday #2 Guatemala 2021
What is ADX for me, today
4
• A Telemetry data Search engine => ELK replacement
• A TSDB envolved in LAMBDA replacements (as WARM path) => OSS LAMBDA (MinIO +
Kafka) replacement
• A Tool to Materialize data into ADLS & SQL
• A Tool for monitoring, summarizing information and send notifications
Data Saturday #2 Guatemala 2021
ADX Architecture
5
1. CONTEXT
2. SOURCES
3. INFRASTRUCTURE
4. DESTINATIONS
Data Saturday #2 Guatemala 2021
ADX - Quickstart
6
Data Saturday #2 Guatemala 2021
Create a Cluster
7
ADX follows standard creation process
• Azure CLI
• Powershell
• C#
• Python
• ARM
Login
az login
Select Subscription
az account set --subscription MyAzureSub
Cluster creation
az kusto cluster create --name azureclitest --sku Standard_D11_v2 --
resource-group testrg
Database Creation
az kusto database create --cluster-name azureclitest --name clidatabase
--resource-group testrg --soft-delete-period P365D --hot-cache-period
P31D
HOT-CACHE-PERIOD: Amount of time that data should be kept in cache.
Duration in ISO8601 format (for example, 100 days would be P100D).
SOFT-DELETE-PERIOD: Amount of time that data should be kept so it is available to query.
Duration in ISO8601 format (for example, 100 days would be P100D)
My favorite is Azure CLI:
Data Saturday #2 Guatemala 2021
How to set and use ADX?
8
1. Create a database
2. Use Database to link Ingestion Sources
3. Configure DataConnections
Blob Storage | IotHub | EventHub (+ EventGrid)
4. Pay attention to
1. Have clear concept PRINCIPALS
& AAD App registrations.
2. BE FRIEND of «JSON LINER»
TOOLS
3. BE FRIEND of KUSTO & PYTHON
Data Saturday #2 Guatemala 2021
How ADX is Organized
9
ISTANCE DATABASE SOURCES
DB Users/Apps
Ingestion URL
Querying URL
Cache storage
Blob storage
EXTERNAL
SOURCES
EXTERNAL
DESTINATIONS
IotHUB
EventHub
Storage
ADLS
Sql Server
MANY..
Data Saturday #2 Guatemala 2021
ADX - Ingest
10
Data Saturday #2 Guatemala 2021
FIRST PHASE: Ingestion
11
• Many connections & Plugins
• Many SDKs
• Many managed pipelines
• Many tools to Ingest Rapidly
Managed pipelines:
• Ingest blob using EventGrid
• Ingest Eventhub stream
• Ingest IotHub stream
• Ingest data from ADF
Connections & Plugins:
• Logstash plugin
• Kafka Connector
• Apache spark Connector
Many SDK:
• Python SDK
• .NET SDK
• Java SDK
• Node SDK
• REST API
• GO API
Tools:
• One click ingestion
• LightIngest
Data Saturday #2 Guatemala 2021
Ingestion Types:
• Streaming ingestion: Optimized for low volume of data per table,
over thousands of tables
• Operation completes in under 10 seconds
• Data available for query after completion
• Batching ingestion: optimized for high ingestion throughput
• Default batch params: 5 minutes, 500 items, or 1000MB
12
Data Saturday #2 Guatemala 2021
What is LightIngest
13
• command-line utility for ad-hoc data
ingestion into Kusto
• pull source data from a local folder
• pull source data from an Azure Blob
Storage container
• Useful to ingest fastly and play with
ADX
• Most useful when you want to ingest a
large amount of data, (time constraint
on ingestion duration)
[Ingest JSON data from blobs]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:db001
-table:LAB
-sourcePath:"https://ACCOUNT_NAME.blob.core.windows.net/CONTAINER_NAME?SAS_TOKEN"
-prefix:MyDir1/MySubDir2
-format:json
-mappingRef:DefaultJsonMapping
-pattern:*.json
-limit:100
[Ingest CSV data with headers from local files]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:MyDb
-table:MyTable
-sourcePath:"D:MyFolderData"
-format:csv
-ignoreFirstRecord:true
-mappingPath:"D:MyFolderCsvMapping.txt"
-pattern:*.csv.gz
-limit:100
REFERENCE:
https://docs.microsoft.com/en-us/azure/kusto/tools/lightingest
Data Saturday #2 Guatemala 2021
LightIngest: pay attention with Users!
14
Queued ingestion
Direct ingestion
IMPORTANT: the -creationTimePattern argument allows users to partition the data by creation time, not ingestion time
Data Saturday #2 Guatemala 2021
LightIngest: pay attention with Users!
15
IMPORTANT:
All the data is indexed but... How is partitioned???? By Ingestion TIME !!!
the -creationTimePattern argument allows users to partition the data by creation time, not ingestion time
[Ingest CSV data with headers from local files]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:MyDb
-table:MyTable
-sourcePath:"D:MyFolderData"
-format:csv
-ignoreFirstRecord:true
-mappingPath:"D:MyFolderCsvMapping.txt"
-pattern:*.csv.gz
-limit:100
[Ingest JSON data from blobs]
LightIngest "https://adxclu001.kusto.windows.net;Federated=true"
-database:db001
-table:LAB
-sourcePath:
"https://ACCOUNT_NAME.blob.core.windows.net/CONTAIN
ER_NAME?SAS_TOKEN"
-prefix:MyDir1/MySubDir2
-format:json
-mappingRef:DefaultJsonMapping
-pattern:*.json
-limit:100
Data Saturday #2 Guatemala 2021
One Click ingestion GA
• One Click makes ingestion (intuitive UX)
• Start ingesting data , creating tables and mapping structures
• Different data formats
• One-time or continuous ingestion
* SOON
• Create Table
• Support all data formats
• Ingest from eventhub
16
Data Saturday #2 Guatemala 2021
Ingestion: Format & UseCases
• Ingest data using native formats: ApacheAvro, CSV (RFC4180),
JSON, MultiJSON (jsonLine), ORC, Parquet, PSV, SCSV, TSV, TXT
• Files/Blobs can be compressed: ZIP, GZIP
• Better to use declarative names: MyData.csv.zip, MyData.json.gz
17
Data Saturday #2 Guatemala 2021
Kafka Gold certified connector
• From apache Kafka
cluster (on cloud or
onprem)
• Kafka to ingest data
into ADX at scale
• GOLD (Partner
supported <
Microsoft)
What’s the VISION behind it?
18
Data Saturday #2 Guatemala 2021
What is FluentBIT
• Collaboration with CNCF FluentBIT project
• Multi platform Log Processor and Forwarder to collect
data/logs from different sources
• Unify and send to Block Blob
• Ingest them into ADX using EventGrid
• Can use AZURITE as a storageEndpoint for Simulation
19
https://docs.microsoft.com/en-us/azure/storage/common/storage-use-
azurite?toc=/azure/storage/blobs/toc.json
Data Saturday #2 Guatemala 2021
Ingestion Tecniques
For high-volume, reliable, and
cheap data ingestion
Batch ingestion
(provided by SDK)
the client uploads the data to Azure Blob
storage (designated by the Azure Data
Explorer data management service) and
posts a notification to an Azure Queue.
Batch ingestion is the recommended
technique.
20
Most appropriate for exploration and
prototyping
.Inline ingestion
(provided by query tools)
Inline ingestion: control command (.ingest inline) containing in-
band data is intended for ad hoc testing purposes.
Ingest from query: control command (.set, .set-or-append, .set-
or-replace) that points to query results is used for generating
reports or small temporary tables.
Ingest from storage: control command (.ingest into) with data
stored externally (for example, Azure Blob Storage) allows
efficient bulk ingestion of data.
Data Saturday #2 Guatemala 2021
Supported data formats
For all ingestion methods other than ingest from query, format the data so that Azure Data Explorer can parse it. The
supported data formats are:
• CSV, TSV, TSVE, PSV, SCSV, SOH
• JSON (line-separated, multi-line), Avro
• ZIP and GZIP
Schema mapping helps bind source data fields to destination table columns.
• CSV Mapping (optional) works with all ordinal-based formats. It can be performed using the ingest
command parameter or pre-created on the table and referenced from the ingest command
parameter.
• JSON Mapping (mandatory) and Avro mapping (mandatory) can be performed using the ingest
command parameter. They can also be pre-created on the table and referenced from the ingest
command parameter.
21
Data Saturday #2 Guatemala 2021
Use ADX as ODBC Datasource
1. Download SQL 17 ODBC Driver:
https://www.microsoft.com/en-us/download/details.aspx?id=56567
2. Configure ODBC source (as a normal SQL SERVER ODBC DSN )
Than you can use your preferred tool: POWER BI DESKTOP, QLIK SENSE DESKTOP, SISENSE, ecc.
22
Data Saturday #2 Guatemala 2021
Notebooks + ADX = KQL Magic
KQL magic:
https://github.com/microsoft/jupyter-Kqlmagic
• extends the capabilities of the Python kernel in Jupyter
• can run Kusto language queries natively
• combine Python and Kusto query language
23
Data Saturday #2 Guatemala 2021
ADX - Query
24
Data Saturday #2 Guatemala 2021
Some code Examples
25
Query with between
Function with parameters «ToScalar» expression
«Extend» usage
Data Saturday #2 Guatemala 2021
QUERY AND PERFORMANCE OPTIMIZATION
• Materialized views
• Partitioning
• Query result caching
• Near real time scoring of AML and ONNX models
• FFT functions
• Geospatial
26
Data Saturday #2 Guatemala 2021
Materialized views
The view expose an always up-to-date view of the defined aggregation.
Advantages:
• Performance improvement
• Freshness
• Cost reduction
Behind the scenes:
• Source table is periodically materialized into the view table
• During the query time, the view combines the materialized part with the DELTA in raw table since last
materialization to return complete results
27
Data Saturday #2 Guatemala 2021
Query result caching
• Better query performance
• Lower resource consumption
• The queries needstto be identical
• The cache policy will be defined ùby MAX AGE
• Common use cases: DASHBOARD
29
Data Saturday #2 Guatemala 2021
Geospatial joins
• Use cases
• Connected mobility solutions
• Geospatial risk analysis
• Agriculture optimization using weather data
• Technical background
• Join of polygons reference data and geospatial timeseries data
• Based on three-dimensioanl S2 geometry
• Consists on a coarse-grained join using S2 cell coverage and exact
validation using geo_point_in_polygon function
30
Data Saturday #2 Guatemala 2021
Kusto for SQL USers
• Perform SQL SELECT (no DDL, only SELECT)
• Use KQL (Kusto Query Language)
• Supports translating T-SQL queries to Kusto query language
--
explain
select top(10) * from StormEvents
order by DamageProperty desc
StormEvents
| sort by DamageProperty desc nulls first
| take 10
31
Data Saturday #2 Guatemala 2021
ADX Functions
Functions are reusable queries or query parts. Kusto supports several kinds
of functions:
• Stored functions, which are user-defined functions that are stored and managed a
one kind of a database's schema entities. See Stored functions.
• Query-defined functions, which are user-defined functions that are defined and
used within the scope of a single query. The definition of such functions is done
through a let statement. See User-defined functions.
• Built-in functions, which are hard-coded (defined by Kusto and cannot be modified
by users).
32
Data Saturday #2 Guatemala 2021
Language examples
Alias
database["wiki"] =
cluster("https://somecluster.kusto.windows.net:443").da
tabase("somedatabase");
database("wiki").PageViews | count
Let
start = ago(5h);
let period = 2h;
T | where Time > start and Time < start + period | ...
Bin:
T | summarize Hits=count() by bin(Duration, 1s)
33
Batch:
let m = materialize(StormEvents | summarize n=count() by
State);
m | where n > 2000; m | where n < 10
Tabular expression:
Logs
| where Timestamp > ago(1d)
| join ( Events | where continent == 'Europe' ) on RequestId
Data Saturday #2 Guatemala 2021
Time Series Analysis – Bin Operator
T | summarize Hits=count() by bin(Duration, 1s)
bin(value,roundTo)
bin operator
Rounds values down to an integer multiple of a given bin size. If you have a scattered set of values, they will be
grouped into a smaller set of specific values.
[Rule]
[Example]
34
Data Saturday #2 Guatemala 2021
Time Series Analysis – Make Series Operator
T | make-series sum(amount) default=0, avg(price) default=0 on
timestamp from datetime(2016-01-01) to datetime(2016-01-10) step
1d by supplier
T | make-series [MakeSeriesParamters] [Column =] Aggregation [default = DefaultValue] [, ...] on
AxisColumn from start to end step step [by [Column =] GroupExpression [, ...]]
make-series operator
[Rule]
[Example]
35
Data Saturday #2 Guatemala 2021
Time Series Analysis – Basket Operator
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State, EventType, Damage, DamageCrops
| evaluate basket(0.2)
basket operator
Basket finds all frequent patterns of discrete attributes (dimensions) in the data and will return all frequent patterns
that passed the frequency threshold in the original query.
[Rule]
[Example]
T | evaluate basket([Threshold, WeightColumn, MaxDimensions, CustomWildcard, CustomWildcard, ...])
36
Data Saturday #2 Guatemala 2021
Time Series Analysis – Autocluster Operator
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.6)
autocluster operator
AutoCluster finds common patterns of discrete attributes (dimensions) in the data and will reduce the results of the
original query (whether it's 100 or 100k rows) to a small number of patterns.
[Rule]
[Example]
T | evaluate autocluster([SizeWeight, WeightColumn, NumSeeds, CustomWildcard, CustomWildcard, ...])
StormEvents
| where monthofyear(StartTime) == 5
| extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO")
| project State , EventType , Damage
| evaluate autocluster(0.2, '~', '~', '*')
37
Data Saturday #2 Guatemala 2021
ADX - VIEW
38
Data Saturday #2 Guatemala 2021
ADX Dashboards
• Integration in KUSTO
Web Explorer
• Optimized for big data
• Using powerful KQL to
retrieve visual data
• Make dynamic views
or widgets
39
Data Saturday #2 Guatemala 2021
Grafana query builder
• Create Grafana panels with no
KQL knowledge
• Select values/filter/grouping
using simple UI dropdowns
• Switch to RawMode to enhance
queries with KQL
40
Data Saturday #2 Guatemala 2021
How to use Grafana easily
41
Go to https://grafana.com/
Signup and get and Account
Data Saturday #2 Guatemala 2021
How to use Grafana easily
42
Go to All Plugins section, search ADX
Datasource and install plugin
Data Saturday #2 Guatemala 2021
How to use Grafana easily
43
Go to your grafana
https://<workbenchname>.grafana.net/datasources
And configure ADX datasource
And then Start building dashboards!
Data Saturday #2 Guatemala 2021
ADX - Orchestration
44
Data Saturday #2 Guatemala 2021
How about orchestration?
Three use cases in which FLOW + KUSTO are the solution
Push data to Power BI dataset
Periodically do queries, and
push to PowerBI dataset
Conditional queries
Make data checks, and send
notifications with no code
Email multiple ADX Flow charts
Send incredible emails with HTML5
Chart as query result
Data Saturday #2 Guatemala 2021
Orchestration?
Manage costs
Starting and stopping cluster,
evaluating a condition
Query sets to check data
Plan a Set of Queries in order
to say «IT’S OK, even Today
!»
Manage data retention
Based on dynamic condition
Data Saturday #2 Guatemala 2021
An Example of:
47
1. Set trigger 2. Connect and test ADX BLOCK 3. Configure Email BLOCK with dynamic params
Data Saturday #2 Guatemala 2021
And the result is:
48
Data Saturday #2 Guatemala 2021
ADX - Management
49
Data Saturday #2 Guatemala 2021
Data encryption in ADX
• encryption rest (using Azure Storage
• A Microsoft-managed key is used
• customer-managed keys can be enabled
• key rotation, temporary disable and revoke access controls can be
implemented.
• Soft Delete and Purge Protection will be enabled on the Key Vault and cannot
be disabled.
50
Data Saturday #2 Guatemala 2021
Extents, policies and Partition
• What are data shards or extents
• Column, segments, and blocks
• merge policy and sharding policy
• Data partitioning policy (post-ingestion)
51
Data Saturday #2 Guatemala 2021
FACTS:
A) Kusto stores its ingested data in reliable storage (most commonly Azure Blob Storage).
B) To speed-up queries on that data, Kusto caches this data (or parts of it) on its processing nodes,
The Kusto cache provides a granular cache policy that
customers can use to differentiate between two data
cache policies: hot data cache and cold data cache.
set query_datascope="hotcache";
T | union U | join (T datascope=all | where Timestamp < ago(365d) on X
YOU CAN SPECIFY WHICH LOCATION MUST BE USED
Cache policy
is independent
of retention policy !
Retention policy
52
Data Saturday #2 Guatemala 2021
Retention policy
• Soft Delete Period (number)
• Data is available for query
ts is the ADX IngestionDate
• Default is set to 100 YEARS
• Recoverability (enabled/disabled)
• Default is set to ENABLED
• Recoverable for 14 days after deletion
.alter database DatabaseName policy retention "{}"
.alter table TableName policy retention "{}"
EXAMPLE:
{ "SoftDeletePeriod": "36500.00:00:00",
"Recoverability":"Enabled" }
.delete database DatabaseName policy retention
.delete table TableName policy retention
.alter-merge table MyTable1 policy retention softdelete = 7d
2 Parameters, applicable to DB or Table
53
Data Saturday #2 Guatemala 2021
Data Purge
The purge process is final and irreversible
PURGE PROCESS:
1. It requires database admin
permissions
2. Prior to Purging you have to
be ENABLED, opening a
SUPPORT TICKET.
3. Run purge QUERY, and
identify SIZE, EXEC.TIME and
give VerificationToken
4. Run REALLY purge QUERY
passing Verification Token
.purge table MyTable records in database MyDatabase <| where
CustomerId in ('X', 'Y')
NumRecordsToPurge
EstimatedPurge
ExecutionTime VerificationToken
1,596 00:00:02 e43c7184ed22f4f
23c7a9d7b124d19
6be2e570096987
e5baadf65057fa6
5736b
.purge table MyTable records in database MyDatabase with
(verificationtoken='e43c7184ed22f4f23c7a9d7b124d196be2e570
096987e5baadf65057fa65736b') <| where CustomerId in ('X', 'Y')
.purge table MyTable records
in database MyDatabase
with (noregrets='true')
2 STEP PROCESS 1 STEP PROCESS
With No Regrets !!!!
54
Data Saturday #2 Guatemala 2021
Virtual Network
BENEFITS
• USE NSG rules to limit traffic.
• Connect your on-premise network to Azure Data Explorer cluster's subnet.
• Secure your data connection sources (Event Hub and Event Grid) with service
endpoints.
VNET gives you TWO Independent IPs
• Private IP: access the cluster inside the VNet.
• Public IP: access the cluster from outside the VNet (management and monitoring) and as a source address
for outbound connections initiated from the cluster.
55
Data Saturday #2 Guatemala 2021
Enterprise readiness
• RLS
• Provides fine control of access to table data by different users
• Allow specifying user access to specific rows in tables
• Provides mechanics to mask PII data in tables
56
Data Saturday #2 Guatemala 2021
Leader and Follower
• Azure Data Share creates a symbolic link between two ADX cluster.
• Sharing occurs in near-real-time (no data pipeline)
• ADX Decouples the storage and compute
• Allows customers to run multiple compute (read-only) instances on the same underlying storage
• You can attach a database as a follower database, which is a read-only database on a remote cluster.
• You can share the data at the database level or at the cluster level.
The cluster sharing the database is the leader cluster and the
cluster receiving the share is the follower cluster.
A follower cluster can follow one or more leader cluster
databases. The follower cluster periodically synchronizes to
check for changes.
The queries running on the follower cluster use local cache
and don't use the resources of the leader cluster.
57
Data Saturday #2 Guatemala 2021
ADX - INTEGRATION
58
Data Saturday #2 Guatemala 2021
Export
• To Storage
.export async compressed to csv (
h@"https://storage1.blob.core.windows.net/containerName;secretKey",
h@"https://storage1.blob.core.windows.net/containerName2;secretKey" ) with (
sizeLimit=100000, namePrefix=export, includeHeaders=all, encoding =UTF8NoBOM
) <| myLogs | where id == "moshe" | limit 10000
• To Sql
.export async to sql ['dbo.MySqlTable']
h@"Server=tcp:myserver.database.windows.net,1433;Database=MyDatabase;Auth
entication=Active Directory Integrated;Connection Timeout=30;" with
(createifnotexists="true", primarykey="Id") <| print Message = "Hello World!",
Timestamp = now(), Id=12345678
1. DEFINE COMMAND
Define ADX command and try your
recurrent export strategy
2. TRY IN EDITOR
Use an Editor to try command,
verifying conection strings and
parametrizing them
3. BUILD A JOB
Build a Notebook or a C# JOB using
the command as a SQL QUERY in
your CODE
59
Data Saturday #2 Guatemala 2021
External tables & Continuous Export
• It’s an external
endpoint:
• Azure Storage
• Azure Datalake Store
• SQL Server
• You need to define:
• Destination
• Continuous-Export
Strategy
EXT TABLE CREATION
.create external table ExternalAdlsGen2 (Timestamp:datetime, x:long,
s:string) kind=adl partition by bin(Timestamp, 1d) dataformat=csv (
h@'abfss://filesystem@storageaccount.dfs.core.windows.net/path;secretKey
' ) with ( docstring = "Docs", folder = "ExternalTables", namePrefix="Prefix" )
EXPORT to EXT TABLE
.create-or-alter continuous-export MyExport over (T) to table
ExternalAdlsGen2 with (intervalBetweenRuns=1h, forcedLatency=10m,
sizeLimit=104857600) <| T
60
Data Saturday #2 Guatemala 2021
ADX – USE CASES
61
Data Saturday #2 Guatemala 2021
USE CASE: Calculation And Rule Engine
62
Data Saturday #2 Guatemala 2021
End-to-end monitoring solution
63
Data Saturday #2 Guatemala 2021
IoT Telemetry
64
Data Saturday #2 Guatemala 2021
Big Data Analytics
65
Data Saturday #2 Guatemala 2021
Interactive analytics
66
Data Saturday #2 Guatemala 2021
MLOPS / Data science
67
Data Saturday #2 Guatemala 2021
ADX - TOOLS
68
Data Saturday #2 Guatemala 2021
How about the Tools?
69
3.VISUALIZE
• Notebooks
• Power BI
• Graphana
• ADX WEB UI
2.QUERY
• Kusto.Explorer
• Web UI
4.ORCHESTRATE
• Microsoft Flow
• Microsoft Logic App
1.LOAD
• LightIngest
• Azure Data Factory
Load
Query
Visualize
Orchestrate
BI People
IT People
ML People
Data Saturday #2 Guatemala 2021
Azure data studio plugins:
Manager Cluster
Manager
Notebooks
1. Select New connection from the Connections pane.
2. Fill in the Connection Details information.
3. For Connection type , select Kusto.
4. For Cluster , enter in your Azure Data Explorer cluster.
5. (When entering the cluster name, don't include the https://
prefix or a trailing /)
6. For Authentication Type , use the default - Azure Active
Directory - Universal with MFA account.
7. For Account , use your account information.
8. For Database , use Default.
9. For Server Group , use Default.
10. For Name (optional) , leave blank.
70
Data Saturday #2 Guatemala 2021
Azure data studio plugins:
• Filter/View data
• Build 3D Charts
• Take snapshot as JSON declarative file
71
Data Saturday #2 Guatemala 2021
Visual Studio Code plugins:
72
• Use Log Analytics or KUSTO/KQL extensions
( .csl | .kusto | .kql)
• Open VSC, create a file, save it and then
edit
Data Saturday #2 Guatemala 2021
ADX – WRAP UP
73
Data Saturday #2 Guatemala 2021
KUSTO: Do and Don’t
• DO analytics over Big Data.
• DO and support entities such as databases, tables, and columns
• DO and support complex analytics query operators (calculated
columns, filtering, group by, joins).
• DO NOT perform in-place updates
74
Data Saturday #2 Guatemala 2021
Azure Data Explorer
Blob
&
Azure
Queue
Python
SDK
IoT Hub
.NET SDK
Azure Data
Explorer
REST API
Event Hub
.NET SDK
Python SDK
Web UI
Desktop App
Jupyter
Magic
APIs UX
Power BI
Direct Query
Microsoft
Flow
Azure App
Logic
Connectors
Grafana
ADF
MS-TDS
Java SDK
Java Script
Monaco IDE
Azure
Notebooks
Protocols
Streaming
Bulk
APIs
Blob
&
Event Grid
Queued
Ingestion Direct
Java SDK
75
Data Saturday #2 Guatemala 2021
Why ADX is Unique
Simplified costs
• Vm costs
• ADX service add on
cost
Many Prebuilt Inputs
• ADF
• Logstash
• Kafka
• Storage
• Iothub
• EventHub
• Fluent bit
Many Prebuilt Outputs
• Power BI
• ODBC Connector
• Jupyter
• Grafana
76
Data Saturday #2 Guatemala 2021
Which are the OSS Alternatives that we should compare with?
From db-engines.com
Azure Data Explorer
Fully managed big data
interactive analytics platform
Elastic Search
A distributed, RESTful modern
search and analytics engine
ADX can be a replacement for search and log analytics engines such as Elasticsearch, Splunk, InfluxDB.
Splunk
real-time insights Engine to
boost productivity & security.
InfluxDB
DBMS for storing time series,
events and metrics
Vs
77
Data Saturday #2 Guatemala 2021
Comparison chart
Name Elasticsearch (ELASTIC) InfluxDB (InfluxData Inc.) Azure Data Explorer (Microsoft) Splunk (Splunk Inc.)
Description A distributed, RESTful modern search and
analytics engine based on Apache Lucene
DBMS for storing time series, events and
metrics
Fully managed big data interactive analytics
platform
Analytics Platform for Big Data
Database models Search engine, Document store Time Series DBMS Time Series DBMS, Search engine, Document
store , Event Store, Relational DBMS
Search engine
Initial release 2010 2013 2019 2003
License Open Source Open Source commercial commercial
Cloud-based only no no yes no
Implementation language Java Go
Server operating systems All OS with a Java VM Linux, OS X hosted Linux, OS X, Solaris, Windows
Data scheme schema-free schema-free Fixed schema with schema-less datatypes
(dynamic)
yes
Typing yes Numeric data and Strings yes yes
XML support no no yes yes
Secondary indexes yes no all fields are automatically indexed yes
SQL SQL-like query language SQL-like query language Kusto Query Language (KQL), SQL subset no
APIs and other access methods RESTful HTTP/JSON API HTTP API RESTful HTTP API HTTP REST
Java API JSON over UDP Microsoft SQL Server communication
protocol (MS-TDS)
Supported programming languages .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python
Ruby, PHP, Perl, Groovy, Community
Contributed Clients
R,Ruby,PHP,Perl,Haskell,Clojure,Erlang,Go,Lisp
,Rust,Scala
R, PowerShell Ruby, PHP
Server-side scripts yes no Yes, possible languages: KQL, Python, R yes
Triggers yes no yes yes
Partitioning methods Sharding Sharding Sharding Sharding
Replication methods yes selectable replication factor yes Master-master replication
MapReduce ES-Hadoop Connector no no yes
Consistency concepts Eventual Consistency Eventual Consistency Eventual Consistency
Immediate Consistency
Foreign keys no no no no
Transaction concepts no no no no
Concurrency yes yes yes yes
Durability yes yes yes yes
In-memory capabilities Memcached and Redis integration yes no no
User concepts simple rights management via user accounts Azure Active Directory Authentication Access rights for users and roles
78
Q & A
Data Saturday #2 Guatemala 2021 80
Data Saturday #2 Guatemala 2021 81
Riccardo Zamana
Riccardo.Zamana@gmail.com
@riccardozamana
www.linkedin.com/in/riccardozamana/
Thank you.

More Related Content

What's hot

Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
Introduction to azure cosmos db
Introduction to azure cosmos dbIntroduction to azure cosmos db
Introduction to azure cosmos dbRatan Parai
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
Scaling Data Quality @ Netflix
Scaling Data Quality @ NetflixScaling Data Quality @ Netflix
Scaling Data Quality @ NetflixMichelle Ufford
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...Altinity Ltd
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMark Kromer
 
BigQuery implementation
BigQuery implementationBigQuery implementation
BigQuery implementationSimon Su
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAAdam Doyle
 

What's hot (20)

Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Introduction to azure cosmos db
Introduction to azure cosmos dbIntroduction to azure cosmos db
Introduction to azure cosmos db
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Presto
PrestoPresto
Presto
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Scaling Data Quality @ Netflix
Scaling Data Quality @ NetflixScaling Data Quality @ Netflix
Scaling Data Quality @ Netflix
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
 
BigQuery implementation
BigQuery implementationBigQuery implementation
BigQuery implementation
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 

Similar to Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturdays Guatemala

Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in MotionRuhani Arora
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21JDA Labs MTL
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT_MTL
 
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...MongoDB
 
Azure satpn19 time series analytics with azure adx
Azure satpn19   time series analytics with azure adxAzure satpn19   time series analytics with azure adx
Azure satpn19 time series analytics with azure adxRiccardo Zamana
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...MongoDB
 
いそがしいひとのための Microsoft Ignite 2018 + 最新情報 Data & AI 編
いそがしいひとのための Microsoft Ignite 2018 + 最新情報 Data & AI 編いそがしいひとのための Microsoft Ignite 2018 + 最新情報 Data & AI 編
いそがしいひとのための Microsoft Ignite 2018 + 最新情報 Data & AI 編Miho Yamamoto
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryMárton Kodok
 
Supercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuerySupercharge your data analytics with BigQuery
Supercharge your data analytics with BigQueryMárton Kodok
 
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022GoDataDriven
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneDataWorks Summit
 
Flux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JSFlux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JSIvo Andreev
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...Cisco DevNet
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformDATAVERSITY
 
Webinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data LayerWebinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data LayerIBM Cloud Data Services
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationDenodo
 

Similar to Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturdays Guatemala (20)

MCT Virtual Summit 2021
MCT Virtual Summit 2021MCT Virtual Summit 2021
MCT Virtual Summit 2021
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
 
Azure satpn19 time series analytics with azure adx
Azure satpn19   time series analytics with azure adxAzure satpn19   time series analytics with azure adx
Azure satpn19 time series analytics with azure adx
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
 
いそがしいひとのための Microsoft Ignite 2018 + 最新情報 Data & AI 編
いそがしいひとのための Microsoft Ignite 2018 + 最新情報 Data & AI 編いそがしいひとのための Microsoft Ignite 2018 + 最新情報 Data & AI 編
いそがしいひとのための Microsoft Ignite 2018 + 最新情報 Data & AI 編
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
 
Supercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuerySupercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuery
 
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
 
Flux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JSFlux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JS
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
Webinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data LayerWebinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data Layer
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
 

More from Riccardo Zamana

Copilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdfCopilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdfRiccardo Zamana
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTORiccardo Zamana
 
Data saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewData saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewRiccardo Zamana
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Riccardo Zamana
 
Azure Industrial Iot Edge
Azure Industrial Iot EdgeAzure Industrial Iot Edge
Azure Industrial Iot EdgeRiccardo Zamana
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADXRiccardo Zamana
 
Industrial iot: dalle parole ai fatti
Industrial iot: dalle parole ai fatti Industrial iot: dalle parole ai fatti
Industrial iot: dalle parole ai fatti Riccardo Zamana
 
Azure dayroma java, il lato oscuro del cloud
Azure dayroma   java, il lato oscuro del cloudAzure dayroma   java, il lato oscuro del cloud
Azure dayroma java, il lato oscuro del cloudRiccardo Zamana
 
Industrial Iot - IotSaturday
Industrial Iot - IotSaturday Industrial Iot - IotSaturday
Industrial Iot - IotSaturday Riccardo Zamana
 

More from Riccardo Zamana (11)

Copilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdfCopilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdf
 
At the core you will have KUSTO
At the core you will have KUSTOAt the core you will have KUSTO
At the core you will have KUSTO
 
Data saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewData saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overview
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
Azure Industrial Iot Edge
Azure Industrial Iot EdgeAzure Industrial Iot Edge
Azure Industrial Iot Edge
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADX
 
Industrial iot: dalle parole ai fatti
Industrial iot: dalle parole ai fatti Industrial iot: dalle parole ai fatti
Industrial iot: dalle parole ai fatti
 
Azure dayroma java, il lato oscuro del cloud
Azure dayroma   java, il lato oscuro del cloudAzure dayroma   java, il lato oscuro del cloud
Azure dayroma java, il lato oscuro del cloud
 
Industrial Iot - IotSaturday
Industrial Iot - IotSaturday Industrial Iot - IotSaturday
Industrial Iot - IotSaturday
 
Azure reactive systems
Azure reactive systemsAzure reactive systems
Azure reactive systems
 
Industrial IoT on azure
Industrial IoT on azureIndustrial IoT on azure
Industrial IoT on azure
 

Recently uploaded

Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyAnusha Are
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456KiaraTiradoMicha
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfayushiqss
 

Recently uploaded (20)

Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 

Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturdays Guatemala

  • 1. Data Saturday #2 Guatemala 2021 Time series Analytics A deep dive into ADX Azure Data Explorer Riccardo Zamana Riccardo.Zamana@gmail.com @riccardozamana www.linkedin.com/in/riccardozamana/ 1
  • 2. Data Saturday #2 Guatemala 2021 • 20+ Experience in IT • 10+ Experience in IoT • 5+ Experience in Azure Projects 2
  • 3. Data Saturday #2 Guatemala 2021 ADX - Basics 3
  • 4. Data Saturday #2 Guatemala 2021 What is ADX for me, today 4 • A Telemetry data Search engine => ELK replacement • A TSDB envolved in LAMBDA replacements (as WARM path) => OSS LAMBDA (MinIO + Kafka) replacement • A Tool to Materialize data into ADLS & SQL • A Tool for monitoring, summarizing information and send notifications
  • 5. Data Saturday #2 Guatemala 2021 ADX Architecture 5 1. CONTEXT 2. SOURCES 3. INFRASTRUCTURE 4. DESTINATIONS
  • 6. Data Saturday #2 Guatemala 2021 ADX - Quickstart 6
  • 7. Data Saturday #2 Guatemala 2021 Create a Cluster 7 ADX follows standard creation process • Azure CLI • Powershell • C# • Python • ARM Login az login Select Subscription az account set --subscription MyAzureSub Cluster creation az kusto cluster create --name azureclitest --sku Standard_D11_v2 -- resource-group testrg Database Creation az kusto database create --cluster-name azureclitest --name clidatabase --resource-group testrg --soft-delete-period P365D --hot-cache-period P31D HOT-CACHE-PERIOD: Amount of time that data should be kept in cache. Duration in ISO8601 format (for example, 100 days would be P100D). SOFT-DELETE-PERIOD: Amount of time that data should be kept so it is available to query. Duration in ISO8601 format (for example, 100 days would be P100D) My favorite is Azure CLI:
  • 8. Data Saturday #2 Guatemala 2021 How to set and use ADX? 8 1. Create a database 2. Use Database to link Ingestion Sources 3. Configure DataConnections Blob Storage | IotHub | EventHub (+ EventGrid) 4. Pay attention to 1. Have clear concept PRINCIPALS & AAD App registrations. 2. BE FRIEND of «JSON LINER» TOOLS 3. BE FRIEND of KUSTO & PYTHON
  • 9. Data Saturday #2 Guatemala 2021 How ADX is Organized 9 ISTANCE DATABASE SOURCES DB Users/Apps Ingestion URL Querying URL Cache storage Blob storage EXTERNAL SOURCES EXTERNAL DESTINATIONS IotHUB EventHub Storage ADLS Sql Server MANY..
  • 10. Data Saturday #2 Guatemala 2021 ADX - Ingest 10
  • 11. Data Saturday #2 Guatemala 2021 FIRST PHASE: Ingestion 11 • Many connections & Plugins • Many SDKs • Many managed pipelines • Many tools to Ingest Rapidly Managed pipelines: • Ingest blob using EventGrid • Ingest Eventhub stream • Ingest IotHub stream • Ingest data from ADF Connections & Plugins: • Logstash plugin • Kafka Connector • Apache spark Connector Many SDK: • Python SDK • .NET SDK • Java SDK • Node SDK • REST API • GO API Tools: • One click ingestion • LightIngest
  • 12. Data Saturday #2 Guatemala 2021 Ingestion Types: • Streaming ingestion: Optimized for low volume of data per table, over thousands of tables • Operation completes in under 10 seconds • Data available for query after completion • Batching ingestion: optimized for high ingestion throughput • Default batch params: 5 minutes, 500 items, or 1000MB 12
  • 13. Data Saturday #2 Guatemala 2021 What is LightIngest 13 • command-line utility for ad-hoc data ingestion into Kusto • pull source data from a local folder • pull source data from an Azure Blob Storage container • Useful to ingest fastly and play with ADX • Most useful when you want to ingest a large amount of data, (time constraint on ingestion duration) [Ingest JSON data from blobs] LightIngest "https://adxclu001.kusto.windows.net;Federated=true" -database:db001 -table:LAB -sourcePath:"https://ACCOUNT_NAME.blob.core.windows.net/CONTAINER_NAME?SAS_TOKEN" -prefix:MyDir1/MySubDir2 -format:json -mappingRef:DefaultJsonMapping -pattern:*.json -limit:100 [Ingest CSV data with headers from local files] LightIngest "https://adxclu001.kusto.windows.net;Federated=true" -database:MyDb -table:MyTable -sourcePath:"D:MyFolderData" -format:csv -ignoreFirstRecord:true -mappingPath:"D:MyFolderCsvMapping.txt" -pattern:*.csv.gz -limit:100 REFERENCE: https://docs.microsoft.com/en-us/azure/kusto/tools/lightingest
  • 14. Data Saturday #2 Guatemala 2021 LightIngest: pay attention with Users! 14 Queued ingestion Direct ingestion IMPORTANT: the -creationTimePattern argument allows users to partition the data by creation time, not ingestion time
  • 15. Data Saturday #2 Guatemala 2021 LightIngest: pay attention with Users! 15 IMPORTANT: All the data is indexed but... How is partitioned???? By Ingestion TIME !!! the -creationTimePattern argument allows users to partition the data by creation time, not ingestion time [Ingest CSV data with headers from local files] LightIngest "https://adxclu001.kusto.windows.net;Federated=true" -database:MyDb -table:MyTable -sourcePath:"D:MyFolderData" -format:csv -ignoreFirstRecord:true -mappingPath:"D:MyFolderCsvMapping.txt" -pattern:*.csv.gz -limit:100 [Ingest JSON data from blobs] LightIngest "https://adxclu001.kusto.windows.net;Federated=true" -database:db001 -table:LAB -sourcePath: "https://ACCOUNT_NAME.blob.core.windows.net/CONTAIN ER_NAME?SAS_TOKEN" -prefix:MyDir1/MySubDir2 -format:json -mappingRef:DefaultJsonMapping -pattern:*.json -limit:100
  • 16. Data Saturday #2 Guatemala 2021 One Click ingestion GA • One Click makes ingestion (intuitive UX) • Start ingesting data , creating tables and mapping structures • Different data formats • One-time or continuous ingestion * SOON • Create Table • Support all data formats • Ingest from eventhub 16
  • 17. Data Saturday #2 Guatemala 2021 Ingestion: Format & UseCases • Ingest data using native formats: ApacheAvro, CSV (RFC4180), JSON, MultiJSON (jsonLine), ORC, Parquet, PSV, SCSV, TSV, TXT • Files/Blobs can be compressed: ZIP, GZIP • Better to use declarative names: MyData.csv.zip, MyData.json.gz 17
  • 18. Data Saturday #2 Guatemala 2021 Kafka Gold certified connector • From apache Kafka cluster (on cloud or onprem) • Kafka to ingest data into ADX at scale • GOLD (Partner supported < Microsoft) What’s the VISION behind it? 18
  • 19. Data Saturday #2 Guatemala 2021 What is FluentBIT • Collaboration with CNCF FluentBIT project • Multi platform Log Processor and Forwarder to collect data/logs from different sources • Unify and send to Block Blob • Ingest them into ADX using EventGrid • Can use AZURITE as a storageEndpoint for Simulation 19 https://docs.microsoft.com/en-us/azure/storage/common/storage-use- azurite?toc=/azure/storage/blobs/toc.json
  • 20. Data Saturday #2 Guatemala 2021 Ingestion Tecniques For high-volume, reliable, and cheap data ingestion Batch ingestion (provided by SDK) the client uploads the data to Azure Blob storage (designated by the Azure Data Explorer data management service) and posts a notification to an Azure Queue. Batch ingestion is the recommended technique. 20 Most appropriate for exploration and prototyping .Inline ingestion (provided by query tools) Inline ingestion: control command (.ingest inline) containing in- band data is intended for ad hoc testing purposes. Ingest from query: control command (.set, .set-or-append, .set- or-replace) that points to query results is used for generating reports or small temporary tables. Ingest from storage: control command (.ingest into) with data stored externally (for example, Azure Blob Storage) allows efficient bulk ingestion of data.
  • 21. Data Saturday #2 Guatemala 2021 Supported data formats For all ingestion methods other than ingest from query, format the data so that Azure Data Explorer can parse it. The supported data formats are: • CSV, TSV, TSVE, PSV, SCSV, SOH • JSON (line-separated, multi-line), Avro • ZIP and GZIP Schema mapping helps bind source data fields to destination table columns. • CSV Mapping (optional) works with all ordinal-based formats. It can be performed using the ingest command parameter or pre-created on the table and referenced from the ingest command parameter. • JSON Mapping (mandatory) and Avro mapping (mandatory) can be performed using the ingest command parameter. They can also be pre-created on the table and referenced from the ingest command parameter. 21
  • 22. Data Saturday #2 Guatemala 2021 Use ADX as ODBC Datasource 1. Download SQL 17 ODBC Driver: https://www.microsoft.com/en-us/download/details.aspx?id=56567 2. Configure ODBC source (as a normal SQL SERVER ODBC DSN ) Than you can use your preferred tool: POWER BI DESKTOP, QLIK SENSE DESKTOP, SISENSE, ecc. 22
  • 23. Data Saturday #2 Guatemala 2021 Notebooks + ADX = KQL Magic KQL magic: https://github.com/microsoft/jupyter-Kqlmagic • extends the capabilities of the Python kernel in Jupyter • can run Kusto language queries natively • combine Python and Kusto query language 23
  • 24. Data Saturday #2 Guatemala 2021 ADX - Query 24
  • 25. Data Saturday #2 Guatemala 2021 Some code Examples 25 Query with between Function with parameters «ToScalar» expression «Extend» usage
  • 26. Data Saturday #2 Guatemala 2021 QUERY AND PERFORMANCE OPTIMIZATION • Materialized views • Partitioning • Query result caching • Near real time scoring of AML and ONNX models • FFT functions • Geospatial 26
  • 27. Data Saturday #2 Guatemala 2021 Materialized views The view expose an always up-to-date view of the defined aggregation. Advantages: • Performance improvement • Freshness • Cost reduction Behind the scenes: • Source table is periodically materialized into the view table • During the query time, the view combines the materialized part with the DELTA in raw table since last materialization to return complete results 27
  • 28. Data Saturday #2 Guatemala 2021 Query result caching • Better query performance • Lower resource consumption • The queries needstto be identical • The cache policy will be defined ùby MAX AGE • Common use cases: DASHBOARD 29
  • 29. Data Saturday #2 Guatemala 2021 Geospatial joins • Use cases • Connected mobility solutions • Geospatial risk analysis • Agriculture optimization using weather data • Technical background • Join of polygons reference data and geospatial timeseries data • Based on three-dimensioanl S2 geometry • Consists on a coarse-grained join using S2 cell coverage and exact validation using geo_point_in_polygon function 30
  • 30. Data Saturday #2 Guatemala 2021 Kusto for SQL USers • Perform SQL SELECT (no DDL, only SELECT) • Use KQL (Kusto Query Language) • Supports translating T-SQL queries to Kusto query language -- explain select top(10) * from StormEvents order by DamageProperty desc StormEvents | sort by DamageProperty desc nulls first | take 10 31
  • 31. Data Saturday #2 Guatemala 2021 ADX Functions Functions are reusable queries or query parts. Kusto supports several kinds of functions: • Stored functions, which are user-defined functions that are stored and managed a one kind of a database's schema entities. See Stored functions. • Query-defined functions, which are user-defined functions that are defined and used within the scope of a single query. The definition of such functions is done through a let statement. See User-defined functions. • Built-in functions, which are hard-coded (defined by Kusto and cannot be modified by users). 32
  • 32. Data Saturday #2 Guatemala 2021 Language examples Alias database["wiki"] = cluster("https://somecluster.kusto.windows.net:443").da tabase("somedatabase"); database("wiki").PageViews | count Let start = ago(5h); let period = 2h; T | where Time > start and Time < start + period | ... Bin: T | summarize Hits=count() by bin(Duration, 1s) 33 Batch: let m = materialize(StormEvents | summarize n=count() by State); m | where n > 2000; m | where n < 10 Tabular expression: Logs | where Timestamp > ago(1d) | join ( Events | where continent == 'Europe' ) on RequestId
  • 33. Data Saturday #2 Guatemala 2021 Time Series Analysis – Bin Operator T | summarize Hits=count() by bin(Duration, 1s) bin(value,roundTo) bin operator Rounds values down to an integer multiple of a given bin size. If you have a scattered set of values, they will be grouped into a smaller set of specific values. [Rule] [Example] 34
  • 34. Data Saturday #2 Guatemala 2021 Time Series Analysis – Make Series Operator T | make-series sum(amount) default=0, avg(price) default=0 on timestamp from datetime(2016-01-01) to datetime(2016-01-10) step 1d by supplier T | make-series [MakeSeriesParamters] [Column =] Aggregation [default = DefaultValue] [, ...] on AxisColumn from start to end step step [by [Column =] GroupExpression [, ...]] make-series operator [Rule] [Example] 35
  • 35. Data Saturday #2 Guatemala 2021 Time Series Analysis – Basket Operator StormEvents | where monthofyear(StartTime) == 5 | extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO") | project State, EventType, Damage, DamageCrops | evaluate basket(0.2) basket operator Basket finds all frequent patterns of discrete attributes (dimensions) in the data and will return all frequent patterns that passed the frequency threshold in the original query. [Rule] [Example] T | evaluate basket([Threshold, WeightColumn, MaxDimensions, CustomWildcard, CustomWildcard, ...]) 36
  • 36. Data Saturday #2 Guatemala 2021 Time Series Analysis – Autocluster Operator StormEvents | where monthofyear(StartTime) == 5 | extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO") | project State , EventType , Damage | evaluate autocluster(0.6) autocluster operator AutoCluster finds common patterns of discrete attributes (dimensions) in the data and will reduce the results of the original query (whether it's 100 or 100k rows) to a small number of patterns. [Rule] [Example] T | evaluate autocluster([SizeWeight, WeightColumn, NumSeeds, CustomWildcard, CustomWildcard, ...]) StormEvents | where monthofyear(StartTime) == 5 | extend Damage = iff(DamageCrops + DamageProperty > 0 , "YES" , "NO") | project State , EventType , Damage | evaluate autocluster(0.2, '~', '~', '*') 37
  • 37. Data Saturday #2 Guatemala 2021 ADX - VIEW 38
  • 38. Data Saturday #2 Guatemala 2021 ADX Dashboards • Integration in KUSTO Web Explorer • Optimized for big data • Using powerful KQL to retrieve visual data • Make dynamic views or widgets 39
  • 39. Data Saturday #2 Guatemala 2021 Grafana query builder • Create Grafana panels with no KQL knowledge • Select values/filter/grouping using simple UI dropdowns • Switch to RawMode to enhance queries with KQL 40
  • 40. Data Saturday #2 Guatemala 2021 How to use Grafana easily 41 Go to https://grafana.com/ Signup and get and Account
  • 41. Data Saturday #2 Guatemala 2021 How to use Grafana easily 42 Go to All Plugins section, search ADX Datasource and install plugin
  • 42. Data Saturday #2 Guatemala 2021 How to use Grafana easily 43 Go to your grafana https://<workbenchname>.grafana.net/datasources And configure ADX datasource And then Start building dashboards!
  • 43. Data Saturday #2 Guatemala 2021 ADX - Orchestration 44
  • 44. Data Saturday #2 Guatemala 2021 How about orchestration? Three use cases in which FLOW + KUSTO are the solution Push data to Power BI dataset Periodically do queries, and push to PowerBI dataset Conditional queries Make data checks, and send notifications with no code Email multiple ADX Flow charts Send incredible emails with HTML5 Chart as query result
  • 45. Data Saturday #2 Guatemala 2021 Orchestration? Manage costs Starting and stopping cluster, evaluating a condition Query sets to check data Plan a Set of Queries in order to say «IT’S OK, even Today !» Manage data retention Based on dynamic condition
  • 46. Data Saturday #2 Guatemala 2021 An Example of: 47 1. Set trigger 2. Connect and test ADX BLOCK 3. Configure Email BLOCK with dynamic params
  • 47. Data Saturday #2 Guatemala 2021 And the result is: 48
  • 48. Data Saturday #2 Guatemala 2021 ADX - Management 49
  • 49. Data Saturday #2 Guatemala 2021 Data encryption in ADX • encryption rest (using Azure Storage • A Microsoft-managed key is used • customer-managed keys can be enabled • key rotation, temporary disable and revoke access controls can be implemented. • Soft Delete and Purge Protection will be enabled on the Key Vault and cannot be disabled. 50
  • 50. Data Saturday #2 Guatemala 2021 Extents, policies and Partition • What are data shards or extents • Column, segments, and blocks • merge policy and sharding policy • Data partitioning policy (post-ingestion) 51
  • 51. Data Saturday #2 Guatemala 2021 FACTS: A) Kusto stores its ingested data in reliable storage (most commonly Azure Blob Storage). B) To speed-up queries on that data, Kusto caches this data (or parts of it) on its processing nodes, The Kusto cache provides a granular cache policy that customers can use to differentiate between two data cache policies: hot data cache and cold data cache. set query_datascope="hotcache"; T | union U | join (T datascope=all | where Timestamp < ago(365d) on X YOU CAN SPECIFY WHICH LOCATION MUST BE USED Cache policy is independent of retention policy ! Retention policy 52
  • 52. Data Saturday #2 Guatemala 2021 Retention policy • Soft Delete Period (number) • Data is available for query ts is the ADX IngestionDate • Default is set to 100 YEARS • Recoverability (enabled/disabled) • Default is set to ENABLED • Recoverable for 14 days after deletion .alter database DatabaseName policy retention "{}" .alter table TableName policy retention "{}" EXAMPLE: { "SoftDeletePeriod": "36500.00:00:00", "Recoverability":"Enabled" } .delete database DatabaseName policy retention .delete table TableName policy retention .alter-merge table MyTable1 policy retention softdelete = 7d 2 Parameters, applicable to DB or Table 53
  • 53. Data Saturday #2 Guatemala 2021 Data Purge The purge process is final and irreversible PURGE PROCESS: 1. It requires database admin permissions 2. Prior to Purging you have to be ENABLED, opening a SUPPORT TICKET. 3. Run purge QUERY, and identify SIZE, EXEC.TIME and give VerificationToken 4. Run REALLY purge QUERY passing Verification Token .purge table MyTable records in database MyDatabase <| where CustomerId in ('X', 'Y') NumRecordsToPurge EstimatedPurge ExecutionTime VerificationToken 1,596 00:00:02 e43c7184ed22f4f 23c7a9d7b124d19 6be2e570096987 e5baadf65057fa6 5736b .purge table MyTable records in database MyDatabase with (verificationtoken='e43c7184ed22f4f23c7a9d7b124d196be2e570 096987e5baadf65057fa65736b') <| where CustomerId in ('X', 'Y') .purge table MyTable records in database MyDatabase with (noregrets='true') 2 STEP PROCESS 1 STEP PROCESS With No Regrets !!!! 54
  • 54. Data Saturday #2 Guatemala 2021 Virtual Network BENEFITS • USE NSG rules to limit traffic. • Connect your on-premise network to Azure Data Explorer cluster's subnet. • Secure your data connection sources (Event Hub and Event Grid) with service endpoints. VNET gives you TWO Independent IPs • Private IP: access the cluster inside the VNet. • Public IP: access the cluster from outside the VNet (management and monitoring) and as a source address for outbound connections initiated from the cluster. 55
  • 55. Data Saturday #2 Guatemala 2021 Enterprise readiness • RLS • Provides fine control of access to table data by different users • Allow specifying user access to specific rows in tables • Provides mechanics to mask PII data in tables 56
  • 56. Data Saturday #2 Guatemala 2021 Leader and Follower • Azure Data Share creates a symbolic link between two ADX cluster. • Sharing occurs in near-real-time (no data pipeline) • ADX Decouples the storage and compute • Allows customers to run multiple compute (read-only) instances on the same underlying storage • You can attach a database as a follower database, which is a read-only database on a remote cluster. • You can share the data at the database level or at the cluster level. The cluster sharing the database is the leader cluster and the cluster receiving the share is the follower cluster. A follower cluster can follow one or more leader cluster databases. The follower cluster periodically synchronizes to check for changes. The queries running on the follower cluster use local cache and don't use the resources of the leader cluster. 57
  • 57. Data Saturday #2 Guatemala 2021 ADX - INTEGRATION 58
  • 58. Data Saturday #2 Guatemala 2021 Export • To Storage .export async compressed to csv ( h@"https://storage1.blob.core.windows.net/containerName;secretKey", h@"https://storage1.blob.core.windows.net/containerName2;secretKey" ) with ( sizeLimit=100000, namePrefix=export, includeHeaders=all, encoding =UTF8NoBOM ) <| myLogs | where id == "moshe" | limit 10000 • To Sql .export async to sql ['dbo.MySqlTable'] h@"Server=tcp:myserver.database.windows.net,1433;Database=MyDatabase;Auth entication=Active Directory Integrated;Connection Timeout=30;" with (createifnotexists="true", primarykey="Id") <| print Message = "Hello World!", Timestamp = now(), Id=12345678 1. DEFINE COMMAND Define ADX command and try your recurrent export strategy 2. TRY IN EDITOR Use an Editor to try command, verifying conection strings and parametrizing them 3. BUILD A JOB Build a Notebook or a C# JOB using the command as a SQL QUERY in your CODE 59
  • 59. Data Saturday #2 Guatemala 2021 External tables & Continuous Export • It’s an external endpoint: • Azure Storage • Azure Datalake Store • SQL Server • You need to define: • Destination • Continuous-Export Strategy EXT TABLE CREATION .create external table ExternalAdlsGen2 (Timestamp:datetime, x:long, s:string) kind=adl partition by bin(Timestamp, 1d) dataformat=csv ( h@'abfss://filesystem@storageaccount.dfs.core.windows.net/path;secretKey ' ) with ( docstring = "Docs", folder = "ExternalTables", namePrefix="Prefix" ) EXPORT to EXT TABLE .create-or-alter continuous-export MyExport over (T) to table ExternalAdlsGen2 with (intervalBetweenRuns=1h, forcedLatency=10m, sizeLimit=104857600) <| T 60
  • 60. Data Saturday #2 Guatemala 2021 ADX – USE CASES 61
  • 61. Data Saturday #2 Guatemala 2021 USE CASE: Calculation And Rule Engine 62
  • 62. Data Saturday #2 Guatemala 2021 End-to-end monitoring solution 63
  • 63. Data Saturday #2 Guatemala 2021 IoT Telemetry 64
  • 64. Data Saturday #2 Guatemala 2021 Big Data Analytics 65
  • 65. Data Saturday #2 Guatemala 2021 Interactive analytics 66
  • 66. Data Saturday #2 Guatemala 2021 MLOPS / Data science 67
  • 67. Data Saturday #2 Guatemala 2021 ADX - TOOLS 68
  • 68. Data Saturday #2 Guatemala 2021 How about the Tools? 69 3.VISUALIZE • Notebooks • Power BI • Graphana • ADX WEB UI 2.QUERY • Kusto.Explorer • Web UI 4.ORCHESTRATE • Microsoft Flow • Microsoft Logic App 1.LOAD • LightIngest • Azure Data Factory Load Query Visualize Orchestrate BI People IT People ML People
  • 69. Data Saturday #2 Guatemala 2021 Azure data studio plugins: Manager Cluster Manager Notebooks 1. Select New connection from the Connections pane. 2. Fill in the Connection Details information. 3. For Connection type , select Kusto. 4. For Cluster , enter in your Azure Data Explorer cluster. 5. (When entering the cluster name, don't include the https:// prefix or a trailing /) 6. For Authentication Type , use the default - Azure Active Directory - Universal with MFA account. 7. For Account , use your account information. 8. For Database , use Default. 9. For Server Group , use Default. 10. For Name (optional) , leave blank. 70
  • 70. Data Saturday #2 Guatemala 2021 Azure data studio plugins: • Filter/View data • Build 3D Charts • Take snapshot as JSON declarative file 71
  • 71. Data Saturday #2 Guatemala 2021 Visual Studio Code plugins: 72 • Use Log Analytics or KUSTO/KQL extensions ( .csl | .kusto | .kql) • Open VSC, create a file, save it and then edit
  • 72. Data Saturday #2 Guatemala 2021 ADX – WRAP UP 73
  • 73. Data Saturday #2 Guatemala 2021 KUSTO: Do and Don’t • DO analytics over Big Data. • DO and support entities such as databases, tables, and columns • DO and support complex analytics query operators (calculated columns, filtering, group by, joins). • DO NOT perform in-place updates 74
  • 74. Data Saturday #2 Guatemala 2021 Azure Data Explorer Blob & Azure Queue Python SDK IoT Hub .NET SDK Azure Data Explorer REST API Event Hub .NET SDK Python SDK Web UI Desktop App Jupyter Magic APIs UX Power BI Direct Query Microsoft Flow Azure App Logic Connectors Grafana ADF MS-TDS Java SDK Java Script Monaco IDE Azure Notebooks Protocols Streaming Bulk APIs Blob & Event Grid Queued Ingestion Direct Java SDK 75
  • 75. Data Saturday #2 Guatemala 2021 Why ADX is Unique Simplified costs • Vm costs • ADX service add on cost Many Prebuilt Inputs • ADF • Logstash • Kafka • Storage • Iothub • EventHub • Fluent bit Many Prebuilt Outputs • Power BI • ODBC Connector • Jupyter • Grafana 76
  • 76. Data Saturday #2 Guatemala 2021 Which are the OSS Alternatives that we should compare with? From db-engines.com Azure Data Explorer Fully managed big data interactive analytics platform Elastic Search A distributed, RESTful modern search and analytics engine ADX can be a replacement for search and log analytics engines such as Elasticsearch, Splunk, InfluxDB. Splunk real-time insights Engine to boost productivity & security. InfluxDB DBMS for storing time series, events and metrics Vs 77
  • 77. Data Saturday #2 Guatemala 2021 Comparison chart Name Elasticsearch (ELASTIC) InfluxDB (InfluxData Inc.) Azure Data Explorer (Microsoft) Splunk (Splunk Inc.) Description A distributed, RESTful modern search and analytics engine based on Apache Lucene DBMS for storing time series, events and metrics Fully managed big data interactive analytics platform Analytics Platform for Big Data Database models Search engine, Document store Time Series DBMS Time Series DBMS, Search engine, Document store , Event Store, Relational DBMS Search engine Initial release 2010 2013 2019 2003 License Open Source Open Source commercial commercial Cloud-based only no no yes no Implementation language Java Go Server operating systems All OS with a Java VM Linux, OS X hosted Linux, OS X, Solaris, Windows Data scheme schema-free schema-free Fixed schema with schema-less datatypes (dynamic) yes Typing yes Numeric data and Strings yes yes XML support no no yes yes Secondary indexes yes no all fields are automatically indexed yes SQL SQL-like query language SQL-like query language Kusto Query Language (KQL), SQL subset no APIs and other access methods RESTful HTTP/JSON API HTTP API RESTful HTTP API HTTP REST Java API JSON over UDP Microsoft SQL Server communication protocol (MS-TDS) Supported programming languages .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python .Net, Java, JavaScript, Python Ruby, PHP, Perl, Groovy, Community Contributed Clients R,Ruby,PHP,Perl,Haskell,Clojure,Erlang,Go,Lisp ,Rust,Scala R, PowerShell Ruby, PHP Server-side scripts yes no Yes, possible languages: KQL, Python, R yes Triggers yes no yes yes Partitioning methods Sharding Sharding Sharding Sharding Replication methods yes selectable replication factor yes Master-master replication MapReduce ES-Hadoop Connector no no yes Consistency concepts Eventual Consistency Eventual Consistency Eventual Consistency Immediate Consistency Foreign keys no no no no Transaction concepts no no no no Concurrency yes yes yes yes Durability yes yes yes yes In-memory capabilities Memcached and Redis integration yes no no User concepts simple rights management via user accounts Azure Active Directory Authentication Access rights for users and roles 78
  • 78. Q & A Data Saturday #2 Guatemala 2021 80
  • 79. Data Saturday #2 Guatemala 2021 81 Riccardo Zamana Riccardo.Zamana@gmail.com @riccardozamana www.linkedin.com/in/riccardozamana/ Thank you.

Editor's Notes

  1. CONTEXT SOURCES INFRASTRUCTURE DESTINATIONS
  2. DEMO: LightIngest is a command-line utility for ad-hoc data ingestion into Azure Data Explorer (ADX). The utility can pull source data from a local folder or from an Azure blob storage container. LightIngest is most useful when you want to ingest a large amount of data, because there is no time constraint on ingestion duration. When historical data is loaded from existing system to ADX, all records receive the same the ingestion date. Now, the -creationTimePattern argument allows users to partition the data by creation time, not ingestion time. The -creationTimePattern argument extracts the CreationTime property from the file or blob path; the pattern doesn't need to reflect the entire item path, just the section enclosing the timestamp you want.
  3. LightIngest is a command-line utility for ad-hoc data ingestion into Azure Data Explorer (ADX). The utility can pull source data from a local folder or from an Azure blob storage container. LightIngest is most useful when you want to ingest a large amount of data, because there is no time constraint on ingestion duration. When historical data is loaded from existing system to ADX, all records receive the same the ingestion date. Now, the -creationTimePattern argument allows users to partition the data by creation time, not ingestion time. The -creationTimePattern argument extracts the CreationTime property from the file or blob path; the pattern doesn't need to reflect the entire item path, just the section enclosing the timestamp you want.
  4. LightIngest is a command-line utility for ad-hoc data ingestion into Azure Data Explorer (ADX). The utility can pull source data from a local folder or from an Azure blob storage container. LightIngest is most useful when you want to ingest a large amount of data, because there is no time constraint on ingestion duration. When historical data is loaded from existing system to ADX, all records receive the same the ingestion date. Now, the -creationTimePattern argument allows users to partition the data by creation time, not ingestion time. The -creationTimePattern argument extracts the CreationTime property from the file or blob path; the pattern doesn't need to reflect the entire item path, just the section enclosing the timestamp you want.
  5. EXAMPLE of QUEUED INGESTION https://docs.microsoft.com/en-us/azure/kusto/api/netfx/kusto-ingest-queued-ingest-sample Example of INLINE INGESTION https://docs.microsoft.com/it-it/azure/kusto/management/data-ingestion/ingest-inline
  6. https://notebooks.azure.com/riccardo-zamana/projects/azuresaturday2019
  7. Se usi HAS usi indici, contains non usa indici Spiega come fai i between
  8. FARE UN PO DI PROVE FARE la distinct per introdurre la SUMMERIZE
  9. .create-or-alter function with (folder = "AzureSaturday2019", docstring = "Func1", skipvalidation = "true") MyFunction1(i:long) {TBL_LAB0X | limit 100 | where minimum_nights > i} MyFunction1(80); explain SELECT name, minimum_nights from TBL_LAB0X .create-or-alter function with (folder = "AzureSaturday2019", docstring = "Func1", skipvalidation = "true") MyFunction1(i:long) {TBL_LAB0X | project name, minimum_nights | limit 100 | where minimum_nights > i | render columnchart} MyFunction1(80);
  10. T | summarize Hits=count() by bin(Duration, 1s)
  11. Kusto is built to support tables with a huge number of records (rows) and large amounts of data. To handle such large tables, each table's data is divided into smaller "tablets" called data shards or extents (the two terms are synonymous). The union of all the table's extents holds the table's data. Individual extents are kept smaller than a single node's capacity, and the extents are spread over the cluster's nodes, achieving scale-out. An extent is a like a type of mini-table. It contains data and metadata and information such as its creation time and optional tags that are associated with its data. Additionally, the extent usually holds information that lets Kusto query the data efficiently. For example, an index for each column of data in the extent, and an encoding dictionary, if column data is encoded. As a result, the table's data is the union of all the data in the table's extents. Extents are immutable and can never be modified. It may only be queried, reassigned to a different node, or dropped out of the table. Data modification happens by creating one or more new extents and transactionally swapping old extents with new ones. Extents hold a collection of records that are physically arranged in columns. This technique is called columnar store. It enables efficient encoding and compression of the data, because different values from the same column often "resemble" each other. It also makes querying large spans of data more efficient, because only the columns used by the query need to be loaded. Internally, each column of data in the extent is subdivided into segments, and the segments into blocks. This division isn't observable to queries, and lets Kusto optimize column compression and indexing. To maintain query efficiency, smaller extents are merged into larger extents. The merge is done automatically, as a background process, according to the configured merge policy and sharding policy. Merging extents reduces the management overhead of having a large number of extents to track. More importantly, it allows Kusto to optimize its indexes and improve compression. Extent merging stops once an extent reaches certain limits, such as size, since beyond a certain point, merging reduces rather than increases efficiency. When a Data partitioning policy is defined on a table, extents go through another background process after they're created (post-ingestion). This process reingests the data from the source extents and creates homogeneous extents, in which the values of the column that is the table's partition key all belong to the same partition. If the policy includes a hash partition key, all homogeneous extents that belong to the same partition will be assigned to the same data node in the cluster.
  12. Azure Data Share creates a symbolic link between two ADX cluster. Sharing occurs in near-real-time (no data pipeline) ADX Decouples the storage and compute Allows customers to run multiple compute (read-only) instances on the same underlying storage You can attach a database as a follower database, which is a read-only database on a remote cluster. You can share the data at the database level or at the cluster level. The cluster sharing the database is the leader cluster and the cluster receiving the share is the follower cluster. A follower cluster can follow one or more leader cluster databases. The follower cluster periodically synchronizes to check for changes. The queries running on the follower cluster use local cache and don't use the resources of the leader cluster.
  13. FAI ESEMPIO DI EXPORT
  14. Fai vedere NOTEBOOKS
  15. PROVA SU VSC CTRL+P => kuskus Poi: cluster(adxclu001).database('db001').table('TBL_LAB01') | count