SlideShare a Scribd company logo
1 of 103
Introduction to Analytics
with the Microsoft Data
Platform
Jen Stirrup
Data Whisperer,
Data Relish
Level: 300
Jen Stirrup
·Consultant
·Postgraduate degrees
in Artificial Intelligence
and Cognitive
Science
·Twenty year career in
industry
·Author
Contact Details
·http://bit.ly/JenStirrupRD
·http://bit.ly/JenStirrupLinkedIn
·http://bit.ly/JenStirrupMVP
·http://bit.ly/JenStirrupTwitter
JenStirrup.com
DataRelish.com
Agenda
• Azure Data Explorer
• Azure Data Factory
• Streaming Analytics
• Event Hubs
• Azure SQL Database
Agenda
• Analysis Services
• Data Lake Analytics
• HDInsight
• Azure Databricks
• Azure SQL Datawarehouse
• Azure Synapse
What to use and when?
A fully managed, elastic data warehouse with security at
every level of scale at no extra cost
Azure Synapse Analytics
Fast, easy and collaborative Apache Spark-based analytics
platform
Azure Databricks
A fully managed cloud Hadoop and Spark service backed by
99.9% SLA for your enterprise
HDInsight
A fully managed cloud service that enables you to easily
build, deploy and share predictive analytics solutions
Machine Learning
An on-demand, real-time stream processing service with
enterprise-grade security, auditing and support
Stream Analytics
What to use and when?
A no-limits data lake built to support massively parallel
analytics
Data Lake Store
A fully managed on-demand pay-per-job analytics
service with enterprise-grade security, auditing and
support
Data Lake Analytics
An enterprise-wide metadata catalogue that makes
data asset discovery simple
Azure Data Catalog
A data integration service to orchestrate and automate
data movement and transformation
Data Factory
INGEST
Modern Data Warehouse
PREPARE TRANSFORM
& ENRICH
SERVE
STORE
VISUALIZE
On-premises data
Cloud data
SaaS data
Azure Data Explorer
Azure Data Explorer
Jupyter Notebook allows you to create and share documents that contain live code, equations,
visualizations, and explanatory text.
We are excited to announce KQL magic commands which extends the functionality of the Python kernel in
Jupyter Notebook. KQL magic allows you to write KQL queries natively and query data from Microsoft
Azure Data Explorer. You can easily interchange between Python and KQL, and visualize data using rich
Plot.ly library integrated with KQL render commands. KQL magic supports Azure Data Explorer,
Application Insights, and Log Analytics as data sources to run queries against.
Azure Data Explorer
Azure Data Explorer
Fast and highly scalable data exploration
service.
Azure Data Explorer is a fast, fully managed
data analytics service for real-time analysis on
large volumes of data streaming from
applications, websites, IoT devices and more.
Azure Data Explorer
● Low-latency ingestion
● Fast read-only query with high concurrency
● Query large amounts of structured, semi-
structured (JSON-like nested types) and
unstructured (free-text) data.
Credit: Microsoft
Azure Data Explorer Demo
Azure Data Factory
Data Ingestion
Data Factory
● No code or maintenance required to build
hybrid ETL and ELT pipelines within the
Data Factory visual environment.
● Cost-efficient and fully managed serverless
cloud data integration tool that scales on
demand.
Data Factory
● Azure security measures to connect to on-
premises, cloud-based and software-as-a-
service apps with peace of mind.
● SSIS integration runtime to easily move
SSIS ETL workloads into the cloud with
minimal effort.
Data Factory
● Ingest, move, prepare, transform and
process your data in a few clicks, and
complete your data modelling within the
accessible visual environment.
Why Data Factory?
● Orchestrate, monitor & schedule data
pipelines
● Automatic cloud resource management
● Single pane of glass
• Data Sources, Linked Services & Datasets
• Activities
• Pipelines
• Supported data sources
• Supported activity types
Module Overview
Streaming Analytics
Stream Analytics
Stream Analytics
● Build streaming pipelines in minutes - Run complex analytics with no need to learn new
processing frameworks or provision virtual machines (VMs) or clusters. Use familiar SQL language
that is extensible with JavaScript and C# custom code for more advanced use cases. Easily enable
scenarios such as low-latency dashboarding, streaming ETL and real-time alerting with one-click
integration across sources and sinks.
● Run mission-critical workloads with subsecond latencies - Get guaranteed, “exactly once”
event processing with 99.9% availability and built-in recovery capabilities. Easily set up a continuous
integration and continuous delivery (CI-CD) pipeline and achieve subsecond latencies on your most
demanding workloads.
● Deploy in the cloud and on the edge - Bring real-time insights and analytics capabilities closer to
where your data originates. Enable new scenarios with true hybrid architectures for stream
processing and run the same query in the cloud or on the edge.
● Power real-time analytics with artificial intelligence - Take advantage of built-in machine learning
(ML) models to shorten time to insights. Use ML-based capabilities to perform anomaly detection
directly in your streaming jobs with Azure Stream Analytics.
Credit: Microsoft
Event Hubs
Event Hubs
● A hyper-scale telemetry ingestion service
that collects, transforms and stores millions
of events.
● Event Hubs is a fully managed, real-time
data ingestion service that’s simple, trusted
and scalable.
Event Hubs
● Integrate seamlessly with other Azure
services to unlock valuable insights.
● Experience real-time data ingestion and
microbatching on the same stream.
Event Hubs
● Focus on drawing insights from your data instead of managing
infrastructure. Build real-time big data pipelines and respond to
business challenges right away.
● Build real-time data pipelines with just a couple of clicks. Seamlessly
integrate with Azure data services to uncover insights faster.
Event Hubs
● Ingest millions of events per second -
Continuously ingress data from hundreds of
thousands of sources with low latency and
configurable time retention.
Analysis Services
Analysis Services
Focus on solving business problems, not
learning new skills, when you use the familiar,
integrated development environment of Visual
Studio. Easily deploy your existing SQL
Server 2016 tabular models to the cloud.
Data Lake Analytics
Data Lake Analytics
● Easily develop and run massively parallel
data transformation and processing
programs in U-SQL, R, Python and .NET
over petabytes of data. With no
infrastructure to manage, you can process
data on demand, scale instantly and only
pay per job.
Data Lake Analytics
● Process big data jobs in seconds with
Azure Data Lake Analytics. There is no
infrastructure to worry about because there
are no servers, virtual machines or clusters
to wait for, manage or tune.
Data Lake Analytics
● Instantly scale the processing power,
measured in Azure Data Lake Analytics
Units (AU), from one to thousands for each
job. You only pay for the processing that
you use per job.
Data Lake Analytics
● U-SQL is a simple, expressive and
extensible language that allows you to write
code once and have it automatically
parallelised for the scale you need.
Data Lake Analytics
● Process petabytes of data for diverse
workload categories such as querying, ETL,
analytics, machine learning, machine
translation, image processing and
sentiment analysis by leveraging existing
libraries written in .NET languages, R or
Python.
Apache Spark
Three Ages of Databases
Data
WarehouseData
Warehouse RDBMS
RDBMS
RDBMS NoSQL
1985-1995
2010-Now1995-2010
Pressures on single node
RDBMS
Scalability
Single Node
RDBMS
OLAP/BI/Data
Warehouse
Social
Networks
Agile
Schema
Free
Key-Value Stores
Graph/Triple Stores
XML
Column-Family Stores
Object Stores
17
Simplicity?
YesDoes it
look like
document?
Start
No
Stop
Use the
RDBMS
Use Microsoft
Office
What is big data?
• When you have to innovate to collect,
store, organize, analyse and share it.
• - Werner Vogels, Amazon CTO
What is Big Data?
• Traditionally…..
– Physics Experiments
– Sensor data
– Satellite data
Now?
• Now: zettabytes in the cloud
expected by end of next year
• https://datarelish.net/2019/11/07/whats-
the-future-for-cloud-data-storage-clouds-of-
glass/
Azure Synapse
● Azure Synapse delivers insights from all your data,
across data warehouses and big data analytics
systems, with blazing speed.
● Data professionals can query both relational and non-
relational data at petabyte-scale using the familiar
SQL language
● Credit to the Microsoft team for help with these decks
Azure Synapse
● Azure Synapse is a limitless analytics service
that brings together enterprise data
warehousing and Big Data analytics.
● Query data on your terms, using either
serverless on-demand or provisioned
resources—at scale
Azure Synapse
• Discover
powerful insights
across your
most important
data
• Unified analytics
experience
Azure Synapse
• support for SSDT with Visual Studio 2019
• native platform integration with Azure
DevOps
• built-in continuous integration and
deployment (CI/CD) capabilities for
enterprise-level deployments
Azure Synapse
STORE
VISUALIZE
On-premises data
Cloud data
SaaS data
Best in class price
per performance
Developer
productivity
Intelligent workload
management
Data flexibility
Up to 94% less expensive
than competitors
Prioritize resources for
the most valuable
workloads
Ingest variety of data
sources to derive the
maximum benefit
Use preferred tooling for
SQL data warehouse
development
Industry-leading
security
Defense-in-depth
security and 99.9%
financially backed
availability SLA
Azure Synapse
DirectQuery Composite Models &
Aggregation Tables
The enterprise solution
Avoid data movement
Delegate query work to
the back-end source;
take advantage of Azure
SQL Data Warehouse’s
advanced features
Why choose? Import and
DirectQuery in a single
model
Keep summarized data
local; get detail data from
the source
Import
Great for small data
sources and personal
data discovery
Fine for CSV files,
spreadsheet data and
summarized OLTP data
Power BI
Best in class price
per performance
Developer
productivity
Intelligent workload
management
Data flexibility
Up to 94% less expensive
than competitors
Prioritize resources for
the most valuable
workloads
Ingest variety of data
sources to derive the
maximum benefit
Use preferred tooling for
SQL data warehouse
development
Industry-leading
security
Defense-in-depth
security and 99.9%
financially backed
availability SLA
Azure SQL Data Warehouse
Complete Data SecurityCategory Feature
Data Protection
Data in Transit
Data Encryption at Rest
Data Discovery and Classification
Access Control
Object Level Security (Tables/Views)
Row Level Security
Column Level Security
Dynamic Data Masking
SQL Login
Authentication Azure Active Directory
Multi-Factor Authentication
Virtual Networks
Network Security Firewall
Azure ExpressRoute
Thread Detection
Threat Protection Auditing
Vulnerability Assessment
Workload
Management
Scale-In Isolation
Predictablecost
Online elasticity
Efficient for unpredictableworkloads
No cacheeviction forscaling
Intra Cluster Workload Isolation
(Scale In)
Marketing
CREATE WORKLOAD GROUP Sales
WITH
(
[ MIN_PERCENTAGE_RESOURCE = 60 ]
[ CAP_PERCENTAGE_RESOURCE = 100 ]
[ MAX_CONCURRENCY = 6 ] )
40%
Compute
1000c DWU
60%
Sales
60%
100%
M I C R OSOFT C O NFIDE NTI AL
Heterogeneous
Data
Scale-In Isolation
Predictablecost
Online elasticity
Efficient for unpredictableworkloads
No cacheeviction forscaling
Developer
Productivity
Performance Optimized Storage
Elastic Architecture Columnar Storage Columnar Ordering Table Partitioning
Nonclustered Indexes Hash Distribution Materialized Views Resultset Cache
Azure SQL Data Warehouse performance
advantage
Overview
Complete
Security
Data In Transit
Data encryption at rest (Service & User Managed Keys)
Data Discovery and Classification
Native Row Level Security
Table and View Security (GRANT / DENY)
Column Level Security
Dynamic Data Masking
SQL Authentication
Native Azure Active Directory
Integrated Security
Multi-Factor Authentication
Virtual Network (VNET)
SQL Firewall (server)
Integration with ExpressRoute
SQL Threat Detection
SQL Auditing
Vulnerability Assessment
Data Protection
CATEGORY FEATURE SQL DATA
WAREHOUSE
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
https://azure.microsoft.com/en-gb/services/synapse-analytics/
What is Polybase?
Unstructured Data
What is Polybase?
SQL Server
Azure SQL Data Warehouse
Parallel Data Warehouse
Azure SQL Database
STRUCTURED
DATA
UNSTRUCTURED
DATA
BUSINESS DATA
Polybase
Making business data accessible
Provides a scalable, T-SQL-compatible query processing
framework for combining data from both universes
Polybase Purpose
Consumer Analyst Scientist
Data Volume Medium to Low Reasonable High -> Huge
Degree of Structure Very High Some Low ->None
Number of Users Very High Medium Low
Transformation Complexity Low Medium to High High
Analytics Complexity Low Medium Very High
Data
Machine Learning
Agenda
• Why machine learning in SQL Server?
• How to leverage:
– SQL Compute context
– sp_execute_external_script features
– PREDICT T-SQL Function
• Call to action
• Questions
Why machine learning with SQL Server?
Reduce or eliminate
data movement with
in-database analytics
Operationalize
machine learning
models
Get enterprise scale,
performance, and
security
Machine Learning Services
• R/Python Integration Design
– Invokes runtime outside of SQL Server process
– Batch-oriented operations
• SQL Compute context
Any R/Python
IDE
Data Scientist
Workstation
Typical Machine Learning workflow against database
SQL Server
Pull Data1
train <-
sqlQuery(connection,
“select * from nyctaxi_sample”)
model <- glm(formula, train)
3
Model
Output
2 Execution
Any R/Python
IDE
Data Scientist
Workstation rx*
output
3
Machine Learning workflow using SQL compute context
Execution2
SQL Server 2017
SQL Server
R/Python Runtime
Machine Learning
Services
Script1
cc <- RxInSqlServer( connectionString,
computeContext)
rxLogit(formula, cc)
Model or
Prediction
s
4
SQL Compute Context from
R/Python client
• Requirement - Use rx* functions
Push data from SQL Server to
external runtime
sp_execute_external_
script
@input_data_1 = N’
SELECT * FROM
TrainingData’
InputDataset:
data.frame
OR
Pandas
dataframe
Read Files with R Server
• R can read almost all flat text files like
SPSS, CSV, TXT.
• Provide path direction and read file directory
from the given path into R.
RStudio & XDF File
• In order to convert our text files either CSV
or other text formats, into XDF format,
RStudio can easily handle this task.
• XDF file formats can only be read by R, and
they are very small in size as compared to
other files.
RStudio
Convert File to XDF
• TXT or CSV files can be converted to XDF
format .
• XDF file formats can only be read by Rand
they are very small in size as compared to
other files.
rxCrossTabs
rxCrossTabs is used to create contingency
tables from cross- classifying factors using a
formula interface.
rxCrossTabs() is also used to compute sums
according to combinations of different variables
RxCube
• rxCube() performs a very similar function to
rxCrossTabs().
• It computes tabulated sums or means.
• rxCube() produces the sums or means in long
format rather than a table.
• This can be useful when we want to
aggregate data for further analysis within R.
dplyrXdf Package
The dplyr package is a popular toolkit for data
transformation and manipulation.
dplyr supports data frames, data tables (from
the data.table package)
The dplyrXdf package implements such a
backend for the xdf file format, a technology
supplied as part of Revolution R Enterprise.
Ggplot2
Ggplot2 allows you to create graphs that
represent both univariate and multivariate
numerical and categorical data in a
straightforward manner
This function can be used to create the most
common graph types.
It can create a very wide range of useful plots.
Custom visualizations with
rxSummary & rxCube
• RxSummary:
• The rxSummary function provides
descriptive statistics using
a formula argument.
Custom visualizations with
rxSummary & rxCube
• RxCube
• Rxcube is similar to rxSummary but it
returns fewer statistical summaries and
therefore run faster.
• With y ~u : v as the formula, the rxCube
returns count and averages for column y
Custom visualizations with
rxSummary & rxCube
• RxCube
• Code: rxc1 <- rxCube(trip_distance ~
picup_nb:dropoff_nb,
• mht_xdf)
rxHistogram
• rxHistogram() is used to create a histogram
for the Close variable.
• Syntax: Function(formula, data, …)
• Formula = A formula that contains the
variable which you want to visualize. The
rxlinePlot
• Line or scatter plot use data from an .xdf file
or data frame
• Syntax: rxLinePlot(formula, data, …. )
• Formula = For this function, this formula
should have one variable on the left side of
the ~ that reflects the Y-axis, and one
rxDataSteps
• The rxDataStep function can be used to
process data in chunks.
• rxDataStep can be used to create and
transform subsets of data.
rxDataStep
•
• rxDatastep can be used to
• Modify existing columns or add new
columns to the data
Subset Rows Of Data Using
Transform Argument
• A common use of rxDataStep is to create a
new data set with a subset of rows and
variables.
• For this purpose, we use the data frame of
our data as the input data set.
On-the-fly Transformation
• Analytical functions within
the RevoScaleR package use a formal
transformation function framework for
generating on the fly variables
• The RevoScaleR approach is to use the
In-data transformation
• There are two main approaches to in-data
transformation:
• Define an external based R function and
reference it.
• Define an embedded transformation as an
input to a transforms argument on another
function.
Generate a data frame
• A data frame is a table or a two-
dimensional array-like structure in which
each column contains values of one variable
and each row contains one set of values
from each column.
Generate a Data Frame
• Code: SalesData <- file.path("D:/773Demo",
"CustomerSalesInfo.xdf")
• SalesDataFrame <- rxImport(inData =
SalesData)
POSIXct & POSIXIt
• R provides several options for dealing
with date and date/time data. The POSIXct
and POSIXlt classes allow for dates and
times with control for time zones.
Transform functions
▪ It is a generic function which does useful things
with data frames.
▪ Embedded transformations provide instructions
within a formula, through arguments on a
function.
▪ Using just arguments, you can manipulate data
using transformations.
Summary
Improve performance of your ML scripts by using:
– SQL Compute context from client (rx* functions)
– Streaming to reduce memory usage
– Trivial parallelism for scoring (predict or rxPredict)
– Parallel training and scoring using rx* functions
– Native PREDICT function for low latency scoring
Call to action
• Resources
– SQL Server Samples on GitHub – R Services &
ML Services
– Getting started tutorials: AKA.MS/MLSQLDEV
– Configure instance: SSMS Reports for ML
Services
– ML cheat sheet
– Microsoft documentation: SQL Server Machine

More Related Content

What's hot

Part 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapsePart 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapseNilesh Gule
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Michael Rys
 
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)Cathrine Wilhelmsen
 
Azure synapse analytics overview elasta cloud3
Azure synapse analytics overview   elasta cloud3Azure synapse analytics overview   elasta cloud3
Azure synapse analytics overview elasta cloud3Richard Conway
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannDatabricks
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingDatabricks
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopCCG
 
A developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksA developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksMicrosoft Tech Community
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAlberto Diaz Martin
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure Antonios Chatzipavlis
 
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Microsoft Tech Community
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on AzureTrivadis
 
Data Lake ETL in the Cloud with ADF
Data Lake ETL in the Cloud with ADFData Lake ETL in the Cloud with ADF
Data Lake ETL in the Cloud with ADFMark Kromer
 
BTUG - Dec 2014 - Hybrid Connectivity Options
BTUG - Dec 2014 - Hybrid Connectivity OptionsBTUG - Dec 2014 - Hybrid Connectivity Options
BTUG - Dec 2014 - Hybrid Connectivity OptionsMichael Stephenson
 
Data saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewData saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewRiccardo Zamana
 
Modern data warehouse with Azure
Modern data warehouse with AzureModern data warehouse with Azure
Modern data warehouse with AzureNilesh Gule
 
Using Redash for SQL Analytics on Databricks
Using Redash for SQL Analytics on DatabricksUsing Redash for SQL Analytics on Databricks
Using Redash for SQL Analytics on DatabricksDatabricks
 

What's hot (20)

Part 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapsePart 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure Synapse
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
 
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
Azure Synapse Analytics Teaser (Microsoft TechX Oslo 2019)
 
Azure synapse analytics overview elasta cloud3
Azure synapse analytics overview   elasta cloud3Azure synapse analytics overview   elasta cloud3
Azure synapse analytics overview elasta cloud3
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks Streaming
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual Workshop
 
A developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksA developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure Databricks
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientist
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
Data Lake ETL in the Cloud with ADF
Data Lake ETL in the Cloud with ADFData Lake ETL in the Cloud with ADF
Data Lake ETL in the Cloud with ADF
 
BTUG - Dec 2014 - Hybrid Connectivity Options
BTUG - Dec 2014 - Hybrid Connectivity OptionsBTUG - Dec 2014 - Hybrid Connectivity Options
BTUG - Dec 2014 - Hybrid Connectivity Options
 
Data saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewData saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overview
 
Modern data warehouse with Azure
Modern data warehouse with AzureModern data warehouse with Azure
Modern data warehouse with Azure
 
Using Redash for SQL Analytics on Databricks
Using Redash for SQL Analytics on DatabricksUsing Redash for SQL Analytics on Databricks
Using Redash for SQL Analytics on Databricks
 
Super charged prototyping
Super charged prototypingSuper charged prototyping
Super charged prototyping
 

Similar to 1 Introduction to Microsoft data platform analytics for release

Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AIJames Serra
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventTrivadis
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric IntroductionJames Serra
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?James Serra
 
Big Data Expo 2015 - Microsoft Transform you data into intelligent action
Big Data Expo 2015 - Microsoft Transform you data into intelligent actionBig Data Expo 2015 - Microsoft Transform you data into intelligent action
Big Data Expo 2015 - Microsoft Transform you data into intelligent actionBigDataExpo
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Dataconomy Media
 
Hoe het Azure ecosysteem een cruciale rol speelt in uw IoT-oplossing (Glenn C...
Hoe het Azure ecosysteem een cruciale rol speelt in uw IoT-oplossing (Glenn C...Hoe het Azure ecosysteem een cruciale rol speelt in uw IoT-oplossing (Glenn C...
Hoe het Azure ecosysteem een cruciale rol speelt in uw IoT-oplossing (Glenn C...Codit
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BIKellyn Pot'Vin-Gorman
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewAmazon Web Services
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the CloudRoss McNeely
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksAlberto Diaz Martin
 
Deep Learning Technical Pitch Deck
Deep Learning Technical Pitch DeckDeep Learning Technical Pitch Deck
Deep Learning Technical Pitch DeckNicholas Vossburg
 
MongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoT
MongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoTMongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoT
MongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoTMongoDB
 
Microsoft Azure
Microsoft AzureMicrosoft Azure
Microsoft AzureDavid Chou
 
Comparing Microsoft Big Data Platform Technologies
Comparing Microsoft Big Data Platform TechnologiesComparing Microsoft Big Data Platform Technologies
Comparing Microsoft Big Data Platform TechnologiesJen Stirrup
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformDATAVERSITY
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Riccardo Zamana
 

Similar to 1 Introduction to Microsoft data platform analytics for release (20)

Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
Big Data Expo 2015 - Microsoft Transform you data into intelligent action
Big Data Expo 2015 - Microsoft Transform you data into intelligent actionBig Data Expo 2015 - Microsoft Transform you data into intelligent action
Big Data Expo 2015 - Microsoft Transform you data into intelligent action
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
 
Hoe het Azure ecosysteem een cruciale rol speelt in uw IoT-oplossing (Glenn C...
Hoe het Azure ecosysteem een cruciale rol speelt in uw IoT-oplossing (Glenn C...Hoe het Azure ecosysteem een cruciale rol speelt in uw IoT-oplossing (Glenn C...
Hoe het Azure ecosysteem een cruciale rol speelt in uw IoT-oplossing (Glenn C...
 
IoT – The reality of real world solutions
IoT – The reality of real world solutions IoT – The reality of real world solutions
IoT – The reality of real world solutions
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
 
Azure synapse by usama whaba khan
Azure synapse by usama whaba khanAzure synapse by usama whaba khan
Azure synapse by usama whaba khan
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 
Global AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure DatabricksGlobal AI Bootcamp Madrid - Azure Databricks
Global AI Bootcamp Madrid - Azure Databricks
 
Deep Learning Technical Pitch Deck
Deep Learning Technical Pitch DeckDeep Learning Technical Pitch Deck
Deep Learning Technical Pitch Deck
 
MongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoT
MongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoTMongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoT
MongoDB IoT City Tour STUTTGART: The Microsoft Azure Platform for IoT
 
Microsoft Azure
Microsoft AzureMicrosoft Azure
Microsoft Azure
 
Comparing Microsoft Big Data Platform Technologies
Comparing Microsoft Big Data Platform TechnologiesComparing Microsoft Big Data Platform Technologies
Comparing Microsoft Big Data Platform Technologies
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
CC -Unit4.pptx
CC -Unit4.pptxCC -Unit4.pptx
CC -Unit4.pptx
 

More from Jen Stirrup

AI Applications in Healthcare and Medicine.pdf
AI Applications in Healthcare and Medicine.pdfAI Applications in Healthcare and Medicine.pdf
AI Applications in Healthcare and Medicine.pdfJen Stirrup
 
BUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATION
BUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATIONBUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATION
BUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATIONJen Stirrup
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners Jen Stirrup
 
Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...
Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...
Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...Jen Stirrup
 
Introduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and PythonIntroduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and PythonJen Stirrup
 
Sales Analytics in Power BI
Sales Analytics in Power BISales Analytics in Power BI
Sales Analytics in Power BIJen Stirrup
 
Analytics for Marketing
Analytics for MarketingAnalytics for Marketing
Analytics for MarketingJen Stirrup
 
Diversity and inclusion for the newbies and doers
Diversity and inclusion for the newbies and doersDiversity and inclusion for the newbies and doers
Diversity and inclusion for the newbies and doersJen Stirrup
 
Artificial Intelligence from the Business perspective
Artificial Intelligence from the Business perspectiveArtificial Intelligence from the Business perspective
Artificial Intelligence from the Business perspectiveJen Stirrup
 
How to be successful with Artificial Intelligence - from small to success
How to be successful with Artificial Intelligence - from small to successHow to be successful with Artificial Intelligence - from small to success
How to be successful with Artificial Intelligence - from small to successJen Stirrup
 
Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...
Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...
Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...Jen Stirrup
 
Data Visualization dataviz superpower
Data Visualization dataviz superpowerData Visualization dataviz superpower
Data Visualization dataviz superpowerJen Stirrup
 
R - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsR - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsJen Stirrup
 
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowArtificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowJen Stirrup
 
Blockchain Demystified for Business Intelligence Professionals
Blockchain Demystified for Business Intelligence ProfessionalsBlockchain Demystified for Business Intelligence Professionals
Blockchain Demystified for Business Intelligence ProfessionalsJen Stirrup
 
Examples of the worst data visualization ever
Examples of the worst data visualization everExamples of the worst data visualization ever
Examples of the worst data visualization everJen Stirrup
 
Lighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in AzureLighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in AzureJen Stirrup
 
Digital Transformation for the Human Resources Leader
Digital Transformation for the Human Resources LeaderDigital Transformation for the Human Resources Leader
Digital Transformation for the Human Resources LeaderJen Stirrup
 
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationDigital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationJen Stirrup
 
Distilled Power BI Updates for April 2016
Distilled Power BI Updates for April 2016Distilled Power BI Updates for April 2016
Distilled Power BI Updates for April 2016Jen Stirrup
 

More from Jen Stirrup (20)

AI Applications in Healthcare and Medicine.pdf
AI Applications in Healthcare and Medicine.pdfAI Applications in Healthcare and Medicine.pdf
AI Applications in Healthcare and Medicine.pdf
 
BUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATION
BUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATIONBUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATION
BUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATION
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners
 
Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...
Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...
Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...
 
Introduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and PythonIntroduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and Python
 
Sales Analytics in Power BI
Sales Analytics in Power BISales Analytics in Power BI
Sales Analytics in Power BI
 
Analytics for Marketing
Analytics for MarketingAnalytics for Marketing
Analytics for Marketing
 
Diversity and inclusion for the newbies and doers
Diversity and inclusion for the newbies and doersDiversity and inclusion for the newbies and doers
Diversity and inclusion for the newbies and doers
 
Artificial Intelligence from the Business perspective
Artificial Intelligence from the Business perspectiveArtificial Intelligence from the Business perspective
Artificial Intelligence from the Business perspective
 
How to be successful with Artificial Intelligence - from small to success
How to be successful with Artificial Intelligence - from small to successHow to be successful with Artificial Intelligence - from small to success
How to be successful with Artificial Intelligence - from small to success
 
Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...
Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...
Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...
 
Data Visualization dataviz superpower
Data Visualization dataviz superpowerData Visualization dataviz superpower
Data Visualization dataviz superpower
 
R - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsR - what do the numbers mean? #RStats
R - what do the numbers mean? #RStats
 
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowArtificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
 
Blockchain Demystified for Business Intelligence Professionals
Blockchain Demystified for Business Intelligence ProfessionalsBlockchain Demystified for Business Intelligence Professionals
Blockchain Demystified for Business Intelligence Professionals
 
Examples of the worst data visualization ever
Examples of the worst data visualization everExamples of the worst data visualization ever
Examples of the worst data visualization ever
 
Lighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in AzureLighting up Big Data Analytics with Apache Spark in Azure
Lighting up Big Data Analytics with Apache Spark in Azure
 
Digital Transformation for the Human Resources Leader
Digital Transformation for the Human Resources LeaderDigital Transformation for the Human Resources Leader
Digital Transformation for the Human Resources Leader
 
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationDigital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
 
Distilled Power BI Updates for April 2016
Distilled Power BI Updates for April 2016Distilled Power BI Updates for April 2016
Distilled Power BI Updates for April 2016
 

Recently uploaded

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsZilliz
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Recently uploaded (20)

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

1 Introduction to Microsoft data platform analytics for release

  • 1. Introduction to Analytics with the Microsoft Data Platform Jen Stirrup Data Whisperer, Data Relish Level: 300
  • 2. Jen Stirrup ·Consultant ·Postgraduate degrees in Artificial Intelligence and Cognitive Science ·Twenty year career in industry ·Author
  • 4. Agenda • Azure Data Explorer • Azure Data Factory • Streaming Analytics • Event Hubs • Azure SQL Database
  • 5. Agenda • Analysis Services • Data Lake Analytics • HDInsight • Azure Databricks • Azure SQL Datawarehouse • Azure Synapse
  • 6. What to use and when? A fully managed, elastic data warehouse with security at every level of scale at no extra cost Azure Synapse Analytics Fast, easy and collaborative Apache Spark-based analytics platform Azure Databricks A fully managed cloud Hadoop and Spark service backed by 99.9% SLA for your enterprise HDInsight A fully managed cloud service that enables you to easily build, deploy and share predictive analytics solutions Machine Learning An on-demand, real-time stream processing service with enterprise-grade security, auditing and support Stream Analytics
  • 7. What to use and when? A no-limits data lake built to support massively parallel analytics Data Lake Store A fully managed on-demand pay-per-job analytics service with enterprise-grade security, auditing and support Data Lake Analytics An enterprise-wide metadata catalogue that makes data asset discovery simple Azure Data Catalog A data integration service to orchestrate and automate data movement and transformation Data Factory
  • 8. INGEST Modern Data Warehouse PREPARE TRANSFORM & ENRICH SERVE STORE VISUALIZE On-premises data Cloud data SaaS data
  • 10. Azure Data Explorer Jupyter Notebook allows you to create and share documents that contain live code, equations, visualizations, and explanatory text. We are excited to announce KQL magic commands which extends the functionality of the Python kernel in Jupyter Notebook. KQL magic allows you to write KQL queries natively and query data from Microsoft Azure Data Explorer. You can easily interchange between Python and KQL, and visualize data using rich Plot.ly library integrated with KQL render commands. KQL magic supports Azure Data Explorer, Application Insights, and Log Analytics as data sources to run queries against.
  • 12. Azure Data Explorer Fast and highly scalable data exploration service. Azure Data Explorer is a fast, fully managed data analytics service for real-time analysis on large volumes of data streaming from applications, websites, IoT devices and more.
  • 13. Azure Data Explorer ● Low-latency ingestion ● Fast read-only query with high concurrency ● Query large amounts of structured, semi- structured (JSON-like nested types) and unstructured (free-text) data.
  • 16.
  • 18. Data Factory ● No code or maintenance required to build hybrid ETL and ELT pipelines within the Data Factory visual environment. ● Cost-efficient and fully managed serverless cloud data integration tool that scales on demand.
  • 19. Data Factory ● Azure security measures to connect to on- premises, cloud-based and software-as-a- service apps with peace of mind. ● SSIS integration runtime to easily move SSIS ETL workloads into the cloud with minimal effort.
  • 20. Data Factory ● Ingest, move, prepare, transform and process your data in a few clicks, and complete your data modelling within the accessible visual environment.
  • 21. Why Data Factory? ● Orchestrate, monitor & schedule data pipelines ● Automatic cloud resource management ● Single pane of glass
  • 22. • Data Sources, Linked Services & Datasets • Activities • Pipelines • Supported data sources • Supported activity types Module Overview
  • 25. Stream Analytics ● Build streaming pipelines in minutes - Run complex analytics with no need to learn new processing frameworks or provision virtual machines (VMs) or clusters. Use familiar SQL language that is extensible with JavaScript and C# custom code for more advanced use cases. Easily enable scenarios such as low-latency dashboarding, streaming ETL and real-time alerting with one-click integration across sources and sinks. ● Run mission-critical workloads with subsecond latencies - Get guaranteed, “exactly once” event processing with 99.9% availability and built-in recovery capabilities. Easily set up a continuous integration and continuous delivery (CI-CD) pipeline and achieve subsecond latencies on your most demanding workloads. ● Deploy in the cloud and on the edge - Bring real-time insights and analytics capabilities closer to where your data originates. Enable new scenarios with true hybrid architectures for stream processing and run the same query in the cloud or on the edge. ● Power real-time analytics with artificial intelligence - Take advantage of built-in machine learning (ML) models to shorten time to insights. Use ML-based capabilities to perform anomaly detection directly in your streaming jobs with Azure Stream Analytics.
  • 28. Event Hubs ● A hyper-scale telemetry ingestion service that collects, transforms and stores millions of events. ● Event Hubs is a fully managed, real-time data ingestion service that’s simple, trusted and scalable.
  • 29. Event Hubs ● Integrate seamlessly with other Azure services to unlock valuable insights. ● Experience real-time data ingestion and microbatching on the same stream.
  • 30. Event Hubs ● Focus on drawing insights from your data instead of managing infrastructure. Build real-time big data pipelines and respond to business challenges right away. ● Build real-time data pipelines with just a couple of clicks. Seamlessly integrate with Azure data services to uncover insights faster.
  • 31. Event Hubs ● Ingest millions of events per second - Continuously ingress data from hundreds of thousands of sources with low latency and configurable time retention.
  • 33. Analysis Services Focus on solving business problems, not learning new skills, when you use the familiar, integrated development environment of Visual Studio. Easily deploy your existing SQL Server 2016 tabular models to the cloud.
  • 35. Data Lake Analytics ● Easily develop and run massively parallel data transformation and processing programs in U-SQL, R, Python and .NET over petabytes of data. With no infrastructure to manage, you can process data on demand, scale instantly and only pay per job.
  • 36. Data Lake Analytics ● Process big data jobs in seconds with Azure Data Lake Analytics. There is no infrastructure to worry about because there are no servers, virtual machines or clusters to wait for, manage or tune.
  • 37. Data Lake Analytics ● Instantly scale the processing power, measured in Azure Data Lake Analytics Units (AU), from one to thousands for each job. You only pay for the processing that you use per job.
  • 38. Data Lake Analytics ● U-SQL is a simple, expressive and extensible language that allows you to write code once and have it automatically parallelised for the scale you need.
  • 39. Data Lake Analytics ● Process petabytes of data for diverse workload categories such as querying, ETL, analytics, machine learning, machine translation, image processing and sentiment analysis by leveraging existing libraries written in .NET languages, R or Python.
  • 41. Three Ages of Databases Data WarehouseData Warehouse RDBMS RDBMS RDBMS NoSQL 1985-1995 2010-Now1995-2010
  • 42. Pressures on single node RDBMS Scalability Single Node RDBMS OLAP/BI/Data Warehouse Social Networks Agile Schema Free
  • 45. What is big data? • When you have to innovate to collect, store, organize, analyse and share it. • - Werner Vogels, Amazon CTO
  • 46. What is Big Data? • Traditionally….. – Physics Experiments – Sensor data – Satellite data
  • 47. Now? • Now: zettabytes in the cloud expected by end of next year • https://datarelish.net/2019/11/07/whats- the-future-for-cloud-data-storage-clouds-of- glass/
  • 48. Azure Synapse ● Azure Synapse delivers insights from all your data, across data warehouses and big data analytics systems, with blazing speed. ● Data professionals can query both relational and non- relational data at petabyte-scale using the familiar SQL language ● Credit to the Microsoft team for help with these decks
  • 49. Azure Synapse ● Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. ● Query data on your terms, using either serverless on-demand or provisioned resources—at scale
  • 50. Azure Synapse • Discover powerful insights across your most important data • Unified analytics experience
  • 51. Azure Synapse • support for SSDT with Visual Studio 2019 • native platform integration with Azure DevOps • built-in continuous integration and deployment (CI/CD) capabilities for enterprise-level deployments
  • 53. Best in class price per performance Developer productivity Intelligent workload management Data flexibility Up to 94% less expensive than competitors Prioritize resources for the most valuable workloads Ingest variety of data sources to derive the maximum benefit Use preferred tooling for SQL data warehouse development Industry-leading security Defense-in-depth security and 99.9% financially backed availability SLA Azure Synapse
  • 54. DirectQuery Composite Models & Aggregation Tables The enterprise solution Avoid data movement Delegate query work to the back-end source; take advantage of Azure SQL Data Warehouse’s advanced features Why choose? Import and DirectQuery in a single model Keep summarized data local; get detail data from the source Import Great for small data sources and personal data discovery Fine for CSV files, spreadsheet data and summarized OLTP data Power BI
  • 55. Best in class price per performance Developer productivity Intelligent workload management Data flexibility Up to 94% less expensive than competitors Prioritize resources for the most valuable workloads Ingest variety of data sources to derive the maximum benefit Use preferred tooling for SQL data warehouse development Industry-leading security Defense-in-depth security and 99.9% financially backed availability SLA Azure SQL Data Warehouse
  • 56. Complete Data SecurityCategory Feature Data Protection Data in Transit Data Encryption at Rest Data Discovery and Classification Access Control Object Level Security (Tables/Views) Row Level Security Column Level Security Dynamic Data Masking SQL Login Authentication Azure Active Directory Multi-Factor Authentication Virtual Networks Network Security Firewall Azure ExpressRoute Thread Detection Threat Protection Auditing Vulnerability Assessment
  • 57. Workload Management Scale-In Isolation Predictablecost Online elasticity Efficient for unpredictableworkloads No cacheeviction forscaling Intra Cluster Workload Isolation (Scale In) Marketing CREATE WORKLOAD GROUP Sales WITH ( [ MIN_PERCENTAGE_RESOURCE = 60 ] [ CAP_PERCENTAGE_RESOURCE = 100 ] [ MAX_CONCURRENCY = 6 ] ) 40% Compute 1000c DWU 60% Sales 60% 100% M I C R OSOFT C O NFIDE NTI AL
  • 58. Heterogeneous Data Scale-In Isolation Predictablecost Online elasticity Efficient for unpredictableworkloads No cacheeviction forscaling
  • 60. Performance Optimized Storage Elastic Architecture Columnar Storage Columnar Ordering Table Partitioning Nonclustered Indexes Hash Distribution Materialized Views Resultset Cache
  • 61. Azure SQL Data Warehouse performance advantage Overview
  • 62. Complete Security Data In Transit Data encryption at rest (Service & User Managed Keys) Data Discovery and Classification Native Row Level Security Table and View Security (GRANT / DENY) Column Level Security Dynamic Data Masking SQL Authentication Native Azure Active Directory Integrated Security Multi-Factor Authentication Virtual Network (VNET) SQL Firewall (server) Integration with ExpressRoute SQL Threat Detection SQL Auditing Vulnerability Assessment Data Protection CATEGORY FEATURE SQL DATA WAREHOUSE Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
  • 65. What is Polybase? SQL Server Azure SQL Data Warehouse Parallel Data Warehouse Azure SQL Database
  • 67. Making business data accessible Provides a scalable, T-SQL-compatible query processing framework for combining data from both universes
  • 68. Polybase Purpose Consumer Analyst Scientist Data Volume Medium to Low Reasonable High -> Huge Degree of Structure Very High Some Low ->None Number of Users Very High Medium Low Transformation Complexity Low Medium to High High Analytics Complexity Low Medium Very High Data
  • 70.
  • 71. Agenda • Why machine learning in SQL Server? • How to leverage: – SQL Compute context – sp_execute_external_script features – PREDICT T-SQL Function • Call to action • Questions
  • 72. Why machine learning with SQL Server? Reduce or eliminate data movement with in-database analytics Operationalize machine learning models Get enterprise scale, performance, and security
  • 73. Machine Learning Services • R/Python Integration Design – Invokes runtime outside of SQL Server process – Batch-oriented operations • SQL Compute context
  • 74.
  • 75. Any R/Python IDE Data Scientist Workstation Typical Machine Learning workflow against database SQL Server Pull Data1 train <- sqlQuery(connection, “select * from nyctaxi_sample”) model <- glm(formula, train) 3 Model Output 2 Execution
  • 76. Any R/Python IDE Data Scientist Workstation rx* output 3 Machine Learning workflow using SQL compute context Execution2 SQL Server 2017 SQL Server R/Python Runtime Machine Learning Services Script1 cc <- RxInSqlServer( connectionString, computeContext) rxLogit(formula, cc) Model or Prediction s 4
  • 77. SQL Compute Context from R/Python client • Requirement - Use rx* functions
  • 78.
  • 79. Push data from SQL Server to external runtime sp_execute_external_ script @input_data_1 = N’ SELECT * FROM TrainingData’ InputDataset: data.frame OR Pandas dataframe
  • 80. Read Files with R Server • R can read almost all flat text files like SPSS, CSV, TXT. • Provide path direction and read file directory from the given path into R.
  • 81. RStudio & XDF File • In order to convert our text files either CSV or other text formats, into XDF format, RStudio can easily handle this task. • XDF file formats can only be read by R, and they are very small in size as compared to other files.
  • 83. Convert File to XDF • TXT or CSV files can be converted to XDF format . • XDF file formats can only be read by Rand they are very small in size as compared to other files.
  • 84. rxCrossTabs rxCrossTabs is used to create contingency tables from cross- classifying factors using a formula interface. rxCrossTabs() is also used to compute sums according to combinations of different variables
  • 85. RxCube • rxCube() performs a very similar function to rxCrossTabs(). • It computes tabulated sums or means. • rxCube() produces the sums or means in long format rather than a table. • This can be useful when we want to aggregate data for further analysis within R.
  • 86. dplyrXdf Package The dplyr package is a popular toolkit for data transformation and manipulation. dplyr supports data frames, data tables (from the data.table package) The dplyrXdf package implements such a backend for the xdf file format, a technology supplied as part of Revolution R Enterprise.
  • 87. Ggplot2 Ggplot2 allows you to create graphs that represent both univariate and multivariate numerical and categorical data in a straightforward manner This function can be used to create the most common graph types. It can create a very wide range of useful plots.
  • 88. Custom visualizations with rxSummary & rxCube • RxSummary: • The rxSummary function provides descriptive statistics using a formula argument.
  • 89. Custom visualizations with rxSummary & rxCube • RxCube • Rxcube is similar to rxSummary but it returns fewer statistical summaries and therefore run faster. • With y ~u : v as the formula, the rxCube returns count and averages for column y
  • 90. Custom visualizations with rxSummary & rxCube • RxCube • Code: rxc1 <- rxCube(trip_distance ~ picup_nb:dropoff_nb, • mht_xdf)
  • 91. rxHistogram • rxHistogram() is used to create a histogram for the Close variable. • Syntax: Function(formula, data, …) • Formula = A formula that contains the variable which you want to visualize. The
  • 92. rxlinePlot • Line or scatter plot use data from an .xdf file or data frame • Syntax: rxLinePlot(formula, data, …. ) • Formula = For this function, this formula should have one variable on the left side of the ~ that reflects the Y-axis, and one
  • 93. rxDataSteps • The rxDataStep function can be used to process data in chunks. • rxDataStep can be used to create and transform subsets of data.
  • 94. rxDataStep • • rxDatastep can be used to • Modify existing columns or add new columns to the data
  • 95. Subset Rows Of Data Using Transform Argument • A common use of rxDataStep is to create a new data set with a subset of rows and variables. • For this purpose, we use the data frame of our data as the input data set.
  • 96. On-the-fly Transformation • Analytical functions within the RevoScaleR package use a formal transformation function framework for generating on the fly variables • The RevoScaleR approach is to use the
  • 97. In-data transformation • There are two main approaches to in-data transformation: • Define an external based R function and reference it. • Define an embedded transformation as an input to a transforms argument on another function.
  • 98. Generate a data frame • A data frame is a table or a two- dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column.
  • 99. Generate a Data Frame • Code: SalesData <- file.path("D:/773Demo", "CustomerSalesInfo.xdf") • SalesDataFrame <- rxImport(inData = SalesData)
  • 100. POSIXct & POSIXIt • R provides several options for dealing with date and date/time data. The POSIXct and POSIXlt classes allow for dates and times with control for time zones.
  • 101. Transform functions ▪ It is a generic function which does useful things with data frames. ▪ Embedded transformations provide instructions within a formula, through arguments on a function. ▪ Using just arguments, you can manipulate data using transformations.
  • 102. Summary Improve performance of your ML scripts by using: – SQL Compute context from client (rx* functions) – Streaming to reduce memory usage – Trivial parallelism for scoring (predict or rxPredict) – Parallel training and scoring using rx* functions – Native PREDICT function for low latency scoring
  • 103. Call to action • Resources – SQL Server Samples on GitHub – R Services & ML Services – Getting started tutorials: AKA.MS/MLSQLDEV – Configure instance: SSMS Reports for ML Services – ML cheat sheet – Microsoft documentation: SQL Server Machine