Bi and AI updates in the Microsoft Data Platform stack
Powered by
BI and AI in the
Microsoft data
platform
universe
(with a dash of cloud)
Ivan Donev
Agenda
• What’s new in SQL 2019 for BI
• Important updates on Azure for Data platform
• AI and ML for the masses
What is new in 2019 for BI
• SQL Server AS
• MD nothing
• Tabular – Calc groups, M:M relationships, dynamic formatting
• SQL Server IS
• Nothing
• SQL Server RS
• Nothing
• SQL Server MDM and DQS
• Almost nothing
DEMO with AS 2019
Calculation groups
Dynamic formatting
M:M in tabular
Updates in Azure SQL DB
• Azure SQL DB – Hyperscale• Azure SQL DB – Serverless
Azure SQL DB – Serverless
• Single DB Serverless compute tier
• Billed on compute used per SECOND
• Used only in the vCore model
• Parametrize the min/max vCores
• Scenarios
• Intermittent usage
• Frequently rescaled DBs
• New deployments prior historical usage data
Azure SQL DB – Hyperscale
• Up to 100TB
• Fast backups (filesystem snapshots)
• Up to a minute restores
• Faster throughput
• Fast scale out and scale up
• Distributed architecture
Modern DWH – important updates in Ingest
• Azure Data Factory v2
• Integration runtimes to run SSIS
as-is
• Storing SSIS catalogue in SQL DB
• Mapping workflow
• Wrangling workflow
Modern DWH – important updates in Store
• Azure Data Lake Gen 2
• Hierarchical file system
• Security
• Performance
• Much easier to integrate with
other services
Modern DWH – important updates in Prep and
Train
• Azure Databricks Delta
• Spark engine with RDBMS features
Databricks Delta
• ACID transactions
• Versioned PARQUET files
• Streaming writes to a table (i.e.
Kafka)
• Batch upserts
• High performance reads
• Schema enforcement
Modern DWH – important updates in Model and
Serve
• Changes in Azure DWH
• Concurrency increased to 128
• Adaptive caching (NVMe !!!)
• Unlimited Columnstore storage
capacity
• Workload classification and
importance improvements
• Changes in PowerBI
PowerBI Updates worth noting
• PowerBI Dataflows
• Self-service data transformation
• Shared and certified datasets
(preview)
• Paginated Reports (SSRS)
• Premium
• XMLA Endpoints
• Premium
• Auto ML
• Premium
• CDS integration
The ML in BI
Not scalableSelf-service AI
•Prototyping
•Do not need additional configuration or tuning
•Options are
•Microsoft Cognitive services with PowerBI (demo)
•AutoML in Dataflows in PowerBI Premium
•R/Python visuals
Scalable, configurable, needs specialized staffEnterprise AI
•Mandatory to run ML and store it in the Store/Serve model
•Options are
•Databricks
•Azure ML
•R/Python in Dataflow as data sources
How to choose?
• The aim?
• Prototype/Test/Verify
• Production/O16N
• The knowledge
• R/Python/Scala/Java/…
• The task
• Image processing/Text analytics/Prediction/Classification
• The post-production support
• Can you support the solution afterwards?
THANK YOU
All my demos will be described and uploaded on our blog:
http://sqlmasteracademy.com/techblog/
Editor's Notes
Serverless - https://docs.microsoft.com/en-us/azure/sql-database/sql-database-serverless
Scenarios well-suited for serverless compute
Single databases with intermittent, unpredictable usage patterns interspersed with periods of inactivity and lower average compute utilization over time.
Single databases in the provisioned compute tier that are frequently rescaled and customers who prefer to delegate compute rescaling to the service.
New single databases without usage history where compute sizing is difficult or not possible to estimate prior to deployment in SQL Database.
Hyperscale - https://docs.microsoft.com/en-us/azure/sql-database/sql-database-service-tier-hyperscale
Support for up to 100 TB of database size
Nearly instantaneous database backups (based on file snapshots stored in Azure Blob storage) regardless of size with no IO impact on Compute
Fast database restores (based on file snapshots) in minutes rather than hours or days (not a size of data operation)
Higher overall performance due to higher log throughput and faster transaction commit times regardless of data volumes
Rapid scale out - you can provision one or more read-only nodes for offloading your read workload and for use as hot-standbys
Rapid Scale up - you can, in constant time, scale up your compute resources to accommodate heavy workloads as and when needed, and then scale the compute resources back down when not needed.
What exactly is the Azure DWH landscape at its core.
All other services like Logic apps, functions, etc. are auxiliary to the main concept
What exactly is the Azure DWH landscape at its core.
All other services like Logic apps, functions, etc. are auxiliary to the main concept
What exactly is the Azure DWH landscape at its core.
All other services like Logic apps, functions, etc. are auxiliary to the main concept
What exactly is the Azure DWH landscape at its core.
All other services like Logic apps, functions, etc. are auxiliary to the main concept