Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
About Me
 Business Intelligence Consultant, in IT for 30 years
 Microsoft, Big Data Evangelist
 Worked as desktop/web/d...
Agenda
 Collect + Manage
 Transform + Analyze
 Visual + Decide
 Access Methods
 Product Groupings
 Modern Data Wareh...
The Microsoft
Data Platform
MobileReports
Natural
language
queryDashboardsApplications
StreamingRelational
Internal &
exte...
Secure, reliable performance
Increase speed across all your data workloads
Capture any data: structured, unstructured, and...
Who manages what?
Infrastructure
as a Service
Storage
Servers
Networking
O/S
Middleware
Virtualization
Data
Applications
R...
SQL Server options
Azure SQL Database has a max
database size of 4TB; Managed
Instance max of 35TB
Potential total volume ...
Benefits of the cloud
Agility
• Unlimited elastic scale
• Pay for what you need
Innovation
• Quick “Time to market”
• Fail...
Cloud-born data4
Data sources
Our customer challenges
Increasing
data volumes
1
Real-time
business requests
2
New data sou...
Parallelism
• Uses many separate CPUs running in parallel to execute a single program
• Shared Nothing: Each CPU has its o...
50 TB
100 TB
500 TB
10 TB
5 PB
1.000
100
10.000
3-5 Way
Joins
 Joins +
 OLAP operations +
 Aggregation +
 Complex “Whe...
Microsoft data platform solutions
Product Category Description More Info
SQL Server 2016 RDBMS Earned top spot in Gartner’...
Microsoft Big Data Portfolio
SQL Server Stretch
Business intelligence
Machine learning analytics
Insights
Azure SQL Databa...
• Linux distributions including
RedHat Enterprise Linux (RHEL),
Ubuntu, and SUSE Enterprise
Linux (SLES)
• Docker: Windows...
Order history
Name SSN Date
Jane Doe cm61ba906fd 2/28/2005
Jim Gray ox7ff654ae6d 3/18/2005
John Smith i2y36cg776rg 4/10/20...
It can handle up to 384-cores and 24TB of memory! It use the HPE 3PAR StoreServ 8450 storage array
which consists of 192 S...
Options for data warehouse solutions
Balancing flexibility
and choice
By yourself With a reference
architecture
With an ap...
A workload-specific
database system design
and validation program
for Microsoft partners
and customers
Hardware system des...
Analytics Platform System (APS) for Big Data
Pre-Built Hardware + Software Appliance
• Co-engineered with HP, Dell, Quanta...
SQL Database Service
A relational database-as-a-service, fully managed by Microsoft.
For cloud-designed apps when near-zer...
Enhancements over SQL Server
• Create database in minutes
• HA built in
• DR with a few clicks
• Scale on the fly
• 99.99%...
Unmatched app
compatibility
• Fully-fledged
SQL instance
with nearly
100% compat
with on-prem
Unmatched
PaaS capabilities
...
Azure SQL Data Warehouse
A relational data warehouse-as-a-service, fully managed by Microsoft.
Industries first elastic cl...
Azure
Data Lake Store
A hyper-scale
repository for Big Data
analytics workloads
Hadoop File System (HDFS) for the cloud
No...
Azure
HDInsight
Hadoop and Spark
as a Service on Azure
Fully-managed Hadoop and Spark
for the cloud
100% Open Source Horto...
Hortonworks Data Platform (HDP) 2.6
Simply put, Hortonworks ties all the open source products together (22)
(under the cov...
Azure
Data Lake Analytics
A new distributed
analytics service
Job-as-a-service
Distributed analytics service built on
Apac...
Query data where it lives
Easily query data in multiple Azure data stores without moving it to a single store
Benefits
• A...
CONTROL EASE OF USE
Azure Data Lake
Analytics
Azure Data Lake Store
Azure Storage
Any Hadoop technology
Workload optimized...
Cloud Big Data Solution
Data lake is the center of a big data solution
A storage repository, usually Hadoop, that holds a vast amount of raw data ...
Data sources
What happened?
Why did
it happen?
Descriptive
Analytics
Diagnostic
Analytics
Why did it happen?
What will hap...
Roles when using both Data Lake and DW
Data Lake/Hadoop (staging and processing environment)
• Batch reporting
• Data refi...
A globally distributed, massively scalable, multi-model database service
Column-family
Document
Graph
Turnkey global distr...
Relational Databases vs Non-Relational Databases (NoSQL) vs Hadoop
• RDBMS for enterprise OLTP and ACID compliance, or db’...
Publish-subscribe data
distribution
Managed PaaS (Platform
as a Service) solution
Scales with your needs to
millions of ev...
Azure Stream Analytics
Process real-time data in Azure
Consumes millions of real-time events from Event Hub collected from...
SQL Server on Linux
(Preview today, GA in
mid-2017)
Red Hat - Microsoft
Partnership
(Nov 2015)
Microsoft joins Eclipse
Fou...
Microsoft Products vs Hadoop/OSS Products
Microsoft Product Hadoop/Open Source Software Product
Office365/Excel OpenOffice...
Connect, combine, and refine any data
Create data marts and publish reports
Build and test predictive models
Curate and ca...
Make sense of disparate data and prepare it for analysis
Connect, combine, and refine any data
Integration, Data Quality
a...
SQL Server Analysis Services
Azure Analysis Services
Azure Analysis Services is based on the proven analytics engine that has helped
organizations turn...
SSAS/Azure Analysis Services Cubes
Reasons to report off cubes instead of the data warehouse:
 Semantic layer
 Handle ma...
Use the power of machine learning to predict future trends or behavior
Build and test predictive models
• HDInsight
• SQL ...
Azure Machine Learning
Get started with just a browser
Requires no provisioning; simply log
on to your Azure subscription ...
SQL Server
R Services
Linux
Hadoop Teradata
Windows
CommercialCommunity
R ServerR Open
Enable enterprise-wide self-service data source registration and discovery
A metadata repository that allow users to regis...
Azure Data Factory
Connect to relational or non-
relational data that is on-
premises or in the cloud
Orchestrate data mov...
Discover, explore, and combine any data type or size,
regardless of location
Ask questions of data to visualize, analyze,
...
Power BI Overview
Power BI PlatformPower BI Desktop
Prepare Explore ShareReport
Power BI Service
Data refresh
Visualizatio...
Power BI Desktop Create Power BI Content
Connect to data and build reports for Power BI
146.03K145.84K145.96K146.06K 40.08K38.84K39.99K40.33K
Tools Defined
• Front-end (Excel) or Power BI Desktop
• Data shaping and cleanup, self-service ETL (Power Query)
• Data an...
SQL Server Reporting Services
www.botframework.com
Microsoft
Cognitive
Services
Give your apps
a human side
Cognitive Services API Collection
Connect live to your on-premises data
Live Query & Scheduled Data Refresh
PolyBase
Query relational and non-relational data with T-SQL
By preview this year PolyBase will add support for Teradata, ...
PolyBase use cases
Cortana Intelligence Suite
Transform data into intelligent action
Action
People
Automated
Systems
Apps
Web
Mobile
Bots
Int...
Stream Analytics
TransformIngest
Example overall data flow and Architecture
Web logs
Present &
decide
IoT, Mobile Devices
...
BI and analytics
Data management and processing
Data sources Non-relational data
Data enrichment and federated query
OLTP ...
Any BI tool
Advanced Analytics
Any languageBig Data processing
Data warehousing
Relational data
Dashboards | Reporting
Mob...
Data Sources Ingest Prepare
(normalize, clean, etc.)
Analyze
(stat analysis, ML, etc.)
Publish
(for programmatic
consumpti...
Near Realtime Data Analytics Pipeline using Azure Steam Analytics
Big Data Analytics Pipeline using Azure Data Lake
Intera...
Schneider Electric Architecture
Event hubs
Machine
Learning
Flatten &
Metadata Join
Data Factory: Move Data, Orchestrate, ...
Summary
Understand at a high
level all the
Microsoft data
platform products
Q & A ?
James Serra, Big Data Evangelist
Email me at: JamesSerra3@gmail.com
Follow me at: @JamesSerra
Link to me at: www.l...
Microsoft Data Platform - What's included
Microsoft Data Platform - What's included
Microsoft Data Platform - What's included
Microsoft Data Platform - What's included
Microsoft Data Platform - What's included
Microsoft Data Platform - What's included
Microsoft Data Platform - What's included
Microsoft Data Platform - What's included
Upcoming SlideShare
Loading in …5
×

Microsoft Data Platform - What's included

2,368 views

Published on

The pace of Microsoft product innovation is so fast that even though I spend half my days learning, I struggle to keep up. And as I work with customers I find they are often in the dark about many of the products that we have since they are focused on just keeping what they have running and putting out fires. So, let me cover what products you might have missed in the Microsoft data platform world. Be prepared to discover all the various Microsoft technologies and products for collecting data, transforming it, storing it, and visualizing it.  My goal is to help you not only understand each product but understand how they all fit together and there proper use case, allowing you to build the appropriate solution that can incorporate any data in the future no matter the size, frequency, or type. Along the way we will touch on technologies covering NoSQL, Hadoop, and open source.

Published in: Technology

Microsoft Data Platform - What's included

  1. 1. About Me  Business Intelligence Consultant, in IT for 30 years  Microsoft, Big Data Evangelist  Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM architect, PDW/APS developer  Been perm, contractor, consultant, business owner  Presenter at PASS Business Analytics Conference and PASS Summit  MCSE: Data Platform and Business Intelligence  MS: Architecting Microsoft Azure Solutions  Blog at JamesSerra.com  Former SQL Server MVP  Author of book “Reporting with Microsoft SQL Server 2012”
  2. 2. Agenda  Collect + Manage  Transform + Analyze  Visual + Decide  Access Methods  Product Groupings  Modern Data Warehouse  Sample architectures
  3. 3. The Microsoft Data Platform MobileReports Natural language queryDashboardsApplications StreamingRelational Internal & externalNon-relational NoSQL Orchestration Machine learningModeling Information management Complex event processing Transform + analyze Visualize + decide Collect + manage Data
  4. 4. Secure, reliable performance Increase speed across all your data workloads Capture any data: structured, unstructured, and streaming Scale your platform quickly to meet changing demands Collect and manage diverse data types with breakthrough speed Collect + manage Transform + analyze Visualize + decide Collect + manage Data
  5. 5. Who manages what? Infrastructure as a Service Storage Servers Networking O/S Middleware Virtualization Data Applications Runtime ManagedbyMicrosoft Youscale,make resilient&manage Platform as a Service Scale,Resilienceand managementbyMicrosoft Youmanage Storage Servers Networking O/S Middleware Virtualization Applications Runtime Data On Premises Physical / Virtual Youscale,makeresilientandmanage Storage Servers Networking O/S Middleware Virtualization Data Applications Runtime Software as a Service Storage Servers Networking O/S Middleware Virtualization Applications Runtime Data Scale,Resilienceand managementbyMicrosoft Windows Azure Virtual Machines Windows Azure Cloud Services
  6. 6. SQL Server options Azure SQL Database has a max database size of 4TB; Managed Instance max of 35TB Potential total volume size of up to 64 TB, 256TB soon
  7. 7. Benefits of the cloud Agility • Unlimited elastic scale • Pay for what you need Innovation • Quick “Time to market” • Fail fast Risk • Availability • Reliability • Security Total cost of ownership calculator: https://www.tco.microsoft.com/
  8. 8. Cloud-born data4 Data sources Our customer challenges Increasing data volumes 1 Real-time business requests 2 New data sources and types 3 Non-Relational Data
  9. 9. Parallelism • Uses many separate CPUs running in parallel to execute a single program • Shared Nothing: Each CPU has its own memory and disk (scale-out) • Segments communicate using high-speed network between nodes MPP - Massively Parallel Processing • Multiple CPUs used to complete individual processes simultaneously • All CPUs share the same memory, disks, and network controllers (scale-up) • All SQL Server implementations up until now have been SMP • Mostly, the solution is housed on a shared SAN SMP - Symmetric Multiprocessing
  10. 10. 50 TB 100 TB 500 TB 10 TB 5 PB 1.000 100 10.000 3-5 Way Joins  Joins +  OLAP operations +  Aggregation +  Complex “Where” constraints +  Views  Parallelism 5-10 Way Joins Normalized Multiple, Integrated Stars and Normalized Simple Star Multiple, Integrated Stars TB’s MB’s GB’s Batch Reporting, Repetitive Queries Ad Hoc Queries Data Analysis/Mining Near Real Time Data Feeds Daily Load Weekly Load Strategic, Tactical Strategic Strategic, Tactical Loads Strategic, Tactical Loads, SLA “Query Freedom“ “Query complexity“ “Data Freshness” “Query Data Volume“ “Query Concurrency“ “Mixed Workload” “Schema Sophistication“ “Data Volume” DW SCALABILITY SPIDER CHART MPP – Multidimensional Scalability SMP – Tunable in one dimension on cost of other dimensions The spiderweb depicts important attributes to consider when evaluating Data Warehousing options. Big Data support is newest dimension.
  11. 11. Microsoft data platform solutions Product Category Description More Info SQL Server 2016 RDBMS Earned top spot in Gartner’s Operational Database magic quadrant. JSON support. Linux TBD https://www.microsoft.com/en-us/server- cloud/products/sql-server-2016/ SQL Database RDBMS/DBaaS Cloud-based service that is provisioned and scaled quickly. Has built-in high availability and disaster recovery. JSON support https://azure.microsoft.com/en- us/services/sql-database/ SQL Data Warehouse MPP RDBMS/DBaaS Cloud-based service that handles relational big data. Provision and scale quickly. Can pause service to reduce cost https://azure.microsoft.com/en- us/services/sql-data-warehouse/ Analytics Platform System (APS) MPP RDBMS Big data analytics appliance for high performance and seamless integration of all your data https://www.microsoft.com/en-us/server- cloud/products/analytics-platform- system/ Azure Data Lake Store Hadoop storage Removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics https://azure.microsoft.com/en- us/services/data-lake-store/ Azure Data Lake Analytics On-demand analytics job service/Big Data-as-a- service Cloud-based service that dynamically provisions resources so you can run queries on exabytes of data. Includes U- SQL, a new big data query language https://azure.microsoft.com/en- us/services/data-lake-analytics/ HDInsight PaaS Hadoop compute/Hadoop clusters-as-a-service A managed Apache Hadoop, Spark, R, HBase, Kafka, and Storm cloud service made easy https://azure.microsoft.com/en- us/services/hdinsight/ Azure Cosmos DB PaaS NoSQL: Key-value, Column-family, Document, Graph Globally distributed, massively scalable, multi-model, multi- API, low latency data service – which can be used as an operational database or a hot data lake https://azure.microsoft.com/en- us/services/cosmos-db/ Azure Table Storage PaaS NoSQL: Key-value Store Store large amount of semi-structured data in the cloud https://azure.microsoft.com/en- us/services/storage/tables/
  12. 12. Microsoft Big Data Portfolio SQL Server Stretch Business intelligence Machine learning analytics Insights Azure SQL Database SQL Server 2017 SQL Server 2016 Fast Track Azure SQL DW ADLS & ADLA Cosmos DB HDInsight Hadoop Analytics Platform System Sequential Scale Out + AcrossScale Up Key Relational Non-relational On-premisesCloud Microsoft has solutions covering and connecting all four quadrants – that’s why SQL Server is one of the most utilized databases in the world
  13. 13. • Linux distributions including RedHat Enterprise Linux (RHEL), Ubuntu, and SUSE Enterprise Linux (SLES) • Docker: Windows & Linux containers • Windows Server / Windows 10 • Speed query performance without tuning using new Adaptive Query Processing NEW* • Maintain performance when making app changes with Automatic Plan Correction NEW* Power of SQL Server 2017 on the platform of your choice Linux Linux/Windows container Windows
  14. 14. Order history Name SSN Date Jane Doe cm61ba906fd 2/28/2005 Jim Gray ox7ff654ae6d 3/18/2005 John Smith i2y36cg776rg 4/10/2005 Bill Brown nx290pldo90l 4/27/2005 Sue Daniels ypo85ba616rj 5/12/2005 Sarah Jones bns51ra806fd 5/22/2005 Jake Marks mci12hh906fj 6/07/2005 Order history Name SSN Date Jane Doe cm61ba906fd 2/28/2005 Jim Gray ox7ff654ae6d 3/18/2005 John Smith i2y36cg776rg 4/10/2005 Bill Brown nx290pldo90l 4/27/2005 Customer data Product data Order History Stretch to cloud Stretch SQL Server into Azure (Stretch DB) Stretch cold data to Azure with remote query processing App Query Microsoft Azure  Jim Gray ox7ff654ae6d 3/18/2005
  15. 15. It can handle up to 384-cores and 24TB of memory! It use the HPE 3PAR StoreServ 8450 storage array which consists of 192 SSD drives (480GB/drive) for a total of 92TB of disk space.
  16. 16. Options for data warehouse solutions Balancing flexibility and choice By yourself With a reference architecture With an appliance Tuning and optimization Installation Configuration Tuning and optimization Installation Configuration Installation Tuning and optimization HIGH LOW Time to solution Optional, if you have hardware already Existing or procured hardware and support Procured software and support Offerings • SQL Server 2014/2016 • Windows Server 2012 R2/2016 • System Center 2012 R2/2016 Offerings • Private Cloud Fast Track • Data Warehouse Fast Track • Build or purchase Offerings • Analytics Platform System Existing or procured hardware and support Procured software and support Procured appliance and support HIGH Price
  17. 17. A workload-specific database system design and validation program for Microsoft partners and customers Hardware system design • Tight specifications for servers, storage, and networking • Resource balanced and validated • Latest-generation servers and storage, including solid-state disks (SSDs) Database configuration • Workload-specific • Database architecture • SQL Server settings • Windows Server settings • Performance guidance Software • SQL Server 2016 Enterprise • Windows Server 2012 R2 Windows Server 2012 R2 SQL Server 2016 Processors Networking Servers Storage https://www.microsoft.com/en-us/cloud-platform/data-warehouse-fast-track
  18. 18. Analytics Platform System (APS) for Big Data Pre-Built Hardware + Software Appliance • Co-engineered with HP, Dell, Quanta • Scale-out, up to 100x performance increase • Appliance installed in 1-2 days • Support - Microsoft provides first call support • Hardware partner provides onsite break/fix support PlugandPlay Built-inBest Practices SaveTime On-Premise Solution
  19. 19. SQL Database Service A relational database-as-a-service, fully managed by Microsoft. For cloud-designed apps when near-zero administration and enterprise-grade capabilities are key. Perfect for organizations looking to dramatically increase the DB:IT ratio.
  20. 20. Enhancements over SQL Server • Create database in minutes • HA built in • DR with a few clicks • Scale on the fly • 99.99% SLA • Point-in-time restore • Database Advisor (recommendations: index tuning, parameterized queries, schema issues) • Query performance insight • Query store • Auditing and threat detection
  21. 21. Unmatched app compatibility • Fully-fledged SQL instance with nearly 100% compat with on-prem Unmatched PaaS capabilities • Learns and adapts with customer app Favorable business model • Competitive • Transparent • Frictionless A flavor of SQL DB that designed to provide easy app migration to a fully managed PaaS SQL Database (DBaaS) Managed Instance Singleton Elastic Pool
  22. 22. Azure SQL Data Warehouse A relational data warehouse-as-a-service, fully managed by Microsoft. Industries first elastic cloud data warehouse with enterprise-grade capabilities. Support your smallest to your largest data storage needs while handling queries up to 100x faster.
  23. 23. Azure Data Lake Store A hyper-scale repository for Big Data analytics workloads Hadoop File System (HDFS) for the cloud No limits to scale Store any data in its native format Enterprise-grade access control, encryption at rest Optimized for analytic workload performance
  24. 24. Azure HDInsight Hadoop and Spark as a Service on Azure Fully-managed Hadoop and Spark for the cloud 100% Open Source Hortonworks data platform Clusters up and running in minutes Managed, monitored and supported by Microsoft with the industry’s best SLA Familiar BI tools for analysis, or open source notebooks for interactive data science 63% lower TCO than deploy your own Hadoop on-premises* *IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
  25. 25. Hortonworks Data Platform (HDP) 2.6 Simply put, Hortonworks ties all the open source products together (22) (under the covers of HDInsight)
  26. 26. Azure Data Lake Analytics A new distributed analytics service Job-as-a-service Distributed analytics service built on Apache YARN Elastic scale per query lets users focus on business goals—not configuring hardware Includes U-SQL—a language that unifies the benefits of SQL with the expressive power of C# Integrates with Visual Studio to develop, debug, and tune code faster Federated query across Azure data sources Enterprise-grade role based access control
  27. 27. Query data where it lives Easily query data in multiple Azure data stores without moving it to a single store Benefits • Avoid moving large amounts of data across the network between stores (federated query/logical data warehouse) • Single view of data irrespective of physical location • Minimize data proliferation issues caused by maintaining multiple copies • Single query language for all data • Each data store maintains its own sovereignty • Design choices based on the need • Push SQL expressions to remote SQL sources • Filters, Joins • SELECT * FROM EXTERNAL MyDataSource EXECUTE @”Select CustName from Customers WHERE ID=1”; (remote queries) • SELECT CustName FROM EXTERNAL MyDataSource LOCATION “dbo.Customers” WHERE ID=1 (federated queries) U-SQL Query Query Azure Storage Blobs Azure SQL in VMs Azure SQL DB Azure Data Lake Analytics Azure SQL Data Warehouse Azure Data Lake Storage
  28. 28. CONTROL EASE OF USE Azure Data Lake Analytics Azure Data Lake Store Azure Storage Any Hadoop technology Workload optimized, managed clusters Specific apps in a multi- tenant form factor Azure Marketplace HDP | CDH | MapR Azure Data Lake Analytics IaaS Hadoop Managed Hadoop Big Data as-a-service Azure HDInsight BIGDATA STORAGE BIGDATA ANALYTICS Bringing Big Data to everybody Accelerate the pace of innovation through a state-of-the-art cloud platform UserAdoption
  29. 29. Cloud Big Data Solution
  30. 30. Data lake is the center of a big data solution A storage repository, usually Hadoop, that holds a vast amount of raw data in its native format until it is needed. • Inexpensively store unlimited data • Collect all data “just in case” • Easy integration of differently-structured data • Store data with no modeling – “Schema on read” • Complements EDW • Frees up expensive EDW resources • Hadoop cluster offers faster ETL processing over SMP solutions • Quick user access to data • Data exploration to see if data valuable before writing ETL and schema for relational database • Allows use of Hadoop tools such as ETL and extreme analytics • Place to land IoT streaming data • On-line archive for data warehouse data • Easily scalable • With Hadoop, high availability built in
  31. 31. Data sources What happened? Why did it happen? Descriptive Analytics Diagnostic Analytics Why did it happen? What will happen? Predictive Analytics Prescriptive Analytics How can we make it happen?
  32. 32. Roles when using both Data Lake and DW Data Lake/Hadoop (staging and processing environment) • Batch reporting • Data refinement/cleaning • ETL workloads • Store historical data • Sandbox for data exploration • One-time reports • Data scientist workloads • Quick results Data Warehouse/RDBMS (serving and compliance environment) • Low latency • High number of users • Additional security • Large support for tools • Easily create reports (Self-service BI) • A data lake is just a glorified file folder with data files in it – how many end-users can accurately create reports from it?
  33. 33. A globally distributed, massively scalable, multi-model database service Column-family Document Graph Turnkey global distribution Elastic scale out of storage & throughput Guaranteed low latency at the 99th percentile Comprehensive SLAs Five well-defined consistency models Table API Key-value Azure Cosmos DB MongoDB API
  34. 34. Relational Databases vs Non-Relational Databases (NoSQL) vs Hadoop • RDBMS for enterprise OLTP and ACID compliance, or db’s under 5TB • NoSQL for scaled OLTP and JSON documents • Hadoop for big data analytics (OLAP) or Data Lake (from my presentation “Relational Databases vs Non-Relational Databases”)
  35. 35. Publish-subscribe data distribution Managed PaaS (Platform as a Service) solution Scales with your needs to millions of events per second Provides a durable buffer between event publishers and event consumers Azure Event Hubs
  36. 36. Azure Stream Analytics Process real-time data in Azure Consumes millions of real-time events from Event Hub collected from devices, sensors, infrastructure, and applications Performs time-sensitive analysis using SQL-like language against multiple real-time streams and reference data Outputs to persistent stores, dashboards or back to devices Point of Service Devices Self Checkout Stations Kiosks Smart Phones Slates/ Tablets PCs/ Laptops Servers Digital Signs Diagnostic EquipmentRemote Medical Monitors Logic Controllers Specialized DevicesThin Clients Handhelds Security POS Terminals Automation Devices Vending Machines Kinect ATM
  37. 37. SQL Server on Linux (Preview today, GA in mid-2017) Red Hat - Microsoft Partnership (Nov 2015) Microsoft joins Eclipse Foundation (Mar 2016). HD Insight PaaS on Linux GA (Sep 2015) C:Usersmarkhill> root@localhost: # bash Azure Marketplace 60% of all images in Azure Marketplace are based on Linux/OSS In partnership with the Linux Foundation, Microsoft releases the Microsoft Certified Solutions Associate (MCSA) Linux on Azure certification. 493,141,677 ?????? Microsoft Open Source Hub Ross Gardler: President Apache Software Foundation Wim Coekaerts: Oracle’s Mr Linux 1 out of 4 VMs on Azure runs Linux, and getting larger every day • 28.9% of All VMs are Linux • >50% of new VMs
  38. 38. Microsoft Products vs Hadoop/OSS Products Microsoft Product Hadoop/Open Source Software Product Office365/Excel OpenOffice/Calc DocumentDB MongoDB, HBase, Cassandra SQL Database SQLite, MySQL, PostgreSQL, MariaDB Azure Data Lake Analytics/YARN None Azure VM/IaaS OpenStack Blob Storage HDFS, Ceph (Note: These are distributed file systems and Blob storage is not distributed) Azure HBase Apache HBase (Azure HBase is a service wrapped around Apache HBase), Apache Trafodion Event Hub Apache Kafka Azure Stream Analytics Apache Storm, Apache Spark, Twitter Heron Power BI Apache Zeppelin, Apache Jupyter, Airbnb Caravel, Kibana HDInsight Hortonworks (pay), Cloudera (pay), MapR (pay) Azure ML Apache Mahout, Apache Spark MLib Microsoft R Open R SQL Data Warehouse Apache Hive, Apache Drill, Presto IoT Hub Apache NiFi Azure Data Factory Apache Falcon, Apache Oozie, Airbnb Airflow Azure Data Lake Storage/WebHDFS HDFS Ozone Azure Analysis Services/SSAS Apache Kylin, Apache Lens, AtScale (pay) SQL Server Reporting Services None Hadoop Indexes Jethro Data (pay) Azure Data Catalog Apache Atlas PolyBase Apache Drill Azure Search Apache Solr, Apache ElasticSearch (Azure Search build on ES) Others Apache Flink, Apache Ambari, Apache Ranger, Apache Knox Note: Many of the Hadoop/OSS products are available in Azure
  39. 39. Connect, combine, and refine any data Create data marts and publish reports Build and test predictive models Curate and catalog any data Transform + analyze Transform + analyze Visualize + decide Collect + manage Data Transform and analyze data for anyone to access anywhere
  40. 40. Make sense of disparate data and prepare it for analysis Connect, combine, and refine any data Integration, Data Quality and Master Data Services • Rich support for ETL tasks • Data cleansing and matching • Manage master data structures Connect any data and all volumes in real time • Social data • SAP and Dynamics data • Machine data
  41. 41. SQL Server Analysis Services
  42. 42. Azure Analysis Services Azure Analysis Services is based on the proven analytics engine that has helped organizations turn complex data into a trusted, single source of truth for years. Built for hybrid data Access and model data on-premises, in the cloud, or both Interactive visualization Quick, highly interactive self-service data discovery with support of major data visualization tools Proven technology Powerful, proven tabular models built from SQL Server 2016 Analysis Services Cloud powered Easy to deploy, scale, and manage as a platform-as- a-service solution
  43. 43. SSAS/Azure Analysis Services Cubes Reasons to report off cubes instead of the data warehouse:  Semantic layer  Handle many concurrent users  Aggregating data for performance  Multidimensional analysis  No joins or relationships  Hierarchies, KPI’s  Security  Advanced time-calculations  Slowly Changing Dimensions (SCD)  Required for some reporting tools
  44. 44. Use the power of machine learning to predict future trends or behavior Build and test predictive models • HDInsight • SQL Server VM • SQL DB • Blobs and tables Publish API in minutes Devices Applications Dashboards Data Microsoft Azure Machine Learning API Storage space Web Microsoft Azure portal Workspace ML Studio Business problem Business valueModeling Deployment • Desktop files • Excel spreadsheet • Other data files on PC Cloud Local
  45. 45. Azure Machine Learning Get started with just a browser Requires no provisioning; simply log on to your Azure subscription or try it for free off azure.com/ml Experience the power of choice Choose from hundreds of algorithms and packages from R and Python or drop in your own custom code Take advantage of business-tested algorithms from Xbox and Bing Deploy solutions in minutes With the click of a button, deploy the finished model as a web service that can connect to any data, anywhere Connect to the world Brand and monetize solutions on our global Machine Learning Marketplace https://datamarket.azure.com/ Beyond business intelligence – machine intelligence Microsoft Azure Machine Learning Studio Modeling environment (shown) Microsoft Azure Machine Learning API service Model in production as a web service Microsoft Azure Machine Learning Marketplace APIs and solutions for broad use
  46. 46. SQL Server R Services Linux Hadoop Teradata Windows CommercialCommunity R ServerR Open
  47. 47. Enable enterprise-wide self-service data source registration and discovery A metadata repository that allow users to register, enrich, understand, discover, and consume data sources Delivers differentiated value though ‒ Data source discovery; rather than data discovery ‒ Support for data from any source; Structured and unstructured, on premises and in the cloud ‒ Publishing, discovery and consumption through any tool ‒ Annotation crowdsourcing: empowering any user to capture and share their knowledge. This, while allowing IT to maintain control and oversight
  48. 48. Azure Data Factory Connect to relational or non- relational data that is on- premises or in the cloud Orchestrate data movement & data processing Publish to Power BI users as a searchable data view Operationalize (schedule, manage, debug) workflows Lifecycle management, monitoring Orchestrate trusted information production in Azure Microsoft Confidential – Under Strict NDA C# MapReduce Hive Pig Stored Procedures Azure Machine Learning
  49. 49. Discover, explore, and combine any data type or size, regardless of location Ask questions of data to visualize, analyze, and forecast Make faster decisions, share broadly, and access insights on any device Visualize + decide Transform + analyze Visualize + decide Collect + manage Data Visualize data and make decisions quickly using everyday tools
  50. 50. Power BI Overview Power BI PlatformPower BI Desktop Prepare Explore ShareReport Power BI Service Data refresh Visualizations Live dashboards Content packs Sharing & collaborationNatural language query Reports Datasets01001 10101 </> embed, extend, integrate Data sources Cloud-based SaaS solutions e.g. Marketo, Salesforce, Quickbooks, Google Analytics, … On-premises data e.g. Analysis Services, SQL Server Organizational content packs Corporate data sources or external data services Azure services Azure SQL, Stream Analytics… Excel and CSV files Workbook data, flat files Power BI Desktop files Data from files, databases, Azure, Online Services, and other sources
  51. 51. Power BI Desktop Create Power BI Content Connect to data and build reports for Power BI
  52. 52. 146.03K145.84K145.96K146.06K 40.08K38.84K39.99K40.33K
  53. 53. Tools Defined • Front-end (Excel) or Power BI Desktop • Data shaping and cleanup, self-service ETL (Power Query) • Data analysis (Power Pivot) • Visualization and data discovery (Power View, Power Map) • Dashboarding (Power BI Dashboard) • Publishing and sharing (Power BI Service) • Natural language query (Power BI Q&A) • Mobile (Power BI for Mobile) • Access on-premise data (DMG, Analysis Services Connector) • Power BI Service updated bi-weekly, Power BI Desktop updated monthly Power Query Power Pivot Power View Power Map Power BI Desktop Power BI Dashboard Power BI Service Power BI Q&A Power BI for mobile
  54. 54. SQL Server Reporting Services
  55. 55. www.botframework.com
  56. 56. Microsoft Cognitive Services Give your apps a human side Cognitive Services API Collection
  57. 57. Connect live to your on-premises data Live Query & Scheduled Data Refresh
  58. 58. PolyBase Query relational and non-relational data with T-SQL By preview this year PolyBase will add support for Teradata, Oracle, SQL Server, MongoDB, and generic ODBC (Spark, Hive, Impala, DB2) vs U-SQL: PolyBase is interactive while U-SQL is batch. PolyBase extents T-SQL onto data via views while U-SQL natively operates on data and virtualizes access to other SQL data sources (no metadata needed) and supports more formats (JSON) and libraries/UDOs
  59. 59. PolyBase use cases
  60. 60. Cortana Intelligence Suite Transform data into intelligent action Action People Automated Systems Apps Web Mobile Bots Intelligence Dashboards & Visualizations Cortana Bot Framework Cognitive Services Power BI Information Management Event Hubs Data Catalog Data Factory Machine Learning and Analytics HDInsight (Hadoop and Spark) Stream Analytics Intelligence Data Lake Analytics Machine Learning Big Data Stores SQL Data Warehouse Data Lake Store Data Sources Apps Sensors and devices Data
  61. 61. Stream Analytics TransformIngest Example overall data flow and Architecture Web logs Present & decide IoT, Mobile Devices etc. Social Data Event Hubs HDInsight Azure Data Factory Azure SQL DB Azure Blob Storage Azure Machine Learning (Fraud detection etc.) Power BI Web dashboards Mobile devices DW / Long-term storage Predictive analytics Event & data producers Analytics Platform Sys.
  62. 62. BI and analytics Data management and processing Data sources Non-relational data Data enrichment and federated query OLTP ERP CRM LOB Devices Web Sensors Social Self-service Corporate Collaboration Mobile Machine learning Single query model Extract, transform, load Data quality Master data management Box software Appliances Cloud SQL Server Box software Appliances Cloud
  63. 63. Any BI tool Advanced Analytics Any languageBig Data processing Data warehousing Relational data Dashboards | Reporting Mobile BI | Cubes Machine Learning Stream analytics | Cognitive | AI .NET | Java | R | Python Ruby | PHP | Scala Non-relational data Datavirtualization OLTP ERP CRM LOB The Data Management Platform for Analytics Social media DevicesWeb Media On-premises Cloud
  64. 64. Data Sources Ingest Prepare (normalize, clean, etc.) Analyze (stat analysis, ML, etc.) Publish (for programmatic consumption, BI/visualization) Consume (Alerts, Operational Stats, Insights) Lambda Architecture : Interactive Analytics Pipeline Data Consumption (Ingestion) Stream Layer (data in motion) Batch Layer (data at rest) Presentation/Serving Layer
  65. 65. Near Realtime Data Analytics Pipeline using Azure Steam Analytics Big Data Analytics Pipeline using Azure Data Lake Interactive Analytics and Predictive Pipeline using Azure Data Factory Base Architecture : Big Data Advanced Analytics Pipeline Data Sources Ingest Prepare (normalize, clean, etc.) Analyze (stat analysis, ML, etc.) Publish (for programmatic consumption, BI/visualization) Consume (Alerts, Operational Stats, Insights) Machine Learning (Failure and RCA Predictions) Telemetry Azure SQL (Predictions) HDI Custom ETL Aggregate /Partition Azure Storage Blob dashboard of predictions / alerts Live / real-time data stats, Anomalies and aggregates Custome r MIS Event Hub PowerBI dashboard Stream Analytics (real-time analytics) Azure Data Lake Analytics (Big Data Processing) Azure Data Lake Storage Azure SQL (COL + TACOPS) Data in MotionData at Rest dashboard of operational stats FDS + SDS (Shared with field Ops, customers, MIS, and Engineers) Scheduledhourly transferusingAzure DataFactory Machine Learning (Anomaly Detection)
  66. 66. Schneider Electric Architecture Event hubs Machine Learning Flatten & Metadata Join Data Factory: Move Data, Orchestrate, Schedule, and Monitor Machine Learning Azure SQL Data Warehouse Power BI INGEST PREPARE ANALYZE PUBLISH ASA Job Rule #2 CONSUMEDATA SOURCES Cortana Web/LOB Dashboards On Premise Hot Path Cold Path Archived Data Data Lake Store Simulated Sensors and devices Blobs – Reference Data Event hubs ASA Job Rule #1 Event hubs Real-time Scoring Aggregated Data Data Lake Store CSV Data Data Lake Store Data Lake Analytics Batch Scoring Offline Training Hourly, Daily, Monthly Roll Ups Ingestion Batch PresentationSpeed
  67. 67. Summary Understand at a high level all the Microsoft data platform products
  68. 68. Q & A ? James Serra, Big Data Evangelist Email me at: JamesSerra3@gmail.com Follow me at: @JamesSerra Link to me at: www.linkedin.com/in/JamesSerra Visit my blog at: JamesSerra.com (where this slide deck will be posted)

×