Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Microsoft cloud big data strategy

Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.

  • Login to see the comments

Microsoft cloud big data strategy

  1. 1. About Me  Microsoft, Big Data Evangelist  In IT for 30 years, worked on many BI and DW projects  Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM architect, PDW/APS developer  Been perm employee, contractor, consultant, business owner  Presenter at PASS Business Analytics Conference, PASS Summit, Enterprise Data World conference  Certifications: MCSE: Data Platform, Business Intelligence; MS: Architecting Microsoft Azure Solutions, Design and Implement Big Data Analytics Solutions, Design and Implement Cloud Data Platform Solutions  Blog at  Former SQL Server MVP  Author of book “Reporting with Microsoft SQL Server 2012”
  2. 2. Agenda  Big data defined  Microsoft big data solution  Azure data lake
  3. 3. Big Data is changing traditional data warehousing … data warehousing has reached the most significant tipping point since its inception. The biggest, possibly most elaborate data management system in IT is changing. – Gartner, “The State of Data Warehousing”* * Donald Feinberg, Mark Beyer, Merv Adrian, Roxane Edjlali (Gartner), The State of Data Warehousing in 2012 (Stamford, CT.: Gartner, 2012) Data sources ETL Data warehouse BI and analytics
  4. 4. Big Data has new data characteristics Data complexity: variety and velocity Petabytes
  5. 5. Big Data is driving transformative changes Traditional Big Data Relational data with highly modeled schema All data with schema agility Specialized HW Commodity HW Data characteristics Costs Culture Operational reporting Focus on rear-view analysis Experimentation leading to intelligent action With machine learning, graph, a/b testing
  6. 6. Big Data introduces new culture of experimentation Understand customer patterns to uncover cross-sell opportunities Historical campaign effectiveness Generate year-end financial reports Financial monitoring with real-time recommendations to increase revenue Generate year-end financial reports Real-time product offers and promotions based on behavior Collect historical data on equipment performance Real-time monitoring to identify proactive maintenance Shipping features without understanding success Building successful features correlating user action with product experience
  7. 7. Action Value From data to decisions and actions
  8. 8. However, there are challenges to Big Data… Obtaining skills and capabilities Determining how to get value Integrating with existing IT investments *Gartner: Survey Analysis – Hadoop Adoption Drivers and Challenges (Stamford, CT.: Gartner, 2015)
  9. 9. But, Microsoft has done it before We needed to better leverage data and analytics to do more experimentation So we: • Designed a data lake for everyone to put their data into • Built tools approachable by any developer • Created machine learning tools for collaborating across large experiment models Result: • Across Microsoft, ten thousand developers doing experimentation leading to better insights • Leading to growth in our Microsoft businesses: • Office productivity revenue (45%YoY)* • Intelligent Cloud (100% YoY)* • Bing search share doubles 2010 2011 2012 2013 2014 2015 Growth of data @ Microsoft Windows SMSG Live Bing CRM/Dynamics Xbox Live Office365 Malware Protection Microsoft Stores Commerce Risk Skype LCA Exchange Yammer PetabytesExabytes * Microsoft. FY16 Q4 Results, URL:
  10. 10. Microsoft is now taking everything we’ve learned on this journey and bringing it to our customers Technology. Cost. Culture.
  11. 11. Big Data as a cornerstone of Cortana Intelligence Action People Automated Systems Apps Web Mobile Bots Intelligence Dashboards & Visualizations Cortana Bot Framework Cognitive Services Power BI Information Management Event Hubs Data Catalog Data Factory Machine Learning and Analytics HDInsight (Hadoop and Spark) Stream Analytics Intelligence Data Lake Analytics Machine Learning Big Data Stores Data Lake Store Data Sources Apps Sensors and devices Data SQL Data Warehouse
  12. 12. CONTROL EASE OF USE Azure Data Lake Analytics Azure Data Lake Store Azure Storage Any Hadoop technology Workload optimized, managed clusters Specific apps in a multi- tenant form factor Azure Marketplace HDP | CDH | MapR Azure Data Lake Analytics IaaS Hadoop Managed Hadoop Big Data as-a-service Azure HDInsight BIGDATA STORAGE BIGDATA ANALYTICS Bringing Big Data to everybody Accelerate the pace of innovation through a state-of-the-art cloud platform UserAdoption
  13. 13. Microsoft Big Data Portfolio SQL Server Stretch Business intelligence Machine learning analytics Insights Azure SQL Database SQL Server 2016 SQL Server 2016 Fast Track Azure SQL DW Azure Data Lake DocumentDB HDInsight Hadoop Analytics Platform System Sequential Scale Out + AcrossScale Up Key Relational Non-relational On-premisesCloud Microsoft has solutions covering and connecting all four quadrants – that’s why SQL Server is one of the most utilized databases in the world 16
  14. 14. Azure HDInsight A Cloud Spark and Hadoop service for the Enterprise Reliable with an industry leading SLA Enterprise-grade security and monitoring Productive platform for developers and scientists Cost effective cloud scale Integration with leading ISV applications Easy for administrators to manage 63% lower TCO than deploy your own Hadoop on-premises* *IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
  15. 15. Hortonworks Data Platform (HDP) 2.5 Simply put, Hortonworks ties all the open source products together (22) (under the covers of HDInsight)
  16. 16. Azure Data Lake Store A No limits Data Lake that powers Big Data Analytics Petabyte size files and Trillions of objects Scalable throughput for massively parallel analytics HDFS for the cloud Always encrypted, role-based security & auditing Enterprise-grade support
  17. 17. Azure Data Lake Analytics A No limits Analytics Job Service to power intelligent action Start in seconds, scale instantly, pay per job Develop massively parallel programs with simplicity Debug and optimize your big data programs with ease Virtualize your analytics Enterprise-grade security, auditing and support
  18. 18. Azure Data Lake YARN U-SQL Analytics HDInsight Hive R Server HDFS Store Store and analyze data of any kind and size Develop faster, debug and optimize smarter Interactively explore patterns in your data No learning curve Managed and supported Dynamically scales to match your business priorities Enterprise-grade security Built on YARN, designed for the cloud
  19. 19. Azure SQL Data Warehouse A relational data warehouse-as-a-service, fully managed by Microsoft. Industries first elastic cloud data warehouse with enterprise-grade capabilities. Integrated with on-premises and cloud assets. Simple compute & storage billing Pay for what you need High performance without rewriting applications Low cost for latent data Infrastructure, management and support provided Scales to petabytes of data with MPP processing Resize compute nodes < 1 minute Faster time to insight than other SMP offering Designed for “on-demand” workload Integrated with Azure platform and other Microsoft services Enables hybrid solutions Built on SQL Server experience & technology
  20. 20. PolyBase Query relational and non-relational data with T-SQL By preview early this year PolyBase will support Teradata, Oracle, SQL Server, MongoDB, Hadoop and Azure blob storage
  21. 21. Publish-subscribe data distribution Managed PaaS (Platform as a Service) solution Scales with your needs to millions of events per second Provides a durable buffer between event publishers and event consumers Azure Event Hubs
  22. 22. Azure Stream Analytics Process real-time data in Azure Consumes millions of real-time events from Event Hub collected from devices, sensors, infrastructure, and applications Performs time-sensitive analysis using SQL-like language against multiple real-time streams and reference data Outputs to persistent stores, dashboards or back to devices Point of Service Devices Self Checkout Stations Kiosks Smart Phones Slates/ Tablets PCs/ Laptops Servers Digital Signs Diagnostic EquipmentRemote Medical Monitors Logic Controllers Specialized DevicesThin Clients Handhelds Security POS Terminals Automation Devices Vending Machines Kinect ATM
  23. 23. Azure Machine Learning Get started with just a browser Requires no provisioning; simply log on to your Azure subscription or try it for free off Experience the power of choice Choose from hundreds of algorithms and packages from R and Python or drop in your own custom code Take advantage of business-tested algorithms from Xbox and Bing Deploy solutions in minutes With the click of a button, deploy the finished model as a web service that can connect to any data, anywhere Connect to the world Brand and monetize solutions on our global Machine Learning Marketplace Beyond business intelligence – machine intelligence Microsoft Azure Machine Learning Studio Modeling environment (shown) Microsoft Azure Machine Learning API service Model in production as a web service Microsoft Azure Machine Learning Marketplace APIs and solutions for broad use
  24. 24. Enable enterprise-wide self-service data source registration and discovery A metadata repository that allow users to register, enrich, understand, discover, and consume data sources Delivers differentiated value though ‒ Data source discovery; rather than data discovery ‒ Support for data from any source; Structured and unstructured, on premises and in the cloud ‒ Publishing, discovery and consumption through any tool ‒ Annotation crowdsourcing: empowering any user to capture and share their knowledge. This, while allowing IT to maintain control and oversight
  25. 25. Azure Data Factory Connect to relational or non- relational data that is on- premises or in the cloud Orchestrate data movement & data processing Publish to Power BI users as a searchable data view Operationalize (schedule, manage, debug) workflows Lifecycle management, monitoring Orchestrate trusted information production in Azure Microsoft Confidential – Under Strict NDA C# MapReduce Hive Pig Stored Procedures Azure Machine Learning
  26. 26. Discovery & exploration – Custom visualizations— R integration -
  27. 27. 146.03K145.84K145.96K146.06K 40.08K38.84K39.99K40.33K
  28. 28.
  29. 29. Microsoft Cognitive Services Give your apps a human side Cognitive Services API Collection
  30. 30. Azure Analysis Services Azure Analysis Services is based on the proven analytics engine that has helped organizations turn complex data into a trusted, single source of truth for years. Built for hybrid data Access and model data on-premises, in the cloud, or both Interactive visualization Quick, highly interactive self-service data discovery with support of major data visualization tools Proven technology Powerful, proven tabular models built from SQL Server 2016 Analysis Services Cloud powered Easy to deploy, scale, and manage as a platform-as- a-service solution
  31. 31. SQL Server R Services Linux Hadoop Teradata Windows CommercialCommunity R ServerR Open
  32. 32. Fully managed database service built on a native JSON data model Application controlled schema with massive scale-out enables iterative development and evolving data models Automatic indexing enables robust querying over schema-free data Integrated transactional JavaScript processing + tunable consistency enable high performance application experiences Azure DocumentDB
  33. 33. SQL Server on Linux (Preview today, GA in mid-2017) Red Hat - Microsoft Partnership (Nov 2015) Microsoft joins Eclipse Foundation (Mar 2016). HD Insight PaaS on Linux GA (Sep 2015) C:Usersmarkhill> root@localhost: # bash Azure Marketplace 60% of all images in Azure Marketplace are based on Linux/OSS In partnership with the Linux Foundation, Microsoft releases the Microsoft Certified Solutions Associate (MCSA) Linux on Azure certification. 493,141,677 ?????? Microsoft Open Source Hub Ross Gardler: President Apache Software Foundation Wim Coekaerts: Oracle’s Mr Linux 1 out of 4 VMs on Azure runs Linux, and getting larger every day • 28.9% of All VMs are Linux • >50% of new VMs
  34. 34. Azure Data Lake Big Data made easy Analytics on any data, any size Easier and more productive for all users Enterprise-ready
  35. 35. Azure Data Lake Big Data made easy Analytics on any data, any size Easier and more productive for all users Enterprise-ready
  36. 36. Petabyte size files and Trillions of objects • Store data in it’s native format • PB sized files, 200x larger than anyone else • Scalable throughput for massively parallel analytics • No need to redesign application or reparation data at higher scale TBs EBs Store
  37. 37. Any type of analytics • Batch, interactive, streaming, machine learning • Allows for exploratory analytics over data • Analyze with Hadoop and Microsoft solutions Cortana Intelligence Suite YARN U-SQL Analytics HDInsight HDFS Store Hive R Server
  38. 38. Start in seconds, Scale instantly, Pay per job with Analytics • Process big data jobs in 30 seconds • No infrastructure to worry about (no servers, no VMs, no clusters) • Instantly scale analytic units up or down (processing power) • Architected for cloud scale and performance • Frees you up to focus only on your business logic
  39. 39. Azure Data Lake Big Data made easy Analytics on any data, any size Easier and more productive for all users Enterprise-ready
  40. 40. Easy for administrators to spin up quickly • Deploy big data projects in minutes • No hardware to install, tune, configure or deploy • No infrastructure or software to manage • Scale to tens to thousands of machines instantly
  41. 41. Debug and Optimize your Big Data programs with ease • Deep integration with Visual Studio, Visual Studio Code, Eclipse, & IntelliJ • Easy for novices to write simple queries • Integrated with U-SQL, Hive, Storm, and Spark • Actively offers recommendations to improve performance and reduce cost • Playback visually displays job run
  42. 42. Develop massively parallel programs with simplicity • U-SQL: a simple and powerful language that’s familiar and easily extensible • Unifies the declarative nature of SQL with expressive power of C# • Leverage existing libraries in .NET languages, R and Python • Massively parallelize code on diverse workloads (ETL, ML, image tagging, facial detection)
  43. 43. Query data where it lives Easily query data in multiple Azure data stores without moving it to a single store Benefits • Avoid moving large amounts of data across the network between stores (federated query/logical data warehouse) • Single view of data irrespective of physical location • Minimize data proliferation issues caused by maintaining multiple copies • Single query language for all data • Each data store maintains its own sovereignty • Design choices based on the need • Push SQL expressions to remote SQL sources • Filters • Joins U-SQL Query Query Azure Storage Blobs Azure SQL in VMs Azure SQL DB Azure Data Lake Analytics Azure SQL Data Warehouse Azure Data Lake Storage
  44. 44. Easy for data scientists with familiar R language R Server for HDInsight • Largest portable R parallel analytics library • Terabyte-scale machine learning—1,000x larger than in open source R • Up to 100x faster performance using Spark and optimized vector/math libraries • Enterprise-grade security and support *Applies to HDInsight only
  45. 45. Azure Data Lake Big Data made easy Analytics on any data, any size Easier and more productive for all users Enterprise-ready
  46. 46. Highest availability guarantee in the industry for peace of mind • Managed, monitored and supported by Microsoft • Enterprise-leading SLA— 99.9% uptime • No IT resources needed for upgrades and patching • Microsoft monitors your deployment so you don’t have to 99.9% SLA
  47. 47. Azure Regions 38 Regions Worldwide, 32 Generally Available  100+ datacenters  Top 3 networks in the world  2.5x AWS, 7x Google DC Regions  G Series – Largest VM in World, 32 cores, 448GB Ram, SSD…
  48. 48. Always encrypted, Role-based security & Auditing • Always encrypted; in motion using SSL, and at rest using keys in Azure Key Vault • Single sign-on, multi-factor authentication and seamless integration of on-premises identities with Active Directory • Fine-grained POSIX-based ACLs for role-based access controls • Auditing every access / configuration change
  49. 49. Lower total cost of ownership • No hardware • Hadoop support included with Azure support • Pay only for what you use • Independently scale storage and compute • No need to hire specialized operations team • 63% lower total cost of ownership than on-premises* *IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
  50. 50. Recognized by top analysts Forrester Wave for Big Data Hadoop Cloud • Named industry leader by Forrester with the most comprehensive, scalable, and integrated platforms* • Recognized for its cloud-first strategy that is paying off* *The Forrester WaveTM: Big Data Hadoop Cloud Solutions, Q2 2016.
  51. 51. Q & A ? James Serra, Big Data Evangelist Email me at: Follow me at: @JamesSerra Link to me at: Visit my blog at: (where this slide deck is posted under the “Presentations” tab)