Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How does Microsoft solve Big Data?


Published on

So you got a handle on what Big Data is and how you can use it to find business value in your data.  Now you need an understanding of the Microsoft products that can be used to create a Big Data solution.  Microsoft has many pieces of the puzzle and in this presentation I will show how they fit together.  How does Microsoft enhance and add value to Big Data?  From collecting data, transforming it, storing it, to visualizing it, I will show you Microsoft’s solutions for every step of the way

Published in: Technology
  • We called it "operation mind control" - as we discovered a simple mind game that makes a girl become obsessed with you. (Aand it works even if you're not her type or she's already dating someone else) Here's how we figured it out... 
    Are you sure you want to  Yes  No
    Your message goes here
  • ♣♣ 10 Easy Ways to Improve Your Performance in Bed... ♣♣♣
    Are you sure you want to  Yes  No
    Your message goes here
  • ➤➤ 3 Reasons Why You Shouldn't take Pills for ED (important) ➤➤
    Are you sure you want to  Yes  No
    Your message goes here

How does Microsoft solve Big Data?

  1. 1. How does Microsoft solve Big Data? James Serra Big Data Evangelist Microsoft
  2. 2. Other Presentations  Building an Effective Data Warehouse Architecture Reasons for building a DW and the various approaches and DW concepts (Kimball vs Inmon)  Building a Big Data Solution (Building an Effective Data Warehouse Architecture with Hadoop, the cloud and MPP) Explains what Big Data is, it’s benefits including use cases, and how Hadoop, the cloud, and MPP fit in  Finding business value in Big Data (What exactly is Big Data and why should I care?) Very similar to “Building a Big Data Solution” but target audience is business users/CxO instead of architects  How does Microsoft solve Big Data? Covers the Microsoft products that can be used to create a Big Data solution  Modern Data Warehousing with the Microsoft Analytics Platform System The next step in data warehouse performance is APS, a MPP appliance  Power BI, Azure ML, Azure HDInsights, Azure Data Factory, etc Deep dives into the various Microsoft Big Data related products
  3. 3. About Me  Business Intelligence Consultant, in IT for 30 years  Microsoft, Big Data Evangelist  Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM architect, PDW/APS developer  Been perm, contractor, consultant, business owner  Presenter at PASS Business Analytics Conference and PASS Summit  MCSE: Data Platform and Business Intelligence  MS: Architecting Microsoft Azure Solutions  Blog at  Former SQL Server MVP  Author of book “Reporting with Microsoft SQL Server 2012”
  4. 4. I tried understanding all the Microsoft Big Data products… And ended up passed-out drunk in a Denny’s parking lot Let’s prevent that from happening…
  5. 5. Agenda  Collect + Manage  Transform + Analyze  Visual + Decide  Access Methods  Product Groupings  Modern Data Warehouse  Sample architectures
  6. 6. Microsoft’s portfolio of products • Windows • Visual Studio • .NET • Azure, HDInsight • Power BI: Power Query, Power Map, PowerPivot, Power View • Azure ML • APS • SQL Server, Azure SQL DB • SCOM • SSAS, SSRS, SSIS • Excel • Report Builder • PerformancePoint • SharePoint • DQS • MDS • Data Lake • SQL DW Microsoft has all the Lego's to build anything you want, but difficulty is determining how the pieces fit together
  7. 7. The Microsoft data platform MobileReports Natural language queryDashboardsApplications StreamingRelational Internal & externalNon-relational NoSQL Orchestration Machine learningModeling Information management Complex event processing Transform + analyze Visualize + decide Collect + manage Data
  8. 8. Secure, reliable performance Increase speed across all your data workloads Capture any data: structured, unstructured, and streaming Scale your platform quickly to meet changing demands Collect and manage diverse data types with breakthrough speed Collect + manage Transform + analyze Visualize + decide Collect + manage Data
  9. 9. SQL Server options Azure SQL Database has a max database size of 500GB Potential total volume size of up to 64 TB
  10. 10. Cloud-born data4 Data sources Our customer challenges Increasing data volumes 1 Real-time business requests 2 New data sources and types 3 Non-Relational Data
  11. 11. Parallelism • Uses many separate CPUs running in parallel to execute a single program • Shared Nothing: Each CPU has its own memory and disk (scale-out) • Segments communicate using high-speed network between nodes MPP - Massively Parallel Processing • Multiple CPUs used to complete individual processes simultaneously • All CPUs share the same memory, disks, and network controllers (scale-up) • All SQL Server implementations up until now have been SMP • Mostly, the solution is housed on a shared SAN SMP - Symmetric Multiprocessing
  12. 12. Analytics Platform System (APS) for Big Data Pre-Built Hardware + Software Appliance • Co-engineered with HP, Dell, Quanta • Scale-out, up to 100x performance increase • Optional Hadoop region • Appliance installed in 1-2 days • Support - Microsoft provides first call support • Hardware partner provides onsite break/fix support PlugandPlay Built-inBest Practices SaveTime On-Premise Solution
  13. 13. Office 365 Azure
  14. 14. YARN U-SQL Analytics Service HDInsight HDFS Store Introducing Azure Data Lake Store No fixed limits file size (PB file sizes) Designed for diversity of analytic workloads Accessible to all HDFS compliant analytic applications (Hortonworks, Cloudera, MapR) Managed, monitored, and supported by Microsoft Enterprise grade features around security, compliance & management
  15. 15. Support HBase as NoSQL columnar database on Azure Blobs Support Storm as stream processing Hadoop in Azure (HDP under the covers) Data Node Data Node Data Node Data Node Task Tracker Task Tracker Task Tracker Task Tracker Name Node Job Tracker HMaster Coordination Region Server Region Server Region Server Region Server HBase as a columnar NoSQL transactional database running on Azure Blobs Storm as a streaming service for near real time processing Hadoop 2.4 support for 100x query gains on Hive queries Mahout support for machine learning + Hadoop Graphical User Interface for HIVE queries
  16. 16. Microsoft Azure Data Lake YARN U-SQL Analytics Service HDInsight Store HDFS Announcing Azure Data Lake Analytics Service Distributed analytics service Dynamically scales to meet your business needs Productive day one with industry leading development tools (for novices & experts) Analytics over all data (unstructured, semi- structured, structured) U-SQL: simple and familiar, easily extensible Hive coming soon Built on open standards (YARN)
  17. 17. Data sources What happened? Why did it happen? Descriptive Analytics Diagnostic Analytics Why did it happen? What will happen? Predictive Analytics Prescriptive Analytics How can we make it happen?
  18. 18. Azure Stream Analytics Process real-time data in Azure Consumes millions of real-time events from Event Hub collected from devices, sensors, infrastructure, and applications Performs time-sensitive analysis using SQL-like language against multiple real-time streams and reference data Outputs to persistent stores, dashboards or back to devices Point of Service Devices Self Checkout Stations Kiosks Smart Phones Slates/ Tablets PCs/ Laptops Servers Digital Signs Diagnostic EquipmentRemote Medical Monitors Logic Controllers Specialized DevicesThin Clients Handhelds Security POS Terminals Automation Devices Vending Machines Kinect ATM
  19. 19. • Free and open source R distribution • Enhanced and distributed by Revolution Analytics Microsoft R Open • Secure, Scalable and Supported Distribution of R • With proprietary components created by Revolution Analytics Microsoft R Server
  20. 20. Fully managed database service built on a native JSON data model Application controlled schema with massive scale-out enables iterative development and evolving data models Automatic indexing enables robust querying over schema-free data Integrated transactional JavaScript processing + tunable consistency enable high performance application experiences Azure DocumentDB
  21. 21. SQL Server on Linux (Preview today, GA in mid-2017) Red Hat - Microsoft Partnership (Nov 2015) Microsoft joins Eclipse Foundation (Mar 2016). HD Insight PaaS on Linux GA (Sep 2015) C:Usersmarkhill> root@localhost: # bash Azure Marketplace 60% of all images in Azure Marketplace are based on Linux/OSS In partnership with the Linux Foundation, Microsoft releases the Microsoft Certified Solutions Associate (MCSA) Linux on Azure certification. 493,141,677 ?????? Microsoft Open Source Hub Ross Gardler: President Apache Software Foundation Wim Coekaerts: Oracle’s Mr Linux 1 out of 4 VMs on Azure runs Linux, and getting larger every day • 28.9% of All VMs are Linux • >50% of new VMs
  22. 22. Connect, combine, and refine any data Create data marts and publish reports Build and test predictive models Curate and catalog any data Transform + analyze Transform + analyze Visualize + decide Collect + manage Data Transform and analyze data for anyone to access anywhere
  23. 23. Make sense of disparate data and prepare it for analysis Connect, combine, and refine any data Integration, Data Quality and Master Data Services • Rich support for ETL tasks • Data cleansing and matching • Manage master data structures Connect any data and all volumes in real time • Social data • SAP and Dynamics data • Machine data
  24. 24. Query aggregated data and build reports Create data marts and reports Reporting services • Create and publish interactive reports • Consolidate reporting management • Enable reporting capabilities for anyone Analysis services • Single semantic model • 100x faster analysis with in-memory columnstore • Manage user-created BI content
  25. 25. Use the power of machine learning to predict future trends or behavior Build and test predictive models • HDInsight • SQL Server VM • SQL DB • Blobs and tables Publish API in minutes Devices Applications Dashboards Data Microsoft Azure Machine Learning API Storage space Web Microsoft Azure portal Workspace ML Studio Business problem Business valueModeling Deployment • Desktop files • Excel spreadsheet • Other data files on PC Cloud Local
  26. 26. Azure Machine Learning Get started with just a browser Requires no provisioning; simply log on to your Azure subscription or try it for free off Experience the power of choice Choose from hundreds of algorithms and packages from R and Python or drop in your own custom code Take advantage of business-tested algorithms from Xbox and Bing Deploy solutions in minutes With the click of a button, deploy the finished model as a web service that can connect to any data, anywhere Connect to the world Brand and monetize solutions on our global Machine Learning Marketplace Beyond business intelligence – machine intelligence Microsoft Azure Machine Learning Studio Modeling environment (shown) Microsoft Azure Machine Learning API service Model in production as a web service Microsoft Azure Machine Learning Marketplace APIs and solutions for broad use
  27. 27. Enable enterprise-wide self-service data source registration and discovery A metadata repository that allow users to register, enrich, understand, discover, and consume data sources Delivers differentiated value though ‒ Data source discovery; rather than data discovery ‒ Support for data from any source; Structured and unstructured, on premises and in the cloud ‒ Publishing, discovery and consumption through any tool ‒ Annotation crowdsourcing: empowering any user to capture and share their knowledge. This, while allowing IT to maintain control and oversight
  28. 28. Azure Data Factory Connect to relational or non- relational data that is on- premises or in the cloud Orchestrate data movement & data processing Publish to Power BI users as a searchable data view Operationalize (schedule, manage, debug) workflows Lifecycle management, monitoring Orchestrate trusted information production in Azure Microsoft Confidential – Under Strict NDA C# MapReduce Hive Pig Stored Procedures Azure Machine Learning
  29. 29. Discover, explore, and combine any data type or size, regardless of location Ask questions of data to visualize, analyze, and forecast Make faster decisions, share broadly, and access insights on any device Visualize + decide Transform + analyze Visualize + decide Collect + manage Data Visualize data and make decisions quickly using everyday tools
  30. 30. 35 Analyze & Visualize in Excel Discover & Combine in Excel Collaborate, Get Insights, & Access Anywhere Through Office 365 Microsoft Power BI
  31. 31. Power BI Tools Defined • Front-end (Excel) • Data shaping and cleanup. Self-service ETL (Power Query) • Data analysis (Power Pivot) • Visualization and data discovery (Power View, Power Map, Power BI Designer) • Dashboarding (Power BI Dashboard) • Publishing and sharing (Power BI sites) • Natural language query (Power BI Q&A) • Mobile (Power BI for Mobile) • Access on-premise data (DMG, Analysis Services Connector) Power Query Power Pivot Power View Power Map Power BI Designer Power BI Dashboard Power BI Site Power BI Q&A Power BI for mobile
  32. 32. Power Query: Discover, explore, and combine any data Right from Excel, find any data: corporate, social, machine, Hadoop, open Easily merge, transform, and clean up data
  33. 33. Power BI dashboards and KPIs for monitoring the health of your business New data visualizations and touch- optimized exploration in HTML5 Power BI mobile apps across devices including iPad and iPhone Support for new data sources including, Dynamics CRM online and SQL Server Analysis Services Dashboard Tree Map
  34. 34. Q&A: Ask questions of data Build ad hoc reports with a drag-and-drop interface Look ahead to forecast where business will go Map up to 1 million rows of data in 3-D
  35. 35. Data Management Gateway (DMG) • Power View/Q&A: DMG refreshes workbook so reporting not real-time (daily frequency) and 250MB upload limit • Power Query: Reporting is real-time Analysis Services Connector (ASC) • Power BI Dashboard: Get real-time reports with ASC and SSAS Tabular DirectQuery against SQL Server or APS. Create reports with Power View (limited functionality) • You can publish Power View reports to Power BI Sites and have it use ASC (by uploading Excel file via Get Data in Power BI Dashboard) • Does not support Q&A • Can run on any domain machine • Multidimensional cubes coming soon Intranet Power BI Site PDW HDI APS DMG Metadata catalog O365 Power View/Q&A 3rd- Party Hadoop Power BI Dashboard SSAS Tabular Public Internet ASC Power Pivot workbook SQL Server
  36. 36. PolyBase Query relational and non-relational data with T-SQL
  37. 37. Use cases where PolyBase simplifies using Hadoop data Bringing islands of Hadoop data together High performance queries against Hadoop data (Predicate pushdown) Archiving data warehouse data to Hadoop (move) (Hadoop as cold storage) Exporting relational data to Hadoop (copy) (Hadoop as backup/DR, analysis, cloud use) Importing Hadoop data into data warehouse (copy) (Hadoop as staging area/data lake)
  38. 38. Consumption Experiences Data Visualization Data Analysis Data Modeling Data Discovery & ETL Data Warehouse/Big Data Microsoft Analytics Platform
  39. 39. Cortana Intelligence Suite Transform data into intelligent action Action People Automated Systems Apps Web Mobile Bots Intelligence Dashboards & Visualizations Cortana Bot Framework Cognitive Services Power BI Information Management Event Hubs Data Catalog Data Factory Machine Learning and Analytics HDInsight (Hadoop and Spark) Stream Analytics Intelligence Data Lake Analytics Machine Learning Big Data Stores SQL Data Warehouse Data Lake Store Data Sources Apps Sensors and devices Data
  40. 40. Benefits Accelerate time-to-value by easily deploying IoT applications for the most common use cases, such as remote monitoring, asset management, and predictive maintenance Plan and budget appropriately through a simple predictable business model Grow and extend solutions to support millions of assets Preconfigured solutions Introducing Microsoft Azure IoT SuiteHelping accelerate your business transformation Azure IoT services Azure IoT Suite Predictive MaintenanceRemote Monitoring Asset Management And more… Addresses common scenarios: Mine data Take actionConnect assets Enables you to
  41. 41. Stream Analytics TransformIngest Example overall data flow and Architecture Web logs Present & decide IoT, Mobile Devices etc. Social Data Event Hubs HDInsight Azure Data Factory Azure SQL DB Azure Blob Storage Azure Machine Learning (Fraud detection etc.) Power BI Web dashboards Mobile devices DW / Long-term storage Predictive analytics Event & data producers Analytics Platform Sys.
  42. 42. BI and analytics Data management and processing Data sources Non-relational data Data enrichment and federated query OLTP ERP CRM LOB Devices Web Sensors Social Self-service Corporate Collaboration Mobile Machine learning Single query model Extract, transform, load Data quality Master data management Box software Appliances Cloud SQL Server Box software Appliances Cloud
  43. 43. Industrial automation company partnering with multinational oil company Oil and Gas Leading industrial automation company who employs over 20,000 people. partnering with Leading multinational oil and gas company (one of the six oil and gas super majors) who employs over 90,000 people. Part 1: What They Did | IoT internet-connected sensors to generate analytics for proactive maintenance Challenge Manage sites used for dispensing liquefied natural gas (clean fuel for commercial customers who do heavy-duty road transportation) Built LNG refueling stations across US interstate highway Stations are unmanned so they built 24x7 remote management and monitoring to track diagnostics of each station for maintenance or tuning Built internet-connected sensors embedded in 350 dispenser sites worldwide generating tens of thousands data points per second • Temperature, pressure, vibration, etc. Data needs outgrew company’s internal datacenter and data warehouse Solution Chose Azure HDInsight, Data Factory, SQL Database, Machine Learning Dashboards used to detect anomalies for proactive maintenance • Changes in performance of the components • Energy consumption of components • Component downtime and reliability Future: Goal is to expand program to hundreds of thousands of dispensers IoT, Analytics
  44. 44. BK1 Industrial automation company partnering with multinational oil company Part 2: How They Did It | IoT internet-connected sensors to generate analytics for proactive maintenance How They Did It Collect data from internet-collected sensors • Tens of thousands data points per second • Interpolate time-series prior to analysis • Stored raw sensor data in Blobs every 5 minutes Use Hadoop to execute scripts and Data Factory to orchestrate • Hive and Pig scripts orchestrated by Data Factory • Data resulting from scripts loaded in SQL Database • Queries detect site anomalies to indicate maintenance/tuning Produced dashboards with role-based reporting • Azure Machine Learning , SSRS, Power BI for O365 • Provide users with customizable interface • View current and historical data (day-to-day operations, asset performance over time, etc.) • Leveraged Azure Mobile Notification Hub for real-time notifications, alarms, or important events Use Azure ML to predict • Understand which pumps, run at what speeds, maximized water supply while minimizing energy use IoT, Analytics
  45. 45. Software Company For Web Analytics Technology A software company for web analytics, live chatting, targeting and business intelligence in e- business. Part 1: What They Did | Web Analytics – Traffic, trends, visitor actions + Recommendation Engine Challenge They build an e-business service that does site analysis, real-time monitoring of site metrics, an interactive support chat, and dynamic content builder Needed to find the right set of products that can help them achieve this Solution Chose Azure HDInsight, SQL Server (with Analysis Services) Use HDInsight to preprocess and store raw data Use Analysis Services which generates views from HDInsight Gives their customers self-service BI on top of these views Web Analytics Recommendation Engine
  46. 46. BK1 Software Company For Web Analytics Part 2: How They Did It | Web Analytics – Traffic, trends, visitor actions + Recommendation Engine How They Did It Store data in Azure Blobs • Track visitor data via JavaScript code • Used for real-time tracking and statistics • HDInsight used to pre-process and store raw data Customers of this company have self-service BI • Drag and drop UI • Leveraging Analysis Services, results can be represented as tables, charts, etc. • Analysis Services uses data from HDInsight as source • Uses HIVE ODBC driver to connect to HIVE tables Web Analytics Recommendation Engine
  47. 47. Game Development Company Gaming A predominantly mobile-based game development company. While they are a mid-sized organization, they have partnered with media giants on various gaming projects Part 1: What They Did | In-game Analytics Challenge As a game development studio, they wanted to do in-game analytics to understand their players more and what they do in the games Solution Chose Azure HDInsight (MapReduce and Storm), Service Bus and also use SQL Server for reporting Switched from Amazon AWS EMR Collects telemetry and logging data to gain in-game analytics: • How many players using the game • How many players invited their friends • How far along did players get into the tutorial • How many attempts did they make on one level/stage In-game Analytics
  48. 48. BK1 Game Development Company Part 2: How They Did It | In-game Analytics How They Did It Collect data from games in Azure Blobs • Game sends telemetry/logging data as JSON files • Contains every action of user in the game • Data is pushed to Azure Service Bus as real-time • Tens of Gigabytes of data captured daily HDInsight picks up real-time data and processes • From Service Bus, HDInsight processes using Apache Storm and MapReduce • Constantly running experiments to determine insight • A/B testing • In-game metrics and analytics • Spin up 32-node cluster nightly for four hours Output sent to SQL Server for BI • Transfer data to SQL Server for BI In-game Analytics Service Bus SQL Server On-premises
  49. 49. Big Data is coming
  50. 50. Summary Understand the benefits of big data
  51. 51. Resources  The Modern Data Warehouse:  Fast Track Data Warehouse Reference Architecture for SQL Server 2014:  Should you move your data to the cloud?  Presentation slides for Modern Data Warehousing:  Presentation slides for Building an Effective Data Warehouse Architecture:  Hadoop and Data Warehouses:  What is the Microsoft Analytics Platform System (APS)?  Parallel Data Warehouse (PDW) benefits made simple:  What is Advanced Analytics?
  52. 52. Q & A ? James Serra, Big Data Evangelist Email me at: Follow me at: @JamesSerra Link to me at: Visit my blog at: (where this slide deck will be posted)