Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Microsoft's Hadoop Story


Published on

Presentation at the Seattle Hadoop Meetup 1/23 about Microsoft's Hadoop Story.

Published in: Technology
  • How will you feel when your Ex boyfriend is in bed with another woman? Don't let this happen. Get him back with ♣♣♣
    Are you sure you want to  Yes  No
    Your message goes here
  • Relationship guru Justin Sinclair reveals his secret tactics to help get your Ex back! Learn how ▲▲▲
    Are you sure you want to  Yes  No
    Your message goes here
  • Controversial method reveals inner psychology of techniques you can use to get your Ex back! See it now! ■■■
    Are you sure you want to  Yes  No
    Your message goes here

Microsoft's Hadoop Story

  1. Hadoop and Microsoft.Michael Rys | Principal Program Manager @SQLServerMike
  2. Session Objectives• What is BigData?• How it fits into the Windows and Windows Azure environments• How do I program against it in the Microsoft Environment
  3. What is Big Data?• Traditionally: • Physics Experiments, Sensor data, Satellite data, …• Now: • Operational Logs • Customer behavior • Social interactions online • …• From Terabytes in the 1990 over Petabytes today to Zetabytes in the future
  4. Big Data.
  5. VOLUME VARIETY VELOCITY (Size) (Structure) (Speed) Big Data.
  6. What’s the social sentiment How do I better predictof my product? future outcomes? How do I optimize my services based on patterns of weather, traffic, etc.? New Questions.
  7. Hadoop is for Big Data.
  8. What is Hadoop (v1)?• Processing Platform for Big Data Processing• Using the “Map-Reduce” Processing Paradigm• Characteristics: • Highly-scalable (scaled out) • Commodity HW-based • Open Source => Very low cost for acquisition and storage costs
  9. Hadoop Data FlowData Hadoop Analytics
  10. Hadoop Capabilities Extract Load Distributed Transform Compute Predictive Machine Graph Analysis Learning Processing
  11. HDInsight Ecosystem ODBC Distributed Processing (Map Reduce) Distributed Storage (HDFS)World’s Data (Azure Data Windows Azure StorageMarketplace)
  12. HDInsightData Knowledge Action
  13. HDFS on Azure: Tale of two File Systems HDFS API Containers on Azure Blob Storage NameNode Front end Front end Front end Data Node Partition Layer Data Node … Stream LayerDFS (1 Data Node per Worker Role) Azure Storage Vault (ASV)and Compute Cluster
  14. .Net Map/Reduce Support• Install NuGet• “NuGet” Microsoft .Net MapReduce API for Hadoop• Provide an implementation of a HadoopJob• Execute the job via either • MRLibMRRunner.exe -dll ConsoleAppHadoopJob.exe Or – HadoopJobExecutor.ExecuteJob<HadoopJobClass>();• Collect your result on HDFS
  15. Javascript Map/Reduce Support• Provide a map and reduce function variable in JS file• Use Javascript console with • runJS(‘/user/myself/MRjob.js’, ‘/path/to/inputfile’, ‘/path/to/output/dir’)• Collect your result on HDFS
  16. Invoking HiveQL Queries• Run queries in Hadoop Command Shell after invoking hive• Through the web console• Programmatically through ODBC• Coming soon: LINQ to Hive!
  17. Polybase – Enhancing PDW query engine Data Scientists BI Users DB Admins Regular Results Traditional schema-based DWSocial Sensor T-SQL applicationsApps & RFIDMobile Web Enhanced Apps Apps PDW query engine Hadoop PDW V2Unstructured data Structured data
  18. Microsoft Hadoop VisionBetter on Windows and Azure • Active Directory • System Center • .Net ProgrammabilityMicrosoft Data Connectivity • SQL Server / SQL Parallel Data Warehouse • Azure Storage / Azure Data MarketMicrosoft Business Intelligence (BI) • Hive ODBC Connectivity • BI Tools for Big DataCollaborate with and Contribute to OSS • Collaborate with HortonWorks • Provide improvements and Windows support back to OSS
  19. Getting started• On prem: • Single node cluster (onebox) install • C:hadoop • Starts local services • Can start/stop them with start-onebox.cmd/stop-onebox.cmd • Comes with: • Hadoop command line (shell) • Hadoop Status for name node and map-reduce cluster • HDInsight Dashboard• On Windows Azure: • 3 node cluster running as a service in Azure • Can be used for 5 days • Provides samples and HDInsight Dashboard• TAP Program
  20. Related Content and Links
  21. Thank you