Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache hadoop for windows server and windwos azure

10,394 views

Published on

Apache hadoop for windows server and windwos azure

  1. 1. APACHE HADOOP ON AZURE AND WINDOWS MICROSOFT’S APACHE HADOOP-BASED SERVICES FOR AZURE AND ENTERPRISEELASTIC MAPREDUCE FOR AZURE AND ENTERPRISE PRIVATE CLOUDSBrad SarsfieldEngineering ArchitectMicrosoft Big Data | HaodoopMarch 2012 | revision 1.02
  2. 2. ISOTOPE BRIDGES BI TO COLLABORATION TO CLOUD “The next frontier is all about uniting the power of the cloud with the power of data to gain insights that simply weren’t possible even just a few years ago” Ted Kummert, CVP Business Platforms SQL PASS, October 2011
  3. 3. BIG DATA IS HERE AND HADOOP IS CENTER STAGE
  4. 4. 15 out of 17sectors in the US have more datastored per company than theUS Library of Congress 140,000-190,000 more deep analytical talent positions 1.5 million 50-60% more data savvy managers increase in the number of Hadoop developers in the US alone within organizations already using Hadoop within a year €250 billion Potential annual value to Europe’s public sector $300 billion Potential annual value to US healthcare ECONOMIC CONTEXT AND EXEMPLAR Special Report: The CEO’s Guide to Hadoop Learn how large corporations are coping with the increasing flow of unstructured data by using a free software program called Hadoop http://www.businessweek.com/technology/special-reports/ceo-guide-to-hadoop.html
  5. 5. THE 4Vs OF BIG DATA: VOLUME, VELOCITY, VARIABILITY, AND VARIETYIsotope is designed to enable solution building with all key dimensions in mindDeep integration and coordination with existing Microsoft enterprise, cloud, and BI tools
  6. 6. Cassandra Hadoop BackType MR/GFS SimpleDB Hive Oozie Hadoop Bigtable Dynamo Scribe PigLatin Pig HBase Dremel EC2/EMR/S3 Hadoop … Cassandra … … Internal [ Dryad | Cosmos] and External [ Isotope | Azure | Excel | BI | SQL DW | LTH ]VIBRANT ECOSYSTEM IN ENTERPRISE AND CLOUD WITH MICROSOFTScalable machine learning and data mining [Mahout]Statistical modeling and analysis [R]Coordination and workflow [Oozie, Cascading]Data integration and transformation [SQOOP, Flume]Social network analytics and petascale graph learning [Pegasus]Real-time stream analytics and business intelligence merged with petascale computation[HStreamming]Scale-out caching and storage [Cassandra, HBase, Riak, Redis, Couchbase, S3]Cloud-oriented data warehousing, pattern discovery, and transformation [Hive, Pig]
  7. 7. ENTER ISOTOPEIsotope is the internal codename for Microsoft’s suite of products to support Hadoop in Windows and Azure
  8. 8. Un- and Semi-Structured Sensors Crawlers SQL REPORTING Devices Interactive Reports with Crescent Bots Apps Business HADOOP SQL ANALYSIS Users Excel with PowerPivot EIS ERP SQL DATA WAREHOUSING CRM LOB Embedded BI Apps StructuredOUR DIFFERENTIATORS FOR CLOUD AND ENTERPRISESelf-service business intelligence at any scale on premise or cloudComplete integration of information assets from log files to collaboration artifacts to enterprise data storesFamiliar and integrated tools for analytics, insight, exploration, modeling, and strategic decision makingTransparent, federated identity and security management for all big data servicesHigh availability data protection and recovery services for enterprises through cloudEnterprise-grade support for all service, frameworks, and tools
  9. 9. HADOOP [Azure and Enterprise] Java OM Streaming OM HiveQL PigLatin .NET/C#/F# (T)SQL OCEAN OF DATA NOSQL [unstructured, semi-structured, structured] ETL HDFSA SEAMLESS OCEAN OF INFORMATION PROCESSING AND ANALYTICS EIS / ERP RDBMS File System OData [RSS] Azure Storage
  10. 10. PROJECT ISOTOPE OFFERINGS• Bi-directional connectors between Hadoop and SQL and PDW• ODBC driver for Hadoop• Hive plug-in for Excel• Hosted elastic Hadoop service on Azure• Microsoft’s Apache Hadoop-based solution for Windows Azure• Microsoft’s Apache Hadoop-based solution for Windows Server• JavaScript support for Hadoop, with web-based interactive environment• Contributions back to the open source community via the Apache Foundation
  11. 11. HIVE PLUG-IN FOR EXCEL• Connect Excel directly to Hive• Browse Hive objects – tables, columns, etc.• Construct and issue queries
  12. 12. HOSTED ELASTIC HADOOP SERVICE ON AZURE• Elastic MapReduce, Hive, PigLatin, .Net, Javascript, and integration with BI, DW, and Office Collaboration tools• Simple management UI• Full Hadoop compatibility• Native support for Azure Blob Storage from HDFS
  13. 13. MICROSOFT’S APACHE HADOOP-BASED SOLUTION FOR WINDOWS AZURE• One-click deployment of Hadoop on Azure cluster
  14. 14. MICROSOFT’S APACHE HADOOP-BASED SOLUTION FOR WINDOWS• All standard Hadoop modules supported: Hadoop | HDFS | Pig | Hive | Monitoring Pages• One-click installer• Simplified cluster configuration• Integration with Microsoft ecosystem System Center | Active Directory | etc.
  15. 15. // Map Reduce function in JavaScript// ------------------------------------------------------------------var map = function (key, value, context) { var words = value.split(/[^a-zA-Z]/); for (var i = 0; i < words.length; i++) { if (words[i] !== "") { context.write(words[i].toLowerCase(), 1); } }};var reduce = function (key, values, context) { var sum = 0; while (values.hasNext()) { sum += parseInt(values.next()); } context.write(key, sum);}; ISOTOPE.JS: OUR VB MOMENT FOR BIG DATA • Write MapReduce jobs in JavaScript • Interactive development environment • Interactive data query and analytics of petascale datasets • HIVE command line for interactive HIVE • Charting and graphing for insight and analytics visualization
  16. 16. “We are excited to work with Microsoft to help make Apache Hadoop a compelling platform for storing and processing data. Hortonworks welcomes Microsoft to the Hadoop ecosystem and looks forward to lending our deep domain expertise to help accelerate the delivery of Microsoft’s Apache Hadoop- based solution for Windows Server and service for Windows Azure.” Eric Baldeschwieler CEOGIVING BACK AND PARTICIPATING IN THE HADOOP COMMUNITYMicrosoft will be working with the community to contribute back significant code to the Apache FoundationMicrosoft has announced a partnership with Hortonworks to help accelerate our open source support
  17. 17. APACHE HADOOP ON AZURE AND WINDOWS MICROSOFT’S APACHE HADOOP-BASED SERVICES FOR AZURE AND ENTERPRISESUMMARYPlease visit HadoopOnAzure.com to start using Microsoft’s elastic services for Apache HadoopPlease visit www.microsoft.com/bigdata to learn more about project codename “Isotope” and the broader ecosystem ofproducts and services Microsoft is delivering in 2012 an beyond

×