Key Message: Big Data is a real problem, and Hadoop’s star is rising. It is economically transformative in the way LAMP was in the previous decade. (Linux, Apache, MySQL, Php/Python)Reference numbers from McKinsey Global Institute – Big Data: The next frontier for innovation competition (http://www.mckinsey.com/mgi/publications/big_data/index.asp)http://www.karmasphere.com/images/documents/Karmasphere-HadoopDeveloperResearch.pdfHadoop is moving into mainstream consciousness now. Businessweek recently had a special report dedicated to Hadoop, with half a dozen articles.http://www.businessweek.com/technology/special-reports/ceo-guide-to-hadoop.html
http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-dataKEY POINT: Hadoop is part of the solution -
Hadoop is an AND, not an OR. But it requires a certain philosophy that MSFT has not historically embraced. A key benefit of Hadoop is the large, vibrant open source community around it. To succeed, Microsoft needs to not only acknowledge but thrive in this community.
BIG self service BIBillions+ of data itemsUnstructured, semi-structured, log dataReal-time feedsNew analysis types leveraging large server clusters Leverage the Hadoop ecosystem and ride its momentumIW centric designGive business users direct access to the Big Data storeDeliver IW-centric experiences optimized for unstructured and semi-structured queriesCreate, enrich, visualize and share big data sets through fun and immersive experiencesDo it all in the tool they already use - ExcelIncrease the number of questions, reduce the cost of exploratory mining to zeroLeverage new class of analytics and visualizationEnable new types of questions with new types of data and visualizationsLeverage analysis of text, sentiment, clickstream, time windows, classification, clusteringVisualize big data in impactful ways: tag clouds, graphs, timelines, tree maps, etc. Natural extension of our BI platformMaintain a consistent semantic model, consistent expression languageProvide an iterative, experimental, business-driven workflow from the desktop to the Big Data clusterBuild on existing IW skills with the Microsoft BI platform (Excel, PowerPivot, Crescent)Optimized for cloudIntegrate with Azure DataMarket to connect to Bing and other public data sourcesHost big data sets on Azure , integrated with MyDataLeverage Isotope to run analytics clusters
Isotope is the all-up effort around Microsoft and Hadoop. It includes several components:A full distribution of Apache Hadoop that runs on standard windows hardware.A full version of Apache Hadoop that runs on the Azure cloudConnectors from Hadoop (any Hadoop, not just Microsoft’s) to Microsoft’s key products – SQL, Excel, PDW, etc.Jscript shell for live scripting of Hadoop from the browserAdmin, monitoring, and authoring tools to make Microsoft Hadoop best-in-class
Apache hadoop for windows server and windwos azure
APACHE HADOOP ON AZURE AND WINDOWS MICROSOFT’S APACHE HADOOP-BASED SERVICES FOR AZURE AND ENTERPRISEELASTIC MAPREDUCE FOR AZURE AND ENTERPRISE PRIVATE CLOUDSBrad SarsfieldEngineering ArchitectMicrosoft Big Data | HaodoopMarch 2012 | revision 1.02
ISOTOPE BRIDGES BI TO COLLABORATION TO CLOUD “The next frontier is all about uniting the power of the cloud with the power of data to gain insights that simply weren’t possible even just a few years ago” Ted Kummert, CVP Business Platforms SQL PASS, October 2011
15 out of 17sectors in the US have more datastored per company than theUS Library of Congress 140,000-190,000 more deep analytical talent positions 1.5 million 50-60% more data savvy managers increase in the number of Hadoop developers in the US alone within organizations already using Hadoop within a year €250 billion Potential annual value to Europe’s public sector $300 billion Potential annual value to US healthcare ECONOMIC CONTEXT AND EXEMPLAR Special Report: The CEO’s Guide to Hadoop Learn how large corporations are coping with the increasing flow of unstructured data by using a free software program called Hadoop http://www.businessweek.com/technology/special-reports/ceo-guide-to-hadoop.html
THE 4Vs OF BIG DATA: VOLUME, VELOCITY, VARIABILITY, AND VARIETYIsotope is designed to enable solution building with all key dimensions in mindDeep integration and coordination with existing Microsoft enterprise, cloud, and BI tools
Cassandra Hadoop BackType MR/GFS SimpleDB Hive Oozie Hadoop Bigtable Dynamo Scribe PigLatin Pig HBase Dremel EC2/EMR/S3 Hadoop … Cassandra … … Internal [ Dryad | Cosmos] and External [ Isotope | Azure | Excel | BI | SQL DW | LTH ]VIBRANT ECOSYSTEM IN ENTERPRISE AND CLOUD WITH MICROSOFTScalable machine learning and data mining [Mahout]Statistical modeling and analysis [R]Coordination and workflow [Oozie, Cascading]Data integration and transformation [SQOOP, Flume]Social network analytics and petascale graph learning [Pegasus]Real-time stream analytics and business intelligence merged with petascale computation[HStreamming]Scale-out caching and storage [Cassandra, HBase, Riak, Redis, Couchbase, S3]Cloud-oriented data warehousing, pattern discovery, and transformation [Hive, Pig]
ENTER ISOTOPEIsotope is the internal codename for Microsoft’s suite of products to support Hadoop in Windows and Azure
Un- and Semi-Structured Sensors Crawlers SQL REPORTING Devices Interactive Reports with Crescent Bots Apps Business HADOOP SQL ANALYSIS Users Excel with PowerPivot EIS ERP SQL DATA WAREHOUSING CRM LOB Embedded BI Apps StructuredOUR DIFFERENTIATORS FOR CLOUD AND ENTERPRISESelf-service business intelligence at any scale on premise or cloudComplete integration of information assets from log files to collaboration artifacts to enterprise data storesFamiliar and integrated tools for analytics, insight, exploration, modeling, and strategic decision makingTransparent, federated identity and security management for all big data servicesHigh availability data protection and recovery services for enterprises through cloudEnterprise-grade support for all service, frameworks, and tools
HADOOP [Azure and Enterprise] Java OM Streaming OM HiveQL PigLatin .NET/C#/F# (T)SQL OCEAN OF DATA NOSQL [unstructured, semi-structured, structured] ETL HDFSA SEAMLESS OCEAN OF INFORMATION PROCESSING AND ANALYTICS EIS / ERP RDBMS File System OData [RSS] Azure Storage
HIVE PLUG-IN FOR EXCEL• Connect Excel directly to Hive• Browse Hive objects – tables, columns, etc.• Construct and issue queries
MICROSOFT’S APACHE HADOOP-BASED SOLUTION FOR WINDOWS AZURE• One-click deployment of Hadoop on Azure cluster
MICROSOFT’S APACHE HADOOP-BASED SOLUTION FOR WINDOWS• All standard Hadoop modules supported: Hadoop | HDFS | Pig | Hive | Monitoring Pages• One-click installer• Simplified cluster configuration• Integration with Microsoft ecosystem System Center | Active Directory | etc.
“We are excited to work with Microsoft to help make Apache Hadoop a compelling platform for storing and processing data. Hortonworks welcomes Microsoft to the Hadoop ecosystem and looks forward to lending our deep domain expertise to help accelerate the delivery of Microsoft’s Apache Hadoop- based solution for Windows Server and service for Windows Azure.” Eric Baldeschwieler CEOGIVING BACK AND PARTICIPATING IN THE HADOOP COMMUNITYMicrosoft will be working with the community to contribute back significant code to the Apache FoundationMicrosoft has announced a partnership with Hortonworks to help accelerate our open source support
APACHE HADOOP ON AZURE AND WINDOWS MICROSOFT’S APACHE HADOOP-BASED SERVICES FOR AZURE AND ENTERPRISESUMMARYPlease visit HadoopOnAzure.com to start using Microsoft’s elastic services for Apache HadoopPlease visit www.microsoft.com/bigdata to learn more about project codename “Isotope” and the broader ecosystem ofproducts and services Microsoft is delivering in 2012 an beyond
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.