Windows Azure - BigData and Hadoop

1,105 views

Published on

BigData Dive in Minsk / Altoros conference /
Windows Azure and BigData- autoscale, Linux, HDInsigh.
Options for developers and startups - BizSpark, msdn subscriptions, seed fund

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,105
On SlideShare
0
From Embeds
0
Number of Embeds
198
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • В мире – почти 100 000 кастомеров, в России – несколько десятков крупных проектов, тысячи подписчиков.
  • Slide Objectives:Explain that there are different VM instance sizes available within Windows AzureSpeaking Points:---Speaking Points:One of the key areas of feedback has been to reduce the cost and size of Windows Azure instances. At PDC we will announce..Notes:(*) 20GB with a limitation on VHD size if they are deploying VMRole on XSmall: the VHD can only be up to 15GB.each tenant can support 20 instances just like regular subscriptions with Small VM. We do not scale based on core counts.There is no SLA on the network bandwidth for each VM size as this resource is shared among all the VM. That said, we need to provide guidance for customer so they could design their applications correctly. From the engineering side, this is what we mean by Low, Moderate and High. • Low currently means 0-15Mbps with short burst up to 25-50Mbps (Megabit/s). These are sufficient for some web sites with low traffic. • Moderate means 0-100Mbps with short burst up to 200Mbps (100Mbps is the norm). This is what we currently reserve for the Small VM.• High means 200-800 Mbps. If you divide this into 3 spectrums for Medium, Large and XL. Then Medium is in the low end, Large hovers around the middle zone and of course XL takes the high-end spot.These rates should be used as guidance. Nothing can beat a test run to see what the application requires but using these bandwidth ranges, hopefully it reduces the guess work for the customers
  • Критерии отбора Фонда представлены на слайде.
  • С 2010 года в России работает Фонд посевного финансированияMircrosoft, выдающий гранты на создание продукта. В отличие от инвестиций гранты не предполагают продажи доли в компании, то есть такого рода финансирование – максимально интересное для предпринимателя.
  • Slide Objectives:Store and analyze Transition:Transition statement(s) to setup the slideSpeaking Points:Store and log files that are traditionally thrown away after ETL.Additional analysis is run on the raw log files.Notes:Any notes go here
  • Difference of IOT and Internet IPV6 – MDSN this month … Slide Objectives:Huge opportunities in internet of thingsTransition:Transition statement(s) to setup the slideSpeaking Points:Internet of things can help us monitor our environment and help optimize our physical world.The tremendous amount of data needs to be stored and analyzed in real time, interactively and batch processing.Notes:Any notes go here
  • Slide Objectives:Collective intelligence and predictive analysis is where big data is going nextTransition:Transition statement(s) to setup the slideSpeaking Points:By now, the big data industry already begun to understand how to store data at a large scale. However, predictive analysis of the data we store is the next difficult problem to tackle. Once again, the 4Vs of big data do not make this easier. The tremendous amount of data needs to be stored and analyzed in real time, interactively and batch processing using machine learning and parallel algorithms. Notes:Any notes go here
  • Slide Objectives:Architecture of hadoopTransition:Transition statement(s) to setup the slideSpeaking Points:Map reduce is the programming layer where it resembles the primitives of parallel programming. At the file system layer, the distributed Hadoop file system takes care of availability redundancy and reliability of the storage layer.Each block of your data is copied 3 times for safe keeping, and the map reduce layer can schedule work onto the node that contains the actual data blockNotes:Any notes go here
  • Slide Objectives:Objective #1Transition:Transition statement(s) to setup the slideSpeaking Points:Map reduce is about minimizing the movement of data inside your cluster.The job tracker understands where all the data blocks are, and will send the operation code to the node that contains the data.Notes:Any notes go here
  • Slide Objectives:Objective #1Transition:Transition statement(s) to setup the slideSpeaking Points:Speaking Point #1Speaking Point #2Notes:Any notes go here
  • Slide Objectives:Objective #1Transition:Transition statement(s) to setup the slideSpeaking Points:Speaking Point #1Speaking Point #2Notes:Any notes go here
  • Slide Objectives:Objective #1Transition:Transition statement(s) to setup the slideSpeaking Points:Speaking Point #1Speaking Point #2Notes:Any notes go here
  • Slide Objectives:Understand the HDInsight eco-systemTransition:Transition statement(s) to setup the slideSpeaking Points:Biggest buzzword in Big Data right now is HadoopIt can mean many things, but always includes HDFS and MapReduceHDInsightRed = in product nowBlue = planned for productGreen = ecosystem can connect nowPurple = Samples availableOrange = ecosystem plannedFlume, HBase are not available in the first release of HDInsight ServiceAs of 3/15, we don’t have an on-premise solution, thus AD integration is not yet available. System center integration will come later as well.The Green boxes are packages in the ecosystem that have not been included in the service, but should work out of the box by downloading them.Notes:Any notes go here
  • Slide Objectives:Provides 1 layer to access both attached/local storage on each node and the remote Windows Azure Blog storage which is the default.Transition:Transition statement(s) to setup the slideSpeaking Points:One interface to rule both DFS and Azure blob storageBlob storage:Front End: Security/Auth and scaled out request handlerPartition Layer: Object Layer, Mapping of objects such as Tables, Blobs, Queues to streams (cached in Front End), CCStream Layer: 3-Node HA, Scale-out stream storePlease see details from windows azure storage paper. IN some ways ASV changes things again, we are now moving data to the compute, since data is now remote. Blob storage allows you to persist your data even when you tear down your cluster.Notes:Any notes go here
  • Slide Objectives:Understand the details of ASVTransition:Transition statement(s) to setup the slideSpeaking Points:You will need to create an Azure storage account, you will need your acct name and key.You should create a cluster close to where your data is. (storage in west should create a cluster in the west data center).Notes:Any notes go here
  • Slide Objectives:Best of both world in terms of programming flexibilityTransition:Transition statement(s) to setup the slideSpeaking Points:We offer everything the Hadoop distribution offers.In addition, we have made available javascript, browser hosted console, f#, c# linq2Hive to make life easier for .net /enterprise developers.In addition, devops can use powershell and node.js based CLI to control and manage the cluster.Notes:Any notes go here
  • Innovate across the stack in terms of developer tools for better experience.
  • Slide Objectives:Talk from the bottom layer up to discuss the Microsoft big data solution.Transition:Transition statement(s) to setup the slideSpeaking Points:BI Platform: Sql server analysis service and reporting service.Self service BI: powerview, powerpivot, predictive analysis and embedded BI.Taking in unstructured data and strutted data sources through Hadoop, or PDWNotes:Any notes go here
  • Slide Objectives:Vision slideTransition:Transition statement(s) to setup the slideSpeaking Points:Broaden access to Hadoop on the windows platformEnterprise ready through AD, System center (to come).BI integration and Self service BINotes:Any notes go here
  • Slide Objectives:Objective #1Transition:Transition statement(s) to setup the slideSpeaking Points:Speaking Point #1Speaking Point #2Notes:Any notes go here
  • Windows Azure - BigData and Hadoop

    1. 1. Alexey Bokov abokov@microsoft.com Windows Azure – BigData, Hadoop и многое другое BigData Dive : Minsk 19 September 2012 abokov
    2. 2. Содержание Немного про Windows Azure HDInsight – Hadoop в Azure Q/A #bigdataby
    3. 3. Windows Azure - инфраструктура North Central US South East Asia East AsiaNorth Europe Dublin West Europe AmsterdamSouth Central US East US West US
    4. 4. Windows Azure - инфраструктура Подробнее о датацентрах Azure: ou.gs/wadc
    5. 5. Облачные сервисы – вычислительные ресурсы
    6. 6. Облачные сервисы – работа с данными
    7. 7. Облачные сервисы – приложения и сеть
    8. 8. Windows Azure – немного цифр Облачное хранилище:  сейчас вмещает более 4 триллионов объектов в облачном хранилище  270 000 обращений в среднем  Пиковая нагрузка - 860 000 обращений в сек
    9. 9. Немного о том как работает PaaS
    10. 10. <ServiceDefinition name="MyService" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition"> <WebRole name="WebRole1"> <Startup> <Task commandLine="Startup.cmd" executionContext="limited" taskType="simple"> </Task> </Startup> </WebRole> </ServiceDefinition> ServiceDefinition.csdef
    11. 11.  Модицифируем WorkerRole1approotstartup.cmd  На примере ниже мы скачиваем и устанавливаем tomcat, но ничего не мешает делать git clone или svn co cscript /B /Nologo %APPROOT%utilunzip.vbs apache-tomcat-6.0.32-windows-x86.zip %APPROOT% cscript "utildownload.vbs" "http://tcontepub.blob.core.windows.net/packages/jre6.zip" cscript /B /Nologo %APPROOT%utilunzip.vbs jre6.zip %APPROOT% copy %APPROOT%foo.war %APPROOT%apache-tomcat-6.0.32webapps cd %APPROOT%apache-tomcat-6.0.32bin set JRE_HOME=%APPROOT%jre6 И теперь мы можем делать интересные вещи!
    12. 12. Облачная БД SQL Azure
    13. 13. Да, теперь у нас:  Поддержка БД Oracle  Быстрые (относительно) дисковые устройства  Сценарии хостинга десктопных Windows приложений  Гибкие и удобные варианты оплаты  VM Depot с большим выбором образов
    14. 14. Теперь у нас есть автомасштабирование!  Встроенный в платформу сервис  Мониторинг по утилизации CPU и длине Azure Queues  Если нужны более кастомные настройки – есть библиотека Wasabi - ou.gs/wasabi
    15. 15. Широкий выбор тулов в IaaS сценариях  На виртуалке ( Windows, Linux ) поднять можно все что угодно – например Confluence  В VM Depot (vmdepot.msopentech.com ) есть redmine
    16. 16. Open source фреймворки  SDK for PHP: phpazure.codeplex.com  Ruby on Rails SDK: rubyonrailsinazure.codeplex.com  DNN - DotNetNuke CMS: dotnetnuke.codeplex.com  Lucene.NET на базе блобов: azuredirectory.codeplex.com  Python for Visual Studio: pytools.codeplex.com  ASP.NET web stack (MVC, Web API ): aspnetwebstack.codeplex.com
    17. 17. Cloud Ninja: cloudninja.codeplex.com  Open source проект – пример реализации многотенантного приложения  Может стать отличной основой для вашего кода  Мониторинг (в том числе входящего/исходящего трафика, транзакций к хранилищам с разделением данных по тенантам )  Автоматическое масштабирование  Идентификация с использованием Access Control Services  Провижионинг ( выкладывание )  Красивые диаграммы по данным мониторинга
    18. 18. windowsazure.github.io  .NET SDK – работа с хранилищами, очередями, медиа сервисами  Java SDK – хранилища, медиа сервисы, service bus  Node.js – работа с хранилищами, управление ресурсами, БД SQL  PHP – работа с хранилищами, вычислительными ресурсам  Python– работа с хранилищами, вычислительными ресурсам  Ruby – работа с хранилищами, вычислительными ресурсам  Mobile Services – IOS, Android, Windows Phone, JavaScript, Windows Store  Библиотеки для командной строки – PowerShell и node.js  IISNode – хостинг node.js на IIS
    19. 19. Как воспользоваться - оплата  Кредитная карта – Pay-as-you-go, выставляется счет в конце месяца по фактически использованным ресурсам  Вариант с предоплатой – MOSP – предоплата (коммитмент) на определенную сумму, есть скидки  Оплата по корпоративному соглашению – EA через LAR-ов, значительные скидки  Если нужны счета-фактуры или оплата наличными – это можно сделать через Облакотеку - azure.oblakoteka.ru или Софтлайн - azure.softline.ru
    20. 20. Бесплатные опции 30 дневный триал на windowsazure.com Триал для MSDN подписчиков Для стартапов - BizSpark на 3 года – включает в себя 8 MSDN подписок! Windows Azure Offer $60K – облако на 2 года на $60 000 ( на конкурсной основе ) При использовании ресурсов облака для BizSpark и MSDN подписчиков ( при необходимости использовать ресурсы больше месячного лимита ) – скидка на ресурсы от 25%.
    21. 21. Программы Microsoft BizSpark MS BizSpark MS Seed Fund MS Startup Accelerator  Средства разработки и тестирования ПО  ИТ-инфраструктура  Доступ в магазин приложений • Денежные гранты на создание продукта до $100k  $60k для Windows Azure  Менторская помощь  Технологический консалтинг  Совместный маркетинг, PR ®®
    22. 22. «Идеальный кандидат»  Компания - разработчик ПО или интернет-сервиса, ориентированного на большой рынок (более $1млрд), а еще лучше – создает новую большую рыночную нишу (так называемые «подрывные» (disruptive) продукты и технологии)  К моменту подачи заявки компания имеет уже реализованный прототип и нуждается в финансировании для того, чтобы довести этот прототип до уровня коммерческого продукта  Есть понятный бизнес-план, понимание рынка, продукта, целевой аудитории, модели монетизации. Квалификация команды внушает экспертному жюри уверенность в том, что продукт будет реализован  Использование стратегических технологий Microsoft - Windows Azure, Windows 8 и Windows Phone Фонд посевного финансирования Microsoft
    23. 23. 32 российских стартапа уже получили гранты на сумму около $1.3M: ePythia, Wobot, ColorPen, PiratePay, Ajatix, SPEEREO, BodyNova, ShopPoints, Alpha Smart Systems, Cloud Health Care, ClipClockChoister, SportFort, MoosCool, Car-Fin, RealSpeaker, MD.Voice, 10tracks, Ubiq Mobile и др. Прием заявок идет ежеквартально. Подробнее: ms-start.ru/rusfund Фонд посевного финансировани
    24. 24. KEY TRENDS
    25. 25. Internet of things Audio / Video Log Files Text/Image Social Sentiment Data Market Feeds eGov Feeds Weather Wikis / Blogs Click Stream Sensors / RFID / Devices Spatial & GPS Coordinates WEB 2.0Mobile Advertisin g Collaboratio n eCommerce Digital Marketing Search Marketing Web Logs Recommendation s ERP / CRM Sales Pipeline Payables Payroll Inventor y Contacts Deal Tracking Terabytes (10E12) Gigabytes (10E9) Exabytes (10E18) Petabytes (10E15) Velocity - Variety - variability Volume 1980 190,000$ 2010 0.07$ 1990 9,000$ 2000 15$ Storage/GB ERP / CRM WEB 2.0 Internet of things What is Big Data?
    26. 26. Big Data, BIG OPPORTUNITY 49% CEOs and CIOs are planning big data projects Software Growth 1.8 2.5 3.4 4.6 0 5 Billio… 34% compound annual growth rate Services Growth 2.7 3.9 5.1 6.5 0 10 Billio… 39% compound annual growth rate 1. McKinsey&Company, McKinsey Global Survey Results, Minding Your Digital Business, 2012 2. IDC Market Analysis, Worldwide Big Data Technology and Services 2012–2015 Forecast , 2012
    27. 27. New workflow in Data Warehousing
    28. 28. Devices: Internet and Internet of things
    29. 29. Collective Intelligence and Predictive analysis How do I optimize my services based on patterns of weather, traffic. How do I build a recommendation engine? What’s the social sentiment of my product? How do I better predict future outcomes?
    30. 30. MapReduce: Move Code to the Data
    31. 31. So How Does It Work?
    32. 32. Traditional RDBMS vs. NoSQL
    33. 33. Distributed Storage (HDFS) Query (Hive) Distributed Processing (MapReduce) ODBC Legend Red = Core Hadoop Blue = Data processing Purple = Microsoft integration points and value adds Orange = Data Movement Green = Packages
    34. 34. Front end Front end Stream Layer Partition Layer HDFS on Azure: Tale of two File Systems Name Node de Data Node Data Node Front end HDFS API DFS (1 Data Node per Worker Role) and Compute Cluster Azure Storage (ASV) … Azure Blob Storage
    35. 35. Azure Storage (ASV)  Default file system for HDInsight Service  Provides sharable, persistent, highly-scalable Storage with high availability (Azure Blob Store)  Azure storage itself does not provide compute  Fast access from compute nodes to data in same data center  Several file systems, addressable via: asv[s]:<container>@<account>.blob.core.windows.net/<path>  Requires storage key in core-site.xml: <property> <name>fs.azure.account.key.accountname</name> <value>enterthekeyvaluehere</value> </property>
    36. 36. Programming HDInsight Hive, Pig, Mahout, Cascading, Scalding, Scoobi, P egasus… C#, F# Map/Reduce, LINQ to Hive, .NET management clients JavaScript Map/Reduce, Browser hosted console, Node.js management clients PowerShell, Cross Platform CLI tools
    37. 37. Building Developer Experiences
    38. 38. Microsoft Big Data Solution
    39. 39. Microsoft Hadoop Vision
    40. 40. http://www.windowsazure.com/ http://hadoop.apache.org/ http://nuget.org/packages?q=hadoop http://hadoopsdk.codeplex.com
    41. 41. Изучайте и присоединяйтесь! Центр разработки azurehub.ru Полезные ресурсы ms-start.ru rustart@microsoft.com Контактный емейл для всех вопросов по Windows Azure AzureRus@microsoft.com Сообщество пользователей facebook.com/groups/azurerus Последние новости @windowsazure_ru
    42. 42. Ваши вопросы…
    43. 43. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

    ×