Big time: Introducing Hadoop on Azure
Upcoming SlideShare
Loading in...5
×
 

Big time: Introducing Hadoop on Azure

on

  • 476 views

Introduction to HDInsight service (aka Hadoop on Azure)

Introduction to HDInsight service (aka Hadoop on Azure)

Statistics

Views

Total Views
476
Views on SlideShare
476
Embed Views
0

Actions

Likes
0
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Big time: Introducing Hadoop on Azure Big time: Introducing Hadoop on Azure Presentation Transcript

  • BigData
  • The problem is simple• While the storage capacities of hard drives have increased massively over the years, access speeds—the rate at which data can be read from drives have not kept up.• One typical drive from 1990 could store 1,370 MB of data and had a transfer speed of 4.4 MB/s
  • • so you could read all the data from a full drive in around five minutes.• Over 20 years later, one terabyte drives are the norm, but the transfer speed is around 100 MB/s, so it takes more than two and a half hours to read all the data off the disk.
  • GoParallel
  • Cloud computing changes the way applications growhttp://journals.worldnomads.com/davidsgibson/photo/22804/664941/USA/Elephant-shaped-cloud!
  • BIG-TIME:Introducing Hadoopon Azure Yaniv Rodenski Senior Consultant, Sela Group http://blogs.microsoft.co.il/blogs/roadan Twitter: @YRodenski yanivr@sela.co.il David Ginzburg Big Data infrastructure consultant Twitter: @David_Ginzburg davidginzburg@gmail.com
  • AGENDA
  • Apache™ Hadoop™
  • Apache™ Hadoop™
  • Hadoop Distributed File System (HDFS) HDFS Client
  • Hadoop Distributed File System (HDFS) HDFS Client
  • Hadoop Distributed File System (HDFS) HDFS Client
  • MapReduce via WordCount 1 Hello World 1 1 1 2 1 1 2 1 Hello Azure 1 1 1 1 1 1 1Goodbye 1 Cruel World 1 1
  • DEMOA new way to MapReduce
  • Hadoop MapReduce Processing Input Split Input Merge Split Input Split
  • Hadoop MapReduce Processing Job Client
  • MapReduce TMI Partition, Sort, and spill to disk FetchInput Buffer Split
  • MapReduce TMI Sort MapOutput Merge result MapOutput Output MapOutput Merge result MapOutput
  • Partitioners
  • Combiners
  • The TeraSort Use case
  • The TeraSort Use case
  • Beginners Pitfalls
  • Beginners Pitfalls
  • Distinct Values Problem Statementhttp://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
  • Distinct Values Problem Statementhttp://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
  • Distinct Values Problem Statementhttp://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
  • Distinct Values Problem Statementhttp://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
  • DEMOAdministrating Hadoop in the real world
  • Why did Microsoft choose Hadoop?
  • Hadoop on Azure
  • DEMOUsing hadooponazure.com
  • Windows Azure Compute Supporting service Application Configuration
  • Hadoop on Azure Roles Monitoring service (RdAdmin) Hadoop services Configuration
  • Hadoop MapReduce Processing Fabric Controller
  • Hadoop MapReduce Processing Fabric Controller
  • Hadoop MapReduce Processing Fabric Controller
  • The Head Node Template
  • The Worker Node Template
  • Node VM Templates
  • Cloud Storage
  • High Availability on Azure Azure Storage Fabric Controller
  • Elastic MapReduce
  • Elastic MapReduce Storage Client Azure Amazon Storage S3
  • Elastic MapReduce Storage Client Azure Amazon Storage S3
  • Elastic MapReduce Storage Client Azure Amazon Storage S3 $ $ $ $ $ $ $ $
  • DEMOUsing Elastic MapReduce
  • Azure Blob Considerations
  • Storage Size Limitations
  • IsotopeJS
  • DEMOUsing the JavaScript interactive console
  • DEMOUsing Hive
  • Summary
  • Q&A
  • ResourcesMy Blog Windows Azure Developer centerhttp://bit.ly/roadan http://www.windowsazure.com/en-us/develop/overviewApache™ Hadoop™http://hadoop.apache.orgHadoop on Azurehttp://www.hadooponazure.comHadoop: The Definitive GuideTom Whitehttp://shop.oreilly.com/product/9780596521981.do Thanks! Yaniv Rodenski Twitter: @YRodenski