Your SlideShare is downloading. ×
Big time: Introducing Hadoop on Azure
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big time: Introducing Hadoop on Azure

260
views

Published on

Introduction to HDInsight service (aka Hadoop on Azure)

Introduction to HDInsight service (aka Hadoop on Azure)


0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
260
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. BigData
  • 2. The problem is simple• While the storage capacities of hard drives have increased massively over the years, access speeds—the rate at which data can be read from drives have not kept up.• One typical drive from 1990 could store 1,370 MB of data and had a transfer speed of 4.4 MB/s
  • 3. • so you could read all the data from a full drive in around five minutes.• Over 20 years later, one terabyte drives are the norm, but the transfer speed is around 100 MB/s, so it takes more than two and a half hours to read all the data off the disk.
  • 4. GoParallel
  • 5. Cloud computing changes the way applications growhttp://journals.worldnomads.com/davidsgibson/photo/22804/664941/USA/Elephant-shaped-cloud!
  • 6. BIG-TIME:Introducing Hadoopon Azure Yaniv Rodenski Senior Consultant, Sela Group http://blogs.microsoft.co.il/blogs/roadan Twitter: @YRodenski yanivr@sela.co.il David Ginzburg Big Data infrastructure consultant Twitter: @David_Ginzburg davidginzburg@gmail.com
  • 7. AGENDA
  • 8. Apache™ Hadoop™
  • 9. Apache™ Hadoop™
  • 10. Hadoop Distributed File System (HDFS) HDFS Client
  • 11. Hadoop Distributed File System (HDFS) HDFS Client
  • 12. Hadoop Distributed File System (HDFS) HDFS Client
  • 13. MapReduce via WordCount 1 Hello World 1 1 1 2 1 1 2 1 Hello Azure 1 1 1 1 1 1 1Goodbye 1 Cruel World 1 1
  • 14. DEMOA new way to MapReduce
  • 15. Hadoop MapReduce Processing Input Split Input Merge Split Input Split
  • 16. Hadoop MapReduce Processing Job Client
  • 17. MapReduce TMI Partition, Sort, and spill to disk FetchInput Buffer Split
  • 18. MapReduce TMI Sort MapOutput Merge result MapOutput Output MapOutput Merge result MapOutput
  • 19. Partitioners
  • 20. Combiners
  • 21. The TeraSort Use case
  • 22. The TeraSort Use case
  • 23. Beginners Pitfalls
  • 24. Beginners Pitfalls
  • 25. Distinct Values Problem Statementhttp://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
  • 26. Distinct Values Problem Statementhttp://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
  • 27. Distinct Values Problem Statementhttp://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
  • 28. Distinct Values Problem Statementhttp://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
  • 29. DEMOAdministrating Hadoop in the real world
  • 30. Why did Microsoft choose Hadoop?
  • 31. Hadoop on Azure
  • 32. DEMOUsing hadooponazure.com
  • 33. Windows Azure Compute Supporting service Application Configuration
  • 34. Hadoop on Azure Roles Monitoring service (RdAdmin) Hadoop services Configuration
  • 35. Hadoop MapReduce Processing Fabric Controller
  • 36. Hadoop MapReduce Processing Fabric Controller
  • 37. Hadoop MapReduce Processing Fabric Controller
  • 38. The Head Node Template
  • 39. The Worker Node Template
  • 40. Node VM Templates
  • 41. Cloud Storage
  • 42. High Availability on Azure Azure Storage Fabric Controller
  • 43. Elastic MapReduce
  • 44. Elastic MapReduce Storage Client Azure Amazon Storage S3
  • 45. Elastic MapReduce Storage Client Azure Amazon Storage S3
  • 46. Elastic MapReduce Storage Client Azure Amazon Storage S3 $ $ $ $ $ $ $ $
  • 47. DEMOUsing Elastic MapReduce
  • 48. Azure Blob Considerations
  • 49. Storage Size Limitations
  • 50. IsotopeJS
  • 51. DEMOUsing the JavaScript interactive console
  • 52. DEMOUsing Hive
  • 53. Summary
  • 54. Q&A
  • 55. ResourcesMy Blog Windows Azure Developer centerhttp://bit.ly/roadan http://www.windowsazure.com/en-us/develop/overviewApache™ Hadoop™http://hadoop.apache.orgHadoop on Azurehttp://www.hadooponazure.comHadoop: The Definitive GuideTom Whitehttp://shop.oreilly.com/product/9780596521981.do Thanks! Yaniv Rodenski Twitter: @YRodenski

×