BigData
The problem is simple• While the storage capacities of hard drives  have increased massively over the  years, access speed...
• so you could read all the data from a full drive  in around five minutes.• Over 20 years later, one terabyte drives are ...
GoParallel
Cloud computing changes    the way applications growhttp://journals.worldnomads.com/davidsgibson/photo/22804/664941/USA/El...
BIG-TIME:Introducing Hadoopon Azure              Yaniv Rodenski              Senior Consultant, Sela Group              ht...
AGENDA
Apache™ Hadoop™
Apache™ Hadoop™
Hadoop Distributed File System (HDFS)      HDFS      Client
Hadoop Distributed File System (HDFS)      HDFS      Client
Hadoop Distributed File System (HDFS)      HDFS      Client
MapReduce via WordCount             1 Hello World       1                      1   1   2                      1   1   2   ...
DEMOA new way to MapReduce
Hadoop MapReduce Processing   Input    Split   Input                     Merge    Split   Input    Split
Hadoop MapReduce Processing              Job             Client
MapReduce TMI                      Partition,                      Sort, and                       spill to               ...
MapReduce TMI         Sort MapOutput                Merge                result MapOutput                         Output M...
Partitioners
Combiners
The TeraSort Use case
The TeraSort Use case
Beginners Pitfalls
Beginners Pitfalls
Distinct Values Problem Statementhttp://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
Distinct Values Problem Statementhttp://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
Distinct Values Problem Statementhttp://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
Distinct Values Problem Statementhttp://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
DEMOAdministrating Hadoop in the real world
Why did Microsoft choose Hadoop?
Hadoop on Azure
DEMOUsing hadooponazure.com
Windows Azure Compute             Supporting service                Application              Configuration
Hadoop on Azure Roles           Monitoring service (RdAdmin)                Hadoop services                  Configuration
Hadoop MapReduce Processing                 Fabric                Controller
Hadoop MapReduce Processing                 Fabric                Controller
Hadoop MapReduce Processing                 Fabric                Controller
The Head Node Template
The Worker Node Template
Node VM Templates
Cloud Storage
High Availability on Azure                      Azure                     Storage                     Fabric              ...
Elastic MapReduce
Elastic MapReduce Storage  Client            Azure     Amazon            Storage     S3
Elastic MapReduce Storage  Client            Azure     Amazon            Storage     S3
Elastic MapReduce Storage  Client            Azure     Amazon            Storage     S3           $ $ $ $           $ $ $ $
DEMOUsing Elastic MapReduce
Azure Blob Considerations
Storage Size Limitations
IsotopeJS
DEMOUsing the JavaScript interactive console
DEMOUsing Hive
Summary
Q&A
ResourcesMy Blog                                            Windows Azure Developer centerhttp://bit.ly/roadan            ...
Upcoming SlideShare
Loading in …5
×

Big time: Introducing Hadoop on Azure

525 views
373 views

Published on

Introduction to HDInsight service (aka Hadoop on Azure)

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
525
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Big time: Introducing Hadoop on Azure

  1. 1. BigData
  2. 2. The problem is simple• While the storage capacities of hard drives have increased massively over the years, access speeds—the rate at which data can be read from drives have not kept up.• One typical drive from 1990 could store 1,370 MB of data and had a transfer speed of 4.4 MB/s
  3. 3. • so you could read all the data from a full drive in around five minutes.• Over 20 years later, one terabyte drives are the norm, but the transfer speed is around 100 MB/s, so it takes more than two and a half hours to read all the data off the disk.
  4. 4. GoParallel
  5. 5. Cloud computing changes the way applications growhttp://journals.worldnomads.com/davidsgibson/photo/22804/664941/USA/Elephant-shaped-cloud!
  6. 6. BIG-TIME:Introducing Hadoopon Azure Yaniv Rodenski Senior Consultant, Sela Group http://blogs.microsoft.co.il/blogs/roadan Twitter: @YRodenski yanivr@sela.co.il David Ginzburg Big Data infrastructure consultant Twitter: @David_Ginzburg davidginzburg@gmail.com
  7. 7. AGENDA
  8. 8. Apache™ Hadoop™
  9. 9. Apache™ Hadoop™
  10. 10. Hadoop Distributed File System (HDFS) HDFS Client
  11. 11. Hadoop Distributed File System (HDFS) HDFS Client
  12. 12. Hadoop Distributed File System (HDFS) HDFS Client
  13. 13. MapReduce via WordCount 1 Hello World 1 1 1 2 1 1 2 1 Hello Azure 1 1 1 1 1 1 1Goodbye 1 Cruel World 1 1
  14. 14. DEMOA new way to MapReduce
  15. 15. Hadoop MapReduce Processing Input Split Input Merge Split Input Split
  16. 16. Hadoop MapReduce Processing Job Client
  17. 17. MapReduce TMI Partition, Sort, and spill to disk FetchInput Buffer Split
  18. 18. MapReduce TMI Sort MapOutput Merge result MapOutput Output MapOutput Merge result MapOutput
  19. 19. Partitioners
  20. 20. Combiners
  21. 21. The TeraSort Use case
  22. 22. The TeraSort Use case
  23. 23. Beginners Pitfalls
  24. 24. Beginners Pitfalls
  25. 25. Distinct Values Problem Statementhttp://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
  26. 26. Distinct Values Problem Statementhttp://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
  27. 27. Distinct Values Problem Statementhttp://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
  28. 28. Distinct Values Problem Statementhttp://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
  29. 29. DEMOAdministrating Hadoop in the real world
  30. 30. Why did Microsoft choose Hadoop?
  31. 31. Hadoop on Azure
  32. 32. DEMOUsing hadooponazure.com
  33. 33. Windows Azure Compute Supporting service Application Configuration
  34. 34. Hadoop on Azure Roles Monitoring service (RdAdmin) Hadoop services Configuration
  35. 35. Hadoop MapReduce Processing Fabric Controller
  36. 36. Hadoop MapReduce Processing Fabric Controller
  37. 37. Hadoop MapReduce Processing Fabric Controller
  38. 38. The Head Node Template
  39. 39. The Worker Node Template
  40. 40. Node VM Templates
  41. 41. Cloud Storage
  42. 42. High Availability on Azure Azure Storage Fabric Controller
  43. 43. Elastic MapReduce
  44. 44. Elastic MapReduce Storage Client Azure Amazon Storage S3
  45. 45. Elastic MapReduce Storage Client Azure Amazon Storage S3
  46. 46. Elastic MapReduce Storage Client Azure Amazon Storage S3 $ $ $ $ $ $ $ $
  47. 47. DEMOUsing Elastic MapReduce
  48. 48. Azure Blob Considerations
  49. 49. Storage Size Limitations
  50. 50. IsotopeJS
  51. 51. DEMOUsing the JavaScript interactive console
  52. 52. DEMOUsing Hive
  53. 53. Summary
  54. 54. Q&A
  55. 55. ResourcesMy Blog Windows Azure Developer centerhttp://bit.ly/roadan http://www.windowsazure.com/en-us/develop/overviewApache™ Hadoop™http://hadoop.apache.orgHadoop on Azurehttp://www.hadooponazure.comHadoop: The Definitive GuideTom Whitehttp://shop.oreilly.com/product/9780596521981.do Thanks! Yaniv Rodenski Twitter: @YRodenski

×