Your SlideShare is downloading. ×
HDInsight Hadoop on Windows Azure
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

HDInsight Hadoop on Windows Azure


Published on

Introduction to HDInsight Hadoop on Windows Azure services, including using the interactive console with JavaScript and running WordCount via other methods (Streaming, Hive, etc..)

Introduction to HDInsight Hadoop on Windows Azure services, including using the interactive console with JavaScript and running WordCount via other methods (Streaming, Hive, etc..)

Published in: Technology

1 Like
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • OriginalReference: Tom White’s Hadoop: The Definitive Guide (I made some modifications based on my experience)
  • - and
  • - and
  • - and
  • - and
  • - and
  • - and
  • -
  • Detailed info -
  • Transcript

    • 1. S Hadoop on Azure @LynnLangit
    • 2. Data Expertise / Lynn Langit Practicing Architect Cloud Deployments (Azure, AWS, Google) Technical author / trainer Google Cloud Developer Series SQL Server 2012 Developer Series Cloudera Certified Developer 2 books on SQL Server BI Industry awards Microsoft – MVP for SQL Server Google – GDE for Cloud Platform 10Gen – Master for MongoDB Former MSFT FTE 4 years
    • 3. What is Hadoop? S HUGE Hype factor in 2011 / 2012 Apache Hadoop is a software framework that supports data- intensive distributed applications under a free license • Uses HDFS storage to enable applications to work with thousands of nodes and petabytes of data • Uses MapReduce to process the data • Inspired by Google • MapReduce • Google File System
    • 4. What is HDInsight? S Hadoop on Windows S Azure S On-premise S Microsoft worked with Hortonworks to port Hadoop to Windows (from Linux)
    • 5. Working with HDInsight
    • 6. RDBMS vs. Hadoop RDBMS Hadoop Data Size Gigabytes (Terabytes) Petabytes (Hexabytes) Access Interactive and Batch Batch – NOT Interactive Updates Read / Write many times Write once, Read many times Structure Static Schema Dynamic Schema Integrity High (ACID) Low Scaling Nonlinear Linear Query Response Time Can be near immediate Has latency (due to batch processing)
    • 7. Setting Up Your Cluster
    • 8. Configuration 1
    • 9. Configuration 2
    • 10. Pricing (during Preview)
    • 11. Demo
    • 12. Basic Administration Connect via RDP
    • 13. NameNode Utility – Top Level
    • 14. NameNode Utility – Drill Down
    • 15. Understanding Storage
    • 16. Using the Azure Storage Viewer
    • 17. What is MapReduce?
    • 18. MapReduce using Java S WordCount example
    • 19. MapReduce using C# Streaming S WordCount example
    • 20. MapReduce using JavaScript S WordCount example
    • 21. Simple Output Graphing S WordCount example
    • 22. Using HIVE
    • 23. Understanding Pig Load>Transform>Dump or Store
    • 24. Monitoring Job Results S In the portal S Main Console S Job icon (button) status summary S Job History S Interactive Console S JS quick feedback S JS detailed feedback (log) S Using RDP S Map/Reduce tool S Hadoop command prompt
    • 25. Monitoring Job Status
    • 26. Download – ODBC for HIVE S Includes add-in for Excel
    • 27. Hadoop Connector to Excel
    • 28. Connecting to PowerPivot S Create an ODBC connection to HIVE S Connect to ‘other data source’ in PowerPivot
    • 29. Connecting with PowerQuery
    • 30. Pulling it Together - Klout
    • 31. Hadoop To-Do List • Use Hadoop when business needs designate • Use other NoSQL if a better fit BigData = Hadoop • Quick and cheap • Specialized use cases • Behavioral data • dev, test , training environments Hadoop on the cloud • Learn Map/Reduce • Use HIVE via Excel • Pay attention to Impala Hadoop access technologies
    • 32.
    • 34. Keep Learning S @LynnLangit S YouTube – SoCalDevGal S Hire Me S Architecture S Best Practices S Performance Tuning