S
Hadoop on Azure
@LynnLangit
Data Expertise / Lynn Langit
Practicing Architect
Cloud Deployments
(Azure, AWS,
Google)
Technical author / trainer
Google Cloud
Developer Series
SQL Server 2012
Developer Series
Cloudera Certified
Developer
2 books on SQL
Server BI
Industry awards
Microsoft – MVP
for SQL Server
Google – GDE for
Cloud Platform
10Gen – Master for
MongoDB
Former MSFT FTE
4 years
What is Hadoop?
S HUGE Hype factor in 2011 / 2012
Apache Hadoop is a software framework that supports data-
intensive distributed applications under a free license
• Uses HDFS storage to enable applications to work with thousands
of nodes and petabytes of data
• Uses MapReduce to process the data
• Inspired by Google
• MapReduce
• Google File System
What is HDInsight?
S Hadoop on Windows
S Azure
S On-premise
S Microsoft worked with Hortonworks to port Hadoop to
Windows (from Linux)
Working with HDInsight
RDBMS vs. Hadoop
RDBMS Hadoop
Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)
Access Interactive and Batch Batch – NOT Interactive
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
Query
Response Time
Can be near immediate Has latency (due to batch
processing)
Setting Up Your Cluster
Configuration 1
Configuration 2
Pricing (during Preview)
Demo
Basic Administration
Connect via RDP
NameNode Utility – Top Level
NameNode Utility – Drill Down
Understanding Storage
Using the Azure Storage Viewer
What is MapReduce?
MapReduce using Java
S WordCount example
MapReduce using C# Streaming
S WordCount example
MapReduce using JavaScript
S WordCount example
Simple Output Graphing
S WordCount example
Using HIVE
Understanding Pig
Load>Transform>Dump
or
Store
Monitoring Job Results
S In the portal
S Main Console
S Job icon (button) status
summary
S Job History
S Interactive Console
S JS quick feedback
S JS detailed feedback (log)
S Using RDP
S Map/Reduce tool
S Hadoop command prompt
Monitoring Job Status
Download – ODBC for HIVE
S Includes add-in for Excel
Hadoop Connector to Excel
Connecting to PowerPivot
S Create an ODBC connection to HIVE
S Connect to ‘other data source’ in PowerPivot
Connecting with PowerQuery
Pulling it Together - Klout
Hadoop To-Do List
• Use Hadoop when
business needs
designate
• Use other NoSQL if
a better fit
BigData =
Hadoop
• Quick and cheap
• Specialized use cases
• Behavioral data
• dev, test , training
environments
Hadoop on the
cloud • Learn Map/Reduce
• Use HIVE via Excel
• Pay attention to
Impala
Hadoop access
technologies
www.TeachingKidsProgramming.org
VOTE
CONFIRM
SHARE
Keep
Learning
S @LynnLangit
S YouTube – SoCalDevGal
S Hire Me
S Architecture
S Best Practices
S Performance Tuning

HDInsight Hadoop on Windows Azure

Editor's Notes

  • #4 http://hadoop.apache.org/http://en.wikipedia.org/wiki/Apache_Hadoop
  • #6 http://www.infosysblogs.com/microsoft/2011/12/isotope_hadoop_on_windows_and.html
  • #7 OriginalReference: Tom White’s Hadoop: The Definitive Guide (I made some modifications based on my experience)
  • #13 https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • #16 http://www.windowsazure.com/en-us/manage/services/hdinsight/howto-blob-store/
  • #18 http://blog.gopivotal.com/products/hadoop-101-programming-mapreduce-with-native-libraries-hive-pig-and-cascading
  • #19 https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • #20 https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • #21 https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • #22 https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • #23 https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • #24 http://www.windowsazure.com/en-us/manage/services/hdinsight/using-pig-with-hdinsight/
  • #29 http://www.microsoft.com/en-us/bi/default.aspxhttp://dennyglee.com/Demos -   http://www.youtube.com/watch?v=djfpPsGwm6Aand http://www.youtube.com/watch?v=uh9bKWO1K7U
  • #30 Detailed info - http://dennyglee.com/2012/01/21/connecting-powerpivot-to-hadoop-on-azure-self-service-bi-to-big-data-in-the-cloud/
  • #32 http://www.youtube.com/watch?v=eRXEA9-l2eQhttp://thinknook.com/architectures-for-running-sql-server-analysis-service-ssas-on-data-in-hadoop-hive-2013-02-25/#!prettyPhoto