Highway to the Information Zone (Andy Cross)

166 views
128 views

Published on



With major vendors working hard to ease provision of Hadoop, resulting in many Hadoop As A Service offerings; what’s the challenge domain in 2014 for Big Data engineers? If HaaS is a Highway; where does it lead and how do you travel on it?

In this fast paced, L300 hands-on session, Andy will demonstrate Hadoop in practice, using Microsoft’s Cloud technologies: Building a system from scratch to ingest information into HDInsight, query and report on that information.

This session presumes prior knowledge of Map Reduce technologies, Hadoop, HDFS and HCAT.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
166
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Highway to the Information Zone (Andy Cross)

  1. 1. Premium community conference on Microsoft technologies itcampro@ itcamp14# Highway to the Information Zone Solving 3 key challenges of building Big Data Solutions in the Cloud @andybareweb
  2. 2. Premium community conference on Microsoft technologies itcampro@ itcamp14# Huge thanks to our sponsors & partners!
  3. 3. Premium community conference on Microsoft technologies itcampro@ itcamp14# Big Data core ethos: Distribute workload to achieve throughput on IO bound operations Flat files + Compute = Azure
  4. 4. Premium community conference on Microsoft technologies itcampro@ itcamp14# GA managed Hadoop 2 Hadoop on Microsoft Azure Familiar tools such as Hive, Pig, Oozie Additional BoB Microsoft ecosystem tooling with .net SDK Powershell and .net for provision Execution with .net and powershell for Hive Paired with Hortonworks HDP for on-premises Hadoop; compatible with all major Hadoop implementations Combined with Excel and traditional Microsoft BI stack for compelling solutions HDInsight – Hadoop as a Service
  5. 5. Premium community conference on Microsoft technologies itcampro@ itcamp14# Simple Programming style for efficient distribution A cluster topology designed for resilience and efficiency What is Hadoop? MAP REDUCE Name Node & Job Tracker Data Node & Task Tracker Data Node & Task Tracker Data Node & Task Tracker……
  6. 6. Premium community conference on Microsoft technologies itcampro@ itcamp14# Apply innovative expressions of logic over stored mass of data
  7. 7. Premium community conference on Microsoft technologies itcampro@ itcamp14# Position in Cloud
  8. 8. Premium community conference on Microsoft technologies itcampro@ itcamp14# Blank Canvas • Windows Azure Subscription – Capacity to provision HDInsight – Capacity to provision Storage Account
  9. 9. Premium community conference on Microsoft technologies itcampro@ itcamp14# Challenge 1: Cluster Provision
  10. 10. Premium community conference on Microsoft technologies itcampro@ itcamp14# We need somewhere to Execute! • Powershell / C# / xpat CLI • All these give further configuration options including – Boost performance by increasing IOPs – stripe data across many Storage Accounts – Manage cluster specific features; core-site, mapred-site and hdfs-site
  11. 11. Premium community conference on Microsoft technologies itcampro@ itcamp14# DEMO Provision a customised HDInsight cluster via powershell
  12. 12. Premium community conference on Microsoft technologies itcampro@ itcamp14# Centralised Resources
  13. 13. Premium community conference on Microsoft technologies itcampro@ itcamp14# HDFS Mount Azure Blob Storage; consume from Hadoop
  14. 14. Provision Execute De-provision
  15. 15. Premium community conference on Microsoft technologies itcampro@ itcamp14# Shard Data to boost performance Shard source data across Azure storage accounts, giving over 5000 IOPS per HDInsight cluster
  16. 16. Premium community conference on Microsoft technologies itcampro@ itcamp14# Isolate logs best practice Use a state storage account for logs, creating automatically at the same time as cluster creation
  17. 17. Premium community conference on Microsoft technologies itcampro@ itcamp14# Challenge 2: Data Ingress
  18. 18. Premium community conference on Microsoft technologies itcampro@ itcamp14# • Windows Azure Storage Blobs – Equivalent to Azure Blob Storage • Mounted as HDFS compatible file system – Hadoop can read/write directly with – Azure Blobs Explanation of WASB ANDYC2014
  19. 19. Premium community conference on Microsoft technologies itcampro@ itcamp14# DEMO File upload to new WASB location; Hadoop fs –cat /path/to/file
  20. 20. Premium community conference on Microsoft technologies itcampro@ itcamp14# In reality you will have a file pipeline; my solution is Cloud Data Sync Agent
  21. 21. Premium community conference on Microsoft technologies itcampro@ itcamp14# Challenge 3: Run a query!
  22. 22. Premium community conference on Microsoft technologies itcampro@ itcamp14# • .net Map Reduce SDK • Programmatically express logic • Implement three main classes • Job execution from a console application • Hive query language • Create Table myTable location ‘/path’ • Select * from myTable • Powershell execution
  23. 23. Premium community conference on Microsoft technologies itcampro@ itcamp14# DEMO Hive and .net
  24. 24. Premium community conference on Microsoft technologies itcampro@ itcamp14#
  25. 25. Premium community conference on Microsoft technologies itcampro@ itcamp14#

×