Hadoop on azure_july_2012
Upcoming SlideShare
Loading in...5
×
 

Hadoop on azure_july_2012

on

  • 3,742 views

60 minute webcast for DevelopMentor - Hadoop on Azure

60 minute webcast for DevelopMentor - Hadoop on Azure

Statistics

Views

Total Views
3,742
Views on SlideShare
1,709
Embed Views
2,033

Actions

Likes
1
Downloads
35
Comments
0

5 Embeds 2,033

http://lynnlangit.wordpress.com 1590
http://lynnlangit.com 437
http://www.linkedin.com 4
https://si0.twimg.com 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • http://hortonworks.com/blog/7-key-drivers-for-the-big-data-market/
  • http://hadoop.apache.org/http://en.wikipedia.org/wiki/Apache_Hadoop
  • http://www.oracle.com/technetwork/bdc/hadoop-loader/overview/index.htmlhttp://www.microsoft.com/download/en/details.aspx?id=27584
  • http://bigdatanerd.wordpress.com/2012/01/04/why-nosql-part-2-overview-of-data-modelrelational-nosql/http://docs.jboss.org/hibernate/ogm/3.0/reference/en-US/html_single/
  • http://stage.hypertable.com/index.php/documentation/architecture/http://code.google.com/appengine/http://code.google.com/appengine/articles/datastore/overview.html
  • OriginalReference: Tom White’s Hadoop: The Definitive Guide (I made some modifications based on my experience)
  • But…is it cheaper? It is, right now on AWS (i.e. MapReduce vs. RDS). However, pricing has not been announced for Hadoop on Azure.
  • https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=ugi9C6s_sH4
  • http://www.infosysblogs.com/microsoft/2011/12/isotope_hadoop_on_windows_and.html
  • DBA Tasks originally from From SQL Pass Summit 2011 – by Steve JonesEditorSQLServerCentral/ Red Gate Software
  • https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • When the volume of data is too much for simple human interpretation ->Man PLUS Machine (Data Mining / Statistics)
  • https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • http://www.microsoft.com/en-us/bi/default.aspxhttp://dennyglee.com/Demos -   http://www.youtube.com/watch?v=djfpPsGwm6Aand http://www.youtube.com/watch?v=uh9bKWO1K7U
  • Detailed info - http://dennyglee.com/2012/01/21/connecting-powerpivot-to-hadoop-on-azure-self-service-bi-to-big-data-in-the-cloud/
  • http://en.wikipedia.org/wiki/Apache_Hadoop
  • http://www.slideshare.net/mattaslett/mysql-vs-nosql-and-newsql-survey-results-13073043
  • http://www.monafoundation.org/project/Teaching-Kids-Programming/22

Hadoop on azure_july_2012 Hadoop on azure_july_2012 Presentation Transcript

  • Hadoop on Azure BigData on the Azure platform @LynnLangit
  • Hadoop = BigData?• HUGE Hype factor in 2011 / 2012Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license• enables applications to work with thousands of nodes and petabytes of data• was inspired by Googles MapReduce and Google File System (GFS) papers
  • Oracle Loader for HadoopSQL Server Connector for Hadoop
  • Flavors of NoSQL
  • Column DatabaseWide, sparse column sets
  • RDBMS vs. Hadoop Traditional RDBMS HadoopData Size Gigabytes (Terabytes) Petabytes (Hexabytes)Access Interactive and Batch Batch – NOT InteractiveUpdates Read / Write many times Write once, Read many timesStructure Static Schema Dynamic SchemaIntegrity High (ACID) LowScaling Nonlinear LinearQuery Response Can be near immediate Has latency (due to batch processing)Time
  • What about the cloud?
  • The reality…two pivots
  • Demo - Setting up Your Cluster
  • Cluster Allocation Process
  • Working with Hadoop on Azure Tools / Languages • MapReduce • Map (query/format) • Reduce (aggregate) • plug-in for Eclipse (Java) • JavaScript • C# Streaming • Pig (ETL -- Java) • Hive (HQL Query) • HBase tables • Others • Mahout (analyze) • R (analyze)
  • Tasks – DBA vs. Hadoop on AzureRDBMS Hadoop on AzureImport Data Upload Data using FTP or import via SqoopSetup Security Setup SecurityScale Compute (up or out) Add child nodes to the clusterPerform a Backup Monitor and replace failed nodesRestore a Database n/aClean up data via ETL Execute a PIG jobCreate an Index – query tune Write a HIVE query (HQL)Join Tables Together Run MapReducen/a Monitor and manage running MapReduce jobsSchedule a Job Schedule a (Cron) JobRun Database Maintenance Monitor space and resources usedSend an Email from SQL Server Set up resource threshold alertsManage License costs Manage usage time charges
  • Demo - Basic AdministrationOpen Ports
  • Demo - Basic AdministrationConnect via RDP
  • NameNode Utility – Top Level
  • NameNode Utility – Drill Down
  • Demo - Basic AdministrationConfigure connections to remote storage
  • Configuring Upload from AWS S3
  • Configuring Upload from Azure
  • Using the Azure Storage Viewer
  • Configuring Upload from DataMarket
  • Asking Questions = MapReduce
  • Samples
  • Demo - MapReduce using Java• WordCount example using AWS S3 data
  • Demo - MapReduce using C# Streaming• WordCount example
  • Demo - MapReduce using JavaScript• WordCount example
  • Demo - Using HIVE• WordCount example
  • Demo - Using HIVE
  • Monitoring Job Results• In the portal – Main Console • Job icon (button) status summary • Job History – Interactive Console • JS quick feedback • JS detailed feedback (log)• Using RDP – Map/Reduce tool
  • Demo – Monitoring Job Status
  • Download – ODBC for HIVE• Includes add-in for Excel
  • Demo - Hadoop Connector to Excel
  • Connecting to PowerPivot• Create an ODBC connection to HIVE• Connect to ‘other data source’ in PowerPivot
  • Real-World – Hadoop and…
  • Hadoop To-Do List BigData = Hadoop Hadoop on the cloud • Use Hadoop when business • Quick and cheap needs designate • Specialized use cases • Behavioral data • dev, test , training environments Hadoop access technologies • Learn Map/Reduce • Use HIVE via Excel
  • The Changing Data Landscape Other ServicesRDBMS Hadoop
  • TeachingKidsProgramming.orgDo a Recipe  Teach a Kid (Ages 10 ++)SmallBasic or Java  Free Courseware (recipes)
  • Toward Data Craftsmanship… Follow me @LynnLangit RSS my blog www.LynnLangit.com Hire me • To help build your BI/Big Data solution • To teach your team next gen BI • To learn more about using NoSQL solutions