Hadoop on azure_july_2012
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Hadoop on azure_july_2012

on

  • 3,778 views

60 minute webcast for DevelopMentor - Hadoop on Azure

60 minute webcast for DevelopMentor - Hadoop on Azure

Statistics

Views

Total Views
3,778
Views on SlideShare
1,745
Embed Views
2,033

Actions

Likes
1
Downloads
35
Comments
0

5 Embeds 2,033

http://lynnlangit.wordpress.com 1590
http://lynnlangit.com 437
http://www.linkedin.com 4
https://si0.twimg.com 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • http://hortonworks.com/blog/7-key-drivers-for-the-big-data-market/
  • http://hadoop.apache.org/http://en.wikipedia.org/wiki/Apache_Hadoop
  • http://www.oracle.com/technetwork/bdc/hadoop-loader/overview/index.htmlhttp://www.microsoft.com/download/en/details.aspx?id=27584
  • http://bigdatanerd.wordpress.com/2012/01/04/why-nosql-part-2-overview-of-data-modelrelational-nosql/http://docs.jboss.org/hibernate/ogm/3.0/reference/en-US/html_single/
  • http://stage.hypertable.com/index.php/documentation/architecture/http://code.google.com/appengine/http://code.google.com/appengine/articles/datastore/overview.html
  • OriginalReference: Tom White’s Hadoop: The Definitive Guide (I made some modifications based on my experience)
  • But…is it cheaper? It is, right now on AWS (i.e. MapReduce vs. RDS). However, pricing has not been announced for Hadoop on Azure.
  • https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=ugi9C6s_sH4
  • http://www.infosysblogs.com/microsoft/2011/12/isotope_hadoop_on_windows_and.html
  • DBA Tasks originally from From SQL Pass Summit 2011 – by Steve JonesEditorSQLServerCentral/ Red Gate Software
  • https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • When the volume of data is too much for simple human interpretation ->Man PLUS Machine (Data Mining / Statistics)
  • https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • https://www.hadooponazure.com/AccountDemo - http://www.youtube.com/watch?v=XcHz8aUDDN8 and http://www.youtube.com/watch?v=c7oHntP8HBI
  • http://www.microsoft.com/en-us/bi/default.aspxhttp://dennyglee.com/Demos -   http://www.youtube.com/watch?v=djfpPsGwm6Aand http://www.youtube.com/watch?v=uh9bKWO1K7U
  • Detailed info - http://dennyglee.com/2012/01/21/connecting-powerpivot-to-hadoop-on-azure-self-service-bi-to-big-data-in-the-cloud/
  • http://en.wikipedia.org/wiki/Apache_Hadoop
  • http://www.slideshare.net/mattaslett/mysql-vs-nosql-and-newsql-survey-results-13073043
  • http://www.monafoundation.org/project/Teaching-Kids-Programming/22

Hadoop on azure_july_2012 Presentation Transcript

  • 1. Hadoop on Azure BigData on the Azure platform @LynnLangit
  • 2. Hadoop = BigData?• HUGE Hype factor in 2011 / 2012Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license• enables applications to work with thousands of nodes and petabytes of data• was inspired by Googles MapReduce and Google File System (GFS) papers
  • 3. Oracle Loader for HadoopSQL Server Connector for Hadoop
  • 4. Flavors of NoSQL
  • 5. Column DatabaseWide, sparse column sets
  • 6. RDBMS vs. Hadoop Traditional RDBMS HadoopData Size Gigabytes (Terabytes) Petabytes (Hexabytes)Access Interactive and Batch Batch – NOT InteractiveUpdates Read / Write many times Write once, Read many timesStructure Static Schema Dynamic SchemaIntegrity High (ACID) LowScaling Nonlinear LinearQuery Response Can be near immediate Has latency (due to batch processing)Time
  • 7. What about the cloud?
  • 8. The reality…two pivots
  • 9. Demo - Setting up Your Cluster
  • 10. Cluster Allocation Process
  • 11. Working with Hadoop on Azure Tools / Languages • MapReduce • Map (query/format) • Reduce (aggregate) • plug-in for Eclipse (Java) • JavaScript • C# Streaming • Pig (ETL -- Java) • Hive (HQL Query) • HBase tables • Others • Mahout (analyze) • R (analyze)
  • 12. Tasks – DBA vs. Hadoop on AzureRDBMS Hadoop on AzureImport Data Upload Data using FTP or import via SqoopSetup Security Setup SecurityScale Compute (up or out) Add child nodes to the clusterPerform a Backup Monitor and replace failed nodesRestore a Database n/aClean up data via ETL Execute a PIG jobCreate an Index – query tune Write a HIVE query (HQL)Join Tables Together Run MapReducen/a Monitor and manage running MapReduce jobsSchedule a Job Schedule a (Cron) JobRun Database Maintenance Monitor space and resources usedSend an Email from SQL Server Set up resource threshold alertsManage License costs Manage usage time charges
  • 13. Demo - Basic AdministrationOpen Ports
  • 14. Demo - Basic AdministrationConnect via RDP
  • 15. NameNode Utility – Top Level
  • 16. NameNode Utility – Drill Down
  • 17. Demo - Basic AdministrationConfigure connections to remote storage
  • 18. Configuring Upload from AWS S3
  • 19. Configuring Upload from Azure
  • 20. Using the Azure Storage Viewer
  • 21. Configuring Upload from DataMarket
  • 22. Asking Questions = MapReduce
  • 23. Samples
  • 24. Demo - MapReduce using Java• WordCount example using AWS S3 data
  • 25. Demo - MapReduce using C# Streaming• WordCount example
  • 26. Demo - MapReduce using JavaScript• WordCount example
  • 27. Demo - Using HIVE• WordCount example
  • 28. Demo - Using HIVE
  • 29. Monitoring Job Results• In the portal – Main Console • Job icon (button) status summary • Job History – Interactive Console • JS quick feedback • JS detailed feedback (log)• Using RDP – Map/Reduce tool
  • 30. Demo – Monitoring Job Status
  • 31. Download – ODBC for HIVE• Includes add-in for Excel
  • 32. Demo - Hadoop Connector to Excel
  • 33. Connecting to PowerPivot• Create an ODBC connection to HIVE• Connect to ‘other data source’ in PowerPivot
  • 34. Real-World – Hadoop and…
  • 35. Hadoop To-Do List BigData = Hadoop Hadoop on the cloud • Use Hadoop when business • Quick and cheap needs designate • Specialized use cases • Behavioral data • dev, test , training environments Hadoop access technologies • Learn Map/Reduce • Use HIVE via Excel
  • 36. The Changing Data Landscape Other ServicesRDBMS Hadoop
  • 37. TeachingKidsProgramming.orgDo a Recipe  Teach a Kid (Ages 10 ++)SmallBasic or Java  Free Courseware (recipes)
  • 38. Toward Data Craftsmanship… Follow me @LynnLangit RSS my blog www.LynnLangit.com Hire me • To help build your BI/Big Data solution • To teach your team next gen BI • To learn more about using NoSQL solutions