Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hadoop intro


Published on

"Big Data" Hadoop Introduction

Published in: Technology
  • Be the first to comment

Hadoop intro

  1. 1.
  2. 2. “Big Data”Hadoop Introduction Stefan Bauer
  3. 3. A little about me… Data Warehouse Administrator  Architect (logical/physical)  DBA (monitoring, space management, etc)  SSIS Developer (build it… run it… support it)  SSAS/SSRS (performance tuning, supporting)  Performance monitoring (is it all working?)  I am a geek (Some people have pointed that out about me… judge for yourself)
  4. 4. What we will cover Why do you care (or at least why you should)? General overview Basic terms (get us on the same page) A Look at some of the technology (aka demo) All of the technical parts are in a multi-part series on my Blog
  5. 5. What kind of data do sort through? Interesting technology… might not be for you You have big data… Getting there… might and you know it! be something interesting to start working out the details…
  6. 6. What is that Hadoop thing I keep hearing about? A Framework (collection of technologies) Complex processing Massively parallel Large amounts of data Commodity hardware
  7. 7. Hadoop … what is it not Ad hoc analytics Low latency between data arrival, analysis, and query usage “fast” (speed is a relative thing)  Facebook has interactive queries on Hadoop framework Good for small data
  8. 8. Terms Cloud Cluster Hadoop Hadoop Distributed File System (HDFS) Hue (Web Interface for Mapreduce/Oozie) Mapreduce  Job Tracker  Task Trackers (on Data Nodes) Oozie (Workflow Management)
  9. 9. Terms Pig (Distributed Transformation Scripting) Beeswax (Wrapper for Hive) Hive  EDW on (10’s, 100’s, 1000’s servers)  HiveQL (Based on Ansi SQL)  Reporting Tools/Business Analytics Name Node  Data Nodes Zookeeper (Distributed Configuration Management) Cloudera/MapR/Amazon/Hortonworks …
  10. 10. HDFS
  11. 11. Cloudera
  12. 12. Hive
  13. 13. Questions?
  14. 14. Questions?