Hadoop intro

stef-bauer.com/2012/12/10/you-need-a-zetta-what

“Big Data”

Hadoop Introduction

Stefan Bauer

A little about me…

 Data Warehouse Administrator
 Architect (logical/physical)
 DBA (monitoring, space management, etc)
 SSIS Developer (build it… run it… support it)
 SSAS/SSRS (performance tuning, supporting)
 Performance monitoring (is it all working?)
 I am a geek (Some people have pointed that out about me…
judge for yourself)

What we will cover
 Why do you care (or at least why you should)?
 General overview
 Basic terms (get us on the same page)
 A Look at some of the technology (aka demo)

 All of the technical parts are in a multi-part
series on my Blog

What kind of data do sort
through?
Interesting technology…
might not be for you

You have big data…
Getting there… might and you know it!
be something
interesting to start
working out the
details…

What is that Hadoop thing I
keep hearing about?
 A Framework (collection of technologies)
 Complex processing
 Massively parallel
 Large amounts of data
 Commodity hardware

Hadoop … what is it not

 Ad hoc analytics
 Low latency between data arrival,
analysis, and query usage
 “fast” (speed is a relative thing)
 Facebook has interactive queries on Hadoop
framework
 Good for small data

Terms
 Cloud
 Cluster
 Hadoop
 Hadoop Distributed File System (HDFS)
 Hue (Web Interface for Mapreduce/Oozie)
 Mapreduce
 Job Tracker
 Task Trackers (on Data Nodes)
 Oozie (Workflow Management)

Terms
 Pig (Distributed Transformation Scripting)
 Beeswax (Wrapper for Hive)
 Hive
 EDW on (10’s, 100’s, 1000’s servers)
 HiveQL (Based on Ansi SQL)
 Reporting Tools/Business Analytics
 Name Node
 Data Nodes
 Zookeeper (Distributed Configuration Management)
 Cloudera/MapR/Amazon/Hortonworks …

Questions?

Stef-Bauer.com

@stefbauer

Stef_Bauer@hotmail.com

Hadoop intro

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (20)

Similar to Hadoop intro

Similar to Hadoop intro (20)

Recently uploaded

Recently uploaded (20)

Hadoop intro