An introduction toClass Presentation byDamon A. Runion.docx

An introduction to
Class Presentation by
Damon A. Runion
MIS 2321 - Spring 2017
Hello and welcome to An Introduction to Hadoop
Data Everywhere
“Every two days now we create as much information as we did
from the dawn of civilization up until 2003”
Eric Schmidt
then CEO of Google
Aug 4, 2010
Read this quote. That data is something like 4 exabytes.
The Hadoop Project
Originally based on papers published by Google in 2003 and
2004
Hadoop started in 2006 at Yahoo!Top level Apache Foundation
project Large, active user base, user groups Very active
development, strong development team

One way to do that analysis is through Hadoop
Who Uses Hadoop?
Rackspace for log processing. Netflix for recommendations.
LinkedIn for social graph. SU for page recommendations.
Hadoop Components
Storage
Self-healing
high-bandwidth
clustered storage
Processing
Fault-tolerant
distributed
processing
HDFS
MapReduce
HDFS cluster/healing. MapReduce
HDFS Basics
HDFS is a filesystem written in Java Sits on top of a native
filesystemProvides redundant storage for massive amounts of
dataUse cheap(ish), unreliable computers

Let’s talk about HDFS
HDFS DataData is split into blocks and stored on multiple
nodes in the clusterEach block is usually 64 MB or 128 MB
(conf)Each block is replicated multiple times (conf)Replicas
stored on different data nodesLarge files, 100 MB+
What is MapReduce?
MapReduce is a method for distributing a task across multiple
nodes
Automatic parallelization and distributionEach node processes
data stored on that node (processing goes to the data, unlike
Databases where data is brought to the query engine)

An introduction toClass Presentation byDamon A. Runion.docx

Recommended

Recommended

More Related Content

Similar to An introduction toClass Presentation byDamon A. Runion.docx

Similar to An introduction toClass Presentation byDamon A. Runion.docx (20)

More from greg1eden90113

More from greg1eden90113 (20)

Recently uploaded

Recently uploaded (20)

An introduction toClass Presentation byDamon A. Runion.docx