4. 100%
Open
Source
–
Democra/zed
Access
to
Data
The
leaders
of
Hadoop’s
development
We
do
Hadoop
Drive
Innova/on
in
the
plaForm
–
We
lead
the
roadmap
Community
driven,
Enterprise
Focused
5. We
do
Hadoop
successfully.
Support
Training
Professional
Services
6. Enter
the
Hadoop.
………
hOp://www.fabulouslybroke.com/2011/05/ninja-‐elephants-‐and-‐other-‐awesome-‐stories/
7. Hadoop
was
created
because
tradi/onal
technologies
never
cut
it
for
the
Internet
proper/es
like
Google,
Yahoo,
Facebook,
TwiOer,
and
LinkedIn
8. Tradi/onal
architecture
didn’t
scale
enough…
App
App
App
App
App
App
App
App
DB
DB
DB
SAN
App
App
App
App
DB
DB
DB
SAN
DB
DB
DB
SAN
12. If
you
could
design
a
system
that
would
handle
this,
what
would
it
look
like?
13. It
would
probably
need
a
highly
resilient,
self-‐healing,
cost-‐efficient,
distributed
file
system…
Storage
Storage
Storage
Storage
Storage
Storage
Storage
Storage
Storage
14. It
would
probably
need
a
completely
parallel
processing
framework
that
took
tasks
to
the
data…
Processing
Processing
Processing
Storage
Storage
Storage
Processing
Processing
Processing
Storage
Storage
Storage
Processing
Processing
Processing
Storage
Storage
Storage
15. It
would
probably
run
on
commodity
hardware,
virtualized
machines,
and
common
OS
plaForms
Processing
Processing
Processing
Storage
Storage
Storage
Processing
Processing
Processing
Storage
Storage
Storage
Processing
Processing
Processing
Storage
Storage
Storage
16. It
would
probably
be
open
source
so
innova/on
could
happen
as
quickly
as
possible
24. The
Sandbox
is
‘Hadoop
in
a
Can’.
It
contains
one
copy
of
each
of
the
Master
and
Worker
node
processes
used
in
a
cluster,
only
in
a
single
virtual
node.
Processing
Processing
Processing
Storage
Storage
Storage
Processing
Processing
Processing
Storage
Storage
Storage
Processing
Storage
Linux
VM
Processing
Processing
Processing
Storage
Storage
Storage
25. Gefng
started
with
Sandbox
VM:
-‐
Pick
your
flavor
of
VM
at…
hOp://www.hortonworks.com/sandbox
-‐
Start
the
sandbox
VM
-‐
find
the
IP
displayed
-‐
go
to…
hOp://172.16.130.131
-‐
Register
-‐
Click
on
‘Start
Tutorials’
-‐
On
the
lek
hand
nav,
click
on
‘HCatalog,
Basic
Pig
&
Hive
Commands’
26. In
this
tutorial
we
will:
-‐
Land
files
in
HDFS
-‐
Assign
metadata
with
HCatalog
-‐
Use
SQL
with
Hive
-‐
Learn
to
process
data
with
Pig