To Cloud or Not To Cloud?

To Cloud 
or Not to Cloud? 
 
Greg Lindahl, CTO 
 
@glindahl – greg@blekko.com"

About
Us

•  Web-‐scale
search
engine
with
our
own
crawl
&
index

•  Public
launch,
November
2010

•  $60
M
raised

•  800
servers,
16
PB
spinning
rust,
½
PB
ﬂash
disk

izik
–
tablet
search

The
wiring
diagram

Web
Crawler
Extractor
Ranker
Indexer

Lookup

Query

Analyzer

Front
End
Query
SERP

DIG
KB

Hijacking
a
meetup
topic

•  Original
topic
was
“virtualizaUon
or
not”

•  But
really,
virtualizaUon
is
an
implementaUon

detail
these
days

– cloud
=>
virtual

– virtual
=>
public
or
private
cloud
(probably)

•  This
talk:
Public
cloud
vs.
not

•  I’m
trying
to
list
a
bunch
of
things
that
you

should
think
about
…
your
situaUon
probably

diﬀers
from
mine

The
quesUon

•  It’s
2007,
and
your
CEO
asks
you:

Should
our
new
startup
use
this
newfangled

cloud
compuUng
stuﬀ
or
not?

Why
cloud
at
all?

•  Flexible

– prototyping
&
development

– tesUng
at
scale

– scale
up
for
high
usage
and
back
down
later

•  Turns
CapEx
into
OpEx

– startups
prefer
paying
over
Ume

– “money
tomorrow
is
cheaper
than
money
today”,

if
you’re
successful

{btw,
plenty
of
banks
will
loan
against
equipment.}

Cloud
win
examples

•  CommonCrawl.org
has
a
web
crawl
dataset
on

EC2

– Map/Reduce
job
to
read
the
whole
thing
is
~
$50

•  Fewer
ops
people
is
actually
true

•  Your
company
can
change
direcUon

OK,
so
what’s
bad?

•  Examine
the
curve
of
Amazon’s
pricing
over

Ume
and
per
volume

•  People
think
it’s
a
low-‐priced
product,
but
it’s

not.

•  It’s
value
priced.

•  Not
enough
compeUUon,
yet,
to
really
drive

Amazon’s
margins
down

•  This
is
good
for
Amazon,
maybe
not
for
you.

6
Reasons
to
not
use
Amazon

•  Economy
of
scale
in
your
favor?

•  Your
max::min
raUo
is
not
large
enough

•  Cloud
IOPs
are
expensive

•  Data
is
heavy
if
you
use
a
lot
of
local
disk

•  SSDs
are
overpriced

•  RaUo
of
disk
capacity
or
bandwidth
::
ssd
::

memory
::
compute
may
not
be
ideal
for
you

Economy
of
scale

•  “Amazon
has
100s
of
thousands
of
servers,
so

they
can
run
them
cheaper
than
I
can.”

•  But:

– you
pay
retail,
not
wholesale
price

– there
are
diminishing
returns
with
size

•  At
some
point,
it’s
cheaper
to
do
it
yourself

•  100
servers?
50
servers?

{
blekko
had
700
at
launch…
}

Your
max::min
raUo
is
not
big
enough

•  Maybe
you
use
100x
as
many
servers
some

days?

– Cloud
is
for
you!

•  How
long
do
your
usage
spikes
last?

•  Can
you
predict
them
far
enough
in
advance?

•  How
long
does
it
take
you
to
spin
up
a
new

node?

{blekko’s
day::night
is
only
2x}

Cloud
IOPs
are
expensive

•  I/O
OperaUons
are
expensive
to
start
with

– “spinning
rust”
disks
only
seek
so
much

•  Networked
storage
has
low
bandwidth

compared
to
10
apached
disks

– 1
Gbyte/sec
sustained
–
woah!

•  Networked
disks
are
more
expensive
than

local

– beper
failure
behavior,
whether
I
want
it
or
not

Data
is
heavy
if
you
use
a
lot
of
local
disk

•  I
mean:
it
takes
a
loooooong
Ume
to
copy
a

few
tbytes
of
data
onto
your
local
disk
over

the
network

– 1
gigabit:
½
tbyte/hour

– 10
gigabit:
5
tbytes/hour

– even
ﬁlling
your
½
tbyte
SSD
is
kinda
slow

•  Slow
spin-‐up/down
of
nodes
hurts
your
ability

to
ﬂex
up
and
down

SSDs
are
overpriced
(by
cloud
providers)

•  SSDs
are
completely
awesome
for
read-‐heavy

analyUcs
queries

•  SSDs
wear
out
with
writes

•  No
cloud
provider
charges
a
fee
for
writes?

•  Instead,
they
assume
all
their
customers
are

average

•  …
and
so
they
charge
way
too
much
to

customers
who
are
smart
about
not
wriUng

too
much

{
blekko
is
great
at
not
wriUng
to
our
SSDs
}

RaUos
available
might
not
fit
your
usage

•  Amazon
tries
prepy
hard:

–  high
memory,
high-‐CPU,
GPU,
high
I/O,
high-‐storage

–  weirder
ones
are
less
flexible

•  It’s
sUll
easy
to
not
fit
into
that
set
of
cookie

cupers

•  Not
firng
==
wasted
money

–  idle
resources
that
you’ve
paid
for

–  moves
the
break-‐even
point
to
smaller
node
count

{
blekko
crawler
nodes:
10
local
disks
(capacity,

bandwidth,
seeks),
2
ssds,
96
gigs
ram}

So…

•  For
us,
it
was
easy
to
predict
the
right
answer

•  Our
SWAG
for
launch
day
was
600
servers

– and
our
enUre
index
in
SSD

– and
we
can’t
scale
down
from
that

•  Amazon
wasn’t
renUng
SSDs
yet

•  If
you’re
going
to
run
your
own
servers,
you

need
to
start
early

How
about
you?

•  RT
analyUcs
is
a
complicated
subject

•  Two
main
thrusts

– Pre:
pre-‐compute
aggregate
numbers,
query

those

– Mem:
sUck
a
subset
of
your
big
data
that
ﬁts
into

ram
or
ssd,
do
complicated
queries
against
those

{
blekko
only
does
Pre
}

Pre

•  Needs
to
be
wired
into
your
stream
of
data

generaUon,
e.g.
your
webserver

•  Summary
data
can
be
prepy
small

•  Doesn’t
really
maper
where
you
put
it

•  Not
much
impact
on
the
cloud/no-‐cloud

decision

{
blekko
pre-‐computes
a
lot
of
things
using

“combinators”
in
our
home-‐grown
NoSQL,

opUonally
stuﬃng
them
into
our
SSD
caching

system
}

SERVER 1
PROCESS 1 PROCESS 2
SERVER 2
PROCESS 1 PROCESS 2
DISK 1 DISK 2 DISK 3
+4 +3 +4 +7
+11
+11+11
+7
+7+7
+18 +18 +18
Combinators
reduce
the
total
work

Mem

•  Even
a
decimated
subset
of
your
fresh
data

can
involve
a
lot
of
write
bandwidth

– SomeUmes
referred
to
as
“high
velocity”

•  High
BW
probably
needs
to
go
nearby
your
big

data
store

•  AnalyUcs
probably
isn’t
going
to
inﬂuence
the

cloud/not-‐cloud
decision

Discuss!

•  Discuss

•  For
more
about
blekko’s
setup:

– 3
part
blog
series
at
highscalability.com

– Please
search
[high
scalability
blekko]
in
your

search
engine
of
choice

– greg@blekko.com
-‐-‐-‐
@glindahl

To Cloud or Not To Cloud?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to To Cloud or Not To Cloud?

Similar to To Cloud or Not To Cloud? (20)

Recently uploaded

Recently uploaded (20)

To Cloud or Not To Cloud?