Big data at CallFire

Big Data at CallFire

Vijesh Mehta (Co-Founder and CTO)

Agenda

•  A little about CallFire

•  CallFire’s technical challenges

•  How CallFire deals with data

•  Summary

Some background about myself

•  I am one of the founders of CallFire.
–  Started in 2005 in a small apartment
–  Now 28 people
–  Bootstrapped and profitable

•  I’ve been writing software primarily in the
Java space for 12 years. CallFire is all
Java.
–  We use : Wicket, Guice, Hibernate, MySQL,
Cassandra, ActiveMQ, XEN, Puppet

About CallFire

•  We are a cloud telephony provider.
–  Outbound Phone calls
–  Phone Numbers
–  SMS through long and short codes
–  IVR – Interactive Voice Response
–  Power Dialing

•  CallFire’s call volume can get large very quickly.
–  Hurricane Sandy : 1.9 million emergency calls

•  4 Engineers and 1 System admin managing
operations and new features.
•  We just hired 7 more engineers this year, and still hiring!

Technical Challenges by Numbers

•  1.4 billion calls and texts
–  Growing exponentially
•  Over 50,000 accounts
•  Over 6 million campaigns
•  80 million sound files
•  14 TB in storage (NFS)
•  MySQL : Over 10,000 qps at peak

Big data isn’t always big company problem!

Growing faster each day

Campaigns
over
Time

7000000

6000000

5000000

4000000

3000000

2000000

1000000

0

The first challenge

•  Problem : We outgrew our datacenter. New
systems need access to central storage.
Replication across a 1gb/s interconnect.

•  Needed Solution:
–  Must work across datacenter
–  Must scale as demand increases
–  Must be fault tolerant
–  Must deal with over 80 million sound files
–  Cheaper the better

Solutions Considered (2010)
NFS
GLUSTER
HDFS
CASSANDRA

Fault
Tolerant
Yes,
if
configured
Yes
Yes
Yes

Datacenter
Maybe.
Rsync
isn’t
Not
at
the
Dme
Yes
Yes

Replica>on
fun
with
lots
of

files.

Easy
to
add
storage
No
Not
at
the
Dme
Yes
Yes

No
Single
point
of
No
Yes
Not
exactly,
Yes

failure
NameNode.

Data
always
No,
hard
to
sort
No,
same
as
a
file
Yes
Yes

accessible
easily
through
file
system

systems.

Notes
Not
working
for
us.
Looks
good,
tried
it
Didn’t
like
the
name
Everything
we

Too
much
for
a
while.
Easy
at
node
issue.
May
need,
quick
to

management
and
first
because
it
was
have
been
a
good
learn.
We
went
all

downDme.
a
file
system.
way
to
go.
in!

*
Only
LAN
soluDons
considered.
Calls
had
too
much
latency
in
the
cloud,
or
even

across
datacenter.

Cassandra

•  Storage isn’t the best use of Cassandra.

•  Do not exceed 50% of drive space.
–  Compaction needs the space. Hard lesson learned.

•  Fault Tolerance: Replication factor of 3.

•  Result
•  1 TB of data = 6 TB of storage needed!
•  CallFire has a 74TB Cassandra Cluster

Extending the scope

•  We like SQL and Hibernate.
–  Pros: Easy, Flexible, Ad-Hoc Queries, Locks
–  Cons: Scaling

•  Solution: Sharding with Cassandra for universal data

Shard
1
Shard
2
Shard
3

Cassandra
Cluster

Sharding + Big Data

•  Cassandra makes sharding easier
–  Easy to store universal data. (Authentication)
–  Performs very well

•  Tungsten Replicator (Big Data with SQL)
–  Sharding makes joins impossible, so fan your
data into central places.
–  NoSQL can’t handle ad-hoc queries. No
worries, you can still have SQL.

Big Data Summary

•  Not Just for big companies, data grows rapidly in
todays environment.
–  Nice article about Obama’s Data Crunchers:
–  http://swampland.time.com/2012/11/07/inside-the-secret-world-of-quants-and-data-crunchers-who-helped-obama-win/

•  NoSQL systems have easier scaling and fault
tolerance mechanisms.
–  Not uncommon to see small teams with 10-20 node
clusters.

•  SQL is still a big part of the equation. (Tungsten)
–  Fan in information across partitions
–  Replicate across datacenters
–  Keep your ad-hoc dreams alive!

Passive / Archived Storage
Backblaze
–
$5,300
for
empty
case.
Holds
45
Drives
(117TB
usable
space)

hUp://www.protocase.com/products/index.php?e=Backblaze

Big data at CallFire

Recommended

Recommended

More Related Content

Similar to Big data at CallFire

Similar to Big data at CallFire (20)

Recently uploaded

Recently uploaded (20)

Big data at CallFire