Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big data at CallFire


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Big data at CallFire

  1. 1. Big Data at CallFireVijesh Mehta (Co-Founder and CTO)
  2. 2. Agenda•  A little about CallFire•  CallFire’s technical challenges•  How CallFire deals with data•  Summary
  3. 3. Some background about myself•  I am one of the founders of CallFire. –  Started in 2005 in a small apartment –  Now 28 people –  Bootstrapped and profitable•  I’ve been writing software primarily in the Java space for 12 years. CallFire is all Java. –  We use : Wicket, Guice, Hibernate, MySQL, Cassandra, ActiveMQ, XEN, Puppet
  4. 4. About CallFire•  We are a cloud telephony provider. –  Outbound Phone calls –  Phone Numbers –  SMS through long and short codes –  IVR – Interactive Voice Response –  Power Dialing•  CallFire’s call volume can get large very quickly. –  Hurricane Sandy : 1.9 million emergency calls•  4 Engineers and 1 System admin managing operations and new features. •  We just hired 7 more engineers this year, and still hiring!
  5. 5. Technical Challenges by Numbers•  1.4 billion calls and texts –  Growing exponentially•  Over 50,000 accounts•  Over 6 million campaigns•  80 million sound files•  14 TB in storage (NFS)•  MySQL : Over 10,000 qps at peak Big data isn’t always big company problem!
  6. 6. Growing faster each day Campaigns  over  Time  7000000  6000000  5000000  4000000  3000000  2000000  1000000   0  
  7. 7. The first challenge•  Problem : We outgrew our datacenter. New systems need access to central storage. Replication across a 1gb/s interconnect.•  Needed Solution: –  Must work across datacenter –  Must scale as demand increases –  Must be fault tolerant –  Must deal with over 80 million sound files –  Cheaper the better
  8. 8. Solutions Considered (2010) NFS   GLUSTER   HDFS   CASSANDRA  Fault  Tolerant   Yes,  if  configured   Yes   Yes   Yes  Datacenter   Maybe.  Rsync  isn’t   Not  at  the  Dme   Yes   Yes  Replica>on   fun  with  lots  of   files.  Easy  to  add  storage   No   Not  at  the  Dme   Yes   Yes  No  Single  point  of   No   Yes   Not  exactly,   Yes  failure   NameNode.  Data  always   No,  hard  to  sort   No,  same  as  a  file   Yes   Yes  accessible  easily   through  file   system   systems.    Notes   Not  working  for  us.   Looks  good,  tried  it   Didn’t  like  the  name   Everything  we   Too  much   for  a  while.  Easy  at   node  issue.  May   need,  quick  to   management  and   first  because  it  was   have  been  a  good   learn.  We  went  all   downDme.   a  file  system.   way  to  go.   in!  *  Only  LAN  soluDons  considered.  Calls  had  too  much  latency  in  the  cloud,  or  even  across  datacenter.  
  9. 9. Cassandra•  Storage isn’t the best use of Cassandra.•  Do not exceed 50% of drive space. –  Compaction needs the space. Hard lesson learned.•  Fault Tolerance: Replication factor of 3.•  Result •  1 TB of data = 6 TB of storage needed! •  CallFire has a 74TB Cassandra Cluster
  10. 10. Extending the scope•  We like SQL and Hibernate. –  Pros: Easy, Flexible, Ad-Hoc Queries, Locks –  Cons: Scaling•  Solution: Sharding with Cassandra for universal data Shard  1   Shard  2   Shard  3   Cassandra  Cluster  
  11. 11. Sharding + Big Data•  Cassandra makes sharding easier –  Easy to store universal data. (Authentication) –  Performs very well•  Tungsten Replicator (Big Data with SQL) –  Sharding makes joins impossible, so fan your data into central places. –  NoSQL can’t handle ad-hoc queries. No worries, you can still have SQL.
  12. 12. Big Data Summary•  Not Just for big companies, data grows rapidly in todays environment. –  Nice article about Obama’s Data Crunchers: –•  NoSQL systems have easier scaling and fault tolerance mechanisms. –  Not uncommon to see small teams with 10-20 node clusters.•  SQL is still a big part of the equation. (Tungsten) –  Fan in information across partitions –  Replicate across datacenters –  Keep your ad-hoc dreams alive!
  13. 13. Passive / Archived StorageBackblaze  –  $5,300  for  empty  case.  Holds  45  Drives  (117TB  usable  space)  hUp://