Your SlideShare is downloading. ×
Experiences Streaming Analytics at Petabyte Scale
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Experiences Streaming Analytics at Petabyte Scale


Published on

How do you keep up with the velocity and variety of data streaming in and get analytics on it even before persistence and replication in Hadoop? In this talk, we'll look at common architectural …

How do you keep up with the velocity and variety of data streaming in and get analytics on it even before persistence and replication in Hadoop? In this talk, we'll look at common architectural patterns being used today at companies such as Expedia, Groupon and Zynga that take advantage of Splunk to provide real-time collection, indexing and analysis of machine-generated big data with reliable event delivery to Hadoop. We'll also describe how to use Splunk's advanced search language to access data stored in Hadoop and rapidly analyze, report on and visualize results.

Published in: Technology

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Copyright  ©  2012  Splunk  Inc.  Experiences  in  Streaming  Analy>cs  at  Petabyte    (or  larger)  Scale  Stephen  Sorkin  VP  Engineering,  Splunk  Inc.  Eddie  Sa/erly  Chief  Big  Data  Evangelist,  Splunk  Inc.  
  • 2. Big  Data  Comes  from  Machines   Volume    |    Velocity    |    Variety  |  Variability   Machine-­‐generated  data  is  one  of  the   fastest  growing,  most  complex     GPS,  and  most  valuable  segments  of  big  data   RFID,   Hypervisor,   Web  Servers,   Email,  Messaging   Clickstreams,  Mobile,     Telephony,  IVR,  Databases,   Sensors,  Telema>cs,  Storage,   Servers,  Security  Devices,  Desktops     2  
  • 3. What  Does  Machine  Data  Look  Like?   Sources  Order  Processing   Middleware     Error   Care  IVR   Twi/er   3  
  • 4. Machine  Data  Contains  Cri>cal  Insights   Sources   Customer  ID   Order  ID   Product  ID  Order  Processing   Order  ID   Customer  ID   Middleware     Error   Time  Wai>ng  On  Hold   Care  IVR   Customer  ID   TwiZer   Customer’s  Tweet     ID   Twi/er   Company’s  TwiZer  ID   4  
  • 5. Big  Data  Technologies   Aster  Data   Cassandra   Greenplum   Voldemort   Big  Table   CouchDB   Hadoop   Single   Single   RDBMS   SQL  &   NoSQL  RDBMS   Bigger   Sharding   Map/Reduce   RDBMS   Map  /  Reduce   Rela>onal  Database  (highly  structured)   Key/Value,  Tables  or     Temporal,  Unstructured   Other  (semi-­‐structured)   Heterogeneous   Time   5  
  • 6. Splunk  Turns  Machine  Data  into  Real-­‐>me  Insights   Op>mized  for  real-­‐>me,  low  latency  and  interac>vity   Ad  hoc     search   Monitor     and  alert   Real-­‐Dme   CollecDon  and     Report  and   Indexing   analyze   Splunk  storage   Other   Custom     Stores   dashboards   Developer   PlaHorm   6  
  • 7. Splunk  Collects  and  Indexes  Any  Machine  Data   No  upfront  schema.  No  RDBMS.  No  custom  connectors.  Customer     Outside  the  Facing  Data   Datacenter  !  Click-­‐stream  data   !  Manufacturing,  !  Shopping  cart  data   logis>cs…  !  Online  transac>on  data   !  CDRs  &  IPDRs   !  Power  consump>on   !  RFID  data   Logfiles   Configs   Messages   Traps     Metrics   Scripts   Changes   Tickets   !  GPS  data    Alerts   Windows   Linux/Unix   VirtualizaDon     ApplicaDons   Databases   Networking   !  Registry   !  Configura>ons   &  Cloud   !  Web  logs   !  Configura>ons   !  Configura>ons   !  Event  logs   !  syslog   !  Log4J,  JMS,  JMX   !  Audit/query   !  syslog   !  File  system   !  File  system   !  Hypervisor   !  .NET  events   logs   !  SNMP   ! sysinternals   !  ps,  iostat,  top   !  Guest  OS,  Apps   !  Code  and  scripts   !  Tables   !  neglow   !  Cloud   !  Schemas   7  
  • 8. New  Approach  to  Analyzing  Heterogeneous  Data   Universal     Late  Structure   Analysis  and   Indexing   Binding   Visualiza>on  ! No  data  normaliza>on   ! Knowledge  applied  at   ! Normaliza>on  as  it’s  ! Automa>cally  handles   search-­‐>me   needed   >mestamps   ! No  briZle  schema  to  work   ! Faster  implementa>on  ! Parsers  not  required   around   ! Easy  search  language  ! Index  every  term  &   ! Mul>ple  views  into  the   ! Mul>ple  views  into  the   paZern  “blindly”   same  data   same  data  ! No  aZempt  to   ! Find  transac>ons,  paZerns   “understand”  up  front   and  trends   Rapid  >me-­‐to-­‐deploy:  hours  or  days   8  
  • 9. Splunk  Search  Processing  Language  Lots  of  random  “hypothe>cal  examples”  from  our  Mugs   9
  • 10. Opera>onal  Intelligence  for  IT  and  Business  Users   IT  Opera>ons  Management   Web  Intelligence   Applica>on  Management             Business  Analy>cs   Security  &  Compliance  Customer   LOB  Owners/   Support   Execu>ves   Opera>ons   Website/Business   Teams   Analysts   System   IT     Administrator   Execu>ves   Development     Security   Auditors   Teams   Analysts   10  
  • 11. Scalability  to  Tens  of  TBs/Day  on  Commodity  Servers   Offload  search  load  to  Splunk  Search  Heads     Auto  load-­‐balanced  forwarding  to  as  many  Splunk  Indexers  as  you  need  to  index  terabytes/day   Send  data  from  1000s  of  servers  using  combina>on  of  Splunk  Forwarders,  syslog,  WMI,  message  queues,  or  other  remote  protocols   11  
  • 12. Splunk  Big  Data  Solu>on   Product-­‐based   Integrated  and     Performance     Solu>on   End-­‐to-­‐end   at  scale  !  Easy  to  download  and   !  Collects  data  from  tens   !  Proven  at  mul>-­‐terabyte   deploy   of  thousands  of  sources   scale  per  day  !  Pre-­‐integrated,  end-­‐to-­‐ !  Advanced  real-­‐>me  and   !  Upwards  of  PB  under   end  func>onality   historical  analysis  of   management  !  Enterprise-­‐grade   data   !  4,000+  customers   features   !  Fast,  custom   visualiza>ons  for  IT  and   business  users   !  Developer  APIs  SDKs   12  
  • 13. Accelerate  Games  Releases  with  Big  Data  Insight  Splunk  Use:   –  Over  10  TB/day  from  scaled-­‐out  cloud  and  physical  infrastructure   –  Data  indexed  includes  web  server  and  applica>on  logs  for  games   –  Splunk  for  opera>onal  visibility,  troubleshoo>ng  and  monitoring   –  Users  include:  game  opera>ons,  developers,  and  corporate  IT  Value  Delivered:   –  Faster  game  releases  with  real-­‐>me  visibility  into  produc>on  issues   –  Reduced  fault  resolu>on  >me  from  hours  to  minutes   –  Scale  ops  team  to  manage  and  monitor  growing  infrastructure   l  Leading  social  gaming  company   globally   l  232  million  monthly  ac>ve  users   l  60  million  daily  ac>ve  users     13  
  • 14. !  Launched  in  November  2008  !  Over  33  million  ac>ve  customers  (as  of  December  2011)  !  More  than  11,000  employees  worldwide  !  Ac>ve  in  48  countries  !  Running  over  1,000  deals/day  worldwide  
  • 15. Daily  Uses  of  Splunk   Key  AcDviDes   Splunk  Use  Cases  !  Guarantee  API  performance   !  All  log  data  is  available  through  Splunk  !   Monitor  API  data  usage   !  Dashboards  !  Early  access  to  key  business  metrics   !  No>fica>ons   (conversions,  funnel,  etc.)  !  End-­‐to-­‐end  tes>ng   >   !  Near  real-­‐>me  !  Ad  hoc  troubleshoo>ng   “Cannot  have  a  server  that  is   not  sending  data  into  Splunk”   15  
  • 16. Dashboards   16  
  • 17. Complemen>ng  BI  and  Hadoop  CollecDon  &  OperaDonal  Intelligence   Daily,  weekly,  monthly  metrics  across  promo>ons     offers  and  acceptance  rates   Applica>on  Performance  Management  (APM)     and  system  availability                                                    Hadoop   Machine  Data  ETL  –  highly  reliable  data  delivery     IntegraDon   to  HDFS   Data  Archival  &  Batch  Data  Science   Long-­‐term  data  warehousing  and  specialized,  batch   analy>cs   17  
  • 18. Turning  Big  Data  Into  Opera>onal  Insights  at  Expedia  
  • 19. Formerly  -­‐  Sr.  Director  –    Who     Eddie  Sa/erly  Am  I?   Architecture  &  Engineering,  Expedia   ! The  World’s  Largest     ! Discount  travel  site   Travel  Site   Hotwire®   ! First  $1B  Quarter  in  2011   ! 4,000+  Technology  Workers   ! 90  localized®  and   ! Development  Team    Who  Is®  sites   of  1,800  Expedia?   ! NASDAQ:  (EXPE)   19  
  • 20. Where  Splunk  Comes  In  12,000+     27,000+     1,000+     227,000     Servers   Hosts   Source  Types   Sources   38  Indexers,    16  Search  heads   >  6.5TB  per  day  indexed  20+  Different  Solu>ons  for  RCA          All  Migrated  to  Splunk  in  3  Months   20  
  • 21. SDK  Integra>ons  built  for  Cassandra     Why  Splunk?   Archiving  Data  to   Hadoop  for  batch   data  stores   analysis   Speed     of     Deployment   Splunkbase  Apps   Scales  via   Available  for   Commodity   Download   Hardware   Developers  Build   Aggrega>on  of     Custom  Apps  and   Log  Data  from     Dashboards   Any  Device   Simple  UI     for  IT  and     Business  Users   21  
  • 22. Splunk  Adop>on  Over  Ten  Months  Use  case:  Business  Unit     Use  case:  Ecommerce  Systems  Data:  125GB/day   Data:  1.8TB/day  Systems:  1100   Systems:  8700  Deployment:  Jan.  2011   Deployment:  March  2011   Big  Data  Integra>on     Use  case:  App  Transac>ons     Data:  3TB/day   Ini>al  Pilot   Viral  Growth  from   Systems:  90TB  Data  Per  Mo.   Demonstrated  Value   Deployment:  1Q12-­‐2Q12   All  Devices,     All  Data  Centers   Use  case:  All  Devices   Data:  ~4TB/day   Systems:  ~21000   Deployment:  Aug.  2011   22  
  • 23. Integrate  External  Data   Extend  search  with  lookups  to  external  data  sources.     LDAP,  AD   Watch  Lists   CMDB   Message   Stores   Reference   Lookups  Correlate  across  mul>ple  data  sources  and  data  sets  using  indexes  and  keys   23  
  • 24. Unique  Characteris>cs  of  Splunk  MapReduce   •  Real-­‐>me  temporal  MapReduce   •  Preview  in-­‐progress  searches   •  Searching  works  on  any  devices   •  Simplified  Search  Language   24  
  • 25. Splunk  Impact  /  Top  Takeaways   Splunk  helped  deliver  Expedia  an  annual  ROI  of  over  $11  Million   ROI  =  5x  original   Splunk  usage     More  data  =     Business  Case   is  viral   more  benefits  !  Tools  Consolida>on   !  50+  Apps  Developed     !  Adding  more  data  to   and  Re>rement   by  Our  Team   Splunk  via  weekly   deployments    !  83%  MTTR  Reduc>on     !  Over  1,400  Users  on     !  Analyzing  more  data   Outage  Avoidance   a  Regular  Basis  !  sets  in  Splunk  UI  from   Hadoop  &  Cassandra     25  
  • 26.    Ques>ons?  
  • 27. Sessions will resume at 11:25am Page 27