Copyright	  ©	  2012	  Splunk	  Inc.	  Experiences	  in	  Streaming	  Analy>cs	  at	  Petabyte	  	  (or	  larger)	  Scale	...
Big	  Data	  Comes	  from	  Machines	                    Volume	  	  |	  	  Velocity	  	  |	  	  Variety	  |	  Variability...
What	  Does	  Machine	  Data	  Look	  Like?	      Sources	  Order	  Processing	     Middleware	  	        Error	       Car...
Machine	  Data	  Contains	  Cri>cal	  Insights	      Sources	                                                             ...
Big	  Data	  Technologies	                                                                 Aster	  Data	          Cassandr...
Splunk	  Turns	  Machine	  Data	  into	  Real-­‐>me	  Insights	           Op>mized	  for	  real-­‐>me,	  low	  latency	  a...
Splunk	  Collects	  and	  Indexes	  Any	  Machine	  Data	                                 No	  upfront	  schema.	  No	  RD...
New	  Approach	  to	  Analyzing	  Heterogeneous	  Data	           Universal	  	                                Late	  Stru...
Splunk	  Search	  Processing	  Language	  Lots	  of	  random	  “hypothe>cal	  examples”	  from	  our	  Mugs	              ...
Opera>onal	  Intelligence	  for	  IT	  and	  Business	  Users	                 IT	  Opera>ons	  Management	               ...
Scalability	  to	  Tens	  of	  TBs/Day	  on	  Commodity	  Servers	                                                        ...
Splunk	  Big	  Data	  Solu>on	       Product-­‐based	                         Integrated	  and	  	                        ...
Accelerate	  Games	  Releases	  with	  Big	  Data	  Insight	  Splunk	  Use:	   –    Over	  10	  TB/day	  from	  scaled-­‐o...
!    Launched	  in	  November	  2008	  !    Over	  33	  million	  ac>ve	  customers	  (as	  of	  December	  2011)	  !    M...
Daily	  Uses	  of	  Splunk	                      Key	  AcDviDes	                                                         S...
Dashboards	             16	  
Complemen>ng	  BI	  and	  Hadoop	  CollecDon	  &	  OperaDonal	  Intelligence	                              Daily,	  weekly...
Turning	  Big	  Data	  Into	  Opera>onal	  Insights	  at	  Expedia	  
Formerly	  -­‐	  Sr.	  Director	  –	  	  Who	  	           Eddie	  Sa/erly	  Am	  I?	                                     ...
Where	  Splunk	  Comes	  In	  12,000+	  	               27,000+	  	                       1,000+	  	              227,000	...
SDK	  Integra>ons	  built	  for	  Cassandra	  	                                                   Why	  Splunk?	          ...
Splunk	  Adop>on	  Over	  Ten	  Months	  Use	  case:	  Business	  Unit	  	           Use	  case:	  Ecommerce	  Systems	  D...
Integrate	  External	  Data	               Extend	  search	  with	  lookups	  to	  external	  data	  sources.	            ...
Unique	  Characteris>cs	  of	  Splunk	  MapReduce	   •    Real-­‐>me	  temporal	  MapReduce	   •    Preview	  in-­‐progres...
Splunk	  Impact	  /	  Top	  Takeaways	             Splunk	  helped	  deliver	  Expedia	  an	  annual	  ROI	  of	  over	  $...
splunk.com/bigdata	  	  Ques>ons?	  
Sessions will resume at 11:25am                             Page 27
Upcoming SlideShare
Loading in …5
×

Experiences Streaming Analytics at Petabyte Scale

2,436 views

Published on

How do you keep up with the velocity and variety of data streaming in and get analytics on it even before persistence and replication in Hadoop? In this talk, we'll look at common architectural patterns being used today at companies such as Expedia, Groupon and Zynga that take advantage of Splunk to provide real-time collection, indexing and analysis of machine-generated big data with reliable event delivery to Hadoop. We'll also describe how to use Splunk's advanced search language to access data stored in Hadoop and rapidly analyze, report on and visualize results.

Published in: Technology

Experiences Streaming Analytics at Petabyte Scale

  1. 1. Copyright  ©  2012  Splunk  Inc.  Experiences  in  Streaming  Analy>cs  at  Petabyte    (or  larger)  Scale  Stephen  Sorkin  VP  Engineering,  Splunk  Inc.  Eddie  Sa/erly  Chief  Big  Data  Evangelist,  Splunk  Inc.  
  2. 2. Big  Data  Comes  from  Machines   Volume    |    Velocity    |    Variety  |  Variability   Machine-­‐generated  data  is  one  of  the   fastest  growing,  most  complex     GPS,  and  most  valuable  segments  of  big  data   RFID,   Hypervisor,   Web  Servers,   Email,  Messaging   Clickstreams,  Mobile,     Telephony,  IVR,  Databases,   Sensors,  Telema>cs,  Storage,   Servers,  Security  Devices,  Desktops     2  
  3. 3. What  Does  Machine  Data  Look  Like?   Sources  Order  Processing   Middleware     Error   Care  IVR   Twi/er   3  
  4. 4. Machine  Data  Contains  Cri>cal  Insights   Sources   Customer  ID   Order  ID   Product  ID  Order  Processing   Order  ID   Customer  ID   Middleware     Error   Time  Wai>ng  On  Hold   Care  IVR   Customer  ID   TwiZer   Customer’s  Tweet     ID   Twi/er   Company’s  TwiZer  ID   4  
  5. 5. Big  Data  Technologies   Aster  Data   Cassandra   Greenplum   Voldemort   Big  Table   CouchDB   Hadoop   Single   Single   RDBMS   SQL  &   NoSQL  RDBMS   Bigger   Sharding   Map/Reduce   RDBMS   Map  /  Reduce   Rela>onal  Database  (highly  structured)   Key/Value,  Tables  or     Temporal,  Unstructured   Other  (semi-­‐structured)   Heterogeneous   Time   5  
  6. 6. Splunk  Turns  Machine  Data  into  Real-­‐>me  Insights   Op>mized  for  real-­‐>me,  low  latency  and  interac>vity   Ad  hoc     search   Monitor     and  alert   Real-­‐Dme   CollecDon  and     Report  and   Indexing   analyze   Splunk  storage   Other   Custom     Stores   dashboards   Developer   PlaHorm   6  
  7. 7. Splunk  Collects  and  Indexes  Any  Machine  Data   No  upfront  schema.  No  RDBMS.  No  custom  connectors.  Customer     Outside  the  Facing  Data   Datacenter  !  Click-­‐stream  data   !  Manufacturing,  !  Shopping  cart  data   logis>cs…  !  Online  transac>on  data   !  CDRs  &  IPDRs   !  Power  consump>on   !  RFID  data   Logfiles   Configs   Messages   Traps     Metrics   Scripts   Changes   Tickets   !  GPS  data    Alerts   Windows   Linux/Unix   VirtualizaDon     ApplicaDons   Databases   Networking   !  Registry   !  Configura>ons   &  Cloud   !  Web  logs   !  Configura>ons   !  Configura>ons   !  Event  logs   !  syslog   !  Log4J,  JMS,  JMX   !  Audit/query   !  syslog   !  File  system   !  File  system   !  Hypervisor   !  .NET  events   logs   !  SNMP   ! sysinternals   !  ps,  iostat,  top   !  Guest  OS,  Apps   !  Code  and  scripts   !  Tables   !  neglow   !  Cloud   !  Schemas   7  
  8. 8. New  Approach  to  Analyzing  Heterogeneous  Data   Universal     Late  Structure   Analysis  and   Indexing   Binding   Visualiza>on  ! No  data  normaliza>on   ! Knowledge  applied  at   ! Normaliza>on  as  it’s  ! Automa>cally  handles   search-­‐>me   needed   >mestamps   ! No  briZle  schema  to  work   ! Faster  implementa>on  ! Parsers  not  required   around   ! Easy  search  language  ! Index  every  term  &   ! Mul>ple  views  into  the   ! Mul>ple  views  into  the   paZern  “blindly”   same  data   same  data  ! No  aZempt  to   ! Find  transac>ons,  paZerns   “understand”  up  front   and  trends   Rapid  >me-­‐to-­‐deploy:  hours  or  days   8  
  9. 9. Splunk  Search  Processing  Language  Lots  of  random  “hypothe>cal  examples”  from  our  Mugs   9
  10. 10. Opera>onal  Intelligence  for  IT  and  Business  Users   IT  Opera>ons  Management   Web  Intelligence   Applica>on  Management             Business  Analy>cs   Security  &  Compliance  Customer   LOB  Owners/   Support   Execu>ves   Opera>ons   Website/Business   Teams   Analysts   System   IT     Administrator   Execu>ves   Development     Security   Auditors   Teams   Analysts   10  
  11. 11. Scalability  to  Tens  of  TBs/Day  on  Commodity  Servers   Offload  search  load  to  Splunk  Search  Heads     Auto  load-­‐balanced  forwarding  to  as  many  Splunk  Indexers  as  you  need  to  index  terabytes/day   Send  data  from  1000s  of  servers  using  combina>on  of  Splunk  Forwarders,  syslog,  WMI,  message  queues,  or  other  remote  protocols   11  
  12. 12. Splunk  Big  Data  Solu>on   Product-­‐based   Integrated  and     Performance     Solu>on   End-­‐to-­‐end   at  scale  !  Easy  to  download  and   !  Collects  data  from  tens   !  Proven  at  mul>-­‐terabyte   deploy   of  thousands  of  sources   scale  per  day  !  Pre-­‐integrated,  end-­‐to-­‐ !  Advanced  real-­‐>me  and   !  Upwards  of  PB  under   end  func>onality   historical  analysis  of   management  !  Enterprise-­‐grade   data   !  4,000+  customers   features   !  Fast,  custom   visualiza>ons  for  IT  and   business  users   !  Developer  APIs  SDKs   12  
  13. 13. Accelerate  Games  Releases  with  Big  Data  Insight  Splunk  Use:   –  Over  10  TB/day  from  scaled-­‐out  cloud  and  physical  infrastructure   –  Data  indexed  includes  web  server  and  applica>on  logs  for  games   –  Splunk  for  opera>onal  visibility,  troubleshoo>ng  and  monitoring   –  Users  include:  game  opera>ons,  developers,  and  corporate  IT  Value  Delivered:   –  Faster  game  releases  with  real-­‐>me  visibility  into  produc>on  issues   –  Reduced  fault  resolu>on  >me  from  hours  to  minutes   –  Scale  ops  team  to  manage  and  monitor  growing  infrastructure   l  Leading  social  gaming  company   globally   l  232  million  monthly  ac>ve  users   l  60  million  daily  ac>ve  users     13  
  14. 14. !  Launched  in  November  2008  !  Over  33  million  ac>ve  customers  (as  of  December  2011)  !  More  than  11,000  employees  worldwide  !  Ac>ve  in  48  countries  !  Running  over  1,000  deals/day  worldwide  
  15. 15. Daily  Uses  of  Splunk   Key  AcDviDes   Splunk  Use  Cases  !  Guarantee  API  performance   !  All  log  data  is  available  through  Splunk  !   Monitor  API  data  usage   !  Dashboards  !  Early  access  to  key  business  metrics   !  No>fica>ons   (conversions,  funnel,  etc.)  !  End-­‐to-­‐end  tes>ng   >   !  Near  real-­‐>me  !  Ad  hoc  troubleshoo>ng   “Cannot  have  a  server  that  is   not  sending  data  into  Splunk”   15  
  16. 16. Dashboards   16  
  17. 17. Complemen>ng  BI  and  Hadoop  CollecDon  &  OperaDonal  Intelligence   Daily,  weekly,  monthly  metrics  across  promo>ons     offers  and  acceptance  rates   Applica>on  Performance  Management  (APM)     and  system  availability                                                    Hadoop   Machine  Data  ETL  –  highly  reliable  data  delivery     IntegraDon   to  HDFS   Data  Archival  &  Batch  Data  Science   Long-­‐term  data  warehousing  and  specialized,  batch   analy>cs   17  
  18. 18. Turning  Big  Data  Into  Opera>onal  Insights  at  Expedia  
  19. 19. Formerly  -­‐  Sr.  Director  –    Who     Eddie  Sa/erly  Am  I?   Architecture  &  Engineering,  Expedia   ! The  World’s  Largest     ! Discount  travel  site   Travel  Site   Hotwire®   ! First  $1B  Quarter  in  2011   ! 4,000+  Technology  Workers   ! 90  localized  Expedia.com®  and   ! Development  Team    Who  Is   Hotels.com®  sites   of  1,800  Expedia?   ! NASDAQ:  (EXPE)   19  
  20. 20. Where  Splunk  Comes  In  12,000+     27,000+     1,000+     227,000     Servers   Hosts   Source  Types   Sources   38  Indexers,    16  Search  heads   >  6.5TB  per  day  indexed  20+  Different  Solu>ons  for  RCA          All  Migrated  to  Splunk  in  3  Months   20  
  21. 21. SDK  Integra>ons  built  for  Cassandra     Why  Splunk?   Archiving  Data  to   Hadoop  for  batch   data  stores   analysis   Speed     of     Deployment   Splunkbase  Apps   Scales  via   Available  for   Commodity   Download   Hardware   Developers  Build   Aggrega>on  of     Custom  Apps  and   Log  Data  from     Dashboards   Any  Device   Simple  UI     for  IT  and     Business  Users   21  
  22. 22. Splunk  Adop>on  Over  Ten  Months  Use  case:  Business  Unit     Use  case:  Ecommerce  Systems  Data:  125GB/day   Data:  1.8TB/day  Systems:  1100   Systems:  8700  Deployment:  Jan.  2011   Deployment:  March  2011   Big  Data  Integra>on     Use  case:  App  Transac>ons     Data:  3TB/day   Ini>al  Pilot   Viral  Growth  from   Systems:  90TB  Data  Per  Mo.   Demonstrated  Value   Deployment:  1Q12-­‐2Q12   All  Devices,     All  Data  Centers   Use  case:  All  Devices   Data:  ~4TB/day   Systems:  ~21000   Deployment:  Aug.  2011   22  
  23. 23. Integrate  External  Data   Extend  search  with  lookups  to  external  data  sources.     LDAP,  AD   Watch  Lists   CMDB   Message   Stores   Reference   Lookups  Correlate  across  mul>ple  data  sources  and  data  sets  using  indexes  and  keys   23  
  24. 24. Unique  Characteris>cs  of  Splunk  MapReduce   •  Real-­‐>me  temporal  MapReduce   •  Preview  in-­‐progress  searches   •  Searching  works  on  any  devices   •  Simplified  Search  Language   24  
  25. 25. Splunk  Impact  /  Top  Takeaways   Splunk  helped  deliver  Expedia  an  annual  ROI  of  over  $11  Million   ROI  =  5x  original   Splunk  usage     More  data  =     Business  Case   is  viral   more  benefits  !  Tools  Consolida>on   !  50+  Apps  Developed     !  Adding  more  data  to   and  Re>rement   by  Our  Team   Splunk  via  weekly   deployments    !  83%  MTTR  Reduc>on     !  Over  1,400  Users  on     !  Analyzing  more  data   Outage  Avoidance   a  Regular  Basis  !  sets  in  Splunk  UI  from   Hadoop  &  Cassandra     25  
  26. 26. splunk.com/bigdata    Ques>ons?  
  27. 27. Sessions will resume at 11:25am Page 27

×