Vantrix hunk


Vantrix use case of Hadoop with Hunk - Splunk Analytics for Hadoop

Published in: Software, Technology, Business
  1. 1. Mobile  “Big”  Data  Analy2cs   Mark  Hopper  –  Vantrix   Raanan  Dagan  -­‐  Splunk   1  
  2. 2. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   Hunk  -­‐  Integrated  Analy1cs  Pla5orm   Full-­‐featured,   Integrated   Product   Insights  for   Everyone   Works  with   What  You   Have  Today   2   Explore   Visualize   Dashboard s   Share  Analyze   Hadoop  Clusters   NoSQL  and  Other  Data  Stores   Hadoop  Client  Libraries   Streaming  Resource  Libraries  
  3. 3. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   Hunk  –  Unique       3   1.  Run  Na1vely  in  Hadoop:   –  Use  Hadoop  MapReduce     2.  Mixed  Mode:     –  Allows  for  data  Preview   3.  Auto  deploy  SplunkD  to  DataNodes:   –  On  the  fly  Indexing   4.  Access  Control:   –  Allows  for  many  users  /  many  Hadoop  directories  /  support   Kerberos       5. Schema  On  the  Fly  
  4. 4. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   What  is  New  in  Hunk  6.1   4   1.  Report  Accelera1on:   –  Get  results  in  seconds     2.  Hive  Schema:     –  Expose  User  Created  Schema   3.  Mul1ple  File  Formats:   –  Parquet,  Sequence,  ORC,  RC   4.  Pass-­‐Through  Authen1ca1on:   –  Splunk  users  iden2fied  in  Hadoop   5.  Streaming  Resource  Libraries        
  6. 6. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   Our  Company   Global  Capability   6   •  Established  2004   •  Mobile  media  experts  –  transcoding,  delivery,  op1miza1on   •  40+  patent  families  –  leading  media  university  research  rela1onship   •  HQ  in  Montreal;  offices  in  Seaale,  London,  Sydney   •  65+  operator  deployments  globally  
  7. 7. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   Our  Product  Lines   7   Bandwidth   Op1mizer   Mobile  Message    Op1mizer   Mul1screen   Video  Pla5orm   Op1mize  delivery  and  Quality  of   Experience  of  ‘Over  The  Top’  services   across  Mobile  Networks   Assure  Quality  of  Experience  and   interoperability  of  Mobile  Mul1media   Messaging   High  Density  Transcoding  and  Op1mized   Delivery  of  Video  across  all  Devices  and   Networks  
  8. 8. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   The  opportunity  for  Analy1cs   8 Vantrix  Gateway   KPI  Dashboard   Analy1cs   Average  Operator  –  25  Million  records  /  day  
  9. 9. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   KPIs  answer  WHAT?   Date  and  Time   Device   Session  Volumes   Data  Volume   Video  Session   Quality   Bitrates   Media  Codec   Media   Container   Media  Size   Web  &  Video   sites   Video   resolu1on   Video  frame  rates   Video  dimensions   Video  length   Delivery  protocol   Loca1on   Media  Types   Session  Length   Video  stall  1me  
  10. 10. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   Analy1cs  explores  WHY?   Date  and  Time   Device   Session  Volumes   Data  Volume   System  Topology   Bitrates   Media  Codec   Media   Container   Media  Size   Web  &  Video   sites   Video   resolu1on   Video  frame  rates   Video  dimensions   Video  stall  1me   Video  length   Delivery  protocol   Loca1on   Media  Types   Session  Length   Helping  our  customers  plan  their  business  strategies   Video  Session   Quality  
  11. 11. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   Product  Management  requirements  to  Engineering   11 Scale  to  Manage  lots  and   lots  of  data   High  Performance   Low  Hardware  Footprint   Highly  Flexible  Data   Structure   Flexible  and  Simple   Repor1ng  UI   Low  Cost  of  Goods   Oh..  and  be  ready  to  go  commercial  in  90  days   Manage  1yr  of    data   80  GB  or  25M  Records/Day   30TB  or  10  Bil  Recs/year   Fit  within  ½  Telco   Rack   TL  –  <1hr  for  a  day  of  data   Queries  average  <  30  secs   Future  proof  design   for  new  use  cases   Support  solu1on   margin  targets   Easily  explore  data  in   new  ways  via  a   simple  UI   Use  Cases  Required  
  12. 12. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   Enter  Splunk  –  Product  Manager’s  best  friend   12 Out  of  the  box  worked  on  our  inconsistent  record  structure   Immediately  iden1fied  the  key  record  fields   Automa1cally  created  new  fields  (e.g.  URL  tags)   Automa1cally  indexed  and  counted  field  value  occurrences   Proved  invaluable  for  iden1fying  and  exploring  use  cases  
  13. 13. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   Example  of  the  exploratory  power   13 No  query  scrip2ng,  No  pre-­‐determined  searches:  point  and  click  explora2on  
  14. 14. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   Point  and  click  explora1on  of  complex  query  results   14 Rapidly  explore  and  uncover  the  story  behind  the  story  
  15. 15. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   The  Vantrix  BigData  pla5orm  architecture   15 High  Density  compu2ng  cluster     Virtualiza1on,  Orchestra1on   SymKloud  Cluster  (16RU)   144  CPU(Hadoop)  Nodes,     69  TB  SSD  Storage   Big  Data  Processing   Analy1cs  Applica1on   Private  cloud  management  layer   BigData  Filesystem  and   MapReduce  architecture     Data  explora2on  and  repor2ng   applica2on  layer    
  16. 16. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   Hadoop  and  Hunk   16 Hadoop  Cluster   Log  Files   Hunk  Hunk   Hunk  Hunk   Hunk  Hunk  Hunk  Hunk   •  Hadoop   §  An  open-­‐source  Framework   §  Used  by  Facebook,  Google,  Yahoo   §  Manage  and  query  vast  amounts   of  unstructured  data   §  Highly  scalable   §  Built  in  redundancy   •  2  Key  Capabili1es   §  Distributed  Filesystem   §  MapReduce     Transparent  UI  migra2on  from  Splunk  to  Hunk  
  17. 17. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   Performance  -­‐  Small  Scale  system   17 •  10  Million  subscribers  generate:   §  80GB  of  raw  session  log  data  /  day   §  26  Million  video  data  session  records     •  Transform  and  Load   §  1.6TB  in  27  mins   §  1GB/Second     §  Projec2ng  2GB/Second  afer  tuning   •  Hunk  Query   §  20  sec  –  search  through  27M  events   §  Returning  4.7M  events   Virtualiza1on,  Orchestra1on   SymKloud  Cluster  (4RU)   28  CPU  Nodes,  14  TB  SSD  Storage   Big  Data  Processing   Analy1cs  Applica1on  
  19. 19. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   Live  Event  impact   19 The  key  Olympic  Hockey  games  drove  significant  peaks  
  20. 20. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   Video  site  characteris1cs   20 YouTube  dominates  the  consump2on  of  low  bitrate  encoded  video  and  higher  encoding   rates.  A  number  of  sites  clearly  focus  on  a  narrow  set  of  encoding  rates  such  as   Instagram  domina2ng  the  1.2Mbps  bracket    
  21. 21. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   Mobile  Network  Performance   21
  22. 22. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   Video  session  quality   22 Generally  subscribers  are  receiving  a  good  video  session  quality,  however  some   video  sites  are  problema1c  
  23. 23. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   Session  quality  distribu1on  by  encoding  rate   23 Off  Peak   Peak   On  average  10%  of  sessions  below  1Mbps  (majority)  experience  stalling.  However   comparing  off-­‐peak  vs  peak      
  24. 24. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   Video  Stall  1mes   24 Low  Quality   HD   HQ   Stall  2me  for  HD  Videos  twice  that  of  HQ  
  25. 25. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   User  engagement/abandonment   25 Watch  ra1o  declines  for  videos  encoded  beyond  1.6Mbps    
  26. 26. Confiden2al  ©2014  Vantrix  –  All  rights  reserved   Rela1onship  between  engagement  and  video  session   quality   26 Subscribers  abandon  video  less  than  halfway  through  when  video  session  quality   is  bad  
  27. 27. Thank  you   Mark  Hopper  –  VP  Product  Line  Management   27