Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭

41,406 views

Published on

使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭

Published in: Data & Analytics
  • I pasted a website that might be helpful to you: ⇒ www.WritePaper.info ⇐ Good luck!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hi there! I just wanted to share a list of sites that helped me a lot during my studies: .................................................................................................................................... www.EssayWrite.best - Write an essay .................................................................................................................................... www.LitReview.xyz - Summary of books .................................................................................................................................... www.Coursework.best - Online coursework .................................................................................................................................... www.Dissertations.me - proquest dissertations .................................................................................................................................... www.ReMovie.club - Movies reviews .................................................................................................................................... www.WebSlides.vip - Best powerpoint presentations .................................................................................................................................... www.WritePaper.info - Write a research paper .................................................................................................................................... www.EddyHelp.com - Homework help online .................................................................................................................................... www.MyResumeHelp.net - Professional resume writing service .................................................................................................................................. www.HelpWriting.net - Help with writing any papers ......................................................................................................................................... Save so as not to lose
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • If we are speaking about saving time and money this site ⇒ HelpWriting.net ⇐ is going to be the best option!! I personally used lots of times and remain highly satisfied.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Grab 5 Free Shed Plans Now! Download 5 Full-Blown Shed Plans with Step-By-Step Instructions & Easy To Follow Blueprints! ☞☞☞ https://t.cn/A62YdZJg
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hi there! Essay Help For Students | Discount 10% for your first order! - Check our website! https://vk.cc/80SakO
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭

  1. 1. 使⽤用 Elasticsearch 及 Kibana 進⾏行 巨量資料搜尋及視覺化 Suiting @ DSC 2015
  2. 2. Who  Am  I 曾書庭  (@suitingtseng)   Data  Engineer   Gogolook  
  3. 3. Jeff, CEO
  4. 4. As  a  data  engineer  in  Gogolook… 書庭,請問我們的  DAU  是多少?
  5. 5. As  a  data  engineer  in  Gogolook… 書庭,請問我們的  DAU  是多少? 我算⼀一下...
  6. 6. As  a  data  engineer  in  Gogolook… 書庭,請問我們的  DAU  是多少? 資料好了嗎? 我算⼀一下...
  7. 7. As  a  data  engineer  in  Gogolook… 書庭,請問我們的  DAU  是多少? 資料好了嗎? 還在跑... 我算⼀一下...
  8. 8. As  a  data  engineer  in  Gogolook… 書庭,請問我們的  DAU  是多少?
  9. 9. As  a  data  engineer  in  Gogolook… 書庭,請問我們的  DAU  是多少? 可以分國家嗎?
  10. 10. As  a  data  engineer  in  Gogolook… 書庭,請問我們的  DAU  是多少? 可以分版本嗎? 可以分國家嗎?
  11. 11. As  a  data  engineer  in  Gogolook… 書庭,請問我們的  DAU  是多少? 可以分版本嗎? 可以看⼀一年嗎? 可以分國家嗎?
  12. 12. As  a  data  engineer  in  Gogolook… 書庭,請問我們的  DAU  是多少? 可以分版本嗎? 可以看⼀一年嗎? 可以嗎? 可以嗎? 可以嗎? 可以分國家嗎?
  13. 13. ⼀一句話激怒⼯工程師⼤大賽 • 可以分XX嗎
  14. 14. ⼀一句話激怒⼯工程師⼤大賽 • 可以分XX嗎   • 可以畫成圖嗎   • 可以給我  raw  data  嗎
  15. 15. ⼀一句話激怒⼯工程師⼤大賽 • 可以分XX嗎   • 可以畫成圖嗎   • 可以給我  raw  data  嗎   • 有沒有辦法知道  user  住哪裡   • 可以知道哪些  user  ⽐比較有錢嗎   • 下⾬雨天  user  會睡⽐比較晚嗎
  16. 16. Table  of  Contents • Problems   • Solution  Requirements   • Elasticsearch  &  Kibana   • In  Gogolook   • Future
  17. 17. Problems • Request-­‐response  model https://medium.com/@samson_hu/building-analytics-at-500px-92e9a7005c83
  18. 18. Problems • Request-­‐response  model   • Long  cycle https://medium.com/@samson_hu/building-analytics-at-500px-92e9a7005c83
  19. 19. Problems • Request-­‐response  model   • Long  cycle   • EAAB  (engineer  as  a  bottleneck) https://medium.com/@samson_hu/building-analytics-at-500px-92e9a7005c83
  20. 20. Problems • Request-­‐response  model   • Long  cycle   • EAAB  (engineer  as  a  bottleneck)   • HDC  (Hippo-­‐driven  company) https://medium.com/@samson_hu/building-analytics-at-500px-92e9a7005c83
  21. 21. Problems • Request-­‐response  model   • Long  cycle   • EAAB  (engineer  as  a  bottleneck)   • HDC  (Hippo-­‐driven  company)   • Lack  of  speed https://medium.com/@samson_hu/building-analytics-at-500px-92e9a7005c83
  22. 22. Problems • Request-­‐response  model   • Long  innovation  cycle   • EAAB  (engineer  as  a  bottleneck)   • HDC  (Hippo-­‐driven  company)   • Lack  of  speed   • =>  We  are  not  alone  (500px) https://medium.com/@samson_hu/building-analytics-at-500px-92e9a7005c83
  23. 23. Table  of  Contents • Problems   • Solution  Requirements   • Elasticsearch  &  Kibana   • In  Gogolook   • Future
  24. 24. Possible  solutions • Approach  1:
 SQL  monkey*  zoo http://www.slideshare.net/GloriaLau1/keynote-at-spark-summit/5
  25. 25. Possible  solutions • Approach  1:
 SQL  monkey  zoo   • Approach  2:
 Provide  limited  yet  easy  visualization http://www.slideshare.net/GloriaLau1/keynote-at-spark-summit/5
  26. 26. Requirement • Easy:  Even  CEO  can  use  it   • Fast:  Must  be  interactive   • Export:  Provide  the  csv  file   • Big:  Must  be  scalable   • 80-­‐20:  Solves  80%  problems
  27. 27. Table  of  Contents • Problems   • Solution  Requirements   • Elasticsearch  &  Kibana   • In  Gogolook   • Future
  28. 28. Elasticsearch • Lucene-­‐based  search  engine   • Document  storage  (JSON)   • Distributed,  scalable   • Serve  search  request  in  ms   • Build  index  for  every  field
  29. 29. Kibana • ES  visualization  tool   • No  code  required
  30. 30. ES  +  Kibana • Fast:  index  every  field   • Fast:  columnar  storage*   • Big:  born  distributed/scalable   • Easy:  GUI  without  code   • Export:  csv
  31. 31. Kibana • Discover   • Visualization   • Dashboard
  32. 32. Discover • Raw  data   • Check  data     • Find  dirty  data   • Try  query
  33. 33. Discover
  34. 34. Discover
  35. 35. Visualization • 8  visualization  types   • 9  group  methods   • 9  aggregation  values
  36. 36. Visualization
  37. 37. Visualization  types
  38. 38. Grouping  methods • Date  histogram   • Histogram   • Range  of  a  value   • Top  N   • Filter
  39. 39. Aggregation  values • Count   • Avg,  Sum,  Min,  Max,  S.D.   • Unique  count*  (Hyperloglog)   • Percentile*  (T-­‐digest)
  40. 40. Visualization • Same  concept,  different  graph   • FILTER   • GROUP   • AGGREGATE
  41. 41. DAU 書庭,請問我們的  DAU  是多少?
  42. 42. DAU  by  region 可以分國家嗎?
  43. 43. DAU  by  version 可以分版本嗎?
  44. 44. server  request  log
  45. 45. Request_total  per  minute GROUP  BY  DATE  HISTOGRAM(minute)   COUNT(*)
  46. 46. Request_total  by  path GROUP  BY  TOP(path,  5),  DATE  HISTOGRAM(minute)   COUNT(*)
  47. 47. Dashboard • Collection  of  visualizations
  48. 48. Community  tag  in  MongoDB
  49. 49. Dashboard
  50. 50. Dashboard  -­‐  1st  peak
  51. 51. Dashboard  -­‐  2nd  peak
  52. 52. Table  of  Contents • Problems   • Solution  Requirements   • Elasticsearch  &  Kibana   • In  Gogolook   • Future
  53. 53. In  Gogolook  (Aug.  2015) • 200M+  data  point  daily   • 150GB+  data  size  daily   • 24  dashboards,  160  visualizations   • Service  status  e.g.  requests_total   • Application  data  e.g.  tag_total   • Log  data  e.g.  button_ctr
  54. 54. In  Gogolook  (currently) • Log  user  behavior  on  features   • ⾃自⼰己的  log  ⾃自⼰己記  (Planner/PM)   • ⾃自⼰己的  board  ⾃自⼰己拉  (every  one)   • Monitor  performance  from  day  1
  55. 55. In  Gogolook
  56. 56. In  Gogolook • Tracking  Kibana  usage  by  Google  Analytics
  57. 57. Table  of  Contents • Problems   • Solution  Requirements   • Elasticsearch  &  Kibana   • In  Gogolook   • Future
  58. 58. In  Gogolook  (future) • Log  all  user-­‐event,  not  feature-­‐based
  59. 59. In  Gogolook  (future) • Log  all  user-­‐event,  not  feature-­‐based   • {
    "userid":  "suiting",
    "@timestamp":  "2015-­‐08-­‐23T11:48:00",
    "page":  "login",
    "button":  "register",
    "period":  3500
 }
  60. 60. In  Gogolook  (future) • Answer  questions A B 40% 60%
  61. 61. In  Gogolook  (future) • Answer  questions A B 40%, 7000ms 60%, 1500ms
  62. 62. Limit • No  SQL  JOIN   • Subquery
  63. 63. How  about  20% • Powerful  engine/tool  required   • Compute  engines:   • Google  BigQuery   • AWS  Redshift   • Visualization  tools:   • Tableau   • Periscope
  64. 64. Thank you
  65. 65. Questions ?

×