Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data Day LA 2016/ Big Data Track - Real Time Analytics with Druid - Guillaume Torch, Big Data Engineer - GumGum

392 views

Published on

GumGum uses Druid to ingest more than 30 billion events every day, which can be queried almost as soon as they happen with a very low response time. This is a tell-all talk about GumGum's love story with Druid, how Druid works and how GumGum leverages Druid's capabilities.

Published in: Technology
  • Be the first to comment

Big Data Day LA 2016/ Big Data Track - Real Time Analytics with Druid - Guillaume Torch, Big Data Engineer - GumGum

  1. 1. Real Time Analytics at Scale with Guillaume Torche Saturday 9th July Big Data LA 2016
  2. 2. What is analytics at scale ?
  3. 3. {"id":"165a0b33-fd2d-4f45-83e4-a373cc4e6333","t":1435633205023,"cl":"js","ua":"Mozilla/5.0 (iPhone; CPU iPhone OS 8_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) GSA/6.0.51363 Mobile/12F70 Safari/600.1.4","ip":"70.109.46.30","cc":"US","rg":"CA","ct":"Santa Maria","pc":"93455","mc":855,"isp":"Verizon Internet Services","bf":"6c52d0492fa92bd912d3a53a3c864e5a47008779","vst":"62784988-5d30-40a2-a334- 43e456499ac6"},{"v":"1.1","e":"view","si":632,"t":"56c89c04","ab":25465,"pv":"6cff666d-92c5- 42ac-9f94-39a40f9f297d","pu":"http://www.nytimes.com/2016/01/08/business/international/a-new- economic-era-for-china-goes-off-the- rails.html?hp&action=click&pgtype=Homepage&clickSource=story-heading&module=first-column- region&region=top-news&WT.nav=top- news&_r=0","af":false,"rpm":0.65999997,"pc":0.25,"nc":0,"dsp":27,"dai":"pqdmnbu","dci":"rpz224 3","do":"*.nytimes.com","vi":3,"pi":1134,"pri":3,"uti":16,"adi":300,"ei":3,"spi":172,"atmi":20 8,"ct":"rtb","cli":3,"cid":1256}
  4. 4. What does Real Time mean?
  5. 5. What does Real Time mean? Response time Data freshness Response Time
  6. 6. Data Freshness
  7. 7. About me Guillaume Torche Born and raised in France First experience in Big Data at Aubay in Paris Started working for GumGum 2 years ago About one year and a half of experience with Druid
  8. 8. Invented In Image advertising in 2008 http://gumgum.com/gallery Processing 2.6B image impressions / month 10B impressions / month 10B events / day 2000 premium publishers 132 employees - 37.5% YOY growth
  9. 9. Interactive data exploration Fast results Access fresh data
  10. 10. Before we met Big Data MySQL for Real Time reporting Data aggregation with Storm Batch updates of MySQL metrics
  11. 11. Brief of Druid Built by another Ad tech company Used by top tech companies
  12. 12. Key features Arbitrary Slice-N-Dice data Fast aggregations on time series data Highly available Real Time Distributed
  13. 13. Overall Architecture
  14. 14. Queries Broker Realtime Historical Stream Deep storage
  15. 15. Data Balance & Replication
  16. 16. Queries Broker Realtime Historical Stream Deep storage Coordinator
  17. 17. Data Ingestion
  18. 18. Queries Broker Realtime Historical Stream Deep storage Coordinator Batch Data Ingestion Ingestion
  19. 19. Why is Druid fast?
  20. 20. Column compression - Dictionary encoding Timestamp Product Vertical Ad impressions Clicks Revenue 2015-01-01T00:01:35Z In-Image Gaming 15 3 10 2015-01-04T00:03:48Z In-Screen Gaming 5 1 3 2015-01-03T00:08:05Z Native Fashion 2 0 0.5 2015-01-02T00:02:35Z In-Image Fashion 25 15 50 ● Create internal ids: ○ Gaming => 0, Fashion => 1 ● Store column: ○ Vertical => [ 0 0 1 1 ]
  21. 21. Inverted Index Row Vertical 0 Gaming 1 Gaming 2 Fashion 3 Fashion Gaming [1, 1, 0, 0] Fashion [0, 0, 1, 1] Gaming OR Fashion [1, 1, 1, 1]
  22. 22. How does Druid achieve low latency ingestion?
  23. 23. Realtime Stream
  24. 24. Write optimized Read optimized Flush S1 S2 S3 S4 S5Final Segment Historical
  25. 25. https://www.linkedin.com/in/guillaume-torche-4a209790 http://imply.io/ http://druid.io http://gumgum.com/careers

×