King hug uk

7,210 views
7,985 views

Published on

Dr Relational or: How I Learned to Stop Worrying and Love the Database (Andy Done, Data Warehouse Lead, King)

In the face of explosive growth King's Hadoop data warehouse simply wasn't scaling fast enough. Find out why King is extending its Big Data platform with MPP database ExaSol and processing its data 100s of times faster.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
7,210
On SlideShare
0
From Embeds
0
Number of Embeds
3,256
Actions
Shares
0
Downloads
55
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

King hug uk

  1. 1. © King.com Ltd 2013 – Public 2 Datab ase Relati onal
  2. 2. © King.com Ltd 2013 – Public Agenda 3 •  Welcome! •  A brief history of King •  King data platform evolution •  Enter Hive •  Hive + DB •  Hive + better DB •  Questions?
  3. 3. © King.com Ltd 2013 – Public A brief history of King 4
  4. 4. © King.com Ltd 2013 – Public Who? 5 A brief history of King
  5. 5. © King.com Ltd 2013 – Public Where? 6 A brief history of king
  6. 6. © King.com Ltd 2013 – Public Web, social, mobile 7 A brief history of King
  7. 7. © King.com Ltd 2013 – Public King in numbers 8 •  100 million daily active users •  1 billion game plays per day •  8 offices •  10 billion events per day •  Lots and lots of data… A brief history of King
  8. 8. © King.com Ltd 2013 – Public A brief history of me andy.done@king.com 9
  9. 9. © King.com Ltd 2013 – Public King data platform evolution 10
  10. 10. © King.com Ltd 2013 – Public Enter Hive 11
  11. 11. © King.com Ltd 2013 – Public The road to big 12 Enter Hive 0 50 100 150 200 250 300 350 2011-02-16 2011-03-04 2011-03-20 2011-04-05 2011-04-21 2011-05-07 2011-05-23 2011-06-08 2011-06-24 2011-07-10 2011-07-26 2011-08-11 2011-08-27 2011-09-12 2011-09-28 2011-10-14 2011-10-30 2011-11-15 2011-12-01 2011-12-17 2012-01-02 2012-01-18 2012-02-03 2012-02-19 2012-03-06 2012-03-22 2012-04-07 2012-04-23 2012-05-09 2012-05-25 2012-06-10 2012-06-26 2012-07-12 2012-07-28 2012-08-13 2012-08-29 2012-09-14 2012-09-30 2012-10-16 2012-11-01 2012-11-17 2012-12-03 2012-12-19 2013-01-04 2013-01-20 2013-02-05 2013-02-21 2013-03-09 2013-03-25 2013-04-10 2013-04-26 Compressedeventsgigabytes/day Browser Mobile 40 nodes Qlikview says no Infobright CE says no 10 nodes 20 nodes
  12. 12. © King.com Ltd 2013 – Public Scaling accomplished 13 Enter Hive
  13. 13. © King.com Ltd 2013 – Public Hive says… 14 Enter Hive
  14. 14. © King.com Ltd 2013 – Public Data exploration 15 •  COUNT(*) •  SELECT DISTINCT •  COUNT, SUM… GROUP BY date Enter Hive
  15. 15. © King.com Ltd 2013 – Public Hive + DB = ? 16
  16. 16. © King.com Ltd 2013 – Public Data platform 1.0 17 Hive + DB Games Event data Hive Report s Data scientis ts ETL
  17. 17. © King.com Ltd 2013 – Public Data platform 1.5 18 Hive + DB Games Event data Hive DB Report s Data scientis ts ETL
  18. 18. © King.com Ltd 2013 – Public Selection criteria 19 •  ‘Accessible’ pricing (free?) •  Single node •  Easy to set up •  Low maintenance Hive + DB
  19. 19. © King.com Ltd 2013 – Public Contenders ready 20 •  Infobright •  Columnar MySql engine •  Light tuning and hinting •  InfiniDB •  Columnar MySql engine •  Tuning-less •  Faster for our use case
  20. 20. © King.com Ltd 2013 – Public How’s that work out? 21 •  Paid its way •  Popular •  100s queries / day •  Stability •  Ceilings •  Screwed by mobile
  21. 21. © King.com Ltd 2013 – Public The road to big 22 Enter Hive 0 50 100 150 200 250 300 350 2011-02-16 2011-03-04 2011-03-20 2011-04-05 2011-04-21 2011-05-07 2011-05-23 2011-06-08 2011-06-24 2011-07-10 2011-07-26 2011-08-11 2011-08-27 2011-09-12 2011-09-28 2011-10-14 2011-10-30 2011-11-15 2011-12-01 2011-12-17 2012-01-02 2012-01-18 2012-02-03 2012-02-19 2012-03-06 2012-03-22 2012-04-07 2012-04-23 2012-05-09 2012-05-25 2012-06-10 2012-06-26 2012-07-12 2012-07-28 2012-08-13 2012-08-29 2012-09-14 2012-09-30 2012-10-16 2012-11-01 2012-11-17 2012-12-03 2012-12-19 2013-01-04 2013-01-20 2013-02-05 2013-02-21 2013-03-09 2013-03-25 2013-04-10 2013-04-26 Compressedeventsgigabytes/day Browser Mobile 40 nodes Qlikview says no Infobright CE says no 10 nodes 20 nodes InfiniDB
  22. 22. © King.com Ltd 2013 – Public ETL? 23
  23. 23. © King.com Ltd 2013 – Public Hive + better DB = ? 24
  24. 24. © King.com Ltd 2013 – Public Data platform 2.0 25 Hive + better DB Game Event data Hive Better DB Report s Data scientis ts ETL
  25. 25. © King.com Ltd 2013 – Public State of the market Jan 2013 26 •  Hadoop on steroids •  Hadapt… •  Impala •  Nouvaeu Data •  Platfora •  SIsense •  MPP analytics databases •  Vertica •  ExaSol Hive + better DB
  26. 26. © King.com Ltd 2013 – Public Contenders ready 27 Hive + better DB Feature ExaSol Vertica Processing In memory Disc optimised Administration Web based Command line Backup Web based Command line Resiliency Hot spare Gradual degradation Tuning Self tuning User tuning Licensing Allocated RAM Total storage Vendor Smaller Larger
  27. 27. © King.com Ltd 2013 – Public Disclaimers 28 •  Our data •  Our queries •  Our use case •  Our results Hive + better DB
  28. 28. © King.com Ltd 2013 – Public This is our data 29 Hive + better DB Table Row count Mobile dimension 161 m Social dimension 600 m Mobile facts 1 B Social facts 6.7 B
  29. 29. © King.com Ltd 2013 – Public Single query 30 Hive + better DB
  30. 30. © King.com Ltd 2013 – Public Single query 31 Hive + better DB
  31. 31. © King.com Ltd 2013 – Public Single query 32 Hive + better DB
  32. 32. © King.com Ltd 2013 – Public Single query 33 Hive + better DB
  33. 33. © King.com Ltd 2013 – Public Cluster stats 34 Hive + better DB Vertica ExaSol Hive InfiniDB Nodes 4 4 19 1 Cores 64 48 228 32 RAM 512 Gb 288 Gb 1216 Gb 300 Gb Discs 96 32 76 4 Hardware cost / USD $$$$ $$ $$ $ Total cost / USD $$$$$$ $$$$$ $$ $$
  34. 34. © King.com Ltd 2013 – Public Concurrency 2 35 Hive + better DB
  35. 35. © King.com Ltd 2013 – Public Concurrency 4 36 Hive + better DB
  36. 36. © King.com Ltd 2013 – Public Concurrency 8 37 Hive + better DB
  37. 37. © King.com Ltd 2013 – Public Concurrency 16 38 Hive + better DB
  38. 38. © King.com Ltd 2013 – Public Overall run time 39 Hive + better DB
  39. 39. © King.com Ltd 2013 – Public Picture:words 40 Hive + better DB $1.9m = 4 ExaSol nodes 420 Hive nodes
  40. 40. © King.com Ltd 2013 – Public This is a test 41 •  Ad hoc query tests •  DML •  INSERTs •  UPDATEs •  DELETEs Hive + better DB
  41. 41. © King.com Ltd 2013 – Public And in the real world 42 •  Faster processing times •  4.5 hours to 20 minutes •  Happier analysts •  Happier data warehouse engineers •  Happier ops Hive + better DB
  42. 42. © King.com Ltd 2013 – Public Conclusions 43 •  For structured workloads, consider a good analytic database to complement your Hadoop infrastructure •  ExaSol was an excellent fit for our use case •  We’ll let you know how we get on! Hive + better DB
  43. 43. © King.com Ltd 2013 – Public Questions? 44
  44. 44. © King.com Ltd 2013 – Public We’re hiring! 45
  45. 45. Thank you © King.com Ltd 2013 – Public 46

×