Lessons learned from building Demand Side Platform

2,567 views

Published on

Published in: Software

Lessons learned from building Demand Side Platform

  1. 1. LESSONS LEARNED FROM BUILDING DSP IN BIDLAB Bartosz Bogacki <bbogacki@bidlab.pl>
  2. 2. CTO, CODER, ROCK CLIMBER • current: • Chief Technology Officer at Bidlab • previous: • IT Director at InternetowyKantor.pl SA • Software Architect / Project Manager at Wolters Kluwer Polska • find out more (if you care): • linkedin.com/in/bartoszbogacki
  3. 3. BIDLAB IS A DSP DSP stands for Demand Side Platform
  4. 4. WHATTOOLS DO WE USE?
  5. 5. HOW ? • Learn how generic solutions work for the biggest in the industry: • highscalability.com • blog.twitter.com • techblog.netflix.com • codeascraft.com • etc.
  6. 6. HOW ? • github.com/facebook • github.com/twitter • github.com/Netflix • github.com/google • github.com/etsy • github.com/Instagram
  7. 7. DON’T BELIEVE EVERYTHING YOU CAN READ!
  8. 8. BE AWARETHAT • There’s a lot of crappy blog posts • There’s a lot of ugly shortcuts on Stack Overflow • There’s a lot of poor quality comparison tests (comparing apples and oranges, incorrectly measuring or not isolating measured feature properly) • Others have often different environment / data / usage scenarios than you!
  9. 9. DESIGNYOUR CORE ARCHITECTURE
  10. 10. USETECHNOLOGY STACK THATYOU KNOW WELL
  11. 11. CHOOSE GOOD TOOLS AND LIBRARIES • Learn how it works …really! • Learn what are the constraints and weak points • Check if it is supported by active community • Use open source or pay for support (startup programs are for kamikaze)
  12. 12. USE CLOUD HOSTING BUT… • Don’t get locked-in !! • Always test machine performance (often) • Monitor CPU steal time • Decide what you need (IO / memory / cpu / storage) • Be aware of roundtrip time
  13. 13. YOU WON’T BE ABLETO DO ALL PROCESSING ON-LINE
  14. 14. …SO RESPOND NOW, AND PROCESS LATER
  15. 15. DECIDE WHATYOU CAN DO ASYNCHRONOUSLY
  16. 16. DECIDE WHATYOU CAN DO AS A BATCH JOB
  17. 17. USE ASYNC WISELY • Programming language mechanisms (threads, futures, reactive pattern, etc.) • Simple Queuing Systems (Amazon SQS, Redis pub sub, etc.) • Advanced Message Queueing Protocol - AMQP (zeromq, rabbitmq, hornetq, etc.)
  18. 18. BATCH! (IFYOU CAN) • Process off-line as much as you can • crontab jobs • job execution frameworks (like quartz)
  19. 19. CACHE AS MUCH ASYOU CAN !
  20. 20. CACHE GUIDELINES • If you deliver static content - do it from cache (like varnish) • Use in memory database (like memcached or redis) to cache data or subresults for your application • Use lightweight inapp cache to lower communication cost (like Guava Loading Cache) • Have a strategy for feeding and invalidating of your cache at each level
  21. 21. THINK ABOUT HA • Have a HA plan, but do not implement, until you really need it. Until then - do (and test) backups! :) • Most of technologies that you would use have recommended fault-tolerance solution, do not invent it by yourself !
  22. 22. GATHER PRODUCTION DATA ANDTHEN TEST YOUR CONCEPTS &TOOLS
  23. 23. DON’T ASSUME ! • Use real, production data if you can! • The real life is often more complex than you thought at the beginning (typos, data consistency, exceptions)
  24. 24. DO FUNCTIONAL & PERFORMANCE TESTING • Use great tools, don’t invent the wheel • xUnit • soapui • apache-benchmark (ab) • jmeter
  25. 25. PROFILE EARLY & OFTEN • Know your application from the execution perspective • Know your hotspots • VisualVM !!
  26. 26. PROFILE ”ENVIRONMENT”
  27. 27. OPTIMIZEYOUR DATA ACCESS redis-faina
  28. 28. TUNE OPERATING SYSTEM • Set sysctls for high load systems • Set system limits for high load systems • Don’t swap
  29. 29. TUNEYOUR GC (IFYOU HAVE ONE ;) • …or at least monitor it, to know how it hits your performance • Monitor & tune GC of used subsystems (Cassandra, Tomcat, Hadoop,Apache- Spark, etc.)
  30. 30. MONITOR AS MUCH ASYOU CAN
  31. 31. BUILD MONITORING INYOUR SOFTWARE
  32. 32. GREATTOOLSTO USE FOR MONITORING • Graphite (graphite.wikidot.com) • New Relic (newrelic.com)
  33. 33. MONITOR ERRORS sentry (github.com/getsentry/sentry)
  34. 34. MONITOR CPU USAGE, LOAD AND STEAL top, htop, atop, etc.
  35. 35. CROSS-MONITOR SYSTEM PARAMETERS dstat
  36. 36. LEARN, EVALUATE & SCALE
  37. 37. PUT AN EFFORTTO BUILD AUTOMATED ENVIRONMENT • To build your software (maven, gradle, etc.) • To test your software (junit, soapui, jmeter, etc.) • To deploy your software (jenkins)
  38. 38. UNDERSTANDYOUR DATA Google refine (code.google.com/p/google-refine/)
  39. 39. KEEPTHINGS SIMPLE
  40. 40. LOG A LOT :)
  41. 41. WHENYOU DEPLOY OR MIGRATE… HAVE A PLAN!
  42. 42. …AND ALWAYS KNOW HOW TO ROLL BACK
  43. 43. NEWTECHNOLOGY IS FRAGILE !!
  44. 44. THE BUG IS IN YOUR CODE
  45. 45. YOU WILL FAIL MANYTIMES…
  46. 46. …BUT DON’T GIVE UP !
  47. 47. THANKS!
  48. 48. we’re hiring ! mail me: bbogacki@bidlab.pl

×