Philipp Kandal , CTO, Skobbler - Big data on a small budget


Published on

Lessons learned while building a solution to crunch 100 billion+ positions for better navigation algorithms. This talk should highlight how you can employ big data technology on commodity hardware and without spending a fortune on it.
More details on:

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Philipp Kandal , CTO, Skobbler - Big data on a small budget

  1. 1. Big data on a small budget
  2. 2. What do I know about big data? - skobbler logs all positions from our users (100 billion+) - > 10TB of data from users - Products / revenues significantly Improved with Business Intelligence Big data on a small budget @apphil #2
  3. 3. Why should you learn about big data?  Harvard Business Review: “Data Scientist: The Sexiest Job of the 21st Century”  Obama became president of the US in big parts due to the use of big data…  World class sports teams enhance their performance by big data  Amazon, Google, Facebook, etc. have all their devprocesses by now data-driven Big data on a small budget @apphil #3
  4. 4. What are some great use-cases for big data?  Analyzing of log files and user behavior (and predictions about future behavior)  A/B testing and automatic optimization of functionality  Improving monetization (e.g. ad optimization, etc.)  Checking adoption and usage of new features Big data on a small budget @apphil #4
  5. 5. When better not to rely on big data?  When qualitative feedback is better than quantitative one (e.g. very early stage companies)  When you don’t have enough users yet to get statistically relevant results  When you do not know what you are optimizing for Big data on a small budget @apphil #5
  6. 6. How does a solid and simple workflow for big data analysis look like? Proces s Log Analyse Eval / Test Big data on a small budget Improv e @apphil #6
  7. 7. Tools / technologies for a good big data setup  Logging: MongoDB, VoltDB, Cassandra  Processing & Analyzing / Storing: Hadoop & Hbase (batch), Storm (real-time), Samza (real-time)  Optimizing: Mahout (machine learning) Big data on a small budget @apphil #7
  8. 8. How can you build this without breaking the bank? - Analyse / process Async - Cheap dedicated servers (vs. cloud) - Use Open / Free Software Big data on a small budget @apphil #8
  9. 9. Key cost factor: Real-time, near-time vs. batch - Real-time much more expensive than batch - Leverage as much preprocessing as possible - Try using in-memory technology for realtime analytics Big data on a small budget @apphil #9
  10. 10. #1 Log: Initially as much data as feasible should be logged so it’s available later - Define interesting data (rather log too much if unsure) - Upload / collect data - Decide on real-time, neartime or batch processing in the chain Big data on a small budget @apphil #10
  11. 11. #2 Process: Enhance the data and make it as rich as possible and easy to query - Move data to processing environment - Run logged data through processing chain so it can be queried - Enhance the logged data with any additional data available (e.g. geography, social data, user data, etc.) Big data on a small budget @apphil
  12. 12. #3 Analyse: Cluster the data in meaningful groups and compare it Big data on a small budget - Define Key performance Indicators (KPI) - Cluster data in a meaningful way (e.g. by geography, time of day, customer past behaviour) - Compare data vs. reference sets @apphil #12
  13. 13. #4 Improve: Learn from analysis where your challenges are to optimize behavior - Manually / Automatically adjust features (e.g. lower prices in certain regions, etc.) - Develop A/B testing scenarios and formulate improvement theories Big data on a small budget @apphil #13
  14. 14. #5 Evaluate  Check if the KPIs improve after applying the changes  Accept changes that improved your users behavior / reject changes that kept them the same  Define which additional logs you might need to better cluster / identify behaviour  Go back to step #1 Big data on a small budget @apphil #14
  15. 15. #1 Log: Practical example on how this works at skobbler  Software version  Routing profile used  Device  Raw Positions  Geography (e.g. country)  Rating of the route (optional)  Destination reached (yes / no)  Etc. Big data on a small budget @apphil #15
  16. 16. #2 Process: Enhance and split the data based on drives and segments  Combine the data on a per drive basis (= session)  Combine the data on a per segment basis (= how fast are people driving on a street versus our estimate)  Identify key behavior across the route (e.g. reroutings, etc.) Big data on a small budget @apphil #16
  17. 17. Example: Real time analysis with Twitter Storm framework to detect road changes Example visualization of drives in last five minutes (real-time) Big data on a small budget @apphil #17
  18. 18. Example: Historic driving patterns (processed with Hadoop / HBase) Big data on a small budget @apphil #18
  19. 19. #3 Analyse: Try to see in which areas our routing is not optimal  KPIs are:  Route rating (if given)  # of re-routings (the smaller the better)  Time to destination vs. estimation by routing  Cluster the data by  Routing algorithm (and parameters used)  Geography Big data on a small budget @apphil #19
  20. 20. #4 Improve: Come up with strategies to improve routing experience based on data  For future routes improve the estimation on time taken on a segment vs. time actually travelled  Alter routing parameters based on country specifics to get better results (e.g. in Germany people drive faster on the Autobahn) Big data on a small budget @apphil #20
  21. 21. #5 Evaluate: Deploy the changes and compare them to reference data - Deploy changes to production and compare ratings / timings vs. base values (~weekly) - Verify if other parameters such as usage, etc. also improve Big data on a small budget @apphil #21
  22. 22. Summary: Big data can drive big value but stay affordable Simple formula: Log -> Process -> Analyze -> Improve -> Evaluate = Success Big data on a small budget @apphil #22
  23. 23. Thank you for your attention! Get in Touch: Phone: +49-172-4597015 Follow me on .com/apphil