Analyzing Large-Scale User Data with Hadoop and HBase

2,027 views

Published on

WibiData's presentation on personalization and large-scale user data at Structure:Data 2012

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Analyzing Large-Scale User Data with Hadoop and HBase

  1. 1. Analyzing Large-Scale User Data with Hadoop and HBaseAaron Kimball – CTO WibiData, Inc.
  2. 2. We can now collect moredata than at any time inhistory.
  3. 3. Yesterday’s engineering challenge:Fitting the problem into thehardware.
  4. 4. Today’s constrainedresource is understanding.
  5. 5. How do we best apply data …to better serving our users?
  6. 6. The best products are user-centric• Intuitive UI• Continuously learning – Guided search – Smarter recommendations• More effective service
  7. 7. What are we building toward?
  8. 8. What are we building toward?
  9. 9. What are we building toward?
  10. 10. What are we building toward?
  11. 11. What are we building toward?
  12. 12. Requirements 1. Understand the user population
  13. 13. Requirements 2. Respond to users in real time
  14. 14. Requirements 3. Support graceful data evolution
  15. 15. Large-scale data science is hard• What does a user look like? – What data is available about the user? – Which features are important? – Which features are correlated?• How do I model this in MapReduce?• How do I serve results in a timely fashion?
  16. 16. Tools of the trade• Store all data about a user in one place• Support real-time get/put, as well as MapReduce
  17. 17. Tools of the trade • Use complex data types to model complex data • Support extended data models over time • Retain support for legacy systems using older models
  18. 18. Tools of the trade• Abstract computational model away from MapReduce• Support computation over all users… or one user at a time
  19. 19. : for set-top boxesViewing/recording history
  20. 20. : for set-top boxes Libraries Device and User AnalysisViewing/recording historyPersonalized offers and recommendations
  21. 21. : for set-top boxes Libraries Device and User AnalysisViewing/recording historyPersonalized offers and recommendations Analysis forproduct roadmap
  22. 22. : for set-top boxes Libraries Device and User AnalysisViewing/recording historyPersonalized offers and recommendations Analysis forproduct roadmap Tech support portal
  23. 23. : for set-top boxes Libraries Device and User AnalysisViewing/recording historyPersonalized offers and recommendations Improved Analysis for reports forproduct roadmap Tech support portal advertisers
  24. 24. The future• More personalization• Adaptive UIs (self arranging dashboards)• Targeted content, ads• More effective customer service
  25. 25. Conclusions• Applications are becoming increasingly user- centric• Data drives this capability, but harnessing it requires a new distributed architecture• The biggest challenge is allowing data scientists to effectively leverage the data
  26. 26. www.wibidata.com / @wibidata Aaron Kimball – aaron@wibidata.com

×