HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Like this? Share it with your network

Share

HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

  • 2,532 views
Uploaded on

Presented by: Jonathan Natkins (WibiData) and Juliet Hougland (WibiData)

Presented by: Jonathan Natkins (WibiData) and Juliet Hougland (WibiData)

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,532
On Slideshare
2,455
From Embeds
77
Number of Embeds
2

Actions

Shares
Downloads
97
Comments
0
Likes
8

Embeds 77

http://www.kiji.org 66
http://www-new.kiji.org 11

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Real-­‐Time  Model  Scoring  in   Recommender  Systems   (c)  2013  WibiData,  Inc.      Juliet  Hougland  and  Jonathan  Natkins  
  • 2. Why  Real-­‐Time?  
  • 3. Who  Are  We?   •  Jon  "Na@y"  Natkins  (@na@yice)   •  Field  Engineer  at  WibiData   •  Before  that,  Cloudera  SoJware  Engineer   •  Before  that,  VerMca  SoJware/Field  Engineer   •  Juliet  Hougland  (@JulietHougland)   •  PlaPorm  Engineer  at  WibiData   •  MS  in  Applied  Math  and  BA  in  Math-­‐Physics  
  • 4. What  is  Kiji?   The  Kiji  Project  is  a   modular,  open-­‐source   framework  that  enables   developers  and  analysts   to  collect,  analyze  and   use  data  in  real-­‐Mme   applicaMons.   •  kiji.org   •  github.com/kijiproject  
  • 5. Genera<ng  Recommenda<ons  
  • 6. Genera<ng  Recommenda<ons  
  • 7. Modeling  with  KijiMR   Producers   •  Operates  on  a  single  row  in  a  table.   •  Generate  derived  data:   o  Apply  a  classifier   o  Assign  a  user  to  a  cluster  or  segment   o  Recommend  new  items   Gatherers   •  Mapper  with  KijiTable  input.   •  Used  when  training  models.  
  • 8. Genera<ng  Recommenda<ons  
  • 9. Genera<ng  Recommenda<ons  
  • 10. Genera<ng  Recommenda<ons  
  • 11. Batch  Isn't  Good  For  Everything  
  • 12. Batch  Isn't  Good  For  Everything  
  • 13. Batch  Isn't  Good  For  Everything  
  • 14. Fresheners  Compute  Lazily   Freshness   Policy   Read  a  column   Get  from  HBase   Fresh?  Yes,  return  to  client   KijiScoring  API   HBase  
  • 15. Fresheners  Compute  Lazily   Freshness   Policy   Read  a  column   Get  from  HBase   Fresh?   Yes,  return  to  client   KijiScoring  API   HBase   Producer   Freshen   Cache  for  next  Mme  
  • 16. How  can  we  make  "freshenable"   models?   Population interests change slowly Individual interests change quickly
  • 17. How  can  we  make  "freshenable"   models?   Population interests change slowly Individual interests change quickly Models  don't  need  to   retrained  frequently   ApplicaMon  of  a  model   should  be  fast  
  • 18. How  can  we  make  "freshenable"   models?   Individual interests change quickly ApplicaMon  of  a  model   should  be  fast  •  Train  a  model  over  your   enMre  data  set   •  Save  fi@ed  model   parameters  to  a  file,  or   another  table   •  Access  the  model   parameters  through  a   KeyValueStore  when   scoring  new  data  with  a   producer.  
  • 19. More  Modeling  with  KijiMR   KeyValueStores   •  Allows  access  to  external  data  in  Producers  and   Gatherers.   •  Supports  various  file  formats  as  well  as  tables.   •  Makes  joining  dataset  together  very  easy.   •  The  mechanism  for  accessing  fi@ed  model   parameters  when  freshening.  
  • 20. •  A real-time product recommendation system •  Content-based model using product descriptions and TF-IDF KijiShopping   UsersKijiShopping Web Application KijiSchema Avro, HBase KijiMR MapReduce KijiScoring
  • 21. KijiShopping  Data  Collec<on   •  User Logins •  Product Information o  Names, descriptions, SKU information •  User Ratings o  Explicit ratings from users How do we go from data to recommendations?
  • 22. Finding  Useful  Features   •  TF-IDF
  • 23. TF-­‐IDF   •  Term Frequency o  How often does this term appear in this document? •  Document Frequency o  How many documents does this term appear in? •  TF-IDF o  How important is this term to this document? •  In KijiShopping, each is a separate job
  • 24. •  Written as a Producer o  Executed on the Product table as a Map-only job o  WordCount on a per-record basis Compu<ng  Term  Frequency   HBase Read Product Description Count Words in Product Description Write Word Counts Back
  • 25. •  Written as a Gatherer o  Executed on the Product table as a MapReduce job o  Groups by words Compu<ng  Document  Frequency   HBase Read Term Frequencies Map Emit (Word, 1) Write Document Frequencies HDFS Reduce Group By Word
  • 26. •  Written as a Producer o  Executed on the Product table as a Map-only job o  Pulls in Document Frequencies as a KVStore Compu<ng  TF-­‐IDF   HBase Read Term Frequencies Divide TF by DF Write TF-IDFs Back HDFS Read Document Frequencies via KVStore
  • 27. •  Batch training process •  Associations stored in a model table Associa<ng  Words  with  Products   gourmet knife "gourmet" Products "knife" Products tfidfgourmet tfidfknife
  • 28. Determine  a  User's  Preferred  Words   •  Stored in a user table Natty gourmet knife wgourmet wknife
  • 29. •  Producers incorporate models using KeyValueStores Combining  User  Ra<ngs  and  Models   Natty gourmet knife "gourmet" Products "knife" Products wgourmet wknife tfidfgourmet tfidfknife
  • 30. Genera<ng  a  Recommenda<on   •  Pick the best products for your user
  • 31. KijiShopping   The  model  was  built  with  KijiMR-­‐  an   extension  of  Hadoop  MapReduce.      
  • 32. KijiShopping   The  model  was  built  with  KijiMR-­‐  an   extension  of  Hadoop  MapReduce.      
  • 33. KijiExpress  Modeling  Lifecycle  
  • 34. Want  to  know  more?   •  The Kiji Project o  kiji.org o  github.com/kijiproject •  KijiShopping o  github.com/wibidata/kiji-shopping Questions about this presentation? o  juliet@wibidata.com o  natty@wibidata.com
  • 35. Want  to  know  more?   •  Come see us at the WibiData booth •  Join us at KijiCon tomorrow