Real-­‐Time	
  Model	
  Scoring	
  in	
  
Recommender	
  Systems	
  
(c)	
  2013	
  WibiData,	
  Inc.	
  
	
  	
  Juliet	
...
Why	
  Real-­‐Time?	
  
Who	
  Are	
  We?	
  
•  Jon	
  "Na@y"	
  Natkins	
  (@na@yice)	
  
•  Field	
  Engineer	
  at	
  WibiData	
  
•  Before	
...
What	
  is	
  Kiji?	
  
The	
  Kiji	
  Project	
  is	
  a	
  
modular,	
  open-­‐source	
  
framework	
  that	
  enables	
...
Genera<ng	
  Recommenda<ons	
  
Genera<ng	
  Recommenda<ons	
  
Modeling	
  with	
  KijiMR	
  
Producers	
  
•  Operates	
  on	
  a	
  single	
  row	
  in	
  a	
  table.	
  
•  Generate	...
Genera<ng	
  Recommenda<ons	
  
Genera<ng	
  Recommenda<ons	
  
Genera<ng	
  Recommenda<ons	
  
Batch	
  Isn't	
  Good	
  For	
  Everything	
  
Batch	
  Isn't	
  Good	
  For	
  Everything	
  
Batch	
  Isn't	
  Good	
  For	
  Everything	
  
Fresheners	
  Compute	
  Lazily	
  
Freshness	
  
Policy	
  
Read	
  a	
  column	
  
Get	
  from	
  HBase	
  
Fresh?	
  Ye...
Fresheners	
  Compute	
  Lazily	
  
Freshness	
  
Policy	
  
Read	
  a	
  column	
  
Get	
  from	
  HBase	
  
Fresh?	
  
Y...
How	
  can	
  we	
  make	
  "freshenable"	
  
models?	
  
Population interests
change slowly
Individual interests
change q...
How	
  can	
  we	
  make	
  "freshenable"	
  
models?	
  
Population interests
change slowly
Individual interests
change q...
How	
  can	
  we	
  make	
  "freshenable"	
  
models?	
  
Individual interests
change quickly
ApplicaMon	
  of	
  a	
  mod...
More	
  Modeling	
  with	
  KijiMR	
  
KeyValueStores	
  
•  Allows	
  access	
  to	
  external	
  data	
  in	
  Producers...
•  A real-time product recommendation system
•  Content-based model using product
descriptions and TF-IDF
KijiShopping	
  ...
KijiShopping	
  Data	
  Collec<on	
  
•  User Logins
•  Product Information
o  Names, descriptions, SKU information
•  Use...
Finding	
  Useful	
  Features	
  
•  TF-IDF
TF-­‐IDF	
  
•  Term Frequency
o  How often does this term appear in this document?
•  Document Frequency
o  How many docu...
•  Written as a Producer
o  Executed on the Product table as a Map-only job
o  WordCount on a per-record basis
Compu<ng	
 ...
•  Written as a Gatherer
o  Executed on the Product table as a MapReduce job
o  Groups by words
Compu<ng	
  Document	
  Fr...
•  Written as a Producer
o  Executed on the Product table as a Map-only job
o  Pulls in Document Frequencies as a KVStore
...
•  Batch training process
•  Associations stored in a model table
Associa<ng	
  Words	
  with	
  Products	
  
gourmet
knif...
Determine	
  a	
  User's	
  Preferred	
  Words	
  
•  Stored in a user table
Natty
gourmet
knife
wgourmet
wknife
•  Producers incorporate models using
KeyValueStores
Combining	
  User	
  Ra<ngs	
  and	
  Models	
  
Natty
gourmet
knife
...
Genera<ng	
  a	
  Recommenda<on	
  
•  Pick the best products for your user
KijiShopping	
  
The	
  model	
  was	
  built	
  with	
  KijiMR-­‐	
  an	
  
extension	
  of	
  Hadoop	
  MapReduce.	
  
	...
KijiShopping	
  
The	
  model	
  was	
  built	
  with	
  KijiMR-­‐	
  an	
  
extension	
  of	
  Hadoop	
  MapReduce.	
  
	...
KijiExpress	
  Modeling	
  Lifecycle	
  
Want	
  to	
  know	
  more?	
  
•  The Kiji Project
o  kiji.org
o  github.com/kijiproject
•  KijiShopping
o  github.com/wi...
Want	
  to	
  know	
  more?	
  
•  Come see us at
the WibiData
booth
•  Join us at KijiCon
tomorrow
Upcoming SlideShare
Loading in...5
×

HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

2,278

Published on

Presented by: Jonathan Natkins (WibiData) and Juliet Hougland (WibiData)

Published in: Technology, Business
0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,278
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
108
Comments
0
Likes
10
Embeds 0
No embeds

No notes for slide

Transcript of "HBaseCon 2013: Real-Time Model Scoring in Recommender Systems "

  1. 1. Real-­‐Time  Model  Scoring  in   Recommender  Systems   (c)  2013  WibiData,  Inc.      Juliet  Hougland  and  Jonathan  Natkins  
  2. 2. Why  Real-­‐Time?  
  3. 3. Who  Are  We?   •  Jon  "Na@y"  Natkins  (@na@yice)   •  Field  Engineer  at  WibiData   •  Before  that,  Cloudera  SoJware  Engineer   •  Before  that,  VerMca  SoJware/Field  Engineer   •  Juliet  Hougland  (@JulietHougland)   •  PlaPorm  Engineer  at  WibiData   •  MS  in  Applied  Math  and  BA  in  Math-­‐Physics  
  4. 4. What  is  Kiji?   The  Kiji  Project  is  a   modular,  open-­‐source   framework  that  enables   developers  and  analysts   to  collect,  analyze  and   use  data  in  real-­‐Mme   applicaMons.   •  kiji.org   •  github.com/kijiproject  
  5. 5. Genera<ng  Recommenda<ons  
  6. 6. Genera<ng  Recommenda<ons  
  7. 7. Modeling  with  KijiMR   Producers   •  Operates  on  a  single  row  in  a  table.   •  Generate  derived  data:   o  Apply  a  classifier   o  Assign  a  user  to  a  cluster  or  segment   o  Recommend  new  items   Gatherers   •  Mapper  with  KijiTable  input.   •  Used  when  training  models.  
  8. 8. Genera<ng  Recommenda<ons  
  9. 9. Genera<ng  Recommenda<ons  
  10. 10. Genera<ng  Recommenda<ons  
  11. 11. Batch  Isn't  Good  For  Everything  
  12. 12. Batch  Isn't  Good  For  Everything  
  13. 13. Batch  Isn't  Good  For  Everything  
  14. 14. Fresheners  Compute  Lazily   Freshness   Policy   Read  a  column   Get  from  HBase   Fresh?  Yes,  return  to  client   KijiScoring  API   HBase  
  15. 15. Fresheners  Compute  Lazily   Freshness   Policy   Read  a  column   Get  from  HBase   Fresh?   Yes,  return  to  client   KijiScoring  API   HBase   Producer   Freshen   Cache  for  next  Mme  
  16. 16. How  can  we  make  "freshenable"   models?   Population interests change slowly Individual interests change quickly
  17. 17. How  can  we  make  "freshenable"   models?   Population interests change slowly Individual interests change quickly Models  don't  need  to   retrained  frequently   ApplicaMon  of  a  model   should  be  fast  
  18. 18. How  can  we  make  "freshenable"   models?   Individual interests change quickly ApplicaMon  of  a  model   should  be  fast  •  Train  a  model  over  your   enMre  data  set   •  Save  fi@ed  model   parameters  to  a  file,  or   another  table   •  Access  the  model   parameters  through  a   KeyValueStore  when   scoring  new  data  with  a   producer.  
  19. 19. More  Modeling  with  KijiMR   KeyValueStores   •  Allows  access  to  external  data  in  Producers  and   Gatherers.   •  Supports  various  file  formats  as  well  as  tables.   •  Makes  joining  dataset  together  very  easy.   •  The  mechanism  for  accessing  fi@ed  model   parameters  when  freshening.  
  20. 20. •  A real-time product recommendation system •  Content-based model using product descriptions and TF-IDF KijiShopping   UsersKijiShopping Web Application KijiSchema Avro, HBase KijiMR MapReduce KijiScoring
  21. 21. KijiShopping  Data  Collec<on   •  User Logins •  Product Information o  Names, descriptions, SKU information •  User Ratings o  Explicit ratings from users How do we go from data to recommendations?
  22. 22. Finding  Useful  Features   •  TF-IDF
  23. 23. TF-­‐IDF   •  Term Frequency o  How often does this term appear in this document? •  Document Frequency o  How many documents does this term appear in? •  TF-IDF o  How important is this term to this document? •  In KijiShopping, each is a separate job
  24. 24. •  Written as a Producer o  Executed on the Product table as a Map-only job o  WordCount on a per-record basis Compu<ng  Term  Frequency   HBase Read Product Description Count Words in Product Description Write Word Counts Back
  25. 25. •  Written as a Gatherer o  Executed on the Product table as a MapReduce job o  Groups by words Compu<ng  Document  Frequency   HBase Read Term Frequencies Map Emit (Word, 1) Write Document Frequencies HDFS Reduce Group By Word
  26. 26. •  Written as a Producer o  Executed on the Product table as a Map-only job o  Pulls in Document Frequencies as a KVStore Compu<ng  TF-­‐IDF   HBase Read Term Frequencies Divide TF by DF Write TF-IDFs Back HDFS Read Document Frequencies via KVStore
  27. 27. •  Batch training process •  Associations stored in a model table Associa<ng  Words  with  Products   gourmet knife "gourmet" Products "knife" Products tfidfgourmet tfidfknife
  28. 28. Determine  a  User's  Preferred  Words   •  Stored in a user table Natty gourmet knife wgourmet wknife
  29. 29. •  Producers incorporate models using KeyValueStores Combining  User  Ra<ngs  and  Models   Natty gourmet knife "gourmet" Products "knife" Products wgourmet wknife tfidfgourmet tfidfknife
  30. 30. Genera<ng  a  Recommenda<on   •  Pick the best products for your user
  31. 31. KijiShopping   The  model  was  built  with  KijiMR-­‐  an   extension  of  Hadoop  MapReduce.      
  32. 32. KijiShopping   The  model  was  built  with  KijiMR-­‐  an   extension  of  Hadoop  MapReduce.      
  33. 33. KijiExpress  Modeling  Lifecycle  
  34. 34. Want  to  know  more?   •  The Kiji Project o  kiji.org o  github.com/kijiproject •  KijiShopping o  github.com/wibidata/kiji-shopping Questions about this presentation? o  juliet@wibidata.com o  natty@wibidata.com
  35. 35. Want  to  know  more?   •  Come see us at the WibiData booth •  Join us at KijiCon tomorrow
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×