HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

Cloudera, Inc.
Cloudera, Inc.Cloudera, Inc.
Real-­‐Time	
  Model	
  Scoring	
  in	
  
Recommender	
  Systems	
  
(c)	
  2013	
  WibiData,	
  Inc.	
  
	
  	
  Juliet	
  Hougland	
  and	
  Jonathan	
  Natkins	
  
Why	
  Real-­‐Time?	
  
Who	
  Are	
  We?	
  
•  Jon	
  "Na@y"	
  Natkins	
  (@na@yice)	
  
•  Field	
  Engineer	
  at	
  WibiData	
  
•  Before	
  that,	
  Cloudera	
  SoJware	
  Engineer	
  
•  Before	
  that,	
  VerMca	
  SoJware/Field	
  Engineer	
  
•  Juliet	
  Hougland	
  (@JulietHougland)	
  
•  PlaPorm	
  Engineer	
  at	
  WibiData	
  
•  MS	
  in	
  Applied	
  Math	
  and	
  BA	
  in	
  Math-­‐Physics	
  
What	
  is	
  Kiji?	
  
The	
  Kiji	
  Project	
  is	
  a	
  
modular,	
  open-­‐source	
  
framework	
  that	
  enables	
  
developers	
  and	
  analysts	
  
to	
  collect,	
  analyze	
  and	
  
use	
  data	
  in	
  real-­‐Mme	
  
applicaMons.	
  
•  kiji.org	
  
•  github.com/kijiproject	
  
Genera<ng	
  Recommenda<ons	
  
Genera<ng	
  Recommenda<ons	
  
Modeling	
  with	
  KijiMR	
  
Producers	
  
•  Operates	
  on	
  a	
  single	
  row	
  in	
  a	
  table.	
  
•  Generate	
  derived	
  data:	
  
o  Apply	
  a	
  classifier	
  
o  Assign	
  a	
  user	
  to	
  a	
  cluster	
  or	
  segment	
  
o  Recommend	
  new	
  items	
  
Gatherers	
  
•  Mapper	
  with	
  KijiTable	
  input.	
  
•  Used	
  when	
  training	
  models.	
  
Genera<ng	
  Recommenda<ons	
  
Genera<ng	
  Recommenda<ons	
  
Genera<ng	
  Recommenda<ons	
  
Batch	
  Isn't	
  Good	
  For	
  Everything	
  
Batch	
  Isn't	
  Good	
  For	
  Everything	
  
Batch	
  Isn't	
  Good	
  For	
  Everything	
  
Fresheners	
  Compute	
  Lazily	
  
Freshness	
  
Policy	
  
Read	
  a	
  column	
  
Get	
  from	
  HBase	
  
Fresh?	
  Yes,	
  return	
  to	
  client	
  
KijiScoring	
  API	
   HBase	
  
Fresheners	
  Compute	
  Lazily	
  
Freshness	
  
Policy	
  
Read	
  a	
  column	
  
Get	
  from	
  HBase	
  
Fresh?	
  
Yes,	
  return	
  to	
  client	
  
KijiScoring	
  API	
   HBase	
  
Producer	
  
Freshen	
  
Cache	
  for	
  next	
  Mme	
  
How	
  can	
  we	
  make	
  "freshenable"	
  
models?	
  
Population interests
change slowly
Individual interests
change quickly
How	
  can	
  we	
  make	
  "freshenable"	
  
models?	
  
Population interests
change slowly
Individual interests
change quickly
Models	
  don't	
  need	
  to	
  
retrained	
  frequently	
  
ApplicaMon	
  of	
  a	
  model	
  
should	
  be	
  fast	
  
How	
  can	
  we	
  make	
  "freshenable"	
  
models?	
  
Individual interests
change quickly
ApplicaMon	
  of	
  a	
  model	
  
should	
  be	
  fast	
  •  Train	
  a	
  model	
  over	
  your	
  
enMre	
  data	
  set	
  
•  Save	
  fi@ed	
  model	
  
parameters	
  to	
  a	
  file,	
  or	
  
another	
  table	
  
•  Access	
  the	
  model	
  
parameters	
  through	
  a	
  
KeyValueStore	
  when	
  
scoring	
  new	
  data	
  with	
  a	
  
producer.	
  
More	
  Modeling	
  with	
  KijiMR	
  
KeyValueStores	
  
•  Allows	
  access	
  to	
  external	
  data	
  in	
  Producers	
  and	
  
Gatherers.	
  
•  Supports	
  various	
  file	
  formats	
  as	
  well	
  as	
  tables.	
  
•  Makes	
  joining	
  dataset	
  together	
  very	
  easy.	
  
•  The	
  mechanism	
  for	
  accessing	
  fi@ed	
  model	
  
parameters	
  when	
  freshening.	
  
•  A real-time product recommendation system
•  Content-based model using product
descriptions and TF-IDF
KijiShopping	
  
UsersKijiShopping
Web Application
KijiSchema
Avro, HBase
KijiMR
MapReduce
KijiScoring
KijiShopping	
  Data	
  Collec<on	
  
•  User Logins
•  Product Information
o  Names, descriptions, SKU information
•  User Ratings
o  Explicit ratings from users
How do we go from data to recommendations?
Finding	
  Useful	
  Features	
  
•  TF-IDF
TF-­‐IDF	
  
•  Term Frequency
o  How often does this term appear in this document?
•  Document Frequency
o  How many documents does this term appear in?
•  TF-IDF
o  How important is this term to this document?
•  In KijiShopping, each is a separate job
•  Written as a Producer
o  Executed on the Product table as a Map-only job
o  WordCount on a per-record basis
Compu<ng	
  Term	
  Frequency	
  
HBase
Read Product
Description
Count Words
in Product
Description
Write Word Counts
Back
•  Written as a Gatherer
o  Executed on the Product table as a MapReduce job
o  Groups by words
Compu<ng	
  Document	
  Frequency	
  
HBase
Read Term
Frequencies Map
Emit
(Word, 1)
Write Document
Frequencies
HDFS
Reduce
Group By
Word
•  Written as a Producer
o  Executed on the Product table as a Map-only job
o  Pulls in Document Frequencies as a KVStore
Compu<ng	
  TF-­‐IDF	
  
HBase
Read Term
Frequencies
Divide
TF by DF
Write TF-IDFs
Back
HDFS
Read
Document
Frequencies
via KVStore
•  Batch training process
•  Associations stored in a model table
Associa<ng	
  Words	
  with	
  Products	
  
gourmet
knife
"gourmet"
Products
"knife"
Products
tfidfgourmet
tfidfknife
Determine	
  a	
  User's	
  Preferred	
  Words	
  
•  Stored in a user table
Natty
gourmet
knife
wgourmet
wknife
•  Producers incorporate models using
KeyValueStores
Combining	
  User	
  Ra<ngs	
  and	
  Models	
  
Natty
gourmet
knife
"gourmet"
Products
"knife"
Products
wgourmet
wknife
tfidfgourmet
tfidfknife
Genera<ng	
  a	
  Recommenda<on	
  
•  Pick the best products for your user
KijiShopping	
  
The	
  model	
  was	
  built	
  with	
  KijiMR-­‐	
  an	
  
extension	
  of	
  Hadoop	
  MapReduce.	
  
	
  
	
  
KijiShopping	
  
The	
  model	
  was	
  built	
  with	
  KijiMR-­‐	
  an	
  
extension	
  of	
  Hadoop	
  MapReduce.	
  
	
  
	
  
KijiExpress	
  Modeling	
  Lifecycle	
  
Want	
  to	
  know	
  more?	
  
•  The Kiji Project
o  kiji.org
o  github.com/kijiproject
•  KijiShopping
o  github.com/wibidata/kiji-shopping
Questions about this presentation?
o  juliet@wibidata.com
o  natty@wibidata.com
Want	
  to	
  know	
  more?	
  
•  Come see us at
the WibiData
booth
•  Join us at KijiCon
tomorrow
1 of 35

Recommended

HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures by
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index StructuresHBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index Structures
HBaseCon 2013: HBase SEP - Reliable Maintenance of Auxiliary Index StructuresCloudera, Inc.
4.1K views16 slides
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ... by
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...Cloudera, Inc.
3.6K views13 slides
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro by
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroCloudera, Inc.
5.5K views38 slides
Hadoop and HBase @eBay by
Hadoop and HBase @eBayHadoop and HBase @eBay
Hadoop and HBase @eBayDataWorks Summit
7.7K views19 slides
Content Identification using HBase by
Content Identification using HBaseContent Identification using HBase
Content Identification using HBaseHBaseCon
3.8K views16 slides
Hadoop @ eBay: Past, Present, and Future by
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureRyan Hennig
5.2K views39 slides

More Related Content

What's hot

HBaseCon 2013: Deal Personalization Engine with HBase @ Groupon by
HBaseCon 2013: Deal Personalization Engine with HBase @ GrouponHBaseCon 2013: Deal Personalization Engine with HBase @ Groupon
HBaseCon 2013: Deal Personalization Engine with HBase @ GrouponCloudera, Inc.
6K views17 slides
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDB by
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDBHBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDBHBaseCon
5.6K views23 slides
In Search of Database Nirvana: Challenges of Delivering HTAP by
In Search of Database Nirvana: Challenges of Delivering HTAPIn Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAPHBaseCon
1.6K views19 slides
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks by
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
2.5K views20 slides
HBaseCon 2013: Rebuilding for Scale on Apache HBase by
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBaseCloudera, Inc.
3.9K views17 slides
What database by
What databaseWhat database
What databaseRegunath B
3.2K views21 slides

What's hot(20)

HBaseCon 2013: Deal Personalization Engine with HBase @ Groupon by Cloudera, Inc.
HBaseCon 2013: Deal Personalization Engine with HBase @ GrouponHBaseCon 2013: Deal Personalization Engine with HBase @ Groupon
HBaseCon 2013: Deal Personalization Engine with HBase @ Groupon
Cloudera, Inc.6K views
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDB by HBaseCon
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDBHBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
HBaseCon5.6K views
In Search of Database Nirvana: Challenges of Delivering HTAP by HBaseCon
In Search of Database Nirvana: Challenges of Delivering HTAPIn Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAP
HBaseCon1.6K views
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks by Data Con LA
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
Data Con LA2.5K views
HBaseCon 2013: Rebuilding for Scale on Apache HBase by Cloudera, Inc.
HBaseCon 2013: Rebuilding for Scale on Apache HBaseHBaseCon 2013: Rebuilding for Scale on Apache HBase
HBaseCon 2013: Rebuilding for Scale on Apache HBase
Cloudera, Inc.3.9K views
What database by Regunath B
What databaseWhat database
What database
Regunath B3.2K views
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad... by DataWorks Summit
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
DataWorks Summit369 views
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase by HBaseCon
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBaseHBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon 2015: Trafodion - Integrating Operational SQL into HBase
HBaseCon3.3K views
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen... by Cloudera, Inc.
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
Cloudera, Inc.3.3K views
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ... by Data Con LA
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Data Con LA500 views
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre... by Data Con LA
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Data Con LA593 views
Case studies session 2 by HBaseCon
Case studies   session 2Case studies   session 2
Case studies session 2
HBaseCon3.7K views
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th... by Cloudera, Inc.
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in th...
Cloudera, Inc.3.4K views
How companies use NoSQL and Couchbase - NoSQL Now 2013 by Dipti Borkar
How companies use NoSQL and Couchbase - NoSQL Now 2013How companies use NoSQL and Couchbase - NoSQL Now 2013
How companies use NoSQL and Couchbase - NoSQL Now 2013
Dipti Borkar3K views
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon by Data Con LA
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonBig Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Data Con LA881 views
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre... by Data Con LA
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Data Con LA362 views

Viewers also liked

HBaseCon 2013: Full-Text Indexing for Apache HBase by
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBaseCloudera, Inc.
7.3K views13 slides
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural... by
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...Cloudera, Inc.
8.8K views29 slides
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase by
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase Cloudera, Inc.
4.6K views23 slides
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget by
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetCloudera, Inc.
3.1K views26 slides
HBaseCon 2012 | Real-time Analytics with HBase - Sematext by
HBaseCon 2012 | Real-time Analytics with HBase - SematextHBaseCon 2012 | Real-time Analytics with HBase - Sematext
HBaseCon 2012 | Real-time Analytics with HBase - SematextCloudera, Inc.
8K views40 slides
HBaseCon 2013: Scalable Network Designs for Apache HBase by
HBaseCon 2013: Scalable Network Designs for Apache HBaseHBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseCloudera, Inc.
5.7K views47 slides

Viewers also liked(20)

HBaseCon 2013: Full-Text Indexing for Apache HBase by Cloudera, Inc.
HBaseCon 2013: Full-Text Indexing for Apache HBaseHBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2013: Full-Text Indexing for Apache HBase
Cloudera, Inc.7.3K views
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural... by Cloudera, Inc.
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
Cloudera, Inc.8.8K views
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase by Cloudera, Inc.
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
Cloudera, Inc.4.6K views
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget by Cloudera, Inc.
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
Cloudera, Inc.3.1K views
HBaseCon 2012 | Real-time Analytics with HBase - Sematext by Cloudera, Inc.
HBaseCon 2012 | Real-time Analytics with HBase - SematextHBaseCon 2012 | Real-time Analytics with HBase - Sematext
HBaseCon 2012 | Real-time Analytics with HBase - Sematext
Cloudera, Inc.8K views
HBaseCon 2013: Scalable Network Designs for Apache HBase by Cloudera, Inc.
HBaseCon 2013: Scalable Network Designs for Apache HBaseHBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBase
Cloudera, Inc.5.7K views
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W... by Cloudera, Inc.
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
Cloudera, Inc.5.8K views
HBaseCon 2015: HBase Operations in a Flurry by HBaseCon
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon4.1K views
Real-time HBase: Lessons from the Cloud by HBaseCon
Real-time HBase: Lessons from the CloudReal-time HBase: Lessons from the Cloud
Real-time HBase: Lessons from the Cloud
HBaseCon4.5K views
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B... by HBaseCon
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
HBaseCon4.1K views
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving by HBaseCon
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon2.6K views
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase by HBaseCon
HBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBaseHBaseCon 2015: Blackbird Collections - In-situ  Stream Processing in HBase
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon3.2K views
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S... by Cloudera, Inc.
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
Cloudera, Inc.4.7K views
HBaseCon 2012 | HBase, the Use Case in eBay Cassini by Cloudera, Inc.
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
Cloudera, Inc.6.1K views
Rolling Out Apache HBase for Mobile Offerings at Visa by HBaseCon
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa
HBaseCon2.6K views
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace by HBaseCon
HBaseCon 2015: Solving HBase Performance Problems with Apache HTraceHBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon4.5K views
Update on OpenTSDB and AsyncHBase by HBaseCon
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
HBaseCon2.6K views
HBase Data Modeling and Access Patterns with Kite SDK by HBaseCon
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
HBaseCon4.7K views
Digital Library Collection Management using HBase by HBaseCon
Digital Library Collection Management using HBaseDigital Library Collection Management using HBase
Digital Library Collection Management using HBase
HBaseCon3.1K views
HBase at Bloomberg: High Availability Needs for the Financial Industry by HBaseCon
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBaseCon6.7K views

Similar to HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

DataEngConf 2017 - Machine Learning Models in Production by
DataEngConf 2017 - Machine Learning Models in ProductionDataEngConf 2017 - Machine Learning Models in Production
DataEngConf 2017 - Machine Learning Models in ProductionSharath Rao
6.7K views44 slides
Bdf16 big-data-warehouse-case-study-data kitchen by
Bdf16 big-data-warehouse-case-study-data kitchenBdf16 big-data-warehouse-case-study-data kitchen
Bdf16 big-data-warehouse-case-study-data kitchenChristopher Bergh
320 views52 slides
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa... by
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...Remy Rosenbaum
451 views17 slides
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017 by
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017AWS Chicago
756 views22 slides
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ... by
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Sonya Liberman
113 views57 slides
Analyzing Large-Scale User Data with Hadoop and HBase by
Analyzing Large-Scale User Data with Hadoop and HBaseAnalyzing Large-Scale User Data with Hadoop and HBase
Analyzing Large-Scale User Data with Hadoop and HBaseWibiData
683 views29 slides

Similar to HBaseCon 2013: Real-Time Model Scoring in Recommender Systems (20)

DataEngConf 2017 - Machine Learning Models in Production by Sharath Rao
DataEngConf 2017 - Machine Learning Models in ProductionDataEngConf 2017 - Machine Learning Models in Production
DataEngConf 2017 - Machine Learning Models in Production
Sharath Rao6.7K views
Bdf16 big-data-warehouse-case-study-data kitchen by Christopher Bergh
Bdf16 big-data-warehouse-case-study-data kitchenBdf16 big-data-warehouse-case-study-data kitchen
Bdf16 big-data-warehouse-case-study-data kitchen
Christopher Bergh320 views
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa... by Remy Rosenbaum
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fa...
Remy Rosenbaum451 views
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017 by AWS Chicago
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
AWS Chicago756 views
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ... by Sonya Liberman
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Sonya Liberman113 views
Analyzing Large-Scale User Data with Hadoop and HBase by WibiData
Analyzing Large-Scale User Data with Hadoop and HBaseAnalyzing Large-Scale User Data with Hadoop and HBase
Analyzing Large-Scale User Data with Hadoop and HBase
WibiData 683 views
Fifth elephant-grill by amarsri
Fifth elephant-grillFifth elephant-grill
Fifth elephant-grill
amarsri1K views
Agile Secure Cloud Application Development Management by Adam Getchell
Agile Secure Cloud Application Development ManagementAgile Secure Cloud Application Development Management
Agile Secure Cloud Application Development Management
Adam Getchell986 views
Hadoop online training in india by Madhu Trainer
Hadoop online training  in indiaHadoop online training  in india
Hadoop online training in india
Madhu Trainer207 views
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ... by MongoDB
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB651 views
Using graphs for recommendations by Rik Van Bruggen
Using graphs for recommendationsUsing graphs for recommendations
Using graphs for recommendations
Rik Van Bruggen5.5K views
NLM Update by Dianne Babski, 18th June 2019 by EAHILPHIG
NLM Update by Dianne Babski, 18th June 2019NLM Update by Dianne Babski, 18th June 2019
NLM Update by Dianne Babski, 18th June 2019
EAHILPHIG537 views
Filipe paternot - Case Study: Zabbix Deployment at Globo.com by Zabbix
Filipe paternot - Case Study: Zabbix Deployment at Globo.comFilipe paternot - Case Study: Zabbix Deployment at Globo.com
Filipe paternot - Case Study: Zabbix Deployment at Globo.com
Zabbix2.2K views
What is the Siemens Open Library, and How it Decreased Development Time for E... by DMC, Inc.
What is the Siemens Open Library, and How it Decreased Development Time for E...What is the Siemens Open Library, and How it Decreased Development Time for E...
What is the Siemens Open Library, and How it Decreased Development Time for E...
DMC, Inc.2.6K views
Utilizing Marginal Net Utility for Recommendation in E-commerce by Liangjie Hong
Utilizing Marginal Net Utility for Recommendation in E-commerceUtilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerce
Liangjie Hong1.2K views
MongoDB for Spatio-Behavioral Data Analysis and Visualization by MongoDB
MongoDB for Spatio-Behavioral Data Analysis and VisualizationMongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB for Spatio-Behavioral Data Analysis and Visualization
MongoDB2.1K views
Machine Learning Vs. Deep Learning – An Example Implementation by Synerzip
Machine Learning Vs. Deep Learning – An Example ImplementationMachine Learning Vs. Deep Learning – An Example Implementation
Machine Learning Vs. Deep Learning – An Example Implementation
Synerzip77 views
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab by Sri Ambati
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Sri Ambati1.2K views
SharePoint Connections Conference Amsterdam - Pitfalls and success factors of... by Wilco Turnhout
SharePoint Connections Conference Amsterdam - Pitfalls and success factors of...SharePoint Connections Conference Amsterdam - Pitfalls and success factors of...
SharePoint Connections Conference Amsterdam - Pitfalls and success factors of...
Wilco Turnhout785 views
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink by Flink Forward
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache FlinkSuneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Suneel Marthi – BigPetStore Flink: A Comprehensive Blueprint for Apache Flink
Flink Forward7.7K views

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx by
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
109 views55 slides
Cloudera Data Impact Awards 2021 - Finalists by
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
6.5K views34 slides
2020 Cloudera Data Impact Awards Finalists by
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
6.3K views43 slides
Edc event vienna presentation 1 oct 2019 by
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
4.5K views67 slides
Machine Learning with Limited Labeled Data 4/3/19 by
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
3.6K views36 slides
Data Driven With the Cloudera Modern Data Warehouse 3.19.19 by
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
2.5K views21 slides

More from Cloudera, Inc.(20)

Partner Briefing_January 25 (FINAL).pptx by Cloudera, Inc.
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.109 views
Cloudera Data Impact Awards 2021 - Finalists by Cloudera, Inc.
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.6.5K views
2020 Cloudera Data Impact Awards Finalists by Cloudera, Inc.
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.6.3K views
Edc event vienna presentation 1 oct 2019 by Cloudera, Inc.
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.4.5K views
Machine Learning with Limited Labeled Data 4/3/19 by Cloudera, Inc.
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.3.6K views
Data Driven With the Cloudera Modern Data Warehouse 3.19.19 by Cloudera, Inc.
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.2.5K views
Introducing Cloudera DataFlow (CDF) 2.13.19 by Cloudera, Inc.
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.4.9K views
Introducing Cloudera Data Science Workbench for HDP 2.12.19 by Cloudera, Inc.
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.2.7K views
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19 by Cloudera, Inc.
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.1.6K views
Leveraging the cloud for analytics and machine learning 1.29.19 by Cloudera, Inc.
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.1.6K views
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19 by Cloudera, Inc.
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.2.5K views
Leveraging the Cloud for Big Data Analytics 12.11.18 by Cloudera, Inc.
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.1.7K views
Modern Data Warehouse Fundamentals Part 3 by Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.1.3K views
Modern Data Warehouse Fundamentals Part 2 by Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.2.3K views
Modern Data Warehouse Fundamentals Part 1 by Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.1.5K views
Extending Cloudera SDX beyond the Platform by Cloudera, Inc.
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.966 views
Federated Learning: ML with Privacy on the Edge 11.15.18 by Cloudera, Inc.
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.2.2K views
Analyst Webinar: Doing a 180 on Customer 360 by Cloudera, Inc.
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.1.4K views
Build a modern platform for anti-money laundering 9.19.18 by Cloudera, Inc.
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.1K views
Introducing the data science sandbox as a service 8.30.18 by Cloudera, Inc.
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.1.2K views

Recently uploaded

ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... by
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...Jasper Oosterveld
35 views49 slides
The Power of Heat Decarbonisation Plans in the Built Environment by
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built EnvironmentIES VE
79 views20 slides
Cencora Executive Symposium by
Cencora Executive SymposiumCencora Executive Symposium
Cencora Executive Symposiummarketingcommunicati21
159 views14 slides
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue by
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueVNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueShapeBlue
203 views54 slides
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue by
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueShapeBlue
135 views13 slides
Qualifying SaaS, IaaS.pptx by
Qualifying SaaS, IaaS.pptxQualifying SaaS, IaaS.pptx
Qualifying SaaS, IaaS.pptxSachin Bhandari
1K views8 slides

Recently uploaded(20)

ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... by Jasper Oosterveld
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE79 views
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue by ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueVNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
ShapeBlue203 views
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue by ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
ShapeBlue135 views
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue by ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
ShapeBlue263 views
Initiating and Advancing Your Strategic GIS Governance Strategy by Safe Software
Initiating and Advancing Your Strategic GIS Governance StrategyInitiating and Advancing Your Strategic GIS Governance Strategy
Initiating and Advancing Your Strategic GIS Governance Strategy
Safe Software176 views
Transcript: Redefining the book supply chain: A glimpse into the future - Tec... by BookNet Canada
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
BookNet Canada41 views
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue198 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc170 views
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue119 views
Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty64 views
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue by ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
ShapeBlue138 views
"Package management in monorepos", Zoltan Kochan by Fwdays
"Package management in monorepos", Zoltan Kochan"Package management in monorepos", Zoltan Kochan
"Package management in monorepos", Zoltan Kochan
Fwdays33 views
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays56 views
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De... by Moses Kemibaro
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Moses Kemibaro34 views

HBaseCon 2013: Real-Time Model Scoring in Recommender Systems

  • 1. Real-­‐Time  Model  Scoring  in   Recommender  Systems   (c)  2013  WibiData,  Inc.      Juliet  Hougland  and  Jonathan  Natkins  
  • 3. Who  Are  We?   •  Jon  "Na@y"  Natkins  (@na@yice)   •  Field  Engineer  at  WibiData   •  Before  that,  Cloudera  SoJware  Engineer   •  Before  that,  VerMca  SoJware/Field  Engineer   •  Juliet  Hougland  (@JulietHougland)   •  PlaPorm  Engineer  at  WibiData   •  MS  in  Applied  Math  and  BA  in  Math-­‐Physics  
  • 4. What  is  Kiji?   The  Kiji  Project  is  a   modular,  open-­‐source   framework  that  enables   developers  and  analysts   to  collect,  analyze  and   use  data  in  real-­‐Mme   applicaMons.   •  kiji.org   •  github.com/kijiproject  
  • 7. Modeling  with  KijiMR   Producers   •  Operates  on  a  single  row  in  a  table.   •  Generate  derived  data:   o  Apply  a  classifier   o  Assign  a  user  to  a  cluster  or  segment   o  Recommend  new  items   Gatherers   •  Mapper  with  KijiTable  input.   •  Used  when  training  models.  
  • 11. Batch  Isn't  Good  For  Everything  
  • 12. Batch  Isn't  Good  For  Everything  
  • 13. Batch  Isn't  Good  For  Everything  
  • 14. Fresheners  Compute  Lazily   Freshness   Policy   Read  a  column   Get  from  HBase   Fresh?  Yes,  return  to  client   KijiScoring  API   HBase  
  • 15. Fresheners  Compute  Lazily   Freshness   Policy   Read  a  column   Get  from  HBase   Fresh?   Yes,  return  to  client   KijiScoring  API   HBase   Producer   Freshen   Cache  for  next  Mme  
  • 16. How  can  we  make  "freshenable"   models?   Population interests change slowly Individual interests change quickly
  • 17. How  can  we  make  "freshenable"   models?   Population interests change slowly Individual interests change quickly Models  don't  need  to   retrained  frequently   ApplicaMon  of  a  model   should  be  fast  
  • 18. How  can  we  make  "freshenable"   models?   Individual interests change quickly ApplicaMon  of  a  model   should  be  fast  •  Train  a  model  over  your   enMre  data  set   •  Save  fi@ed  model   parameters  to  a  file,  or   another  table   •  Access  the  model   parameters  through  a   KeyValueStore  when   scoring  new  data  with  a   producer.  
  • 19. More  Modeling  with  KijiMR   KeyValueStores   •  Allows  access  to  external  data  in  Producers  and   Gatherers.   •  Supports  various  file  formats  as  well  as  tables.   •  Makes  joining  dataset  together  very  easy.   •  The  mechanism  for  accessing  fi@ed  model   parameters  when  freshening.  
  • 20. •  A real-time product recommendation system •  Content-based model using product descriptions and TF-IDF KijiShopping   UsersKijiShopping Web Application KijiSchema Avro, HBase KijiMR MapReduce KijiScoring
  • 21. KijiShopping  Data  Collec<on   •  User Logins •  Product Information o  Names, descriptions, SKU information •  User Ratings o  Explicit ratings from users How do we go from data to recommendations?
  • 22. Finding  Useful  Features   •  TF-IDF
  • 23. TF-­‐IDF   •  Term Frequency o  How often does this term appear in this document? •  Document Frequency o  How many documents does this term appear in? •  TF-IDF o  How important is this term to this document? •  In KijiShopping, each is a separate job
  • 24. •  Written as a Producer o  Executed on the Product table as a Map-only job o  WordCount on a per-record basis Compu<ng  Term  Frequency   HBase Read Product Description Count Words in Product Description Write Word Counts Back
  • 25. •  Written as a Gatherer o  Executed on the Product table as a MapReduce job o  Groups by words Compu<ng  Document  Frequency   HBase Read Term Frequencies Map Emit (Word, 1) Write Document Frequencies HDFS Reduce Group By Word
  • 26. •  Written as a Producer o  Executed on the Product table as a Map-only job o  Pulls in Document Frequencies as a KVStore Compu<ng  TF-­‐IDF   HBase Read Term Frequencies Divide TF by DF Write TF-IDFs Back HDFS Read Document Frequencies via KVStore
  • 27. •  Batch training process •  Associations stored in a model table Associa<ng  Words  with  Products   gourmet knife "gourmet" Products "knife" Products tfidfgourmet tfidfknife
  • 28. Determine  a  User's  Preferred  Words   •  Stored in a user table Natty gourmet knife wgourmet wknife
  • 29. •  Producers incorporate models using KeyValueStores Combining  User  Ra<ngs  and  Models   Natty gourmet knife "gourmet" Products "knife" Products wgourmet wknife tfidfgourmet tfidfknife
  • 30. Genera<ng  a  Recommenda<on   •  Pick the best products for your user
  • 31. KijiShopping   The  model  was  built  with  KijiMR-­‐  an   extension  of  Hadoop  MapReduce.      
  • 32. KijiShopping   The  model  was  built  with  KijiMR-­‐  an   extension  of  Hadoop  MapReduce.      
  • 34. Want  to  know  more?   •  The Kiji Project o  kiji.org o  github.com/kijiproject •  KijiShopping o  github.com/wibidata/kiji-shopping Questions about this presentation? o  juliet@wibidata.com o  natty@wibidata.com
  • 35. Want  to  know  more?   •  Come see us at the WibiData booth •  Join us at KijiCon tomorrow