Introducing recommendation engines from Sixth Moment Computing, an affordable solution for small and medium size businesses in online retail, digital media, social networking and mobile apps. Leveraging the power of the Cloud and GPU accelerators to deliver faster and cheaper solutions.
6. accelerating big data
Shazam
eBay / Cortexica
Twitter / Salesforce.com
10M querier per day against
27M content library
500+ keypoint fngerprint
search of like things
500M tweets against
1M expressions daily
Jen-Hsun Huang
Nvidia CEO & co-founder
Annual Investor Day 2013
7. big data platforms
external
disk
internal
memory
data placement
MapReduce
e.g. Hadoop
traditional
database
cluster
e.g. MPI
multicore
+ accelerators
ease of programming simple analytics
ease of programming complex analytics
performance
energy
concept
borrowed
from slides of
David A. Bader
8. robustness
accuracy
adaptability
integrity
reliability
efficiency
usability
How focusing
on the factor
below afects
the factor to
the right
correctness
efficiency vs other quality metrics
correctness
usability
efficiency
reliability
Efficiency is the hard part.
Improving efficiency hurts
all the other quality metrics.
integrity
adaptability
accuracy
helps it
robustness
hurts it
Steve McConnell
Code Complete
idea
borrowed
from slides of
Michael A. Heroux
9. case studies
hardware specs
2x Intel Sandy Bridge
8 cores (16 threads) / CPU
2.6 GHz
2x Nvidia Tesla K20
2688 CUDA cores / GPU
705 MHz
10. case study
small
1 million
users
20 thousand items
50 million
records
(50 items per user on average)
2 minutes
(120 seconds)
nearest neighbor algorithm (item-based)
Tanimoto (a.k.a. Jaccard) similarity
20 nearest neighbors per item
20 most similar items
for each item
20 recommendations
for each user
(400 thousand total similarities)
(20 million total recommendations)
11. case study
small
1 million
users
20 thousand items
50 million
records
(50 items per user on average)
2 minutes
20 seconds
(140 seconds)
latent variable model
100 features per user / item
alternating least squares algorithm (10 iterations)
20 most similar items
for each item
20 recommendations
for each user
(400 thousand total similarities)
(20 million total recommendations)
12. case study
medium
10 million
users
20 thousand items
500 million
records
(50 items per user on average)
28 minutes
nearest neighbor algorithm (item-based)
Tanimoto (a.k.a. Jaccard) similarity
20 nearest neighbors per item
20 most similar items
for each item
20 recommendations
for each user
(400 thousand total similarities)
(200 million total recommendations)
13. case study
medium
10 million
users
20 thousand items
500 million
records
(50 items per user on average)
32 minutes
latent variable model
100 features per user / item
alternating least squares algorithm (10 iterations)
20 most similar items
for each item
20 recommendations
for each user
(400 thousand total similarities)
(200 million total recommendations)