Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Sistemas de Recomendação: Como funciona e Onde Se aplica?
1. Sistemas de
Recomendação
Marcel Pinheiro Caraciolo
marcel@orygens.com
@marcelcaraciolo
http://www.orygens.com
Thursday, January 26, 2012
2. Quem é Marcel ?
Marcel Pinheiro Caraciolo - @marcelcaraciolo
Sergipano, porém Recifense.
Mestre em Ciência da Computação no CIN/UFPE na área de mineração de dados
Diretor de Pesquisa e Desenvolvimento na Orygens
Membro e Moderador da Celúla de Usuários Python de Pernambuco (PUG-PE)
Minhas áreas de interesse: Computação móvel e Computação inteligente
Meus blogs: http://www.mobideia.com (sobre Mobilidade desde 2006)
http://aimotion.blogspot.com (sobre I.A. desde 2009)
Jovem Aprendiz ainda nas artes pythonicas.... (desde 2007)
Thursday, January 26, 2012
5. 1.0 2.0
Fonte de Informação Fluxo Contínuo de Informação
VI Encontro do PUG-PE
VI Encontro do PUG-PE
Thursday, January 26, 2012
6. WEB SITES
WEB APPLICATIONS
WEB SERVICES
3.0 SEMANTIC WEB
USERS VI Encontro do PUG-PE
VI Encontro do PUG-PE
Thursday, January 26, 2012
7. Usar informação coletiva de
forma efetiva afim de
aprimorar uma aplicação
Thursday, January 26, 2012
8. Intelligence from
Mining Data
User
User
User User
User
Um usuário influencia outros
por resenhas, notas, recomendações e blogs
Um usuário é influenciado por outros
por resenhas, notas, recomendações e blogs
Thursday, January 26, 2012
9. aggregation information: lists
ratings
user-generated content
reviews
blogs recommendations
wikis Collective Intelligence voting
Your application bookmarking
Search
tag cloud tagging
saving
Natural Language Processing
Clustering and Harness external content
predictive models
Thursday, January 26, 2012
10. WEB SITES
WEB APPLICATIONS
WEB SERVICES
3.0 SEMANTIC WEB
USERS
antes...
VI Encontro do PUG-PE
VI Encontro do PUG-PE
Friday, October 1, 20102012
Thursday, January 26,
20. “A lot of times, people don’t know what
they want until you show it to them.”
Steve Jobs
“We are leaving the Information age, and
entering into the Recommendation age.”
Chris Anderson, from book Long Tail
Thursday, January 26, 2012
21. Recomendações Sociais
Família/Amigos
Amigos/ Família
O Que eu
deveria ler ?
Ref: Flickr-BlueAlgae
“Eu acho que
você deveria ler
Ref: Flickr photostream: jefield estes livros.
Thursday, January 26, 2012
22. Recomendações por Interação
Entrada: Avalie alguns livros
O Que eu
deveria ler ?
Saída:
“Livros que você
pode gostar
são …”
Thursday, January 26, 2012
25. Netflix
- 2/3 dos filmes alugados vêm de recomendação
Google News
- 38% das notícias mais clicadas vêm de recomendação
Amazon
- 38% das vendas vêm de recomendação
Fonte: Celma & Lamere, ISMIR 2007
Thursday, January 26, 2012
26. !"#$%"#&'"%(&$)")
Nós+,&-.$/).#&0#/"1.#$%234(".#
* estamos sobrecarregados de
informação
$/)#5(&6 7&.2.#"$4,#)$8
* 93((3&/.#&0#:&'3".;#5&&<.#
$/)#:-.34#2%$4<.#&/(3/"
Milhares de artigos e posts
* =/#>$/&3;#?#@A#+B#4,$//"(.;#
novos todos os dias
2,&-.$/).#&0#7%&6%$:.#
"$4,#)$8
* =/#C"1#D&%<;#."'"%$(#
Milhões de Músicas, Filmes e
2,&-.$/).#&0#$)#:"..$6".#
Livros
."/2#2&#-.#7"%#)$8
Milhares de Ofertas e
Promoções
Thursday, January 26, 2012
27. O que pode ser recomendado ?
Contatos em Redes Sociais Artigos
Produtos Messagens de Propaganda
Cursos e-learning Livros
Tags Músicas
Futuras namoradas
Roupas Filmes
Restaurantes
Programas de Tv
Vídeos Papers
Opções de Investimento Profissionais
Módulos de código
Thursday, January 26, 2012
29. O que os sistemas de recomendação
realmente fazem ?
1. Prediz o quanto você pode gostar de um certo
produto ou serviço
2. Sugere um lista de N items ordenada de acordo
com seu interese
3. Sugere uma lista de N usuários ordernada
para um produto/serviço
4. Explica a você o porque esses items foram
recomendados
5. Ajusta a predição e a recomendação baseado em
seu feedback e de outros.
Thursday, January 26, 2012
30. Filtragem baseada por Conteúdo
Similar
Duro de O Vento Toy
Armagedon Items
Matar Levou Store
recomenda
gosta
Marcel Usuários
Thursday, January 26, 2012
31. Problemas com filtragem por
conteúdo
1. Análise dos dados Restrita
- Items e usuários pouco detalhados. Pior em áudio ou imagens
2. Dados Especializados
- Uma pessoa que não tem experiência com Sushi não recebe o
melhor restaurante de Sushi da cidade
3. Efeito Portfólio
- Só porque eu vi 1 filme da Xuxa quando criança, tem que me
recomendar todos dela
Thursday, January 26, 2012
32. Filtragem Colaborativa
O Vento Toy
Thor Armagedon Items
Levou Store
gosta
recomenda
Marcel Rafael Amanda Usuários
Similar
Thursday, January 26, 2012
33. Problemas com filtragem colaborativa
1. Escabilidade
- Amazon com 5M usuários, 50K items, 1.4B avaliações
2. Dados esparsos
- Novos usuários e items que não tem histórico
3. Partida Fria
- Só avaliei apenas um único livro no Amazon!
4. Popularidade
- Todo mundo lê ‘Harry Potter’
5. Hacking
- A pessoa que lê ‘Harry Potter’ lê Kama Sutra
Thursday, January 26, 2012
34. Filtragem Híbrida
Combinação de múltiplos métodos
Duro de O Vento Toy
Armagedon Items
Matar Levou Store
Ontologias
Dados
Símbolicos
Marcel Rafael Luciana Usuários
Thursday, January 26, 2012
35. Como eles são
apresentados ?
Destaques Mais sobre este artista...
Alguem similar a você também gostou disso
O mais popular em seu grupo...
Já que você escutou esta, você pode querer esta...
Lançamentos Escute músicas de artistas similares.
Estes dois item vêm juntos..
Thursday, January 26, 2012
36. Como eles são avaliados ?
Como sabemos se a recomendação é boa ?
Geralmente se divide-se em treinamento/teste (80/20)
Críterios utilizados:
- Erro de Predição: RMSE
- Curva ROC*, rank-utility, F-Measure
*http://code.google.com/p/pyplotmining/
Thursday, January 26, 2012
38. Por que mobile ?
Mais de 1 bilhão de Aparelhos
Mais de 5 bilhões de apps baixadas
Destaque no segmento mobile
http://foursquare.com
http://vimeo.com/29323612
Thursday, January 26, 2012
39. Sistemas de Recomendação Móvel
Deve-se levar em conta informações temporais e espaciais
Como definir que contexto ele está inserido ?
E as avaliações como ser capturadas em uma tela limitada?
Thursday, January 26, 2012
40. a strong heterogeneity. At case study is carried out in Section 5. Finaly, the
ser's location is constantly conclusion of this paper and future work
ata-processing capability in overview are discussed in Section 6.
WSEAS TRANSACTIONS on COMPUTERS
services on the system
ht new challenges [4-6].
type of location-based
approach, users want to be
e real-time and targeted
2 System Workflow and Architecture
Arquitetura
Figure 1 gives the workflow of our system.
repackage the heterogeneous data and service,
and republic them as web service. The
service com
new code to
not just the indexed Users can send their inquiries demand by successful design of this module is the key After an
simply on a static operating in the mobile phone. And the client problem for realization of cross-platform new appl
mechanism
tly, the rise of a large
.0 applications (blog, Recomendações processadas via Mobile (Inviável Hoje)
will get the current location information and sent
it together with users’ inqueries demand to the
service and data sharing.
The functional layer has three components as
Multi-Mode Location Information Index,
service m
large-scal
Web Albums, Blog and server. Server-side application will analyze the Thus it ca
tes that users have the very relevant data and provide matched restaurant Context-based Collaborative Filtering changing
of direct, rapid, useful and recommendation and navigation. Algorithm, and Location-based Personalized So in th
tion recommendation and - Tudo é processado em Back-End (Servidor)
Application data information of our system e enviado ao celular via Web
Recommendation and Navigation. We will and Serv
]. can be divided into two parts: the location-based discuss every function component in details as Middlewa
n can be user-friendly data (such as traffic and road condition data, follows. Architectu
GPS map, and entity information, etc.) and the two techn
ient mobile terminals, It
Value-added Services integration
a very important research value-added data provided by users (such as combinati
in Web 2.0
very wide market prospect. Ratings, Comments, Blog and Tags, etc.). User Tagging !!Despite th
Value-added DB
signs and realizes a Comments Tags Information Publish platforms h
User
mobile restaurant Ratings …..…. Recommendation information
navigation system. In order Restaurant Query
……... Ping” webs
side response speed for facilities ra
propose a memory pool Location-based DB
website, wh
Client However, it
Accept command, no-data GPS-info E-Map
Entity-info ……... Mobile Information Pushing Platform static guidin
terrupt mechanism, which Prescribed Location-based Info. mobile loca
Context-based Location-based
ize the server-side control Users‘ Collaborative Filtering inconvenien
personalized
ient side, we combine the Matched Entity Collaborative recommendation and with the visi
lication data with the & Route Info. Recommendation &
Multi-Mode Location
Navigation In order
Entity Feature Info. scenario as
nd propose a collaborative Information Index
mmend mechanisms, which and propose
Server
h real-time location-based Let us
Personalized Location-based Data and Service Middleware example.
ecommend personalized Location-based Value-added DB location a
Restaurant Comments Tags
Recommendation &
from its c
ually provide personalized Ratings …..…. through th
Navigation Services
ndation to build their own Clien informatio
Location-based
h can help them to consider Services informatio
munity users!collaborative Fig.1. System Workflow Location-based GPS Navigation current lo
DB informatio
Location-based info
Traffic-info
Booking the targe
E-Map
Entity-query informatio
matching
810 Issue 5, Volume 6, May 2009
informatio
Fig 1. Architecture of the Mobile Information
Thursday, January 26, 2012 Accordi
47. Meu trabalho de Mestrado
Offering Products and Services Using Product
Reviews from Social Networks in Mobile Decision
Aid Systems
Marcel Caraciolo∗ and Germano Vasconcelos†
Informatics Center
Federal University Of Pernambuco
WebSite: http://www.cin.ufpe.br/
Email: ∗ mpc@cin.ufpe.br
† gcv@cin.ufpe.br
Abstract—Recommendation engines provide information fil- extremely used by users to give a more nuanced view about
tering functions and decision aids that have a great potential a product in order to make an informed decision [5].
application the mobile context. An aspect that hasn’t been Nonetheless, providing users with relevant recommenda-
extensively exploited yet in the current recommendations is
the improvement in the explanation of the recommendation. tion information it is a difficult task. Besides the technical
For instance, exploiting the service and product description components such as the user model representation and infor-
and the opinion of users about the recommended products, mation filtering techniques to generate the recommendations,
where associated would bring a better explanation for the user. the information must be user-friendly visualized. This is a
In this paper we will present the foundations for a mobile requirement specially to support the user in the purchase
product/service recommender system which incorporate both
Thursday, January 26, 2012 structured (supplier driven) product descriptions and subject decision process, and to convince him about the utility of the
48. source, the recommendation architecture that we propose will would rely more on collaborative-filtering techniques, that is,
aggregate the results of such filtering techniques. Bezerra and Carvalho proposed approaches where the results
the reviews from similar users.
We aim at integrating the previously mentioned hybrid prod- Figure 1 shows a overview of our meta recommender
achieved showed to be very promising [19].
approach. By combining the content-based filtering and the
uct recommendation approach in a mobile application so the
users could benefit from useful and logical recommendations. collaborative-based one into a hybrid recommender system, it A.
Moreover, we aim at providing a suited explanation for each would use the services/products III. S YSTEM catalogues
repositories which D ESIGN
How reviews from web services sources can be aggregated in the for
recommendation to the user, since the current approaches just
only deliver product recommendations with a overall score
the services to be recommended, and the review repository
Application data information our mobile recommender sys-
that contains the user opinions about those services. All this
datatembecan be from data source containers in the web product description
can extracted divided into two parts: the
rec
mobile recommendation process?
without pointing out the appropriateness of such recommen-
dation [13]. Besides the basic information provided by the such(such location-based social network Foursquare its attributes) and the user
as the as location, description and [17] as
mo
suppliers, the system will deliver the explanation, providing displayed at the Figure 2 and the location recommendation
relevant reviews of similar users, we believe that it will engine from Google: Google HotPot [18]. by user (such as rating, comments,
reviews or ratings provided wh
increase the confidence in the buying decision process and the tags, etc.). The Figure 3 gives the system’s architecture and po
product accepptance rate. In the mobile context this approach
could help the users in this process and showing the user
relative components. thi
opinions could contribute to achieve this task. rec
spe
!"#$"%&'$ 5&-$
!"#$%&'%($) !".,"/#) acc
!"*+#,$+'-) !"*+#,$+'-) +,-*.&$
!(#$()&'*&%$
/01&'234&$ !6#$6,00&41&7$
wh
res
!<#$<'&2&'&04&%A$B,431*,0A$&14C$
ves
0+44%6+'%$,.")1%#"2)
0+($"($)1%#"2)
3,4$"',(5)
ou
3,4$"',(5)
)))67,8,#%)+,4%$91$'%4)-1":))))
suc
!"#$%&"'()*+,#&-,.)
/$%,0"12()*3$4%)3""5.)
))))1,;&,<4)<1&%%,')=2)4&:&8$1))
)))))))))))%$4%,5)94,14>?) <',7)41$
pro
8&=,%*1,'>$
exp
8&4,99&0731*,0$:0;*0&$ !B#$B*%1$,2$D4,'&7$<',7)41%$
!(#$()&'*&%$
ma
8&?*&@$
we
Fig. 2. User Reviews from Foursquare Social Network 8&=,%*1,'>$
com
7"$%)
!"8+99"(2"'))
!8#$830E&7$<',7)41%$
The content-based filtering approach will be used to filter ext
the product/service repository, while the collaborative based
8&%).1%$ B.
approach will derive the product review recommendations. In
addition we will use text mining techniques to distinct the
!"8+99"(2%$,+(#) polarity of the user review between positive or negative one.
This information summarized would contribute in the product Architecture
Fig. 3. Mobile Recommender System rat
score recommendation computation. The final product recom-
Fig. 1. Meta Recommender Architecture
mendation score is computed by integrating the result of both
me
recommenders. By now, weproduct/service recommender, the user could
In our mobile are considering to use different and
Since one of the goals of this work is to incorporate options regarding this integration approach, one and get a list of recommen-
different data sources of user opinions and descriptions, we filter some products or services at special oth
is the symbolic data analysis approach (SDA) [19], which
have addopted an meta recommendation architecture. By using eachtations. The user user ratings/reviews arehis preferences or give his
product description and also can enter modeled ow
a meta recommender architecture, the system would provide
a personalized control over the generated recommendation list
feedback to some offered product recommendation.
as set of modal symbolic descriptions that summarizes the Re
information provided by the corresponding data sources. It is
formed by the combination of rich data [16]. The influence
Thursday, January 26, 2012 specific data sources could be explicitly controlled by a novel Other functionalities are systems which,i n of the next ve best
approach in hybrid recommender the retrieval the
of the
49. Text Mining A Lot!
Sentiment Analysis for Extracting the Polarity
Meta-Recommender Engines
Content-Based Filtering
kNN - Nearest Neighbors
Hybrid Meta Recommender
Symbolic Data Analysis (SDA)
Evaluation in Experimental DataSets
Architectural Proposal for Mobile Recommender
Thursday, January 26, 2012
50. Crab
A Python Framework for Building
Recommendation Engines
Marcel Caraciolo Ricardo Caspirro Bruno Melo
@marcelcaraciolo @ricardocaspirro @brunomelo
Thursday, January 26, 2012
51. What is Crab ?
A python framework for building recommendation engines
A Scikit module for collaborative, content and hybrid filtering
Mahout Alternative for Python Developers :D
Open-Source under the BSD license
https://github.com/muricoca/crab
Thursday, January 26, 2012
53. The current Crab
>>>#load the dataset
Thursday, January 26, 2012
54. The current Crab
>>>#load the dataset
>>> from crab.datasets import load_sample_movies
Thursday, January 26, 2012
55. The current Crab
>>>#load the dataset
>>> from crab.datasets import load_sample_movies
>>> data = load_sample_movies()
Thursday, January 26, 2012
56. The current Crab
>>>#load the dataset
>>> from crab.datasets import load_sample_movies
>>> data = load_sample_movies()
>>> data
Thursday, January 26, 2012
57. The current Crab
>>>#load the dataset
>>> from crab.datasets import load_sample_movies
>>> data = load_sample_movies()
>>> data
{'DESCR': 'sample_movies data set was collected by the book called
nProgramming the Collective Intelligence by Toby Segaran nnNotesn-----
nThis data set consists ofnt* n ratings with (1-5) from n users to n movies.',
'data': {1: {1: 3.0, 2: 4.0, 3: 3.5, 4: 5.0, 5: 3.0},
2: {1: 3.0, 2: 4.0, 3: 2.0, 4: 3.0, 5: 3.0, 6: 2.0},
3: {2: 3.5, 3: 2.5, 4: 4.0, 5: 4.5, 6: 3.0},
4: {1: 2.5, 2: 3.5, 3: 2.5, 4: 3.5, 5: 3.0, 6: 3.0},
5: {2: 4.5, 3: 1.0, 4: 4.0},
6: {1: 3.0, 2: 3.5, 3: 3.5, 4: 5.0, 5: 3.0, 6: 1.5},
7: {1: 2.5, 2: 3.0, 4: 3.5, 5: 4.0}},
'item_ids': {1: 'Lady in the Water',
2: 'Snakes on a Planet',
3: 'You, Me and Dupree',
4: 'Superman Returns',
5: 'The Night Listener',
6: 'Just My Luck'},
'user_ids': {1: 'Jack Matthews',
2: 'Mick LaSalle',
3: 'Claudia Puig',
4: 'Lisa Rose',
5: 'Toby',
6: 'Gene Seymour',
7: 'Michael Phillips'}}
Thursday, January 26, 2012
63. The current Crab
>>> #import pairwise distance
Thursday, January 26, 2012
64. The current Crab
>>> #import pairwise distance
>>> from crab.metrics.pairwise import
euclidean_distances
Thursday, January 26, 2012
65. The current Crab
>>> #import pairwise distance
>>> from crab.metrics.pairwise import
euclidean_distances
>>> #import similarity
Thursday, January 26, 2012
66. The current Crab
>>> #import pairwise distance
>>> from crab.metrics.pairwise import
euclidean_distances
>>> #import similarity
>>> from crab.similarities import UserSimilarity
Thursday, January 26, 2012
67. The current Crab
>>> #import pairwise distance
>>> from crab.metrics.pairwise import
euclidean_distances
>>> #import similarity
>>> from crab.similarities import UserSimilarity
>>> similarity = UserSimilarity(m,
euclidean_distances)
Thursday, January 26, 2012
68. The current Crab
>>> #import pairwise distance
>>> from crab.metrics.pairwise import
euclidean_distances
>>> #import similarity
>>> from crab.similarities import UserSimilarity
>>> similarity = UserSimilarity(m,
euclidean_distances)
>>> similarity[1]
Thursday, January 26, 2012
69. The current Crab
>>> #import pairwise distance
>>> from crab.metrics.pairwise import
euclidean_distances
>>> #import similarity
>>> from crab.similarities import UserSimilarity
>>> similarity = UserSimilarity(m,
euclidean_distances)
>>> similarity[1]
[(1, 1.0),
(6, 0.66666666666666663),
(4, 0.34054242658316669),
(3, 0.32037724101704074),
(7, 0.32037724101704074),
(2, 0.2857142857142857),
(5, 0.2674788903885893)]
Thursday, January 26, 2012
72. The current Crab
>>> from crab.recommenders.knn import UserBasedRecommender
Thursday, January 26, 2012
73. The current Crab
>>> from crab.recommenders.knn import UserBasedRecommender
>>> recsys = UserBasedRecommender(model=m,
similarity=similarity, capper=True,with_preference=True)
Thursday, January 26, 2012
74. The current Crab
>>> from crab.recommenders.knn import UserBasedRecommender
>>> recsys = UserBasedRecommender(model=m,
similarity=similarity, capper=True,with_preference=True)
>>> recsys.recommend(5)
array([[ 5. , 3.45712869],
[ 1. , 2.78857832],
[ 6. , 2.38193068]])
Thursday, January 26, 2012
77. The current Crab
Collaborative Filtering algorithms
User-Based, Item-Based and Slope One
Evaluation of the Recommender Algorithms
Precision, Recall, F1-Score, RMSE
Precision-Recall Charts
Thursday, January 26, 2012
85. Distributing the recommendation computations
Use Hadoop and Map-Reduce intensively
Investigating the Yelp mrjob framework https://github.com/pfig/mrjob
Develop the Netflix and novel standard-of-the-art used
Matrix Factorization, Singular Value Decomposition (SVD), Boltzman machines
The most commonly used is Slope One technique.
Simple algebra math with slope one algebra y = a*x+b
Thursday, January 26, 2012
86. Cache/Paralelism with joblib
http://packages.python.org/joblib/index.html
from joblib import Memory
memory = Memory(cachedir=’’, verbose=0)
class UserSimilarity(BaseSimilarity):
...
@memory.cache
def get_similarity(self, source_id, target_id):
source_preferences = self.model.preferences_from_user(source_id)
target_preferences = self.model.preferences_from_user(target_id)
...
return self.distance(source_preferences, target_preferences)
if not source_preferences.shape[1] == 0
and not target_preferences.shape[1] == 0 else np.array([[np.nan]])
def get_similarities(self, source_id):
return[(other_id, self.get_similarity(source_id, other_id))
for other_id, v in self.model]
Thursday, January 26, 2012
87. Cache/Paralelism with joblib
http://packages.python.org/joblib/index.html
from joblib import Memory
memory = Memory(cachedir=’’, verbose=0)
class UserSimilarity(BaseSimilarity):
...
@memory.cache
def get_similarity(self, source_id, target_id):
source_preferences = self.model.preferences_from_user(source_id)
target_preferences = self.model.preferences_from_user(target_id)
...
return self.distance(source_preferences, target_preferences)
if not source_preferences.shape[1] == 0
and not target_preferences.shape[1] == 0 else np.array([[np.nan]])
def get_similarities(self, source_id):
return[(other_id, self.get_similarity(source_id, other_id))
for other_id, v in self.model]
>>> #Without memory.cache
Thursday, January 26, 2012
88. Cache/Paralelism with joblib
http://packages.python.org/joblib/index.html
from joblib import Memory
memory = Memory(cachedir=’’, verbose=0)
class UserSimilarity(BaseSimilarity):
...
@memory.cache
def get_similarity(self, source_id, target_id):
source_preferences = self.model.preferences_from_user(source_id)
target_preferences = self.model.preferences_from_user(target_id)
...
return self.distance(source_preferences, target_preferences)
if not source_preferences.shape[1] == 0
and not target_preferences.shape[1] == 0 else np.array([[np.nan]])
def get_similarities(self, source_id):
return[(other_id, self.get_similarity(source_id, other_id))
for other_id, v in self.model]
>>> #Without memory.cache >>># With memory.cache
Thursday, January 26, 2012
89. Cache/Paralelism with joblib
http://packages.python.org/joblib/index.html
from joblib import Memory
memory = Memory(cachedir=’’, verbose=0)
class UserSimilarity(BaseSimilarity):
...
@memory.cache
def get_similarity(self, source_id, target_id):
source_preferences = self.model.preferences_from_user(source_id)
target_preferences = self.model.preferences_from_user(target_id)
...
return self.distance(source_preferences, target_preferences)
if not source_preferences.shape[1] == 0
and not target_preferences.shape[1] == 0 else np.array([[np.nan]])
def get_similarities(self, source_id):
return[(other_id, self.get_similarity(source_id, other_id))
for other_id, v in self.model]
>>> #Without memory.cache >>># With memory.cache
>>> timeit similarity.get_similarities
(‘marcel_caraciolo’)
Thursday, January 26, 2012
90. Cache/Paralelism with joblib
http://packages.python.org/joblib/index.html
from joblib import Memory
memory = Memory(cachedir=’’, verbose=0)
class UserSimilarity(BaseSimilarity):
...
@memory.cache
def get_similarity(self, source_id, target_id):
source_preferences = self.model.preferences_from_user(source_id)
target_preferences = self.model.preferences_from_user(target_id)
...
return self.distance(source_preferences, target_preferences)
if not source_preferences.shape[1] == 0
and not target_preferences.shape[1] == 0 else np.array([[np.nan]])
def get_similarities(self, source_id):
return[(other_id, self.get_similarity(source_id, other_id))
for other_id, v in self.model]
>>> #Without memory.cache >>># With memory.cache
>>> timeit similarity.get_similarities >>> timeit similarity.get_similarities
(‘marcel_caraciolo’) (‘marcel_caraciolo’)
Thursday, January 26, 2012
91. Cache/Paralelism with joblib
http://packages.python.org/joblib/index.html
from joblib import Memory
memory = Memory(cachedir=’’, verbose=0)
class UserSimilarity(BaseSimilarity):
...
@memory.cache
def get_similarity(self, source_id, target_id):
source_preferences = self.model.preferences_from_user(source_id)
target_preferences = self.model.preferences_from_user(target_id)
...
return self.distance(source_preferences, target_preferences)
if not source_preferences.shape[1] == 0
and not target_preferences.shape[1] == 0 else np.array([[np.nan]])
def get_similarities(self, source_id):
return[(other_id, self.get_similarity(source_id, other_id))
for other_id, v in self.model]
>>> #Without memory.cache >>># With memory.cache
>>> timeit similarity.get_similarities >>> timeit similarity.get_similarities
(‘marcel_caraciolo’) (‘marcel_caraciolo’)
100 loops, best of 3: 978 ms per loop
Thursday, January 26, 2012
92. Cache/Paralelism with joblib
http://packages.python.org/joblib/index.html
from joblib import Memory
memory = Memory(cachedir=’’, verbose=0)
class UserSimilarity(BaseSimilarity):
...
@memory.cache
def get_similarity(self, source_id, target_id):
source_preferences = self.model.preferences_from_user(source_id)
target_preferences = self.model.preferences_from_user(target_id)
...
return self.distance(source_preferences, target_preferences)
if not source_preferences.shape[1] == 0
and not target_preferences.shape[1] == 0 else np.array([[np.nan]])
def get_similarities(self, source_id):
return[(other_id, self.get_similarity(source_id, other_id))
for other_id, v in self.model]
>>> #Without memory.cache >>># With memory.cache
>>> timeit similarity.get_similarities >>> timeit similarity.get_similarities
(‘marcel_caraciolo’) (‘marcel_caraciolo’)
100 loops, best of 3: 978 ms per loop 100 loops, best of 3: 434 ms per loop
Thursday, January 26, 2012
94. Distributed Computing with mrJob
https://github.com/Yelp/mrjob
It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or
local (for testing)
Thursday, January 26, 2012
95. Distributed Computing with mrJob
https://github.com/Yelp/mrjob
It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or
local (for testing)
Thursday, January 26, 2012
96. Distributed Computing with mrJob
https://github.com/Yelp/mrjob
"""The classic MapReduce job: count the frequency of words.
"""
from mrjob.job import MRJob
import re
WORD_RE = re.compile(r"[w']+")
class MRWordFreqCount(MRJob):
def mapper(self, _, line):
for word in WORD_RE.findall(line):
yield (word.lower(), 1)
def reducer(self, word, counts):
yield (word, sum(counts))
if __name__ == '__main__':
MRWordFreqCount.run()
It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or
local (for testing)
Thursday, January 26, 2012
97. Distributed Computing with mrJob
https://github.com/Yelp/mrjob
Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduce
Thursday, January 26, 2012
98. Distributed Computing with mrJob
https://github.com/Yelp/mrjob
Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduce
Thursday, January 26, 2012
99. Future studies with Sparse Matrices
Real datasets come with lots of empty values
http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html
Solutions:
scipy.sparse package
Sharding operations
Matrix Factorization
techniques (SVD)
Apontador Reviews Dataset
Thursday, January 26, 2012
100. Future studies with Sparse Matrices
Real datasets come with lots of empty values
http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html
Solutions:
scipy.sparse package
Sharding operations
Matrix Factorization
techniques (SVD)
Crab implements a Matrix
Factorization with Expectation
Maximization algorithm
Apontador Reviews Dataset
Thursday, January 26, 2012
101. Future studies with Sparse Matrices
Real datasets come with lots of empty values
http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html
Solutions:
scipy.sparse package
Sharding operations
Matrix Factorization
techniques (SVD)
Crab implements a Matrix
Factorization with Expectation
Maximization algorithm
scikits.crab.svd package
Apontador Reviews Dataset
Thursday, January 26, 2012
102. Benchmarks
Pure Python w/ Python w/ Scipy
Dataset
dicts and Numpy
MovieLens 100k 15.32 s 9.56 s
http://www.grouplens.org/node/73
Old Crab New Crab
Thursday, January 26, 2012
103. Benchmarks
Pure Python w/ Python w/ Scipy
Dataset
dicts and Numpy
MovieLens 100k 15.32 s 9.56 s
http://www.grouplens.org/node/73
Old Crab New Crab
Time ellapsed ( Recommend 5 items)
0 4 8 12 16
Thursday, January 26, 2012
104. Benchmarks
Pure Python w/ Python w/ Scipy
Dataset
dicts and Numpy
MovieLens 100k 15.32 s 9.56 s
http://www.grouplens.org/node/73
Old Crab New Crab
Time ellapsed ( Recommend 5 items)
0 4 8 12 16
Thursday, January 26, 2012
105. Benchmarks
Pure Python w/ Python w/ Scipy
Dataset
dicts and Numpy
MovieLens 100k 15.32 s 9.56 s
http://www.grouplens.org/node/73
Old Crab New Crab
Time ellapsed ( Recommend 5 items)
0 4 8 12 16
Thursday, January 26, 2012
106. Why migrate ?
Old Crab running only using Pure Python
Recommendations demand heavy maths calculations and lots of processing
Compatible with Numpy and Scipy libraries
High Standard and popular scientific libraries optimized for scientific calculations in Python
Scikits projects are amazing!
Active Communities, Scientific Conferences and updated projects (e.g. scikit-learn)
Turn the Crab framework visible for the community
Join the scientific researchers and machine learning developers around the Globe coding with
Python to help us in this project
Be Fast and Furious
Thursday, January 26, 2012
107. How are we working ?
Sprints, Online Discussions and Issues
https://github.com/muricoca/crab/wiki/UpcomingEvents
Thursday, January 26, 2012
108. How are we working ?
Our Project’s Home Page
http://muricoca.github.com/crab
Thursday, January 26, 2012
109. Future Releases
Planned Release 0.1
Collaborative Filtering Algorithms working, sample datasets to load and test
Planned Release 0.11
Evaluation of Recommendation Algorithms and Database Models support
Planned Release 0.12
Recommendation as Services with REST APIs
....
Thursday, January 26, 2012
110. Join us!
1. Read our Wiki Page
https://github.com/muricoca/crab/wiki/Developer-Resources
2. Check out our current sprints and open issues
https://github.com/muricoca/crab/issues
3. Forks, Pull Requests mandatory
4. Join us at irc.freenode.net #muricoca or at our
discussion list in work :(
Thursday, January 26, 2012
112. Recomendação
em
redes
sociais
!"#$%*'+,-)%
./0#$-+1'/% this engine with the popular brazilian social network AtéPassar
Integrated
More than 70.000 students registered studying for the public examinations
Recommend StudyGroups, Friends,Video Classes, Questions and Concursos
More than 70.000 items available for recommend
%
%
!"--(0".(12%&'()%*&+,-$%.,#/&
%
Written in Python using a open-source framework Crab
!"#"$%&&'%()*&+,-(.'&/,-0&+,-(.'&
%12%&'303#2,&('",'&2,"&34&
%
Framework available for building recommender systems (My contribution)
It is running since January 2011
In March B#0-%<#+'CC#/3#$% was performed.
2011 , questionnaire
%% %&-$-C#0#$"%%
Liked Not Liked
-1'/"%
23%
mender Components Interac-
77%
Figure 3: AtePassar Recommender Syste
face
hat students do not meet phys-
Thursday, January 26, 2012
115. Recomendações Sociais
1. Usuário se loga via Facebook
2. Usuário acessa a e-commerce parceira da LikeStore.
3. Usuário já recebe recomendações personalizadas na entrada.
4. Usuário recebe recomendações no carrinho de compras
5. Usuário recebe recomendações na página do produto.
Produtos Similares
Quem comprou este também comprou
Amigos que curtiram/ compraram isto
Thursday, January 26, 2012
119. Join us!
1. Read our Wiki Page
https://github.com/muricoca/crab/wiki/Developer-Resources
2. Check out our current sprints and open issues
https://github.com/muricoca/crab/issues
3. Forks, Pull Requests mandatory
4. Join us at irc.freenode.net #muricoca or at our
discussion list in scikit-crab@googlegroups.com
Thursday, January 26, 2012
124. Items Recomendados
Toby Segaran, Programming Collective SatnamAlag, Collective Intelligence in
Intelligence, O'Reilly, 2007 Action, Manning Publications, 2009
Sites como TechCrunch e ReadWriteWeb
Thursday, January 26, 2012
125. Conferências Recomendadas
- ACM RecSys.
–ICWSM: Weblogand Social Media
–WebKDD: Web Knowledge Discovery and Data Mining
–WWW: The original WWW conference
–SIGIR: Information Retrieval
–ACM KDD: Knowledge Discovery and Data Mining
–ICML: Machine Learning
Thursday, January 26, 2012
126. Onde você estará em tudo
isso ?
Fonte: Hunch.com
Obrigado !!
HUNCH Vendida ao Ebay por $80M
Thursday, January 26, 2012
127. Sistemas de
Recomendação
Marcel Pinheiro Caraciolo
marcel@orygens.com
@marcelcaraciolo
http://www.orygens.com
Thursday, January 26, 2012
128. Optimizations with Cython
http://cython.org/
Cython is a Python extension that lets developers annotate functions so they can be compiled to C.
http://aimotion.blogspot.com/2011/09/high-performance-computation-with_17.html
Thursday, January 26, 2012
129. Optimizations with Cython
http://cython.org/
Cython is a Python extension that lets developers annotate functions so they can be compiled to C.
# setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
# for notes on compiler flags see:
# http://docs.python.org/install/index.html
setup(
cmdclass = {'build_ext': build_ext},
ext_modules = [Extension("spearman_correlation_cython",
["spearman_correlation_cython.pyx"])]
)
http://aimotion.blogspot.com/2011/09/high-performance-computation-with_17.html
Thursday, January 26, 2012
130. Optimizations with Cython
http://cython.org/
Cython is a Python extension that lets developers annotate functions so they can be compiled to C.
# setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
# for notes on compiler flags see:
# http://docs.python.org/install/index.html
setup(
cmdclass = {'build_ext': build_ext},
ext_modules = [Extension("spearman_correlation_cython",
["spearman_correlation_cython.pyx"])]
)
http://aimotion.blogspot.com/2011/09/high-performance-computation-with_17.html
Thursday, January 26, 2012
131. Cache/Paralelism with joblib
http://packages.python.org/joblib/index.html
Investigate how to use multiprocessing and parallel packages with similarities
computation
from joblib import Parallel
...
def get_similarities(self, source_id):
return Parallel(n_jobs=3) ((other_id, delayed(self.get_similarity)
(source_id, other_id)) for other_id, v in self.model)
Thursday, January 26, 2012