• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Sistemas de Recomendação: Como funciona e Onde Se aplica?
 

Sistemas de Recomendação: Como funciona e Onde Se aplica?

on

  • 3,397 views

Palestra ministrada no Recife Summer School 2012 no Porto Digital, Recife, Pernambuco

Palestra ministrada no Recife Summer School 2012 no Porto Digital, Recife, Pernambuco

Statistics

Views

Total Views
3,397
Views on SlideShare
3,396
Embed Views
1

Actions

Likes
3
Downloads
52
Comments
0

1 Embed 1

http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Sistemas de Recomendação: Como funciona e Onde Se aplica? Sistemas de Recomendação: Como funciona e Onde Se aplica? Presentation Transcript

    • Sistemas de Recomendação Marcel Pinheiro Caraciolo marcel@orygens.com @marcelcaraciolo http://www.orygens.comThursday, January 26, 2012
    • Quem é Marcel ? Marcel Pinheiro Caraciolo - @marcelcaraciolo Sergipano, porém Recifense. Mestre em Ciência da Computação no CIN/UFPE na área de mineração de dados Diretor de Pesquisa e Desenvolvimento na Orygens Membro e Moderador da Celúla de Usuários Python de Pernambuco (PUG-PE) Minhas áreas de interesse: Computação móvel e Computação inteligente Meus blogs: http://www.mobideia.com (sobre Mobilidade desde 2006) http://aimotion.blogspot.com (sobre I.A. desde 2009) Jovem Aprendiz ainda nas artes pythonicas.... (desde 2007)Thursday, January 26, 2012
    • WEBThursday, January 26, 2012
    • WEBThursday, January 26, 2012
    • 1.0 2.0 Fonte de Informação Fluxo Contínuo de Informação VI Encontro do PUG-PE VI Encontro do PUG-PEThursday, January 26, 2012
    • WEB SITES WEB APPLICATIONS WEB SERVICES 3.0 SEMANTIC WEB USERS VI Encontro do PUG-PE VI Encontro do PUG-PEThursday, January 26, 2012
    • Usar informação coletiva de forma efetiva afim de aprimorar uma aplicaçãoThursday, January 26, 2012
    • Intelligence from Mining Data User User User User User Um usuário influencia outros por resenhas, notas, recomendações e blogs Um usuário é influenciado por outros por resenhas, notas, recomendações e blogsThursday, January 26, 2012
    • aggregation information: lists ratings user-generated content reviews blogs recommendations wikis Collective Intelligence voting Your application bookmarking Search tag cloud tagging saving Natural Language Processing Clustering and Harness external content predictive modelsThursday, January 26, 2012
    • WEB SITES WEB APPLICATIONS WEB SERVICES 3.0 SEMANTIC WEB USERS antes... VI Encontro do PUG-PE VI Encontro do PUG-PEFriday, October 1, 20102012 Thursday, January 26,
    • AtualmenteThursday, January 26, 2012
    • estamos sobrecarregados de informaçõesThursday, January 26, 2012
    • muitas vezes inúteis Thursday, January 26, 2012Friday, October 1, 2010
    • às vezes procuramos isso...Friday, October 1, 2010 2012 Thursday, January 26,
    • e encontramos isso!Friday, October 1, 2010 2012 Thursday, January 26,
    • google?Friday, October 1, 2010 2012 Thursday, January 26,
    • google? midias sociais?Friday, October 1, 20102012Thursday, January 26,
    • eeeeuuuu... google? midias sociais?riday, October 1, 2010 2012 Thursday, January 26,
    • Sistemas de RecomendaçãoThursday, January 26, 2012
    • “A lot of times, people don’t know what they want until you show it to them.” Steve Jobs “We are leaving the Information age, and entering into the Recommendation age.” Chris Anderson, from book Long TailThursday, January 26, 2012
    • Recomendações Sociais Família/Amigos Amigos/ Família O Que eu deveria ler ? Ref: Flickr-BlueAlgae “Eu acho que você deveria ler Ref: Flickr photostream: jefield estes livros.Thursday, January 26, 2012
    • Recomendações por Interação Entrada: Avalie alguns livros O Que eu deveria ler ? Saída: “Livros que você pode gostar são …”Thursday, January 26, 2012
    • Sistemas desenhados para sugerir algo para mim do meu interesse!Thursday, January 26, 2012
    • Por que Recomendação ?Thursday, January 26, 2012
    • Netflix - 2/3 dos filmes alugados vêm de recomendação Google News - 38% das notícias mais clicadas vêm de recomendação Amazon - 38% das vendas vêm de recomendação Fonte: Celma & Lamere, ISMIR 2007Thursday, January 26, 2012
    • !"#$%"#&"%(&$)") Nós+,&-.$/).#&0#/"1.#$%234(".# * estamos sobrecarregados de informação $/)#5(&6 7&.2.#"$4,#)$8 * 93((3&/.#&0#:&3".;#5&&<.# $/)#:-.34#2%$4<.#&/(3/" Milhares de artigos e posts * =/#>$/&3;#?#@A#+B#4,$//"(.;# novos todos os dias 2,&-.$/).#&0#7%&6%$:.# "$4,#)$8 * =/#C"1#D&%<;#.""%$(# Milhões de Músicas, Filmes e 2,&-.$/).#&0#$)#:"..$6".# Livros ."/2#2&#-.#7"%#)$8 Milhares de Ofertas e PromoçõesThursday, January 26, 2012
    • O que pode ser recomendado ? Contatos em Redes Sociais Artigos Produtos Messagens de Propaganda Cursos e-learning Livros Tags Músicas Futuras namoradas Roupas Filmes Restaurantes Programas de Tv Vídeos Papers Opções de Investimento Profissionais Módulos de códigoThursday, January 26, 2012
    • E como funciona a recomendação ?Thursday, January 26, 2012
    • O que os sistemas de recomendação realmente fazem ? 1. Prediz o quanto você pode gostar de um certo produto ou serviço 2. Sugere um lista de N items ordenada de acordo com seu interese 3. Sugere uma lista de N usuários ordernada para um produto/serviço 4. Explica a você o porque esses items foram recomendados 5. Ajusta a predição e a recomendação baseado em seu feedback e de outros.Thursday, January 26, 2012
    • Filtragem baseada por Conteúdo Similar Duro de O Vento Toy Armagedon Items Matar Levou Store recomenda gosta Marcel UsuáriosThursday, January 26, 2012
    • Problemas com filtragem por conteúdo 1. Análise dos dados Restrita - Items e usuários pouco detalhados. Pior em áudio ou imagens 2. Dados Especializados - Uma pessoa que não tem experiência com Sushi não recebe o melhor restaurante de Sushi da cidade 3. Efeito Portfólio - Só porque eu vi 1 filme da Xuxa quando criança, tem que me recomendar todos delaThursday, January 26, 2012
    • Filtragem Colaborativa O Vento Toy Thor Armagedon Items Levou Store gosta recomenda Marcel Rafael Amanda Usuários SimilarThursday, January 26, 2012
    • Problemas com filtragem colaborativa 1. Escabilidade - Amazon com 5M usuários, 50K items, 1.4B avaliações 2. Dados esparsos - Novos usuários e items que não tem histórico 3. Partida Fria - Só avaliei apenas um único livro no Amazon! 4. Popularidade - Todo mundo lê ‘Harry Potter’ 5. Hacking - A pessoa que lê ‘Harry Potter’ lê Kama SutraThursday, January 26, 2012
    • Filtragem Híbrida Combinação de múltiplos métodos Duro de O Vento Toy Armagedon Items Matar Levou Store Ontologias Dados Símbolicos Marcel Rafael Luciana UsuáriosThursday, January 26, 2012
    • Como eles são apresentados ? Destaques Mais sobre este artista... Alguem similar a você também gostou disso O mais popular em seu grupo... Já que você escutou esta, você pode querer esta... Lançamentos Escute músicas de artistas similares. Estes dois item vêm juntos..Thursday, January 26, 2012
    • Como eles são avaliados ? Como sabemos se a recomendação é boa ? Geralmente se divide-se em treinamento/teste (80/20) Críterios utilizados: - Erro de Predição: RMSE - Curva ROC*, rank-utility, F-Measure *http://code.google.com/p/pyplotmining/Thursday, January 26, 2012
    • Mobile RecommendersThursday, January 26, 2012
    • Por que mobile ? Mais de 1 bilhão de Aparelhos Mais de 5 bilhões de apps baixadas Destaque no segmento mobile http://foursquare.com http://vimeo.com/29323612Thursday, January 26, 2012
    • Sistemas de Recomendação Móvel Deve-se levar em conta informações temporais e espaciais Como definir que contexto ele está inserido ? E as avaliações como ser capturadas em uma tela limitada?Thursday, January 26, 2012
    • a strong heterogeneity. At case study is carried out in Section 5. Finaly, thesers location is constantly conclusion of this paper and future workata-processing capability in overview are discussed in Section 6. WSEAS TRANSACTIONS on COMPUTERS services on the systemht new challenges [4-6]. type of location-based approach, users want to be e real-time and targeted 2 System Workflow and Architecture Arquitetura Figure 1 gives the workflow of our system. repackage the heterogeneous data and service, and republic them as web service. The service com new code to not just the indexed Users can send their inquiries demand by successful design of this module is the key After an simply on a static operating in the mobile phone. And the client problem for realization of cross-platform new appl mechanism tly, the rise of a large .0 applications (blog, Recomendações processadas via Mobile (Inviável Hoje) will get the current location information and sent it together with users’ inqueries demand to the service and data sharing. The functional layer has three components as Multi-Mode Location Information Index, service m large-scal Web Albums, Blog and server. Server-side application will analyze the Thus it ca tes that users have the very relevant data and provide matched restaurant Context-based Collaborative Filtering changing of direct, rapid, useful and recommendation and navigation. Algorithm, and Location-based Personalized So in th tion recommendation and - Tudo é processado em Back-End (Servidor) Application data information of our system e enviado ao celular via Web Recommendation and Navigation. We will and Serv ]. can be divided into two parts: the location-based discuss every function component in details as Middlewa n can be user-friendly data (such as traffic and road condition data, follows. Architectu GPS map, and entity information, etc.) and the two techn ient mobile terminals, It Value-added Services integration a very important research value-added data provided by users (such as combinati in Web 2.0 very wide market prospect. Ratings, Comments, Blog and Tags, etc.). User Tagging !!Despite th Value-added DBsigns and realizes a Comments Tags Information Publish platforms h User mobile restaurant Ratings …..…. Recommendation informationnavigation system. In order Restaurant Query ……... Ping” websside response speed for facilities ra propose a memory pool Location-based DB website, wh Client However, it Accept command, no-data GPS-info E-Map Entity-info ……... Mobile Information Pushing Platform static guidin terrupt mechanism, which Prescribed Location-based Info. mobile loca Context-based Location-based ize the server-side control Users‘ Collaborative Filtering inconvenien personalized ient side, we combine the Matched Entity Collaborative recommendation and with the visi lication data with the & Route Info. Recommendation & Multi-Mode Location Navigation In order Entity Feature Info. scenario as nd propose a collaborative Information Indexmmend mechanisms, which and propose Server h real-time location-based Let us Personalized Location-based Data and Service Middleware example.ecommend personalized Location-based Value-added DB location a Restaurant Comments Tags Recommendation & from its c ually provide personalized Ratings …..…. through th Navigation Services ndation to build their own Clien informatio Location-basedh can help them to consider Services informatiomunity users!collaborative Fig.1. System Workflow Location-based GPS Navigation current lo DB informatio Location-based info Traffic-info Booking the targe E-Map Entity-query informatio matching 810 Issue 5, Volume 6, May 2009 informatio Fig 1. Architecture of the Mobile Information Thursday, January 26, 2012 Accordi
    • Informações Disponíveis Localização, Tags, ContextoThursday, January 26, 2012
    • Informações Disponíveis Avaliação ImplícitaThursday, January 26, 2012
    • Um dos mais populares sistemas de localização móvel Checkins, diga aonde você está! Recomendações de lugaresThursday, January 26, 2012
    • Assistente Virtual Móvel Conversacional Já se utiliza de informações das redes Sociais Recomendação de RestaurantesThursday, January 26, 2012
    • Google HotPot Repositório de Reviews Recomendação de LugaresThursday, January 26, 2012
    • Minhas contribuiçõesThursday, January 26, 2012
    • Meu trabalho de Mestrado Offering Products and Services Using Product Reviews from Social Networks in Mobile Decision Aid Systems Marcel Caraciolo∗ and Germano Vasconcelos† Informatics Center Federal University Of Pernambuco WebSite: http://www.cin.ufpe.br/ Email: ∗ mpc@cin.ufpe.br † gcv@cin.ufpe.br Abstract—Recommendation engines provide information fil- extremely used by users to give a more nuanced view about tering functions and decision aids that have a great potential a product in order to make an informed decision [5]. application the mobile context. An aspect that hasn’t been Nonetheless, providing users with relevant recommenda- extensively exploited yet in the current recommendations is the improvement in the explanation of the recommendation. tion information it is a difficult task. Besides the technical For instance, exploiting the service and product description components such as the user model representation and infor- and the opinion of users about the recommended products, mation filtering techniques to generate the recommendations, where associated would bring a better explanation for the user. the information must be user-friendly visualized. This is a In this paper we will present the foundations for a mobile requirement specially to support the user in the purchase product/service recommender system which incorporate bothThursday, January 26, 2012 structured (supplier driven) product descriptions and subject decision process, and to convince him about the utility of the
    • source, the recommendation architecture that we propose will would rely more on collaborative-filtering techniques, that is, aggregate the results of such filtering techniques. Bezerra and Carvalho proposed approaches where the results the reviews from similar users. We aim at integrating the previously mentioned hybrid prod- Figure 1 shows a overview of our meta recommender achieved showed to be very promising [19]. approach. By combining the content-based filtering and the uct recommendation approach in a mobile application so the users could benefit from useful and logical recommendations. collaborative-based one into a hybrid recommender system, it A. Moreover, we aim at providing a suited explanation for each would use the services/products III. S YSTEM catalogues repositories which D ESIGN How reviews from web services sources can be aggregated in the for recommendation to the user, since the current approaches just only deliver product recommendations with a overall score the services to be recommended, and the review repository Application data information our mobile recommender sys- that contains the user opinions about those services. All this datatembecan be from data source containers in the web product description can extracted divided into two parts: the rec mobile recommendation process? without pointing out the appropriateness of such recommen- dation [13]. Besides the basic information provided by the such(such location-based social network Foursquare its attributes) and the user as the as location, description and [17] as mo suppliers, the system will deliver the explanation, providing displayed at the Figure 2 and the location recommendation relevant reviews of similar users, we believe that it will engine from Google: Google HotPot [18]. by user (such as rating, comments, reviews or ratings provided wh increase the confidence in the buying decision process and the tags, etc.). The Figure 3 gives the system’s architecture and po product accepptance rate. In the mobile context this approach could help the users in this process and showing the user relative components. thi opinions could contribute to achieve this task. rec spe !"#$"%&$ 5&-$ !"#$%&%($) !".,"/#) acc !"*+#,$+-) !"*+#,$+-) +,-*.&$ !(#$()&*&%$ /01&234&$ !6#$6,00&41&7$ wh res !<#$<&2&&04&%A$B,431*,0A$&14C$ ves 0+44%6+%$,.")1%#"2) 0+($"($)1%#"2) 3,4$",(5) ou 3,4$",(5) )))67,8,#%)+,4%$91$%4)-1":)))) suc !"#$%&"()*+,#&-,.) /$%,0"12()*3$4%)3""5.) ))))1,;&,<4)<1&%%,)=2)4&:&8$1)) )))))))))))%$4%,5)94,14>?) <,7)41$ pro 8&=,%*1,>$ exp 8&4,99&0731*,0$:0;*0&$ !B#$B*%1$,2$D4,&7$<,7)41%$ !(#$()&*&%$ ma 8&?*&@$ we Fig. 2. User Reviews from Foursquare Social Network 8&=,%*1,>$ com 7"$%) !"8+99"(2")) !8#$830E&7$<,7)41%$ The content-based filtering approach will be used to filter ext the product/service repository, while the collaborative based 8&%).1%$ B. approach will derive the product review recommendations. In addition we will use text mining techniques to distinct the !"8+99"(2%$,+(#) polarity of the user review between positive or negative one. This information summarized would contribute in the product Architecture Fig. 3. Mobile Recommender System rat score recommendation computation. The final product recom- Fig. 1. Meta Recommender Architecture mendation score is computed by integrating the result of both me recommenders. By now, weproduct/service recommender, the user could In our mobile are considering to use different and Since one of the goals of this work is to incorporate options regarding this integration approach, one and get a list of recommen- different data sources of user opinions and descriptions, we filter some products or services at special oth is the symbolic data analysis approach (SDA) [19], which have addopted an meta recommendation architecture. By using eachtations. The user user ratings/reviews arehis preferences or give his product description and also can enter modeled ow a meta recommender architecture, the system would provide a personalized control over the generated recommendation list feedback to some offered product recommendation. as set of modal symbolic descriptions that summarizes the Re information provided by the corresponding data sources. It is formed by the combination of rich data [16]. The influenceThursday, January 26, 2012 specific data sources could be explicitly controlled by a novel Other functionalities are systems which,i n of the next ve best approach in hybrid recommender the retrieval the of the
    • Text Mining A Lot! Sentiment Analysis for Extracting the Polarity Meta-Recommender Engines Content-Based Filtering kNN - Nearest Neighbors Hybrid Meta Recommender Symbolic Data Analysis (SDA) Evaluation in Experimental DataSets Architectural Proposal for Mobile RecommenderThursday, January 26, 2012
    • Crab A Python Framework for Building Recommendation Engines Marcel Caraciolo Ricardo Caspirro Bruno Melo @marcelcaraciolo @ricardocaspirro @brunomeloThursday, January 26, 2012
    • What is Crab ? A python framework for building recommendation engines A Scikit module for collaborative, content and hybrid filtering Mahout Alternative for Python Developers :D Open-Source under the BSD license https://github.com/muricoca/crabThursday, January 26, 2012
    • The current CrabThursday, January 26, 2012
    • The current Crab >>>#load the datasetThursday, January 26, 2012
    • The current Crab >>>#load the dataset >>> from crab.datasets import load_sample_moviesThursday, January 26, 2012
    • The current Crab >>>#load the dataset >>> from crab.datasets import load_sample_movies >>> data = load_sample_movies()Thursday, January 26, 2012
    • The current Crab >>>#load the dataset >>> from crab.datasets import load_sample_movies >>> data = load_sample_movies() >>> dataThursday, January 26, 2012
    • The current Crab >>>#load the dataset >>> from crab.datasets import load_sample_movies >>> data = load_sample_movies() >>> data {DESCR: sample_movies data set was collected by the book called nProgramming the Collective Intelligence by Toby Segaran nnNotesn----- nThis data set consists ofnt* n ratings with (1-5) from n users to n movies.,  data: {1: {1: 3.0, 2: 4.0, 3: 3.5, 4: 5.0, 5: 3.0},   2: {1: 3.0, 2: 4.0, 3: 2.0, 4: 3.0, 5: 3.0, 6: 2.0},   3: {2: 3.5, 3: 2.5, 4: 4.0, 5: 4.5, 6: 3.0},   4: {1: 2.5, 2: 3.5, 3: 2.5, 4: 3.5, 5: 3.0, 6: 3.0},   5: {2: 4.5, 3: 1.0, 4: 4.0},   6: {1: 3.0, 2: 3.5, 3: 3.5, 4: 5.0, 5: 3.0, 6: 1.5},   7: {1: 2.5, 2: 3.0, 4: 3.5, 5: 4.0}},  item_ids: {1: Lady in the Water,   2: Snakes on a Planet,   3: You, Me and Dupree,   4: Superman Returns,   5: The Night Listener,   6: Just My Luck},  user_ids: {1: Jack Matthews,   2: Mick LaSalle,   3: Claudia Puig,   4: Lisa Rose,   5: Toby,   6: Gene Seymour,   7: Michael Phillips}}Thursday, January 26, 2012
    • The current CrabThursday, January 26, 2012
    • The current Crab >>> from crab.models import MatrixPreferenceDataModelThursday, January 26, 2012
    • The current Crab >>> from crab.models import MatrixPreferenceDataModel >>> m = MatrixPreferenceDataModel(data.data)Thursday, January 26, 2012
    • The current Crab >>> from crab.models import MatrixPreferenceDataModel >>> m = MatrixPreferenceDataModel(data.data) >>> print m MatrixPreferenceDataModel (7 by 6)          1 2 3 4 5 ... 1 3.000000 4.000000 3.500000 5.000000 3.000000 2 3.000000 4.000000 2.000000 3.000000 3.000000 3 --- 3.500000 2.500000 4.000000 4.500000 4 2.500000 3.500000 2.500000 3.500000 3.000000 5 --- 4.500000 1.000000 4.000000 --- 6 3.000000 3.500000 3.500000 5.000000 3.000000 7 2.500000 3.000000 --- 3.500000 4.000000Thursday, January 26, 2012
    • The current CrabThursday, January 26, 2012
    • The current Crab >>> #import pairwise distanceThursday, January 26, 2012
    • The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distancesThursday, January 26, 2012
    • The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarityThursday, January 26, 2012
    • The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity >>> from crab.similarities import UserSimilarityThursday, January 26, 2012
    • The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity >>> from crab.similarities import UserSimilarity >>> similarity = UserSimilarity(m, euclidean_distances)Thursday, January 26, 2012
    • The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity >>> from crab.similarities import UserSimilarity >>> similarity = UserSimilarity(m, euclidean_distances) >>> similarity[1]Thursday, January 26, 2012
    • The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity >>> from crab.similarities import UserSimilarity >>> similarity = UserSimilarity(m, euclidean_distances) >>> similarity[1] [(1, 1.0), (6, 0.66666666666666663), (4, 0.34054242658316669), (3, 0.32037724101704074), (7, 0.32037724101704074), (2, 0.2857142857142857), (5, 0.2674788903885893)]Thursday, January 26, 2012
    • The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity >>> from crab.similarities import UserSimilarity >>> similarity = UserSimilarity(m, euclidean_distances) >>> similarity[1] [(1, 1.0), (6, 0.66666666666666663), MatrixPreferenceDataModel (7 by 6)          1 2 3 4 5 (4, 0.34054242658316669), 1 3.000000 4.000000 3.500000 5.000000 3.000000 (3, 0.32037724101704074), 2 3.000000 4.000000 2.000000 3.000000 3.000000 3 --- 3.500000 2.500000 4.000000 4.500000 (7, 0.32037724101704074), 4 2.500000 3.500000 2.500000 3.500000 3.000000 5 --- 4.500000 1.000000 4.000000 --- (2, 0.2857142857142857), 6 3.000000 3.500000 3.500000 5.000000 3.000000 (5, 0.2674788903885893)] 7 2.500000 3.000000 --- 3.500000 4.000000Thursday, January 26, 2012
    • The current CrabThursday, January 26, 2012
    • The current Crab >>> from crab.recommenders.knn import UserBasedRecommenderThursday, January 26, 2012
    • The current Crab >>> from crab.recommenders.knn import UserBasedRecommender >>> recsys = UserBasedRecommender(model=m, similarity=similarity, capper=True,with_preference=True)Thursday, January 26, 2012
    • The current Crab >>> from crab.recommenders.knn import UserBasedRecommender >>> recsys = UserBasedRecommender(model=m, similarity=similarity, capper=True,with_preference=True) >>> recsys.recommend(5) array([[ 5. , 3.45712869],        [ 1. , 2.78857832],        [ 6. , 2.38193068]])Thursday, January 26, 2012
    • The current Crab >>> from crab.recommenders.knn import UserBasedRecommender >>> recsys = UserBasedRecommender(model=m, similarity=similarity, capper=True,with_preference=True) >>> recsys.recommend(5) array([[ 5. , 3.45712869],        [ 1. , 2.78857832],        [ 6. , 2.38193068]]) >>> recsys.recommended_because(user_id=5,item_id=1) array([[ 2. , 3. ],        [ 1. , 3. ],        [ 6. , 3. ],        [ 7. , 2.5],        [ 4. , 2.5]])Thursday, January 26, 2012
    • The current Crab >>> from crab.recommenders.knn import UserBasedRecommender >>> recsys = UserBasedRecommender(model=m, similarity=similarity, capper=True,with_preference=True) >>> recsys.recommend(5) array([[ 5. , 3.45712869],        [ 1. , 2.78857832],        [ 6. , 2.38193068]]) >>> recsys.recommended_because(user_id=5,item_id=1) array([[ 2. , 3. ],        [ 1. , 3. ], MatrixPreferenceDataModel (7 by 6)          1 2 3 4 5 ...        [ 6. , 3. ], 1 3.000000 4.000000 3.500000 5.000000 3.000000 2 3.000000 4.000000 2.000000 3.000000 3.000000        [ 7. , 2.5], 3 --- 3.500000 2.500000 4.000000 4.500000        [ 4. , 2.5]]) 4 2.500000 3.500000 2.500000 3.500000 3.000000 5 --- 4.500000 1.000000 4.000000 --- 6 3.000000 3.500000 3.500000 5.000000 3.000000 7 2.500000 3.000000 --- 3.500000 4.000000Thursday, January 26, 2012
    • The current Crab Collaborative Filtering algorithms User-Based, Item-Based and Slope One Evaluation of the Recommender Algorithms Precision, Recall, F1-Score, RMSE Precision-Recall ChartsThursday, January 26, 2012
    • Evaluating your recommenderThursday, January 26, 2012
    • Evaluating your recommender >>> from crab.metrics.classes import CfEvaluatorThursday, January 26, 2012
    • Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator >>> evaluator = CfEvaluator()Thursday, January 26, 2012
    • Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator >>> evaluator = CfEvaluator() >>> evaluator.evaluate(recommender=recsys,metric=rmse)Thursday, January 26, 2012
    • Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator >>> evaluator = CfEvaluator() >>> evaluator.evaluate(recommender=recsys,metric=rmse) {rmse: 0.69467177857026907}Thursday, January 26, 2012
    • Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator >>> evaluator = CfEvaluator() >>> evaluator.evaluate(recommender=recsys,metric=rmse) {rmse: 0.69467177857026907} >>> evaluator.evaluate_on_split(recommender=recsys, at =2)Thursday, January 26, 2012
    • Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator >>> evaluator = CfEvaluator() >>> evaluator.evaluate(recommender=recsys,metric=rmse) {rmse: 0.69467177857026907} >>> evaluator.evaluate_on_split(recommender=recsys, at =2) ({error: [{mae: 0.345, nmae: 0.4567, rmse: 0.568}, {mae: 0.456, nmae: 0.356778, rmse: 0.6788}, {mae: 0.456, nmae: 0.356778, rmse: 0.6788}], ir: [{f1score: 0.456, precision: 0.78557, recall:0.55677}, {f1score: 0.64567, precision: 0.67865, recall: 0.785955}, {f1score: 0.45070, precision: 0.74744, recall: 0.858585}]}, {final_score: {avg: {f1score: 0.495955, mae: 0.429292, nmae: 0.373739, precision: 0.63932929, recall: 0.729939393, rmse: 0.3466868}, stdev: {f1score: 0.09938383 , mae: 0.0593933, nmae: 0.03393939, precision: 0.0192929, recall: 0.031293939, rmse: 0.234949494}}})Thursday, January 26, 2012
    • Distributing the recommendation computations Use Hadoop and Map-Reduce intensively Investigating the Yelp mrjob framework https://github.com/pfig/mrjob Develop the Netflix and novel standard-of-the-art used Matrix Factorization, Singular Value Decomposition (SVD), Boltzman machines The most commonly used is Slope One technique. Simple algebra math with slope one algebra y = a*x+bThursday, January 26, 2012
    • Cache/Paralelism with joblib http://packages.python.org/joblib/index.html from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model]Thursday, January 26, 2012
    • Cache/Paralelism with joblib http://packages.python.org/joblib/index.html from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cacheThursday, January 26, 2012
    • Cache/Paralelism with joblib http://packages.python.org/joblib/index.html from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache >>># With memory.cacheThursday, January 26, 2012
    • Cache/Paralelism with joblib http://packages.python.org/joblib/index.html from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache >>># With memory.cache >>> timeit similarity.get_similarities (‘marcel_caraciolo’)Thursday, January 26, 2012
    • Cache/Paralelism with joblib http://packages.python.org/joblib/index.html from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache >>># With memory.cache >>> timeit similarity.get_similarities >>> timeit similarity.get_similarities (‘marcel_caraciolo’) (‘marcel_caraciolo’)Thursday, January 26, 2012
    • Cache/Paralelism with joblib http://packages.python.org/joblib/index.html from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache >>># With memory.cache >>> timeit similarity.get_similarities >>> timeit similarity.get_similarities (‘marcel_caraciolo’) (‘marcel_caraciolo’) 100 loops, best of 3: 978 ms per loopThursday, January 26, 2012
    • Cache/Paralelism with joblib http://packages.python.org/joblib/index.html from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache >>># With memory.cache >>> timeit similarity.get_similarities >>> timeit similarity.get_similarities (‘marcel_caraciolo’) (‘marcel_caraciolo’) 100 loops, best of 3: 978 ms per loop 100 loops, best of 3: 434 ms per loopThursday, January 26, 2012
    • Distributed Computing with mrJob https://github.com/Yelp/mrjobThursday, January 26, 2012
    • Distributed Computing with mrJob https://github.com/Yelp/mrjob It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or local (for testing)Thursday, January 26, 2012
    • Distributed Computing with mrJob https://github.com/Yelp/mrjob It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or local (for testing)Thursday, January 26, 2012
    • Distributed Computing with mrJob https://github.com/Yelp/mrjob """The classic MapReduce job: count the frequency of words. """ from mrjob.job import MRJob import re WORD_RE = re.compile(r"[w]+") class MRWordFreqCount(MRJob):     def mapper(self, _, line):         for word in WORD_RE.findall(line):             yield (word.lower(), 1)     def reducer(self, word, counts):         yield (word, sum(counts)) if __name__ == __main__:     MRWordFreqCount.run() It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or local (for testing)Thursday, January 26, 2012
    • Distributed Computing with mrJob https://github.com/Yelp/mrjob Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduceThursday, January 26, 2012
    • Distributed Computing with mrJob https://github.com/Yelp/mrjob Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduceThursday, January 26, 2012
    • Future studies with Sparse Matrices Real datasets come with lots of empty values http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html Solutions: scipy.sparse package Sharding operations Matrix Factorization techniques (SVD) Apontador Reviews DatasetThursday, January 26, 2012
    • Future studies with Sparse Matrices Real datasets come with lots of empty values http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html Solutions: scipy.sparse package Sharding operations Matrix Factorization techniques (SVD) Crab implements a Matrix Factorization with Expectation Maximization algorithm Apontador Reviews DatasetThursday, January 26, 2012
    • Future studies with Sparse Matrices Real datasets come with lots of empty values http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html Solutions: scipy.sparse package Sharding operations Matrix Factorization techniques (SVD) Crab implements a Matrix Factorization with Expectation Maximization algorithm scikits.crab.svd package Apontador Reviews DatasetThursday, January 26, 2012
    • Benchmarks Pure Python w/ Python w/ Scipy Dataset dicts and Numpy MovieLens 100k 15.32 s 9.56 s http://www.grouplens.org/node/73 Old Crab New CrabThursday, January 26, 2012
    • Benchmarks Pure Python w/ Python w/ Scipy Dataset dicts and Numpy MovieLens 100k 15.32 s 9.56 s http://www.grouplens.org/node/73 Old Crab New Crab Time ellapsed ( Recommend 5 items) 0 4 8 12 16Thursday, January 26, 2012
    • Benchmarks Pure Python w/ Python w/ Scipy Dataset dicts and Numpy MovieLens 100k 15.32 s 9.56 s http://www.grouplens.org/node/73 Old Crab New Crab Time ellapsed ( Recommend 5 items) 0 4 8 12 16Thursday, January 26, 2012
    • Benchmarks Pure Python w/ Python w/ Scipy Dataset dicts and Numpy MovieLens 100k 15.32 s 9.56 s http://www.grouplens.org/node/73 Old Crab New Crab Time ellapsed ( Recommend 5 items) 0 4 8 12 16Thursday, January 26, 2012
    • Why migrate ? Old Crab running only using Pure Python Recommendations demand heavy maths calculations and lots of processing Compatible with Numpy and Scipy libraries High Standard and popular scientific libraries optimized for scientific calculations in Python Scikits projects are amazing! Active Communities, Scientific Conferences and updated projects (e.g. scikit-learn) Turn the Crab framework visible for the community Join the scientific researchers and machine learning developers around the Globe coding with Python to help us in this project Be Fast and FuriousThursday, January 26, 2012
    • How are we working ? Sprints, Online Discussions and Issues https://github.com/muricoca/crab/wiki/UpcomingEventsThursday, January 26, 2012
    • How are we working ? Our Project’s Home Page http://muricoca.github.com/crabThursday, January 26, 2012
    • Future Releases Planned Release 0.1 Collaborative Filtering Algorithms working, sample datasets to load and test Planned Release 0.11 Evaluation of Recommendation Algorithms and Database Models support Planned Release 0.12 Recommendation as Services with REST APIs ....Thursday, January 26, 2012
    • Join us! 1. Read our Wiki Page https://github.com/muricoca/crab/wiki/Developer-Resources 2. Check out our current sprints and open issues https://github.com/muricoca/crab/issues 3. Forks, Pull Requests mandatory 4. Join us at irc.freenode.net #muricoca or at our discussion list in work :(Thursday, January 26, 2012
    • Thursday, January 26, 2012
    • Recomendação  em  redes  sociais !"#$%*+,-)% ./0#$-+1/% this engine with the popular brazilian social network AtéPassar Integrated More than 70.000 students registered studying for the public examinations Recommend StudyGroups, Friends,Video Classes, Questions and Concursos More than 70.000 items available for recommend % % !"--(0".(12%&()%*&+,-$%.,#/& % Written in Python using a open-source framework Crab !"#"$%&&%()*&+,-(.&/,-0&+,-(.& %12%&303#2,&(",&2,"&34& % Framework available for building recommender systems (My contribution) It is running since January 2011 In March B#0-%<#+CC#/3#$% was performed. 2011 , questionnaire %% %&-$-C#0#$"%% Liked Not Liked -1/"% 23%mender Components Interac- 77% Figure 3: AtePassar Recommender Syste face hat students do not meet phys- Thursday, January 26, 2012
    • colecione descontos WWW. FAVORITOZ. COMThursday, January 26, 2012
    • Thursday, January 26, 2012
    • Recomendações Sociais 1. Usuário se loga via Facebook 2. Usuário acessa a e-commerce parceira da LikeStore. 3. Usuário já recebe recomendações personalizadas na entrada. 4. Usuário recebe recomendações no carrinho de compras 5. Usuário recebe recomendações na página do produto. Produtos Similares Quem comprou este também comprou Amigos que curtiram/ compraram istoThursday, January 26, 2012
    • Construção  do  Social  Genoma  Thursday, January 26, 2012
    • Alguém  duvida  ainda  ? http://www.shopycat.com/Thursday, January 26, 2012
    • DicasThursday, January 26, 2012
    • Join us! 1. Read our Wiki Page https://github.com/muricoca/crab/wiki/Developer-Resources 2. Check out our current sprints and open issues https://github.com/muricoca/crab/issues 3. Forks, Pull Requests mandatory 4. Join us at irc.freenode.net #muricoca or at our discussion list in scikit-crab@googlegroups.comThursday, January 26, 2012
    • Dicas para Arquitetura de RecomendaçãoThursday, January 26, 2012
    • Dicas para Arquitetura de RecomendaçãoThursday, January 26, 2012
    • Dicas para Arquitetura de RecomendaçãoThursday, January 26, 2012
    • Dicas para Arquitetura de RecomendaçãoThursday, January 26, 2012
    • Items Recomendados Toby Segaran, Programming Collective SatnamAlag, Collective Intelligence in Intelligence, OReilly, 2007 Action, Manning Publications, 2009 Sites como TechCrunch e ReadWriteWebThursday, January 26, 2012
    • Conferências Recomendadas - ACM RecSys. –ICWSM: Weblogand Social Media –WebKDD: Web Knowledge Discovery and Data Mining –WWW: The original WWW conference –SIGIR: Information Retrieval –ACM KDD: Knowledge Discovery and Data Mining –ICML: Machine LearningThursday, January 26, 2012
    • Onde você estará em tudo isso ? Fonte: Hunch.com Obrigado !! HUNCH Vendida ao Ebay por $80MThursday, January 26, 2012
    • Sistemas de Recomendação Marcel Pinheiro Caraciolo marcel@orygens.com @marcelcaraciolo http://www.orygens.comThursday, January 26, 2012
    • Optimizations with Cython http://cython.org/ Cython is a Python extension that lets developers annotate functions so they can be compiled to C. http://aimotion.blogspot.com/2011/09/high-performance-computation-with_17.htmlThursday, January 26, 2012
    • Optimizations with Cython http://cython.org/ Cython is a Python extension that lets developers annotate functions so they can be compiled to C. # setup.py from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext # for notes on compiler flags see: # http://docs.python.org/install/index.html setup( cmdclass = {build_ext: build_ext}, ext_modules = [Extension("spearman_correlation_cython", ["spearman_correlation_cython.pyx"])] ) http://aimotion.blogspot.com/2011/09/high-performance-computation-with_17.htmlThursday, January 26, 2012
    • Optimizations with Cython http://cython.org/ Cython is a Python extension that lets developers annotate functions so they can be compiled to C. # setup.py from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext # for notes on compiler flags see: # http://docs.python.org/install/index.html setup( cmdclass = {build_ext: build_ext}, ext_modules = [Extension("spearman_correlation_cython", ["spearman_correlation_cython.pyx"])] ) http://aimotion.blogspot.com/2011/09/high-performance-computation-with_17.htmlThursday, January 26, 2012
    • Cache/Paralelism with joblib http://packages.python.org/joblib/index.html Investigate how to use multiprocessing and parallel packages with similarities computation from joblib import Parallel ... def get_similarities(self, source_id):         return Parallel(n_jobs=3) ((other_id, delayed(self.get_similarity) (source_id, other_id)) for other_id, v in self.model)Thursday, January 26, 2012