Sistemas de          Recomendação        Marcel Pinheiro Caraciolo           marcel@orygens.com              @marcelcaraci...
Quem é Marcel ? Marcel Pinheiro Caraciolo - @marcelcaraciolo              Sergipano, porém Recifense.              Mestre ...
WEBThursday, January 26, 2012
WEBThursday, January 26, 2012
1.0                     2.0                     Fonte de Informação   Fluxo Contínuo de Informação                        ...
WEB SITES      WEB APPLICATIONS       WEB SERVICES                             3.0          SEMANTIC WEB                  ...
Usar informação coletiva de                 forma efetiva afim de               aprimorar uma aplicaçãoThursday, January 2...
Intelligence from                                                 Mining Data                                             ...
aggregation information: lists                                                               ratings              user-gen...
WEB SITES       WEB APPLICATIONS        WEB SERVICES                              3.0            SEMANTIC WEB             ...
AtualmenteThursday, January 26, 2012
estamos sobrecarregados      de informaçõesThursday, January 26, 2012
muitas vezes inúteis Thursday, January 26, 2012Friday, October 1, 2010
às vezes    procuramos       isso...Friday, October 1, 2010 2012 Thursday, January 26,
e encontramos isso!Friday, October 1, 2010 2012 Thursday, January 26,
google?Friday, October 1, 2010 2012 Thursday, January 26,
google?     midias sociais?Friday, October 1, 20102012Thursday, January 26,
eeeeuuuu...                  google?    midias sociais?riday, October 1, 2010 2012  Thursday, January 26,
Sistemas de RecomendaçãoThursday, January 26, 2012
“A lot of times, people don’t know what                     they want until you show it to them.”                         ...
Recomendações Sociais                                                                  Família/Amigos                     ...
Recomendações por Interação                                        Entrada: Avalie alguns livros                         O...
Sistemas desenhados para sugerir algo para mim do meu                                   interesse!Thursday, January 26, 2012
Por que Recomendação ?Thursday, January 26, 2012
Netflix              - 2/3 dos filmes alugados vêm de recomendação      Google News               - 38% das notícias mais cl...
!"#$%"#&"%(&$)")                       Nós+,&-.$/).#&0#/"1.#$%234(".#                        * estamos sobrecarregados de ...
O que pode ser recomendado ?                           Contatos em Redes Sociais     Artigos           Produtos      Messa...
E como funciona a                              recomendação ?Thursday, January 26, 2012
O que os sistemas de recomendação                        realmente fazem ?                  1. Prediz o quanto você pode g...
Filtragem baseada por Conteúdo                                      Similar           Duro de                    O Vento  ...
Problemas com filtragem por                                      conteúdo                    1. Análise dos dados Restrita...
Filtragem Colaborativa                                      O Vento                         Toy              Thor         ...
Problemas com filtragem colaborativa                             1. Escabilidade                                - Amazon c...
Filtragem Híbrida                             Combinação de múltiplos métodos                 Duro de           O Vento   ...
Como eles são                                apresentados ?                  Destaques                   Mais sobre este a...
Como eles são avaliados ?        Como sabemos se a recomendação é boa ?        Geralmente se divide-se em treinamento/test...
Mobile RecommendersThursday, January 26, 2012
Por que mobile ?                 Mais de 1 bilhão de Aparelhos                Mais de 5 bilhões de apps baixadas          ...
Sistemas de Recomendação Móvel                Deve-se levar em conta informações temporais e espaciais                    ...
a strong heterogeneity. At            case study is carried out in Section 5. Finaly, thesers location is constantly      ...
Informações Disponíveis                              Localização, Tags, ContextoThursday, January 26, 2012
Informações Disponíveis                                                 Avaliação                                         ...
Um dos mais populares                             sistemas de localização móvel                             Checkins, diga...
Assistente Virtual Móvel Conversacional              Já se utiliza de informações das redes Sociais             Recomendaç...
Google HotPot                     Repositório de Reviews                Recomendação de LugaresThursday, January 26, 2012
Minhas contribuiçõesThursday, January 26, 2012
Meu trabalho de Mestrado                                Offering Products and Services Using Product                      ...
source, the recommendation architecture that we propose will                     would rely more on collaborative-filtering...
Text Mining A Lot!                     Sentiment Analysis for Extracting the Polarity                     Meta-Recommender...
Crab                                A Python Framework for Building                                    Recommendation Engi...
What is Crab ?        A python framework for building recommendation engines     A Scikit module for collaborative, conten...
The current CrabThursday, January 26, 2012
The current Crab            >>>#load the datasetThursday, January 26, 2012
The current Crab            >>>#load the dataset             >>> from crab.datasets import load_sample_moviesThursday, Jan...
The current Crab            >>>#load the dataset             >>> from crab.datasets import load_sample_movies            >...
The current Crab            >>>#load the dataset             >>> from crab.datasets import load_sample_movies            >...
The current Crab            >>>#load the dataset             >>> from crab.datasets import load_sample_movies            >...
The current CrabThursday, January 26, 2012
The current Crab      >>> from crab.models import MatrixPreferenceDataModelThursday, January 26, 2012
The current Crab      >>> from crab.models import MatrixPreferenceDataModel    >>> m = MatrixPreferenceDataModel(data.data...
The current Crab      >>> from crab.models import MatrixPreferenceDataModel    >>> m = MatrixPreferenceDataModel(data.data...
The current CrabThursday, January 26, 2012
The current Crab              >>> #import pairwise distanceThursday, January 26, 2012
The current Crab              >>> #import pairwise distance              >>> from crab.metrics.pairwise import            ...
The current Crab              >>> #import pairwise distance              >>> from crab.metrics.pairwise import            ...
The current Crab              >>> #import pairwise distance              >>> from crab.metrics.pairwise import            ...
The current Crab              >>> #import pairwise distance              >>> from crab.metrics.pairwise import            ...
The current Crab              >>> #import pairwise distance              >>> from crab.metrics.pairwise import            ...
The current Crab              >>> #import pairwise distance              >>> from crab.metrics.pairwise import            ...
The current Crab              >>> #import pairwise distance              >>> from crab.metrics.pairwise import            ...
The current CrabThursday, January 26, 2012
The current Crab              >>> from crab.recommenders.knn import UserBasedRecommenderThursday, January 26, 2012
The current Crab              >>> from crab.recommenders.knn import UserBasedRecommender              >>> recsys = UserBas...
The current Crab              >>> from crab.recommenders.knn import UserBasedRecommender              >>> recsys = UserBas...
The current Crab              >>> from crab.recommenders.knn import UserBasedRecommender              >>> recsys = UserBas...
The current Crab              >>> from crab.recommenders.knn import UserBasedRecommender              >>> recsys = UserBas...
The current Crab                   Collaborative Filtering algorithms                       User-Based, Item-Based and Slo...
Evaluating your recommenderThursday, January 26, 2012
Evaluating your recommender           >>> from crab.metrics.classes import CfEvaluatorThursday, January 26, 2012
Evaluating your recommender           >>> from crab.metrics.classes import CfEvaluator          >>> evaluator = CfEvaluato...
Evaluating your recommender           >>> from crab.metrics.classes import CfEvaluator          >>> evaluator = CfEvaluato...
Evaluating your recommender           >>> from crab.metrics.classes import CfEvaluator          >>> evaluator = CfEvaluato...
Evaluating your recommender           >>> from crab.metrics.classes import CfEvaluator          >>> evaluator = CfEvaluato...
Evaluating your recommender           >>> from crab.metrics.classes import CfEvaluator          >>> evaluator = CfEvaluato...
Distributing the recommendation computations       Use Hadoop and Map-Reduce intensively                Investigating the ...
Cache/Paralelism with joblib                                      http://packages.python.org/joblib/index.html          fr...
Cache/Paralelism with joblib                                      http://packages.python.org/joblib/index.html          fr...
Cache/Paralelism with joblib                                      http://packages.python.org/joblib/index.html          fr...
Cache/Paralelism with joblib                                      http://packages.python.org/joblib/index.html          fr...
Cache/Paralelism with joblib                                      http://packages.python.org/joblib/index.html          fr...
Cache/Paralelism with joblib                                      http://packages.python.org/joblib/index.html          fr...
Cache/Paralelism with joblib                                      http://packages.python.org/joblib/index.html          fr...
Distributed Computing with mrJob                                  https://github.com/Yelp/mrjobThursday, January 26, 2012
Distributed Computing with mrJob                                       https://github.com/Yelp/mrjob             It suppor...
Distributed Computing with mrJob                                       https://github.com/Yelp/mrjob             It suppor...
Distributed Computing with mrJob                                       https://github.com/Yelp/mrjob                      ...
Distributed Computing with mrJob                                            https://github.com/Yelp/mrjob   Elsayed et al:...
Distributed Computing with mrJob                                            https://github.com/Yelp/mrjob   Elsayed et al:...
Future studies with Sparse Matrices               Real datasets come with lots of empty values                 http://aimo...
Future studies with Sparse Matrices               Real datasets come with lots of empty values                 http://aimo...
Future studies with Sparse Matrices               Real datasets come with lots of empty values                 http://aimo...
Benchmarks                                                   Pure Python w/   Python w/ Scipy                       Datase...
Benchmarks                                                   Pure Python w/       Python w/ Scipy                       Da...
Benchmarks                                                   Pure Python w/       Python w/ Scipy                       Da...
Benchmarks                                                   Pure Python w/       Python w/ Scipy                       Da...
Why migrate ?         Old Crab running only using Pure Python                 Recommendations demand heavy maths calculati...
How are we working ?                               Sprints, Online Discussions and Issues                 https://github.c...
How are we working ?                                   Our Project’s Home Page                             http://muricoca...
Future Releases                             Planned Release 0.1                      Collaborative Filtering Algorithms wo...
Join us!                       1. Read our Wiki Page                             https://github.com/muricoca/crab/wiki/Dev...
Thursday, January 26, 2012
Recomendação	  em	  redes	  sociais   !"#$%*+,-)%   ./0#$-+1/% this engine with the popular brazilian social network AtéPa...
colecione descontos                             WWW.                             FAVORITOZ.                             CO...
Thursday, January 26, 2012
Recomendações Sociais   1. Usuário se loga via Facebook   2. Usuário acessa a e-commerce parceira da LikeStore.   3. Usuár...
Construção	  do	  Social	  Genoma	  Thursday, January 26, 2012
Alguém	  duvida	  ainda	  ?                              http://www.shopycat.com/Thursday, January 26, 2012
DicasThursday, January 26, 2012
Join us!                       1. Read our Wiki Page                             https://github.com/muricoca/crab/wiki/Dev...
Dicas para Arquitetura de RecomendaçãoThursday, January 26, 2012
Dicas para Arquitetura de RecomendaçãoThursday, January 26, 2012
Dicas para Arquitetura de RecomendaçãoThursday, January 26, 2012
Dicas para Arquitetura de RecomendaçãoThursday, January 26, 2012
Items Recomendados       Toby Segaran, Programming Collective   SatnamAlag, Collective Intelligence in       Intelligence,...
Conferências Recomendadas        - ACM RecSys.        –ICWSM: Weblogand Social Media        –WebKDD: Web Knowledge Discove...
Onde você estará em tudo                           isso ?                                                      Fonte: Hunc...
Sistemas de          Recomendação        Marcel Pinheiro Caraciolo           marcel@orygens.com              @marcelcaraci...
Optimizations with Cython                                                 http://cython.org/     Cython is a Python extens...
Optimizations with Cython                                                      http://cython.org/     Cython is a Python e...
Optimizations with Cython                                                      http://cython.org/     Cython is a Python e...
Cache/Paralelism with joblib                                   http://packages.python.org/joblib/index.html            Inv...
Upcoming SlideShare
Loading in...5
×

Sistemas de Recomendação: Como funciona e Onde Se aplica?

3,516

Published on

Palestra ministrada no Recife Summer School 2012 no Porto Digital, Recife, Pernambuco

Published in: Technology, Business

Sistemas de Recomendação: Como funciona e Onde Se aplica?

  1. 1. Sistemas de Recomendação Marcel Pinheiro Caraciolo marcel@orygens.com @marcelcaraciolo http://www.orygens.comThursday, January 26, 2012
  2. 2. Quem é Marcel ? Marcel Pinheiro Caraciolo - @marcelcaraciolo Sergipano, porém Recifense. Mestre em Ciência da Computação no CIN/UFPE na área de mineração de dados Diretor de Pesquisa e Desenvolvimento na Orygens Membro e Moderador da Celúla de Usuários Python de Pernambuco (PUG-PE) Minhas áreas de interesse: Computação móvel e Computação inteligente Meus blogs: http://www.mobideia.com (sobre Mobilidade desde 2006) http://aimotion.blogspot.com (sobre I.A. desde 2009) Jovem Aprendiz ainda nas artes pythonicas.... (desde 2007)Thursday, January 26, 2012
  3. 3. WEBThursday, January 26, 2012
  4. 4. WEBThursday, January 26, 2012
  5. 5. 1.0 2.0 Fonte de Informação Fluxo Contínuo de Informação VI Encontro do PUG-PE VI Encontro do PUG-PEThursday, January 26, 2012
  6. 6. WEB SITES WEB APPLICATIONS WEB SERVICES 3.0 SEMANTIC WEB USERS VI Encontro do PUG-PE VI Encontro do PUG-PEThursday, January 26, 2012
  7. 7. Usar informação coletiva de forma efetiva afim de aprimorar uma aplicaçãoThursday, January 26, 2012
  8. 8. Intelligence from Mining Data User User User User User Um usuário influencia outros por resenhas, notas, recomendações e blogs Um usuário é influenciado por outros por resenhas, notas, recomendações e blogsThursday, January 26, 2012
  9. 9. aggregation information: lists ratings user-generated content reviews blogs recommendations wikis Collective Intelligence voting Your application bookmarking Search tag cloud tagging saving Natural Language Processing Clustering and Harness external content predictive modelsThursday, January 26, 2012
  10. 10. WEB SITES WEB APPLICATIONS WEB SERVICES 3.0 SEMANTIC WEB USERS antes... VI Encontro do PUG-PE VI Encontro do PUG-PEFriday, October 1, 20102012 Thursday, January 26,
  11. 11. AtualmenteThursday, January 26, 2012
  12. 12. estamos sobrecarregados de informaçõesThursday, January 26, 2012
  13. 13. muitas vezes inúteis Thursday, January 26, 2012Friday, October 1, 2010
  14. 14. às vezes procuramos isso...Friday, October 1, 2010 2012 Thursday, January 26,
  15. 15. e encontramos isso!Friday, October 1, 2010 2012 Thursday, January 26,
  16. 16. google?Friday, October 1, 2010 2012 Thursday, January 26,
  17. 17. google? midias sociais?Friday, October 1, 20102012Thursday, January 26,
  18. 18. eeeeuuuu... google? midias sociais?riday, October 1, 2010 2012 Thursday, January 26,
  19. 19. Sistemas de RecomendaçãoThursday, January 26, 2012
  20. 20. “A lot of times, people don’t know what they want until you show it to them.” Steve Jobs “We are leaving the Information age, and entering into the Recommendation age.” Chris Anderson, from book Long TailThursday, January 26, 2012
  21. 21. Recomendações Sociais Família/Amigos Amigos/ Família O Que eu deveria ler ? Ref: Flickr-BlueAlgae “Eu acho que você deveria ler Ref: Flickr photostream: jefield estes livros.Thursday, January 26, 2012
  22. 22. Recomendações por Interação Entrada: Avalie alguns livros O Que eu deveria ler ? Saída: “Livros que você pode gostar são …”Thursday, January 26, 2012
  23. 23. Sistemas desenhados para sugerir algo para mim do meu interesse!Thursday, January 26, 2012
  24. 24. Por que Recomendação ?Thursday, January 26, 2012
  25. 25. Netflix - 2/3 dos filmes alugados vêm de recomendação Google News - 38% das notícias mais clicadas vêm de recomendação Amazon - 38% das vendas vêm de recomendação Fonte: Celma & Lamere, ISMIR 2007Thursday, January 26, 2012
  26. 26. !"#$%"#&"%(&$)") Nós+,&-.$/).#&0#/"1.#$%234(".# * estamos sobrecarregados de informação $/)#5(&6 7&.2.#"$4,#)$8 * 93((3&/.#&0#:&3".;#5&&<.# $/)#:-.34#2%$4<.#&/(3/" Milhares de artigos e posts * =/#>$/&3;#?#@A#+B#4,$//"(.;# novos todos os dias 2,&-.$/).#&0#7%&6%$:.# "$4,#)$8 * =/#C"1#D&%<;#.""%$(# Milhões de Músicas, Filmes e 2,&-.$/).#&0#$)#:"..$6".# Livros ."/2#2&#-.#7"%#)$8 Milhares de Ofertas e PromoçõesThursday, January 26, 2012
  27. 27. O que pode ser recomendado ? Contatos em Redes Sociais Artigos Produtos Messagens de Propaganda Cursos e-learning Livros Tags Músicas Futuras namoradas Roupas Filmes Restaurantes Programas de Tv Vídeos Papers Opções de Investimento Profissionais Módulos de códigoThursday, January 26, 2012
  28. 28. E como funciona a recomendação ?Thursday, January 26, 2012
  29. 29. O que os sistemas de recomendação realmente fazem ? 1. Prediz o quanto você pode gostar de um certo produto ou serviço 2. Sugere um lista de N items ordenada de acordo com seu interese 3. Sugere uma lista de N usuários ordernada para um produto/serviço 4. Explica a você o porque esses items foram recomendados 5. Ajusta a predição e a recomendação baseado em seu feedback e de outros.Thursday, January 26, 2012
  30. 30. Filtragem baseada por Conteúdo Similar Duro de O Vento Toy Armagedon Items Matar Levou Store recomenda gosta Marcel UsuáriosThursday, January 26, 2012
  31. 31. Problemas com filtragem por conteúdo 1. Análise dos dados Restrita - Items e usuários pouco detalhados. Pior em áudio ou imagens 2. Dados Especializados - Uma pessoa que não tem experiência com Sushi não recebe o melhor restaurante de Sushi da cidade 3. Efeito Portfólio - Só porque eu vi 1 filme da Xuxa quando criança, tem que me recomendar todos delaThursday, January 26, 2012
  32. 32. Filtragem Colaborativa O Vento Toy Thor Armagedon Items Levou Store gosta recomenda Marcel Rafael Amanda Usuários SimilarThursday, January 26, 2012
  33. 33. Problemas com filtragem colaborativa 1. Escabilidade - Amazon com 5M usuários, 50K items, 1.4B avaliações 2. Dados esparsos - Novos usuários e items que não tem histórico 3. Partida Fria - Só avaliei apenas um único livro no Amazon! 4. Popularidade - Todo mundo lê ‘Harry Potter’ 5. Hacking - A pessoa que lê ‘Harry Potter’ lê Kama SutraThursday, January 26, 2012
  34. 34. Filtragem Híbrida Combinação de múltiplos métodos Duro de O Vento Toy Armagedon Items Matar Levou Store Ontologias Dados Símbolicos Marcel Rafael Luciana UsuáriosThursday, January 26, 2012
  35. 35. Como eles são apresentados ? Destaques Mais sobre este artista... Alguem similar a você também gostou disso O mais popular em seu grupo... Já que você escutou esta, você pode querer esta... Lançamentos Escute músicas de artistas similares. Estes dois item vêm juntos..Thursday, January 26, 2012
  36. 36. Como eles são avaliados ? Como sabemos se a recomendação é boa ? Geralmente se divide-se em treinamento/teste (80/20) Críterios utilizados: - Erro de Predição: RMSE - Curva ROC*, rank-utility, F-Measure *http://code.google.com/p/pyplotmining/Thursday, January 26, 2012
  37. 37. Mobile RecommendersThursday, January 26, 2012
  38. 38. Por que mobile ? Mais de 1 bilhão de Aparelhos Mais de 5 bilhões de apps baixadas Destaque no segmento mobile http://foursquare.com http://vimeo.com/29323612Thursday, January 26, 2012
  39. 39. Sistemas de Recomendação Móvel Deve-se levar em conta informações temporais e espaciais Como definir que contexto ele está inserido ? E as avaliações como ser capturadas em uma tela limitada?Thursday, January 26, 2012
  40. 40. a strong heterogeneity. At case study is carried out in Section 5. Finaly, thesers location is constantly conclusion of this paper and future workata-processing capability in overview are discussed in Section 6. WSEAS TRANSACTIONS on COMPUTERS services on the systemht new challenges [4-6]. type of location-based approach, users want to be e real-time and targeted 2 System Workflow and Architecture Arquitetura Figure 1 gives the workflow of our system. repackage the heterogeneous data and service, and republic them as web service. The service com new code to not just the indexed Users can send their inquiries demand by successful design of this module is the key After an simply on a static operating in the mobile phone. And the client problem for realization of cross-platform new appl mechanism tly, the rise of a large .0 applications (blog, Recomendações processadas via Mobile (Inviável Hoje) will get the current location information and sent it together with users’ inqueries demand to the service and data sharing. The functional layer has three components as Multi-Mode Location Information Index, service m large-scal Web Albums, Blog and server. Server-side application will analyze the Thus it ca tes that users have the very relevant data and provide matched restaurant Context-based Collaborative Filtering changing of direct, rapid, useful and recommendation and navigation. Algorithm, and Location-based Personalized So in th tion recommendation and - Tudo é processado em Back-End (Servidor) Application data information of our system e enviado ao celular via Web Recommendation and Navigation. We will and Serv ]. can be divided into two parts: the location-based discuss every function component in details as Middlewa n can be user-friendly data (such as traffic and road condition data, follows. Architectu GPS map, and entity information, etc.) and the two techn ient mobile terminals, It Value-added Services integration a very important research value-added data provided by users (such as combinati in Web 2.0 very wide market prospect. Ratings, Comments, Blog and Tags, etc.). User Tagging !!Despite th Value-added DBsigns and realizes a Comments Tags Information Publish platforms h User mobile restaurant Ratings …..…. Recommendation informationnavigation system. In order Restaurant Query ……... Ping” websside response speed for facilities ra propose a memory pool Location-based DB website, wh Client However, it Accept command, no-data GPS-info E-Map Entity-info ……... Mobile Information Pushing Platform static guidin terrupt mechanism, which Prescribed Location-based Info. mobile loca Context-based Location-based ize the server-side control Users‘ Collaborative Filtering inconvenien personalized ient side, we combine the Matched Entity Collaborative recommendation and with the visi lication data with the & Route Info. Recommendation & Multi-Mode Location Navigation In order Entity Feature Info. scenario as nd propose a collaborative Information Indexmmend mechanisms, which and propose Server h real-time location-based Let us Personalized Location-based Data and Service Middleware example.ecommend personalized Location-based Value-added DB location a Restaurant Comments Tags Recommendation & from its c ually provide personalized Ratings …..…. through th Navigation Services ndation to build their own Clien informatio Location-basedh can help them to consider Services informatiomunity users!collaborative Fig.1. System Workflow Location-based GPS Navigation current lo DB informatio Location-based info Traffic-info Booking the targe E-Map Entity-query informatio matching 810 Issue 5, Volume 6, May 2009 informatio Fig 1. Architecture of the Mobile Information Thursday, January 26, 2012 Accordi
  41. 41. Informações Disponíveis Localização, Tags, ContextoThursday, January 26, 2012
  42. 42. Informações Disponíveis Avaliação ImplícitaThursday, January 26, 2012
  43. 43. Um dos mais populares sistemas de localização móvel Checkins, diga aonde você está! Recomendações de lugaresThursday, January 26, 2012
  44. 44. Assistente Virtual Móvel Conversacional Já se utiliza de informações das redes Sociais Recomendação de RestaurantesThursday, January 26, 2012
  45. 45. Google HotPot Repositório de Reviews Recomendação de LugaresThursday, January 26, 2012
  46. 46. Minhas contribuiçõesThursday, January 26, 2012
  47. 47. Meu trabalho de Mestrado Offering Products and Services Using Product Reviews from Social Networks in Mobile Decision Aid Systems Marcel Caraciolo∗ and Germano Vasconcelos† Informatics Center Federal University Of Pernambuco WebSite: http://www.cin.ufpe.br/ Email: ∗ mpc@cin.ufpe.br † gcv@cin.ufpe.br Abstract—Recommendation engines provide information fil- extremely used by users to give a more nuanced view about tering functions and decision aids that have a great potential a product in order to make an informed decision [5]. application the mobile context. An aspect that hasn’t been Nonetheless, providing users with relevant recommenda- extensively exploited yet in the current recommendations is the improvement in the explanation of the recommendation. tion information it is a difficult task. Besides the technical For instance, exploiting the service and product description components such as the user model representation and infor- and the opinion of users about the recommended products, mation filtering techniques to generate the recommendations, where associated would bring a better explanation for the user. the information must be user-friendly visualized. This is a In this paper we will present the foundations for a mobile requirement specially to support the user in the purchase product/service recommender system which incorporate bothThursday, January 26, 2012 structured (supplier driven) product descriptions and subject decision process, and to convince him about the utility of the
  48. 48. source, the recommendation architecture that we propose will would rely more on collaborative-filtering techniques, that is, aggregate the results of such filtering techniques. Bezerra and Carvalho proposed approaches where the results the reviews from similar users. We aim at integrating the previously mentioned hybrid prod- Figure 1 shows a overview of our meta recommender achieved showed to be very promising [19]. approach. By combining the content-based filtering and the uct recommendation approach in a mobile application so the users could benefit from useful and logical recommendations. collaborative-based one into a hybrid recommender system, it A. Moreover, we aim at providing a suited explanation for each would use the services/products III. S YSTEM catalogues repositories which D ESIGN How reviews from web services sources can be aggregated in the for recommendation to the user, since the current approaches just only deliver product recommendations with a overall score the services to be recommended, and the review repository Application data information our mobile recommender sys- that contains the user opinions about those services. All this datatembecan be from data source containers in the web product description can extracted divided into two parts: the rec mobile recommendation process? without pointing out the appropriateness of such recommen- dation [13]. Besides the basic information provided by the such(such location-based social network Foursquare its attributes) and the user as the as location, description and [17] as mo suppliers, the system will deliver the explanation, providing displayed at the Figure 2 and the location recommendation relevant reviews of similar users, we believe that it will engine from Google: Google HotPot [18]. by user (such as rating, comments, reviews or ratings provided wh increase the confidence in the buying decision process and the tags, etc.). The Figure 3 gives the system’s architecture and po product accepptance rate. In the mobile context this approach could help the users in this process and showing the user relative components. thi opinions could contribute to achieve this task. rec spe !"#$"%&$ 5&-$ !"#$%&%($) !".,"/#) acc !"*+#,$+-) !"*+#,$+-) +,-*.&$ !(#$()&*&%$ /01&234&$ !6#$6,00&41&7$ wh res !<#$<&2&&04&%A$B,431*,0A$&14C$ ves 0+44%6+%$,.")1%#"2) 0+($"($)1%#"2) 3,4$",(5) ou 3,4$",(5) )))67,8,#%)+,4%$91$%4)-1":)))) suc !"#$%&"()*+,#&-,.) /$%,0"12()*3$4%)3""5.) ))))1,;&,<4)<1&%%,)=2)4&:&8$1)) )))))))))))%$4%,5)94,14>?) <,7)41$ pro 8&=,%*1,>$ exp 8&4,99&0731*,0$:0;*0&$ !B#$B*%1$,2$D4,&7$<,7)41%$ !(#$()&*&%$ ma 8&?*&@$ we Fig. 2. User Reviews from Foursquare Social Network 8&=,%*1,>$ com 7"$%) !"8+99"(2")) !8#$830E&7$<,7)41%$ The content-based filtering approach will be used to filter ext the product/service repository, while the collaborative based 8&%).1%$ B. approach will derive the product review recommendations. In addition we will use text mining techniques to distinct the !"8+99"(2%$,+(#) polarity of the user review between positive or negative one. This information summarized would contribute in the product Architecture Fig. 3. Mobile Recommender System rat score recommendation computation. The final product recom- Fig. 1. Meta Recommender Architecture mendation score is computed by integrating the result of both me recommenders. By now, weproduct/service recommender, the user could In our mobile are considering to use different and Since one of the goals of this work is to incorporate options regarding this integration approach, one and get a list of recommen- different data sources of user opinions and descriptions, we filter some products or services at special oth is the symbolic data analysis approach (SDA) [19], which have addopted an meta recommendation architecture. By using eachtations. The user user ratings/reviews arehis preferences or give his product description and also can enter modeled ow a meta recommender architecture, the system would provide a personalized control over the generated recommendation list feedback to some offered product recommendation. as set of modal symbolic descriptions that summarizes the Re information provided by the corresponding data sources. It is formed by the combination of rich data [16]. The influenceThursday, January 26, 2012 specific data sources could be explicitly controlled by a novel Other functionalities are systems which,i n of the next ve best approach in hybrid recommender the retrieval the of the
  49. 49. Text Mining A Lot! Sentiment Analysis for Extracting the Polarity Meta-Recommender Engines Content-Based Filtering kNN - Nearest Neighbors Hybrid Meta Recommender Symbolic Data Analysis (SDA) Evaluation in Experimental DataSets Architectural Proposal for Mobile RecommenderThursday, January 26, 2012
  50. 50. Crab A Python Framework for Building Recommendation Engines Marcel Caraciolo Ricardo Caspirro Bruno Melo @marcelcaraciolo @ricardocaspirro @brunomeloThursday, January 26, 2012
  51. 51. What is Crab ? A python framework for building recommendation engines A Scikit module for collaborative, content and hybrid filtering Mahout Alternative for Python Developers :D Open-Source under the BSD license https://github.com/muricoca/crabThursday, January 26, 2012
  52. 52. The current CrabThursday, January 26, 2012
  53. 53. The current Crab >>>#load the datasetThursday, January 26, 2012
  54. 54. The current Crab >>>#load the dataset >>> from crab.datasets import load_sample_moviesThursday, January 26, 2012
  55. 55. The current Crab >>>#load the dataset >>> from crab.datasets import load_sample_movies >>> data = load_sample_movies()Thursday, January 26, 2012
  56. 56. The current Crab >>>#load the dataset >>> from crab.datasets import load_sample_movies >>> data = load_sample_movies() >>> dataThursday, January 26, 2012
  57. 57. The current Crab >>>#load the dataset >>> from crab.datasets import load_sample_movies >>> data = load_sample_movies() >>> data {DESCR: sample_movies data set was collected by the book called nProgramming the Collective Intelligence by Toby Segaran nnNotesn----- nThis data set consists ofnt* n ratings with (1-5) from n users to n movies.,  data: {1: {1: 3.0, 2: 4.0, 3: 3.5, 4: 5.0, 5: 3.0},   2: {1: 3.0, 2: 4.0, 3: 2.0, 4: 3.0, 5: 3.0, 6: 2.0},   3: {2: 3.5, 3: 2.5, 4: 4.0, 5: 4.5, 6: 3.0},   4: {1: 2.5, 2: 3.5, 3: 2.5, 4: 3.5, 5: 3.0, 6: 3.0},   5: {2: 4.5, 3: 1.0, 4: 4.0},   6: {1: 3.0, 2: 3.5, 3: 3.5, 4: 5.0, 5: 3.0, 6: 1.5},   7: {1: 2.5, 2: 3.0, 4: 3.5, 5: 4.0}},  item_ids: {1: Lady in the Water,   2: Snakes on a Planet,   3: You, Me and Dupree,   4: Superman Returns,   5: The Night Listener,   6: Just My Luck},  user_ids: {1: Jack Matthews,   2: Mick LaSalle,   3: Claudia Puig,   4: Lisa Rose,   5: Toby,   6: Gene Seymour,   7: Michael Phillips}}Thursday, January 26, 2012
  58. 58. The current CrabThursday, January 26, 2012
  59. 59. The current Crab >>> from crab.models import MatrixPreferenceDataModelThursday, January 26, 2012
  60. 60. The current Crab >>> from crab.models import MatrixPreferenceDataModel >>> m = MatrixPreferenceDataModel(data.data)Thursday, January 26, 2012
  61. 61. The current Crab >>> from crab.models import MatrixPreferenceDataModel >>> m = MatrixPreferenceDataModel(data.data) >>> print m MatrixPreferenceDataModel (7 by 6)          1 2 3 4 5 ... 1 3.000000 4.000000 3.500000 5.000000 3.000000 2 3.000000 4.000000 2.000000 3.000000 3.000000 3 --- 3.500000 2.500000 4.000000 4.500000 4 2.500000 3.500000 2.500000 3.500000 3.000000 5 --- 4.500000 1.000000 4.000000 --- 6 3.000000 3.500000 3.500000 5.000000 3.000000 7 2.500000 3.000000 --- 3.500000 4.000000Thursday, January 26, 2012
  62. 62. The current CrabThursday, January 26, 2012
  63. 63. The current Crab >>> #import pairwise distanceThursday, January 26, 2012
  64. 64. The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distancesThursday, January 26, 2012
  65. 65. The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarityThursday, January 26, 2012
  66. 66. The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity >>> from crab.similarities import UserSimilarityThursday, January 26, 2012
  67. 67. The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity >>> from crab.similarities import UserSimilarity >>> similarity = UserSimilarity(m, euclidean_distances)Thursday, January 26, 2012
  68. 68. The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity >>> from crab.similarities import UserSimilarity >>> similarity = UserSimilarity(m, euclidean_distances) >>> similarity[1]Thursday, January 26, 2012
  69. 69. The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity >>> from crab.similarities import UserSimilarity >>> similarity = UserSimilarity(m, euclidean_distances) >>> similarity[1] [(1, 1.0), (6, 0.66666666666666663), (4, 0.34054242658316669), (3, 0.32037724101704074), (7, 0.32037724101704074), (2, 0.2857142857142857), (5, 0.2674788903885893)]Thursday, January 26, 2012
  70. 70. The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity >>> from crab.similarities import UserSimilarity >>> similarity = UserSimilarity(m, euclidean_distances) >>> similarity[1] [(1, 1.0), (6, 0.66666666666666663), MatrixPreferenceDataModel (7 by 6)          1 2 3 4 5 (4, 0.34054242658316669), 1 3.000000 4.000000 3.500000 5.000000 3.000000 (3, 0.32037724101704074), 2 3.000000 4.000000 2.000000 3.000000 3.000000 3 --- 3.500000 2.500000 4.000000 4.500000 (7, 0.32037724101704074), 4 2.500000 3.500000 2.500000 3.500000 3.000000 5 --- 4.500000 1.000000 4.000000 --- (2, 0.2857142857142857), 6 3.000000 3.500000 3.500000 5.000000 3.000000 (5, 0.2674788903885893)] 7 2.500000 3.000000 --- 3.500000 4.000000Thursday, January 26, 2012
  71. 71. The current CrabThursday, January 26, 2012
  72. 72. The current Crab >>> from crab.recommenders.knn import UserBasedRecommenderThursday, January 26, 2012
  73. 73. The current Crab >>> from crab.recommenders.knn import UserBasedRecommender >>> recsys = UserBasedRecommender(model=m, similarity=similarity, capper=True,with_preference=True)Thursday, January 26, 2012
  74. 74. The current Crab >>> from crab.recommenders.knn import UserBasedRecommender >>> recsys = UserBasedRecommender(model=m, similarity=similarity, capper=True,with_preference=True) >>> recsys.recommend(5) array([[ 5. , 3.45712869],        [ 1. , 2.78857832],        [ 6. , 2.38193068]])Thursday, January 26, 2012
  75. 75. The current Crab >>> from crab.recommenders.knn import UserBasedRecommender >>> recsys = UserBasedRecommender(model=m, similarity=similarity, capper=True,with_preference=True) >>> recsys.recommend(5) array([[ 5. , 3.45712869],        [ 1. , 2.78857832],        [ 6. , 2.38193068]]) >>> recsys.recommended_because(user_id=5,item_id=1) array([[ 2. , 3. ],        [ 1. , 3. ],        [ 6. , 3. ],        [ 7. , 2.5],        [ 4. , 2.5]])Thursday, January 26, 2012
  76. 76. The current Crab >>> from crab.recommenders.knn import UserBasedRecommender >>> recsys = UserBasedRecommender(model=m, similarity=similarity, capper=True,with_preference=True) >>> recsys.recommend(5) array([[ 5. , 3.45712869],        [ 1. , 2.78857832],        [ 6. , 2.38193068]]) >>> recsys.recommended_because(user_id=5,item_id=1) array([[ 2. , 3. ],        [ 1. , 3. ], MatrixPreferenceDataModel (7 by 6)          1 2 3 4 5 ...        [ 6. , 3. ], 1 3.000000 4.000000 3.500000 5.000000 3.000000 2 3.000000 4.000000 2.000000 3.000000 3.000000        [ 7. , 2.5], 3 --- 3.500000 2.500000 4.000000 4.500000        [ 4. , 2.5]]) 4 2.500000 3.500000 2.500000 3.500000 3.000000 5 --- 4.500000 1.000000 4.000000 --- 6 3.000000 3.500000 3.500000 5.000000 3.000000 7 2.500000 3.000000 --- 3.500000 4.000000Thursday, January 26, 2012
  77. 77. The current Crab Collaborative Filtering algorithms User-Based, Item-Based and Slope One Evaluation of the Recommender Algorithms Precision, Recall, F1-Score, RMSE Precision-Recall ChartsThursday, January 26, 2012
  78. 78. Evaluating your recommenderThursday, January 26, 2012
  79. 79. Evaluating your recommender >>> from crab.metrics.classes import CfEvaluatorThursday, January 26, 2012
  80. 80. Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator >>> evaluator = CfEvaluator()Thursday, January 26, 2012
  81. 81. Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator >>> evaluator = CfEvaluator() >>> evaluator.evaluate(recommender=recsys,metric=rmse)Thursday, January 26, 2012
  82. 82. Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator >>> evaluator = CfEvaluator() >>> evaluator.evaluate(recommender=recsys,metric=rmse) {rmse: 0.69467177857026907}Thursday, January 26, 2012
  83. 83. Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator >>> evaluator = CfEvaluator() >>> evaluator.evaluate(recommender=recsys,metric=rmse) {rmse: 0.69467177857026907} >>> evaluator.evaluate_on_split(recommender=recsys, at =2)Thursday, January 26, 2012
  84. 84. Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator >>> evaluator = CfEvaluator() >>> evaluator.evaluate(recommender=recsys,metric=rmse) {rmse: 0.69467177857026907} >>> evaluator.evaluate_on_split(recommender=recsys, at =2) ({error: [{mae: 0.345, nmae: 0.4567, rmse: 0.568}, {mae: 0.456, nmae: 0.356778, rmse: 0.6788}, {mae: 0.456, nmae: 0.356778, rmse: 0.6788}], ir: [{f1score: 0.456, precision: 0.78557, recall:0.55677}, {f1score: 0.64567, precision: 0.67865, recall: 0.785955}, {f1score: 0.45070, precision: 0.74744, recall: 0.858585}]}, {final_score: {avg: {f1score: 0.495955, mae: 0.429292, nmae: 0.373739, precision: 0.63932929, recall: 0.729939393, rmse: 0.3466868}, stdev: {f1score: 0.09938383 , mae: 0.0593933, nmae: 0.03393939, precision: 0.0192929, recall: 0.031293939, rmse: 0.234949494}}})Thursday, January 26, 2012
  85. 85. Distributing the recommendation computations Use Hadoop and Map-Reduce intensively Investigating the Yelp mrjob framework https://github.com/pfig/mrjob Develop the Netflix and novel standard-of-the-art used Matrix Factorization, Singular Value Decomposition (SVD), Boltzman machines The most commonly used is Slope One technique. Simple algebra math with slope one algebra y = a*x+bThursday, January 26, 2012
  86. 86. Cache/Paralelism with joblib http://packages.python.org/joblib/index.html from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model]Thursday, January 26, 2012
  87. 87. Cache/Paralelism with joblib http://packages.python.org/joblib/index.html from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cacheThursday, January 26, 2012
  88. 88. Cache/Paralelism with joblib http://packages.python.org/joblib/index.html from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache >>># With memory.cacheThursday, January 26, 2012
  89. 89. Cache/Paralelism with joblib http://packages.python.org/joblib/index.html from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache >>># With memory.cache >>> timeit similarity.get_similarities (‘marcel_caraciolo’)Thursday, January 26, 2012
  90. 90. Cache/Paralelism with joblib http://packages.python.org/joblib/index.html from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache >>># With memory.cache >>> timeit similarity.get_similarities >>> timeit similarity.get_similarities (‘marcel_caraciolo’) (‘marcel_caraciolo’)Thursday, January 26, 2012
  91. 91. Cache/Paralelism with joblib http://packages.python.org/joblib/index.html from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache >>># With memory.cache >>> timeit similarity.get_similarities >>> timeit similarity.get_similarities (‘marcel_caraciolo’) (‘marcel_caraciolo’) 100 loops, best of 3: 978 ms per loopThursday, January 26, 2012
  92. 92. Cache/Paralelism with joblib http://packages.python.org/joblib/index.html from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache >>># With memory.cache >>> timeit similarity.get_similarities >>> timeit similarity.get_similarities (‘marcel_caraciolo’) (‘marcel_caraciolo’) 100 loops, best of 3: 978 ms per loop 100 loops, best of 3: 434 ms per loopThursday, January 26, 2012
  93. 93. Distributed Computing with mrJob https://github.com/Yelp/mrjobThursday, January 26, 2012
  94. 94. Distributed Computing with mrJob https://github.com/Yelp/mrjob It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or local (for testing)Thursday, January 26, 2012
  95. 95. Distributed Computing with mrJob https://github.com/Yelp/mrjob It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or local (for testing)Thursday, January 26, 2012
  96. 96. Distributed Computing with mrJob https://github.com/Yelp/mrjob """The classic MapReduce job: count the frequency of words. """ from mrjob.job import MRJob import re WORD_RE = re.compile(r"[w]+") class MRWordFreqCount(MRJob):     def mapper(self, _, line):         for word in WORD_RE.findall(line):             yield (word.lower(), 1)     def reducer(self, word, counts):         yield (word, sum(counts)) if __name__ == __main__:     MRWordFreqCount.run() It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or local (for testing)Thursday, January 26, 2012
  97. 97. Distributed Computing with mrJob https://github.com/Yelp/mrjob Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduceThursday, January 26, 2012
  98. 98. Distributed Computing with mrJob https://github.com/Yelp/mrjob Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduceThursday, January 26, 2012
  99. 99. Future studies with Sparse Matrices Real datasets come with lots of empty values http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html Solutions: scipy.sparse package Sharding operations Matrix Factorization techniques (SVD) Apontador Reviews DatasetThursday, January 26, 2012
  100. 100. Future studies with Sparse Matrices Real datasets come with lots of empty values http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html Solutions: scipy.sparse package Sharding operations Matrix Factorization techniques (SVD) Crab implements a Matrix Factorization with Expectation Maximization algorithm Apontador Reviews DatasetThursday, January 26, 2012
  101. 101. Future studies with Sparse Matrices Real datasets come with lots of empty values http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html Solutions: scipy.sparse package Sharding operations Matrix Factorization techniques (SVD) Crab implements a Matrix Factorization with Expectation Maximization algorithm scikits.crab.svd package Apontador Reviews DatasetThursday, January 26, 2012
  102. 102. Benchmarks Pure Python w/ Python w/ Scipy Dataset dicts and Numpy MovieLens 100k 15.32 s 9.56 s http://www.grouplens.org/node/73 Old Crab New CrabThursday, January 26, 2012
  103. 103. Benchmarks Pure Python w/ Python w/ Scipy Dataset dicts and Numpy MovieLens 100k 15.32 s 9.56 s http://www.grouplens.org/node/73 Old Crab New Crab Time ellapsed ( Recommend 5 items) 0 4 8 12 16Thursday, January 26, 2012
  104. 104. Benchmarks Pure Python w/ Python w/ Scipy Dataset dicts and Numpy MovieLens 100k 15.32 s 9.56 s http://www.grouplens.org/node/73 Old Crab New Crab Time ellapsed ( Recommend 5 items) 0 4 8 12 16Thursday, January 26, 2012
  105. 105. Benchmarks Pure Python w/ Python w/ Scipy Dataset dicts and Numpy MovieLens 100k 15.32 s 9.56 s http://www.grouplens.org/node/73 Old Crab New Crab Time ellapsed ( Recommend 5 items) 0 4 8 12 16Thursday, January 26, 2012
  106. 106. Why migrate ? Old Crab running only using Pure Python Recommendations demand heavy maths calculations and lots of processing Compatible with Numpy and Scipy libraries High Standard and popular scientific libraries optimized for scientific calculations in Python Scikits projects are amazing! Active Communities, Scientific Conferences and updated projects (e.g. scikit-learn) Turn the Crab framework visible for the community Join the scientific researchers and machine learning developers around the Globe coding with Python to help us in this project Be Fast and FuriousThursday, January 26, 2012
  107. 107. How are we working ? Sprints, Online Discussions and Issues https://github.com/muricoca/crab/wiki/UpcomingEventsThursday, January 26, 2012
  108. 108. How are we working ? Our Project’s Home Page http://muricoca.github.com/crabThursday, January 26, 2012
  109. 109. Future Releases Planned Release 0.1 Collaborative Filtering Algorithms working, sample datasets to load and test Planned Release 0.11 Evaluation of Recommendation Algorithms and Database Models support Planned Release 0.12 Recommendation as Services with REST APIs ....Thursday, January 26, 2012
  110. 110. Join us! 1. Read our Wiki Page https://github.com/muricoca/crab/wiki/Developer-Resources 2. Check out our current sprints and open issues https://github.com/muricoca/crab/issues 3. Forks, Pull Requests mandatory 4. Join us at irc.freenode.net #muricoca or at our discussion list in work :(Thursday, January 26, 2012
  111. 111. Thursday, January 26, 2012
  112. 112. Recomendação  em  redes  sociais !"#$%*+,-)% ./0#$-+1/% this engine with the popular brazilian social network AtéPassar Integrated More than 70.000 students registered studying for the public examinations Recommend StudyGroups, Friends,Video Classes, Questions and Concursos More than 70.000 items available for recommend % % !"--(0".(12%&()%*&+,-$%.,#/& % Written in Python using a open-source framework Crab !"#"$%&&%()*&+,-(.&/,-0&+,-(.& %12%&303#2,&(",&2,"&34& % Framework available for building recommender systems (My contribution) It is running since January 2011 In March B#0-%<#+CC#/3#$% was performed. 2011 , questionnaire %% %&-$-C#0#$"%% Liked Not Liked -1/"% 23%mender Components Interac- 77% Figure 3: AtePassar Recommender Syste face hat students do not meet phys- Thursday, January 26, 2012
  113. 113. colecione descontos WWW. FAVORITOZ. COMThursday, January 26, 2012
  114. 114. Thursday, January 26, 2012
  115. 115. Recomendações Sociais 1. Usuário se loga via Facebook 2. Usuário acessa a e-commerce parceira da LikeStore. 3. Usuário já recebe recomendações personalizadas na entrada. 4. Usuário recebe recomendações no carrinho de compras 5. Usuário recebe recomendações na página do produto. Produtos Similares Quem comprou este também comprou Amigos que curtiram/ compraram istoThursday, January 26, 2012
  116. 116. Construção  do  Social  Genoma  Thursday, January 26, 2012
  117. 117. Alguém  duvida  ainda  ? http://www.shopycat.com/Thursday, January 26, 2012
  118. 118. DicasThursday, January 26, 2012
  119. 119. Join us! 1. Read our Wiki Page https://github.com/muricoca/crab/wiki/Developer-Resources 2. Check out our current sprints and open issues https://github.com/muricoca/crab/issues 3. Forks, Pull Requests mandatory 4. Join us at irc.freenode.net #muricoca or at our discussion list in scikit-crab@googlegroups.comThursday, January 26, 2012
  120. 120. Dicas para Arquitetura de RecomendaçãoThursday, January 26, 2012
  121. 121. Dicas para Arquitetura de RecomendaçãoThursday, January 26, 2012
  122. 122. Dicas para Arquitetura de RecomendaçãoThursday, January 26, 2012
  123. 123. Dicas para Arquitetura de RecomendaçãoThursday, January 26, 2012
  124. 124. Items Recomendados Toby Segaran, Programming Collective SatnamAlag, Collective Intelligence in Intelligence, OReilly, 2007 Action, Manning Publications, 2009 Sites como TechCrunch e ReadWriteWebThursday, January 26, 2012
  125. 125. Conferências Recomendadas - ACM RecSys. –ICWSM: Weblogand Social Media –WebKDD: Web Knowledge Discovery and Data Mining –WWW: The original WWW conference –SIGIR: Information Retrieval –ACM KDD: Knowledge Discovery and Data Mining –ICML: Machine LearningThursday, January 26, 2012
  126. 126. Onde você estará em tudo isso ? Fonte: Hunch.com Obrigado !! HUNCH Vendida ao Ebay por $80MThursday, January 26, 2012
  127. 127. Sistemas de Recomendação Marcel Pinheiro Caraciolo marcel@orygens.com @marcelcaraciolo http://www.orygens.comThursday, January 26, 2012
  128. 128. Optimizations with Cython http://cython.org/ Cython is a Python extension that lets developers annotate functions so they can be compiled to C. http://aimotion.blogspot.com/2011/09/high-performance-computation-with_17.htmlThursday, January 26, 2012
  129. 129. Optimizations with Cython http://cython.org/ Cython is a Python extension that lets developers annotate functions so they can be compiled to C. # setup.py from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext # for notes on compiler flags see: # http://docs.python.org/install/index.html setup( cmdclass = {build_ext: build_ext}, ext_modules = [Extension("spearman_correlation_cython", ["spearman_correlation_cython.pyx"])] ) http://aimotion.blogspot.com/2011/09/high-performance-computation-with_17.htmlThursday, January 26, 2012
  130. 130. Optimizations with Cython http://cython.org/ Cython is a Python extension that lets developers annotate functions so they can be compiled to C. # setup.py from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext # for notes on compiler flags see: # http://docs.python.org/install/index.html setup( cmdclass = {build_ext: build_ext}, ext_modules = [Extension("spearman_correlation_cython", ["spearman_correlation_cython.pyx"])] ) http://aimotion.blogspot.com/2011/09/high-performance-computation-with_17.htmlThursday, January 26, 2012
  131. 131. Cache/Paralelism with joblib http://packages.python.org/joblib/index.html Investigate how to use multiprocessing and parallel packages with similarities computation from joblib import Parallel ... def get_similarities(self, source_id):         return Parallel(n_jobs=3) ((other_id, delayed(self.get_similarity) (source_id, other_id)) for other_id, v in self.model)Thursday, January 26, 2012
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×