SlideShare a Scribd company logo
1 of 131
Download to read offline
Sistemas de
          Recomendação

        Marcel Pinheiro Caraciolo
           marcel@orygens.com
              @marcelcaraciolo




                                 http://www.orygens.com
Thursday, January 26, 2012
Quem é Marcel ?
 Marcel Pinheiro Caraciolo - @marcelcaraciolo

              Sergipano, porém Recifense.
              Mestre em Ciência da Computação no CIN/UFPE na área de mineração de dados
               Diretor de Pesquisa e Desenvolvimento na Orygens
               Membro e Moderador da Celúla de Usuários Python de Pernambuco (PUG-PE)


                     Minhas áreas de interesse: Computação móvel e Computação inteligente


                      Meus blogs: http://www.mobideia.com (sobre Mobilidade desde 2006)
                                  http://aimotion.blogspot.com (sobre I.A. desde 2009)

                             Jovem Aprendiz ainda nas artes pythonicas.... (desde 2007)




Thursday, January 26, 2012
WEB




Thursday, January 26, 2012
WEB




Thursday, January 26, 2012
1.0                     2.0




                     Fonte de Informação   Fluxo Contínuo de Informação
                                                      VI Encontro do PUG-PE
                                                       VI Encontro do PUG-PE


Thursday, January 26, 2012
WEB SITES
      WEB APPLICATIONS
       WEB SERVICES
                             3.0          SEMANTIC WEB




                             USERS   VI Encontro do PUG-PE
                                      VI Encontro do PUG-PE


Thursday, January 26, 2012
Usar informação coletiva de
                 forma efetiva afim de
               aprimorar uma aplicação



Thursday, January 26, 2012
Intelligence from
                                                 Mining Data




                                                                                  User
                                                                                   User
                             User                                                   User
                                                                                     User
                                           Um usuário influencia outros
                                    por resenhas, notas, recomendações e blogs




                                       Um usuário é influenciado por outros
                                     por resenhas, notas, recomendações e blogs


Thursday, January 26, 2012
aggregation information: lists
                                                               ratings
              user-generated content
                                                  reviews
               blogs                                          recommendations

                         wikis      Collective Intelligence      voting
                                      Your application             bookmarking
                                 Search
                                              tag cloud        tagging
                                                                          saving
             Natural Language Processing

                Clustering and                       Harness external content
               predictive models

Thursday, January 26, 2012
WEB SITES
       WEB APPLICATIONS
        WEB SERVICES
                              3.0            SEMANTIC WEB




                              USERS
                                      antes...
                                        VI Encontro do PUG-PE
                                         VI Encontro do PUG-PE


Friday, October 1, 20102012
 Thursday, January 26,
Atualmente


Thursday, January 26, 2012
estamos sobrecarregados
      de informações




Thursday, January 26, 2012
muitas vezes inúteis




 Thursday, January 26, 2012
Friday, October 1, 2010
às vezes
    procuramos
       isso...


Friday, October 1, 2010 2012
 Thursday, January 26,
e encontramos isso!




Friday, October 1, 2010 2012
 Thursday, January 26,
google?




Friday, October 1, 2010 2012
 Thursday, January 26,
google?




     midias sociais?


Friday, October 1, 20102012
Thursday, January 26,
eeeeuuuu...

                  google?




    midias sociais?


riday, October 1, 2010 2012
  Thursday, January 26,
Sistemas de Recomendação
Thursday, January 26, 2012
“A lot of times, people don’t know what
                     they want until you show it to them.”
                                                         Steve Jobs

                “We are leaving the Information age, and
                entering into the Recommendation age.”
                                      Chris Anderson, from book Long Tail



Thursday, January 26, 2012
Recomendações Sociais

                                                                  Família/Amigos
                                                                Amigos/ Família
                          O Que eu
                         deveria ler ?




                                                                             Ref: Flickr-BlueAlgae



                                                                 “Eu acho que
                                                                você deveria ler
                             Ref: Flickr photostream: jefield     estes livros.




Thursday, January 26, 2012
Recomendações por Interação

                                        Entrada: Avalie alguns livros

                         O Que eu
                        deveria ler ?




                                                                        Saída:
                                                                        “Livros que você
                                                                          pode gostar
                                                                             são …”




Thursday, January 26, 2012
Sistemas desenhados para sugerir algo para mim do meu
                                   interesse!




Thursday, January 26, 2012
Por que Recomendação ?




Thursday, January 26, 2012
Netflix
              - 2/3 dos filmes alugados vêm de recomendação

      Google News
               - 38% das notícias mais clicadas vêm de recomendação


       Amazon
               - 38% das vendas vêm de recomendação

                                                  Fonte: Celma & Lamere, ISMIR 2007




Thursday, January 26, 2012
!"#$%"#&'"%(&$)")
                       Nós+,&-.$/).#&0#/"1.#$%234(".#
                        * estamos sobrecarregados de
                                     informação
                          $/)#5(&6 7&.2.#"$4,#)$8
                              * 93((3&/.#&0#:&'3".;#5&&<.#
                                $/)#:-.34#2%$4<.#&/(3/"
                     Milhares de artigos e posts
                         * =/#>$/&3;#?#@A#+B#4,$//"(.;#
                         novos todos os dias
                                2,&-.$/).#&0#7%&6%$:.#
                                "$4,#)$8
                       * =/#C"1#D&%<;#."'"%$(#
                   Milhões de Músicas, Filmes e
                         2,&-.$/).#&0#$)#:"..$6".#
                              Livros
                                ."/2#2&#-.#7"%#)$8


                             Milhares de Ofertas e
                                  Promoções

Thursday, January 26, 2012
O que pode ser recomendado ?

                           Contatos em Redes Sociais     Artigos
           Produtos      Messagens de Propaganda
         Cursos e-learning                               Livros
                 Tags        Músicas
                                         Futuras namoradas
                               Roupas         Filmes
                     Restaurantes
                                          Programas de Tv
             Vídeos               Papers
               Opções de Investimento             Profissionais
                                     Módulos de código

Thursday, January 26, 2012
E como funciona a
                              recomendação ?




Thursday, January 26, 2012
O que os sistemas de recomendação
                        realmente fazem ?
                  1. Prediz o quanto você pode gostar de um certo
                                  produto ou serviço
                 2. Sugere um lista de N items ordenada de acordo
                                   com seu interese
                 3. Sugere uma lista de N usuários ordernada
                            para um produto/serviço
                4. Explica a você o porque esses items foram
                                 recomendados
                5. Ajusta a predição e a recomendação baseado em
                              seu feedback e de outros.
Thursday, January 26, 2012
Filtragem baseada por Conteúdo

                                      Similar




           Duro de                    O Vento                         Toy
                                                   Armagedon                    Items
            Matar                      Levou                         Store


                                                         recomenda
                              gosta

                                                Marcel                       Usuários




Thursday, January 26, 2012
Problemas com filtragem por
                                      conteúdo
                    1. Análise dos dados Restrita
                    - Items e usuários pouco detalhados. Pior em áudio ou imagens

                     2. Dados Especializados
                  - Uma pessoa que não tem experiência com Sushi não recebe o
                             melhor restaurante de Sushi da cidade
                    3. Efeito Portfólio
               - Só porque eu vi 1 filme da Xuxa quando criança, tem que me
                                   recomendar todos dela



Thursday, January 26, 2012
Filtragem Colaborativa




                                      O Vento                         Toy
              Thor                                    Armagedon               Items
                                       Levou                         Store

               gosta
                                                       recomenda


                             Marcel        Rafael           Amanda           Usuários




                                            Similar

Thursday, January 26, 2012
Problemas com filtragem colaborativa
                             1. Escabilidade
                                - Amazon com 5M usuários, 50K items, 1.4B avaliações
                         2. Dados esparsos
                              - Novos usuários e items que não tem histórico
                        3. Partida Fria
                               - Só avaliei apenas um único livro no Amazon!
                        4. Popularidade
                             - Todo mundo lê ‘Harry Potter’
                         5. Hacking
                             - A pessoa que lê ‘Harry Potter’ lê Kama Sutra
Thursday, January 26, 2012
Filtragem Híbrida
                             Combinação de múltiplos métodos

                 Duro de           O Vento                              Toy
                                                 Armagedon                       Items
                  Matar             Levou                              Store



                                                                               Ontologias
                                                                                 Dados
                                                                               Símbolicos




                               Marcel        Rafael          Luciana           Usuários



Thursday, January 26, 2012
Como eles são
                                apresentados ?
                  Destaques                   Mais sobre este artista...
              Alguem similar a você também gostou disso
                     O mais popular em seu grupo...
       Já que você escutou esta, você pode querer esta...
           Lançamentos                Escute músicas de artistas similares.
                      Estes dois item vêm juntos..


Thursday, January 26, 2012
Como eles são avaliados ?
        Como sabemos se a recomendação é boa ?
        Geralmente se divide-se em treinamento/teste (80/20)

       Críterios utilizados:
           - Erro de Predição: RMSE

        - Curva ROC*, rank-utility, F-Measure
                                       *http://code.google.com/p/pyplotmining/




Thursday, January 26, 2012
Mobile Recommenders




Thursday, January 26, 2012
Por que mobile ?

                 Mais de 1 bilhão de Aparelhos
                Mais de 5 bilhões de apps baixadas




                    Destaque no segmento mobile
                       http://foursquare.com




                                                  http://vimeo.com/29323612
Thursday, January 26, 2012
Sistemas de Recomendação Móvel


                Deve-se levar em conta informações temporais e espaciais


                             Como definir que contexto ele está inserido ?


          E as avaliações como ser capturadas em uma tela limitada?




Thursday, January 26, 2012
a strong heterogeneity. At            case study is carried out in Section 5. Finaly, the
ser's location is constantly            conclusion of this paper and future work
ata-processing capability in            overview are discussed in Section 6.
                                                                                                                      WSEAS TRANSACTIONS on COMPUTERS
     services on the system
ht new challenges [4-6].
  type of location-based
 approach, users want to be
 e real-time and targeted
                                        2 System Workflow and Architecture
                                                                            Arquitetura
                                     Figure 1 gives the workflow of our system.
                                                                                                                   repackage the heterogeneous data and service,
                                                                                                                   and republic them as web service. The
                                                                                                                                                                      service com
                                                                                                                                                                      new code to
  not just the indexed               Users can send their inquiries demand by                                      successful design of this module is the key             After an
     simply on a static              operating in the mobile phone. And the client                                 problem for realization of cross-platform            new appl
                                                                                                                                                                        mechanism
  tly, the rise of a large
  .0 applications (blog,          Recomendações processadas via Mobile (Inviável Hoje)
                                     will get the current location information and sent
                                     it together with users’ inqueries demand to the
                                                                                                                   service and data sharing.
                                                                                                                     The functional layer has three components as
                                                                                                                   Multi-Mode Location Information Index,
                                                                                                                                                                        service m
                                                                                                                                                                        large-scal
  Web Albums, Blog and               server. Server-side application will analyze the                                                                                   Thus it ca
 tes that users have the very        relevant data and provide matched restaurant                                  Context-based       Collaborative    Filtering       changing
   of direct, rapid, useful and      recommendation and navigation.                                                Algorithm, and Location-based Personalized              So in th
 tion recommendation and          - Tudo é processado em Back-End (Servidor)
                                        Application data information of our system                        e enviado ao celular via Web
                                                                                                                   Recommendation and Navigation. We will               and Serv
  ].                                 can be divided into two parts: the location-based                             discuss every function component in details as       Middlewa
  n can be user-friendly             data (such as traffic and road condition data,                                follows.                                             Architectu
                                     GPS map, and entity information, etc.) and the                                                                                     two techn
 ient mobile terminals, It
                                                                                                                                              Value-added Services      integration
 a very important research           value-added data provided by users (such as                                                                                        combinati
                                                                                                                                              in Web 2.0
 very wide market prospect.          Ratings, Comments, Blog and Tags, etc.).                                                                 User Tagging            !!Despite th
                                                                                                                   Value-added DB
signs and realizes a                                                                                               Comments Tags              Information Publish     platforms h
                                                                User
    mobile           restaurant                                                                                    Ratings …..….              Recommendation          information
navigation system. In order                                             Restaurant Query
                                                                                                                                              ……...                   Ping” webs
side response speed for                                                                                                                                               facilities ra
   propose a memory pool                 Location-based DB
                                                                                                                                                                      website, wh
                                                               Client                                                                                                 However, it
  Accept command, no-data                GPS-info    E-Map
                                         Entity-info ……...                                                        Mobile Information Pushing Platform                 static guidin
 terrupt mechanism, which                                               Prescribed Location-based Info.                                                               mobile loca
                                                                                                                    Context-based              Location-based
  ize the server-side control                                                      Users‘                           Collaborative Filtering                           inconvenien
                                                                                                                                               personalized
 ient side, we combine the              Matched Entity                             Collaborative                                               recommendation and     with the visi
 lication data with the                 & Route Info.                              Recommendation &
                                                                                                                    Multi-Mode Location
                                                                                                                                               Navigation               In order
                                                                                   Entity Feature Info.                                                               scenario as
 nd propose a collaborative                                                                                         Information Index
mmend mechanisms, which                                                                                                                                               and propose
                                                               Server
  h real-time location-based                                                                                                                                               Let us
                                         Personalized                                                               Location-based Data and Service Middleware          example.
ecommend personalized                    Location-based                     Value-added DB                                                                              location a
                                         Restaurant                         Comments Tags
                                         Recommendation &
                                                                                                                                                                        from its c
 ually provide personalized                                                 Ratings …..….                                                                               through th
                                         Navigation Services
 ndation to build their own                                    Clien                                                                                                    informatio
                                                                                                                                              Location-based
h can help them to consider                                                                                                                   Services                  informatio
munity users!collaborative                Fig.1. System Workflow                                                   Location-based             GPS Navigation            current lo
                                                                                                                   DB                                                   informatio
                                                                                                                                              Location-based info
                                                                                                                   Traffic-info
                                                                                                                                               Booking                  the targe
                                                                                                                   E-Map
                                                                                                                                              Entity-query              informatio
                                                                                                                                                                        matching
                                  810                           Issue 5, Volume 6, May 2009
                                                                                                                                                                        informatio
                                                                                                                  Fig 1.     Architecture of the Mobile Information
   Thursday, January 26, 2012                                                                                                                                              Accordi
Informações Disponíveis




                              Localização, Tags, Contexto


Thursday, January 26, 2012
Informações Disponíveis




                                                 Avaliação
                                                 Implícita




Thursday, January 26, 2012
Um dos mais populares
                             sistemas de localização móvel


                             Checkins, diga aonde você está!


                               Recomendações de lugares




Thursday, January 26, 2012
Assistente Virtual Móvel Conversacional
              Já se utiliza de informações das redes Sociais
             Recomendação de Restaurantes




Thursday, January 26, 2012
Google HotPot



                     Repositório de Reviews
                Recomendação de Lugares




Thursday, January 26, 2012
Minhas contribuições




Thursday, January 26, 2012
Meu trabalho de Mestrado




                                Offering Products and Services Using Product
                              Reviews from Social Networks in Mobile Decision
                                                Aid Systems
                                                                   Marcel Caraciolo∗ and Germano Vasconcelos†
                                                                                   Informatics Center
                                                                           Federal University Of Pernambuco
                                                                            WebSite: http://www.cin.ufpe.br/
                                                                              Email: ∗ mpc@cin.ufpe.br
                                                                                   † gcv@cin.ufpe.br




                                Abstract—Recommendation engines provide information fil-        extremely used by users to give a more nuanced view about
                             tering functions and decision aids that have a great potential    a product in order to make an informed decision [5].
                             application the mobile context. An aspect that hasn’t been           Nonetheless, providing users with relevant recommenda-
                             extensively exploited yet in the current recommendations is
                             the improvement in the explanation of the recommendation.         tion information it is a difficult task. Besides the technical
                             For instance, exploiting the service and product description      components such as the user model representation and infor-
                             and the opinion of users about the recommended products,          mation filtering techniques to generate the recommendations,
                             where associated would bring a better explanation for the user.   the information must be user-friendly visualized. This is a
                             In this paper we will present the foundations for a mobile        requirement specially to support the user in the purchase
                             product/service recommender system which incorporate both
Thursday, January 26, 2012   structured (supplier driven) product descriptions and subject     decision process, and to convince him about the utility of the
source, the recommendation architecture that we propose will                     would rely more on collaborative-filtering techniques, that is,
                      aggregate the results of such filtering techniques.                                    Bezerra and Carvalho proposed approaches where the results
                                                                                                       the reviews from similar users.
                         We aim at integrating the previously mentioned hybrid prod-                      Figure 1 shows a overview of our meta recommender
                                                                                                            achieved showed to be very promising [19].
                                                                                                       approach. By combining the content-based filtering and the
                      uct recommendation approach in a mobile application so the
                      users could benefit from useful and logical recommendations.                      collaborative-based one into a hybrid recommender system, it                                                   A.
                      Moreover, we aim at providing a suited explanation for each                      would use the services/products III. S YSTEM catalogues
                                                                                                                                       repositories which D ESIGN

       How reviews from web services sources can be aggregated in the for
                      recommendation to the user, since the current approaches just
                      only deliver product recommendations with a overall score
                                                                                                       the services to be recommended, and the review repository
                                                                                                               Application data information our mobile recommender sys-
                                                                                                       that contains the user opinions about those services. All this
                                                                                                       datatembecan be from data source containers in the web product description
                                                                                                             can    extracted divided into two parts: the
                                                                      rec
                     mobile recommendation process?
                      without pointing out the appropriateness of such recommen-
                      dation [13]. Besides the basic information provided by the                       such(such location-based social network Foursquare its attributes) and the user
                                                                                                             as the as location, description and [17] as
                                                                      mo
                      suppliers, the system will deliver the explanation, providing                    displayed at the Figure 2 and the location recommendation
                      relevant reviews of similar users, we believe that it will                       engine from Google: Google HotPot [18]. by user (such as rating, comments,
                                                                                                            reviews or ratings provided                                                                               wh
                      increase the confidence in the buying decision process and the                       tags, etc.). The Figure 3 gives the system’s architecture and                                               po
                      product accepptance rate. In the mobile context this approach
                      could help the users in this process and showing the user
                                                                                                          relative components.                                                                                        thi
                      opinions could contribute to achieve this task.                                                                                                                                                 rec
                                                                                                                                                                                                                      spe
                                                                                                            !"#$"%&'$                                                         5&-$
                               !"#$%&'%($)                               !".,"/#)                                                                                                                                     acc
                               !"*+#,$+'-)                              !"*+#,$+'-)                                                                +,-*.&$
                                                                                                                                  !(#$()&'*&%$
                                                                                                                                                  /01&'234&$          !6#$6,00&41&7$
                                                                                                                                                                                                                      wh
                                                                                                                                                                                                                      res
                                                                                                                                                          !<#$<'&2&'&04&%A$B,431*,0A$&14C$
                                                                                                                                                                                                                      ves
                                                                     0+44%6+'%$,.")1%#"2)
                             0+($"($)1%#"2)
                                                                           3,4$"',(5)
                                                                                                                                                                                                                      ou
                               3,4$"',(5)
                                                                    )))67,8,#%)+,4%$91$'%4)-1":))))
                                                                                                                                                                                                                      suc
                         !"#$%&"'()*+,#&-,.)
                         /$%,0"12()*3$4%)3""5.)
                                                                    ))))1,;&,<4)<1&%%,')=2)4&:&8$1))
                                                                    )))))))))))%$4%,5)94,14>?)                                                                                                    <',7)41$
                                                                                                                                                                                                                      pro
                                                                                                                                                                                                 8&=,%*1,'>$
                                                                                                                                                                                                                      exp
                                                                                                                                         8&4,99&0731*,0$:0;*0&$                        !B#$B*%1$,2$D4,'&7$<',7)41%$
                                                                                                                                                                                       !(#$()&'*&%$
                                                                                                                                                                                                                      ma
                                                                                                                                                                                                  8&?*&@$
                                                                                                                                                                                                                      we
                                                                                                              Fig. 2.   User Reviews from Foursquare Social Network                              8&=,%*1,'>$
                                                                                                                                                                                                                      com
                                                         7"$%)
                                                     !"8+99"(2"'))
                                                                                                                                                            !8#$830E&7$<',7)41%$
                                                                                                          The content-based filtering approach will be used to filter                                                   ext
                                                                                                       the product/service repository, while the collaborative based
                                                                                                                                               8&%).1%$                                                               B.
                                                                                                       approach will derive the product review recommendations. In
                                                                                                       addition we will use text mining techniques to distinct the
                                                      !"8+99"(2%$,+(#)                                 polarity of the user review between positive or negative one.
                                                                                                       This information summarized would contribute in the product Architecture
                                                                                                                          Fig. 3. Mobile Recommender System                                                           rat
                                                                                                       score recommendation computation. The final product recom-
                                       Fig. 1.    Meta Recommender Architecture
                                                                                                       mendation score is computed by integrating the result of both
                                                                                                                                                                                                                      me
                                                                                                       recommenders. By now, weproduct/service recommender, the user could
                                                                                                               In our mobile are considering to use different                                                         and
                         Since one of the goals of this work is to incorporate                         options regarding this integration approach, one and get a list of recommen-
                      different data sources of user opinions and descriptions, we                          filter some products or services at special                                                                oth
                                                                                                       is the symbolic data analysis approach (SDA) [19], which
                      have addopted an meta recommendation architecture. By using                      eachtations. The user user ratings/reviews arehis preferences or give his
                                                                                                             product description and also can enter modeled                                                           ow
                      a meta recommender architecture, the system would provide
                      a personalized control over the generated recommendation list
                                                                                                            feedback to some offered product recommendation.
                                                                                                       as set of modal symbolic descriptions that summarizes the                                                      Re
                                                                                                       information provided by the corresponding data sources. It is
                      formed by the combination of rich data [16]. The influence
Thursday, January 26, 2012 specific data sources could be explicitly controlled by                      a novel Other functionalities are systems which,i n of the next ve best
                                                                                                                approach in hybrid recommender the retrieval                                                          the
                      of the
Text Mining A Lot!

                     Sentiment Analysis for Extracting the Polarity
                     Meta-Recommender Engines
                              Content-Based Filtering
                             kNN - Nearest Neighbors
                              Hybrid Meta Recommender
                              Symbolic Data Analysis (SDA)

                       Evaluation in Experimental DataSets

                        Architectural Proposal for Mobile Recommender
Thursday, January 26, 2012
Crab
                                A Python Framework for Building
                                    Recommendation Engines

       Marcel Caraciolo Ricardo Caspirro                      Bruno Melo
                   @marcelcaraciolo       @ricardocaspirro        @brunomelo



Thursday, January 26, 2012
What is Crab ?

        A python framework for building recommendation engines
     A Scikit module for collaborative, content and hybrid filtering
                         Mahout Alternative for Python Developers :D
                             Open-Source under the BSD license


                             https://github.com/muricoca/crab




Thursday, January 26, 2012
The current Crab




Thursday, January 26, 2012
The current Crab
            >>>#load the dataset




Thursday, January 26, 2012
The current Crab
            >>>#load the dataset

             >>> from crab.datasets import load_sample_movies




Thursday, January 26, 2012
The current Crab
            >>>#load the dataset

             >>> from crab.datasets import load_sample_movies
            >>> data = load_sample_movies()




Thursday, January 26, 2012
The current Crab
            >>>#load the dataset

             >>> from crab.datasets import load_sample_movies
            >>> data = load_sample_movies()
            >>> data




Thursday, January 26, 2012
The current Crab
            >>>#load the dataset

             >>> from crab.datasets import load_sample_movies
            >>> data = load_sample_movies()
            >>> data
            {'DESCR': 'sample_movies data set was collected by the book called
                      nProgramming the Collective Intelligence by Toby Segaran nnNotesn-----
                      nThis data set consists ofnt* n ratings with (1-5) from n users to n movies.',
             'data': {1: {1: 3.0, 2: 4.0, 3: 3.5, 4: 5.0, 5: 3.0},
              2: {1: 3.0, 2: 4.0, 3: 2.0, 4: 3.0, 5: 3.0, 6: 2.0},
              3: {2: 3.5, 3: 2.5, 4: 4.0, 5: 4.5, 6: 3.0},
              4: {1: 2.5, 2: 3.5, 3: 2.5, 4: 3.5, 5: 3.0, 6: 3.0},
              5: {2: 4.5, 3: 1.0, 4: 4.0},
              6: {1: 3.0, 2: 3.5, 3: 3.5, 4: 5.0, 5: 3.0, 6: 1.5},
              7: {1: 2.5, 2: 3.0, 4: 3.5, 5: 4.0}},
             'item_ids': {1: 'Lady in the Water',
              2: 'Snakes on a Planet',
              3: 'You, Me and Dupree',
              4: 'Superman Returns',
              5: 'The Night Listener',
              6: 'Just My Luck'},
             'user_ids': {1: 'Jack Matthews',
              2: 'Mick LaSalle',
              3: 'Claudia Puig',
              4: 'Lisa Rose',
              5: 'Toby',
              6: 'Gene Seymour',
              7: 'Michael Phillips'}}
Thursday, January 26, 2012
The current Crab




Thursday, January 26, 2012
The current Crab

      >>> from crab.models import MatrixPreferenceDataModel




Thursday, January 26, 2012
The current Crab

      >>> from crab.models import MatrixPreferenceDataModel
    >>> m = MatrixPreferenceDataModel(data.data)




Thursday, January 26, 2012
The current Crab

      >>> from crab.models import MatrixPreferenceDataModel
    >>> m = MatrixPreferenceDataModel(data.data)

     >>> print m
     MatrixPreferenceDataModel (7 by 6)
              1          2          3          4            5        ...
     1        3.000000   4.000000   3.500000   5.000000   3.000000
     2        3.000000   4.000000   2.000000   3.000000   3.000000
     3           ---     3.500000   2.500000   4.000000   4.500000
     4        2.500000   3.500000   2.500000   3.500000   3.000000
     5           ---     4.500000   1.000000   4.000000       ---
     6        3.000000   3.500000   3.500000   5.000000   3.000000
     7        2.500000   3.000000       ---    3.500000   4.000000




Thursday, January 26, 2012
The current Crab




Thursday, January 26, 2012
The current Crab
              >>> #import pairwise distance




Thursday, January 26, 2012
The current Crab
              >>> #import pairwise distance
              >>> from crab.metrics.pairwise import
                       euclidean_distances




Thursday, January 26, 2012
The current Crab
              >>> #import pairwise distance
              >>> from crab.metrics.pairwise import
                       euclidean_distances
              >>> #import similarity




Thursday, January 26, 2012
The current Crab
              >>> #import pairwise distance
              >>> from crab.metrics.pairwise import
                       euclidean_distances
              >>> #import similarity
              >>> from crab.similarities import UserSimilarity




Thursday, January 26, 2012
The current Crab
              >>> #import pairwise distance
              >>> from crab.metrics.pairwise import
                       euclidean_distances
              >>> #import similarity
              >>> from crab.similarities import UserSimilarity
               >>> similarity = UserSimilarity(m,
                      euclidean_distances)




Thursday, January 26, 2012
The current Crab
              >>> #import pairwise distance
              >>> from crab.metrics.pairwise import
                       euclidean_distances
              >>> #import similarity
              >>> from crab.similarities import UserSimilarity
               >>> similarity = UserSimilarity(m,
                      euclidean_distances)
              >>> similarity[1]




Thursday, January 26, 2012
The current Crab
              >>> #import pairwise distance
              >>> from crab.metrics.pairwise import
                       euclidean_distances
              >>> #import similarity
              >>> from crab.similarities import UserSimilarity
               >>> similarity = UserSimilarity(m,
                      euclidean_distances)
             >>> similarity[1]
                   [(1, 1.0),
            (6, 0.66666666666666663),
            (4, 0.34054242658316669),
            (3, 0.32037724101704074),
            (7, 0.32037724101704074),
            (2, 0.2857142857142857),
            (5, 0.2674788903885893)]




Thursday, January 26, 2012
The current Crab
              >>> #import pairwise distance
              >>> from crab.metrics.pairwise import
                       euclidean_distances
              >>> #import similarity
              >>> from crab.similarities import UserSimilarity
               >>> similarity = UserSimilarity(m,
                      euclidean_distances)
             >>> similarity[1]
                   [(1, 1.0),
            (6, 0.66666666666666663),     MatrixPreferenceDataModel (7 by 6)
                                                   1          2          3          4            5
            (4, 0.34054242658316669),     1        3.000000   4.000000   3.500000   5.000000   3.000000
            (3, 0.32037724101704074),     2        3.000000   4.000000   2.000000   3.000000   3.000000
                                          3           ---     3.500000   2.500000   4.000000   4.500000
            (7, 0.32037724101704074),     4        2.500000   3.500000   2.500000   3.500000   3.000000
                                          5           ---     4.500000   1.000000   4.000000       ---
            (2, 0.2857142857142857),      6        3.000000   3.500000   3.500000   5.000000   3.000000
            (5, 0.2674788903885893)]      7        2.500000   3.000000       ---    3.500000   4.000000




Thursday, January 26, 2012
The current Crab




Thursday, January 26, 2012
The current Crab

              >>> from crab.recommenders.knn import UserBasedRecommender




Thursday, January 26, 2012
The current Crab

              >>> from crab.recommenders.knn import UserBasedRecommender
              >>> recsys = UserBasedRecommender(model=m,
              similarity=similarity, capper=True,with_preference=True)




Thursday, January 26, 2012
The current Crab

              >>> from crab.recommenders.knn import UserBasedRecommender
              >>> recsys = UserBasedRecommender(model=m,
              similarity=similarity, capper=True,with_preference=True)

               >>> recsys.recommend(5)
               array([[ 5.        , 3.45712869],
                      [ 1.        , 2.78857832],
                      [ 6.        , 2.38193068]])




Thursday, January 26, 2012
The current Crab

              >>> from crab.recommenders.knn import UserBasedRecommender
              >>> recsys = UserBasedRecommender(model=m,
              similarity=similarity, capper=True,with_preference=True)

               >>> recsys.recommend(5)
               array([[ 5.        , 3.45712869],
                      [ 1.        , 2.78857832],
                      [ 6.        , 2.38193068]])

              >>> recsys.recommended_because(user_id=5,item_id=1)
              array([[ 2. , 3. ],
                     [ 1. , 3. ],
                     [ 6. , 3. ],
                     [ 7. , 2.5],
                     [ 4. , 2.5]])




Thursday, January 26, 2012
The current Crab

              >>> from crab.recommenders.knn import UserBasedRecommender
              >>> recsys = UserBasedRecommender(model=m,
              similarity=similarity, capper=True,with_preference=True)

               >>> recsys.recommend(5)
               array([[ 5.        , 3.45712869],
                      [ 1.        , 2.78857832],
                      [ 6.        , 2.38193068]])

              >>> recsys.recommended_because(user_id=5,item_id=1)
              array([[ 2. , 3. ],
                     [ 1. , 3. ],       MatrixPreferenceDataModel (7 by 6)
                                                 1          2          3        4                     5        ...
                     [ 6. , 3. ],       1        3.000000   4.000000   3.500000 5.000000            3.000000
                                        2        3.000000   4.000000   2.000000 3.000000            3.000000
                     [ 7. , 2.5],       3           ---     3.500000   2.500000 4.000000            4.500000
                     [ 4. , 2.5]])      4        2.500000   3.500000   2.500000 3.500000            3.000000
                                                 5         ---     4.500000   1.000000   4.000000       ---
                                                 6      3.000000   3.500000   3.500000   5.000000   3.000000
                                                 7      2.500000   3.000000      ---     3.500000   4.000000




Thursday, January 26, 2012
The current Crab

                   Collaborative Filtering algorithms
                       User-Based, Item-Based and Slope One

                   Evaluation of the Recommender Algorithms
                      Precision, Recall, F1-Score, RMSE




                                                Precision-Recall Charts

Thursday, January 26, 2012
Evaluating your recommender




Thursday, January 26, 2012
Evaluating your recommender
           >>> from crab.metrics.classes import CfEvaluator




Thursday, January 26, 2012
Evaluating your recommender
           >>> from crab.metrics.classes import CfEvaluator
          >>> evaluator = CfEvaluator()




Thursday, January 26, 2012
Evaluating your recommender
           >>> from crab.metrics.classes import CfEvaluator
          >>> evaluator = CfEvaluator()

         >>> evaluator.evaluate(recommender=recsys,metric='rmse')




Thursday, January 26, 2012
Evaluating your recommender
           >>> from crab.metrics.classes import CfEvaluator
          >>> evaluator = CfEvaluator()

         >>> evaluator.evaluate(recommender=recsys,metric='rmse')
            {'rmse': 0.69467177857026907}




Thursday, January 26, 2012
Evaluating your recommender
           >>> from crab.metrics.classes import CfEvaluator
          >>> evaluator = CfEvaluator()

         >>> evaluator.evaluate(recommender=recsys,metric='rmse')
            {'rmse': 0.69467177857026907}
         >>> evaluator.evaluate_on_split(recommender=recsys, at =2)




Thursday, January 26, 2012
Evaluating your recommender
           >>> from crab.metrics.classes import CfEvaluator
          >>> evaluator = CfEvaluator()

         >>> evaluator.evaluate(recommender=recsys,metric='rmse')
            {'rmse': 0.69467177857026907}
         >>> evaluator.evaluate_on_split(recommender=recsys, at =2)
                ({'error': [{'mae': 0.345, 'nmae': 0.4567, 'rmse': 0.568},
                      {'mae': 0.456, 'nmae': 0.356778, 'rmse': 0.6788},
                      {'mae': 0.456, 'nmae': 0.356778, 'rmse': 0.6788}],
             'ir': [{'f1score': 0.456, 'precision': 0.78557, 'recall':0.55677},
               {'f1score': 0.64567, 'precision': 0.67865, 'recall': 0.785955},
              {'f1score': 0.45070, 'precision': 0.74744, 'recall': 0.858585}]},
                       {'final_score': {'avg': {'f1score': 0.495955,
                                        'mae': 0.429292,
                                       'nmae': 0.373739,
                                    'precision': 0.63932929,
                                     'recall': 0.729939393,
                                      'rmse': 0.3466868},
                              'stdev': {'f1score': 0.09938383 ,
                                       'mae': 0.0593933,
                                      'nmae': 0.03393939,
                                    'precision': 0.0192929,
                                     'recall': 0.031293939,
                                    'rmse': 0.234949494}}})
Thursday, January 26, 2012
Distributing the recommendation computations


       Use Hadoop and Map-Reduce intensively
                Investigating the Yelp mrjob framework     https://github.com/pfig/mrjob



       Develop the Netflix and novel standard-of-the-art used
                   Matrix Factorization, Singular Value Decomposition (SVD), Boltzman machines



         The most commonly used is Slope One technique.
                 Simple algebra math with slope one algebra y = a*x+b




Thursday, January 26, 2012
Cache/Paralelism with joblib
                                      http://packages.python.org/joblib/index.html


          from joblib import Memory
          memory = Memory(cachedir=’’, verbose=0)

           class UserSimilarity(BaseSimilarity):
               ...

                @memory.cache 
                def get_similarity(self, source_id, target_id):
                     source_preferences = self.model.preferences_from_user(source_id)
                     target_preferences = self.model.preferences_from_user(target_id)
                             ...
                       return self.distance(source_preferences, target_preferences) 
                           if not source_preferences.shape[1] == 0 
                               and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

                   def get_similarities(self, source_id):
                       return[(other_id, self.get_similarity(source_id, other_id))
                                         for other_id, v in self.model]




Thursday, January 26, 2012
Cache/Paralelism with joblib
                                      http://packages.python.org/joblib/index.html


          from joblib import Memory
          memory = Memory(cachedir=’’, verbose=0)

           class UserSimilarity(BaseSimilarity):
               ...

                @memory.cache 
                def get_similarity(self, source_id, target_id):
                     source_preferences = self.model.preferences_from_user(source_id)
                     target_preferences = self.model.preferences_from_user(target_id)
                             ...
                       return self.distance(source_preferences, target_preferences) 
                           if not source_preferences.shape[1] == 0 
                               and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

                   def get_similarities(self, source_id):
                       return[(other_id, self.get_similarity(source_id, other_id))
                                         for other_id, v in self.model]


 >>> #Without memory.cache




Thursday, January 26, 2012
Cache/Paralelism with joblib
                                      http://packages.python.org/joblib/index.html


          from joblib import Memory
          memory = Memory(cachedir=’’, verbose=0)

           class UserSimilarity(BaseSimilarity):
               ...

                @memory.cache 
                def get_similarity(self, source_id, target_id):
                     source_preferences = self.model.preferences_from_user(source_id)
                     target_preferences = self.model.preferences_from_user(target_id)
                             ...
                       return self.distance(source_preferences, target_preferences) 
                           if not source_preferences.shape[1] == 0 
                               and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

                   def get_similarities(self, source_id):
                       return[(other_id, self.get_similarity(source_id, other_id))
                                         for other_id, v in self.model]


 >>> #Without memory.cache                             >>># With memory.cache




Thursday, January 26, 2012
Cache/Paralelism with joblib
                                      http://packages.python.org/joblib/index.html


          from joblib import Memory
          memory = Memory(cachedir=’’, verbose=0)

           class UserSimilarity(BaseSimilarity):
               ...

                @memory.cache 
                def get_similarity(self, source_id, target_id):
                     source_preferences = self.model.preferences_from_user(source_id)
                     target_preferences = self.model.preferences_from_user(target_id)
                             ...
                       return self.distance(source_preferences, target_preferences) 
                           if not source_preferences.shape[1] == 0 
                               and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

                   def get_similarities(self, source_id):
                       return[(other_id, self.get_similarity(source_id, other_id))
                                         for other_id, v in self.model]


 >>> #Without memory.cache                             >>># With memory.cache
 >>> timeit similarity.get_similarities
        (‘marcel_caraciolo’)



Thursday, January 26, 2012
Cache/Paralelism with joblib
                                      http://packages.python.org/joblib/index.html


          from joblib import Memory
          memory = Memory(cachedir=’’, verbose=0)

           class UserSimilarity(BaseSimilarity):
               ...

                @memory.cache 
                def get_similarity(self, source_id, target_id):
                     source_preferences = self.model.preferences_from_user(source_id)
                     target_preferences = self.model.preferences_from_user(target_id)
                             ...
                       return self.distance(source_preferences, target_preferences) 
                           if not source_preferences.shape[1] == 0 
                               and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

                   def get_similarities(self, source_id):
                       return[(other_id, self.get_similarity(source_id, other_id))
                                         for other_id, v in self.model]


 >>> #Without memory.cache                             >>># With memory.cache
 >>> timeit similarity.get_similarities                 >>> timeit similarity.get_similarities
        (‘marcel_caraciolo’)                                   (‘marcel_caraciolo’)



Thursday, January 26, 2012
Cache/Paralelism with joblib
                                      http://packages.python.org/joblib/index.html


          from joblib import Memory
          memory = Memory(cachedir=’’, verbose=0)

           class UserSimilarity(BaseSimilarity):
               ...

                @memory.cache 
                def get_similarity(self, source_id, target_id):
                     source_preferences = self.model.preferences_from_user(source_id)
                     target_preferences = self.model.preferences_from_user(target_id)
                             ...
                       return self.distance(source_preferences, target_preferences) 
                           if not source_preferences.shape[1] == 0 
                               and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

                   def get_similarities(self, source_id):
                       return[(other_id, self.get_similarity(source_id, other_id))
                                         for other_id, v in self.model]


 >>> #Without memory.cache                             >>># With memory.cache
 >>> timeit similarity.get_similarities                 >>> timeit similarity.get_similarities
        (‘marcel_caraciolo’)                                   (‘marcel_caraciolo’)
       100 loops, best of 3: 978 ms per loop


Thursday, January 26, 2012
Cache/Paralelism with joblib
                                      http://packages.python.org/joblib/index.html


          from joblib import Memory
          memory = Memory(cachedir=’’, verbose=0)

           class UserSimilarity(BaseSimilarity):
               ...

                @memory.cache 
                def get_similarity(self, source_id, target_id):
                     source_preferences = self.model.preferences_from_user(source_id)
                     target_preferences = self.model.preferences_from_user(target_id)
                             ...
                       return self.distance(source_preferences, target_preferences) 
                           if not source_preferences.shape[1] == 0 
                               and not target_preferences.shape[1] == 0 else np.array([[np.nan]])

                   def get_similarities(self, source_id):
                       return[(other_id, self.get_similarity(source_id, other_id))
                                         for other_id, v in self.model]


 >>> #Without memory.cache                             >>># With memory.cache
 >>> timeit similarity.get_similarities                 >>> timeit similarity.get_similarities
        (‘marcel_caraciolo’)                                   (‘marcel_caraciolo’)
       100 loops, best of 3: 978 ms per loop                100 loops, best of 3: 434 ms per loop


Thursday, January 26, 2012
Distributed Computing with mrJob
                                  https://github.com/Yelp/mrjob




Thursday, January 26, 2012
Distributed Computing with mrJob
                                       https://github.com/Yelp/mrjob




             It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or
                                              local (for testing)


Thursday, January 26, 2012
Distributed Computing with mrJob
                                       https://github.com/Yelp/mrjob




             It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or
                                              local (for testing)


Thursday, January 26, 2012
Distributed Computing with mrJob
                                       https://github.com/Yelp/mrjob


                                                             """The classic MapReduce job: count the frequency of words.
                                                             """
                                                             from mrjob.job import MRJob
                                                             import re

                                                             WORD_RE = re.compile(r"[w']+")

                                                             class MRWordFreqCount(MRJob):

                                                                 def mapper(self, _, line):
                                                                     for word in WORD_RE.findall(line):
                                                                         yield (word.lower(), 1)

                                                                 def reducer(self, word, counts):
                                                                     yield (word, sum(counts))

                                                             if __name__ == '__main__':
                                                                 MRWordFreqCount.run()




             It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or
                                              local (for testing)


Thursday, January 26, 2012
Distributed Computing with mrJob
                                            https://github.com/Yelp/mrjob

   Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduce




Thursday, January 26, 2012
Distributed Computing with mrJob
                                            https://github.com/Yelp/mrjob

   Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduce




Thursday, January 26, 2012
Future studies with Sparse Matrices
               Real datasets come with lots of empty values
                 http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html



            Solutions:

                         scipy.sparse package

                         Sharding operations

                         Matrix Factorization
                          techniques (SVD)




                                                                 Apontador Reviews Dataset




Thursday, January 26, 2012
Future studies with Sparse Matrices
               Real datasets come with lots of empty values
                 http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html



            Solutions:

                         scipy.sparse package

                         Sharding operations

                         Matrix Factorization
                          techniques (SVD)




      Crab implements a Matrix
    Factorization with Expectation
       Maximization algorithm

                                                                 Apontador Reviews Dataset




Thursday, January 26, 2012
Future studies with Sparse Matrices
               Real datasets come with lots of empty values
                 http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html



            Solutions:

                         scipy.sparse package

                         Sharding operations

                         Matrix Factorization
                          techniques (SVD)




      Crab implements a Matrix
    Factorization with Expectation
       Maximization algorithm
                  scikits.crab.svd package
                                                                 Apontador Reviews Dataset




Thursday, January 26, 2012
Benchmarks

                                                   Pure Python w/   Python w/ Scipy
                       Dataset
                                                        dicts         and Numpy
            MovieLens 100k                            15.32 s           9.56 s
                http://www.grouplens.org/node/73



                                                      Old Crab         New Crab




Thursday, January 26, 2012
Benchmarks

                                                   Pure Python w/       Python w/ Scipy
                       Dataset
                                                        dicts             and Numpy
            MovieLens 100k                               15.32 s            9.56 s
                http://www.grouplens.org/node/73



                                                         Old Crab           New Crab




    Time ellapsed ( Recommend 5 items)



                                                     0              4   8       12        16




Thursday, January 26, 2012
Benchmarks

                                                   Pure Python w/       Python w/ Scipy
                       Dataset
                                                        dicts             and Numpy
            MovieLens 100k                               15.32 s            9.56 s
                http://www.grouplens.org/node/73



                                                         Old Crab           New Crab




    Time ellapsed ( Recommend 5 items)



                                                     0              4   8       12        16




Thursday, January 26, 2012
Benchmarks

                                                   Pure Python w/       Python w/ Scipy
                       Dataset
                                                        dicts             and Numpy
            MovieLens 100k                               15.32 s            9.56 s
                http://www.grouplens.org/node/73



                                                         Old Crab           New Crab




    Time ellapsed ( Recommend 5 items)



                                                     0              4   8       12        16




Thursday, January 26, 2012
Why migrate ?
         Old Crab running only using Pure Python
                 Recommendations demand heavy maths calculations and lots of processing

        Compatible with Numpy and Scipy libraries
               High Standard and popular scientific libraries optimized for scientific calculations in Python

        Scikits projects are amazing!
               Active Communities, Scientific Conferences and updated projects (e.g. scikit-learn)

       Turn the Crab framework visible for the community
          Join the scientific researchers and machine learning developers around the Globe coding with
                                          Python to help us in this project


                                          Be Fast and Furious

Thursday, January 26, 2012
How are we working ?
                               Sprints, Online Discussions and Issues




                 https://github.com/muricoca/crab/wiki/UpcomingEvents

Thursday, January 26, 2012
How are we working ?
                                   Our Project’s Home Page




                             http://muricoca.github.com/crab

Thursday, January 26, 2012
Future Releases
                             Planned Release 0.1
                      Collaborative Filtering Algorithms working, sample datasets to load and test


                             Planned Release 0.11
                        Evaluation of Recommendation Algorithms and Database Models support


                             Planned Release 0.12
                      Recommendation as Services with REST APIs




                 ....



Thursday, January 26, 2012
Join us!

                       1. Read our Wiki Page
                             https://github.com/muricoca/crab/wiki/Developer-Resources

                       2. Check out our current sprints and open issues
                             https://github.com/muricoca/crab/issues

                       3. Forks, Pull Requests mandatory

                      4. Join us at irc.freenode.net #muricoca or at our
                                     discussion list in work :(



Thursday, January 26, 2012
Thursday, January 26, 2012
Recomendação	
  em	
  redes	
  sociais
   !"#$%*'+,-)%
   ./0#$-+1'/% this engine with the popular brazilian social network AtéPassar
   Integrated
             More than 70.000 students registered studying for the public examinations


    Recommend StudyGroups, Friends,Video Classes, Questions and Concursos
           More than 70.000 items available for recommend
            %
                  %
  !"--(0".(12%&'()%*&+,-$%.,#/&
                  %
    Written in Python using a open-source framework Crab
 !"#"$%&&'%()*&+,-(.'&/,-0&+,-(.'&
      %12%&'303#2,&('",'&2,"&34&
                  %
                Framework available   for building recommender systems (My contribution)


     It is running since January 2011
            In March B#0-%<#+'CC#/3#$% was performed.
                      2011 , questionnaire
                     %% %&-$-C#0#$"%%
                                       Liked                  Not Liked
 -1'/"%

                                                 23%

mender Components Interac-
                                                         77%
                                                                   Figure 3: AtePassar Recommender Syste
                                                                   face
 hat students do not meet phys-
  Thursday, January 26, 2012
colecione descontos




                             WWW.
                             FAVORITOZ.
                             COM




Thursday, January 26, 2012
Thursday, January 26, 2012
Recomendações Sociais
   1. Usuário se loga via Facebook
   2. Usuário acessa a e-commerce parceira da LikeStore.
   3. Usuário já recebe recomendações personalizadas na entrada.
   4. Usuário recebe recomendações no carrinho de compras
   5. Usuário recebe recomendações na página do produto.

                                    Produtos Similares




                             Quem comprou este também comprou




                                  Amigos que curtiram/ compraram isto




Thursday, January 26, 2012
Construção	
  do	
  Social	
  Genoma	
  




Thursday, January 26, 2012
Alguém	
  duvida	
  ainda	
  ?




                              http://www.shopycat.com/
Thursday, January 26, 2012
Dicas




Thursday, January 26, 2012
Join us!

                       1. Read our Wiki Page
                             https://github.com/muricoca/crab/wiki/Developer-Resources

                       2. Check out our current sprints and open issues
                             https://github.com/muricoca/crab/issues

                       3. Forks, Pull Requests mandatory
                      4. Join us at irc.freenode.net #muricoca or at our
                       discussion list in scikit-crab@googlegroups.com




Thursday, January 26, 2012
Dicas para Arquitetura de Recomendação




Thursday, January 26, 2012
Dicas para Arquitetura de Recomendação




Thursday, January 26, 2012
Dicas para Arquitetura de Recomendação




Thursday, January 26, 2012
Dicas para Arquitetura de Recomendação




Thursday, January 26, 2012
Items Recomendados




       Toby Segaran, Programming Collective   SatnamAlag, Collective Intelligence in
       Intelligence, O'Reilly, 2007           Action, Manning Publications, 2009



         Sites como TechCrunch e ReadWriteWeb


Thursday, January 26, 2012
Conferências Recomendadas
        - ACM RecSys.

        –ICWSM: Weblogand Social Media

        –WebKDD: Web Knowledge Discovery and Data Mining

        –WWW: The original WWW conference

        –SIGIR: Information Retrieval

        –ACM KDD: Knowledge Discovery and Data Mining

        –ICML: Machine Learning

Thursday, January 26, 2012
Onde você estará em tudo
                           isso ?



                                                      Fonte: Hunch.com




                                   Obrigado !!
                             HUNCH Vendida ao Ebay por $80M

Thursday, January 26, 2012
Sistemas de
          Recomendação

        Marcel Pinheiro Caraciolo
           marcel@orygens.com
              @marcelcaraciolo




                                 http://www.orygens.com
Thursday, January 26, 2012
Optimizations with Cython
                                                 http://cython.org/


     Cython is a Python extension that lets developers annotate functions so they can be compiled to C.




                             http://aimotion.blogspot.com/2011/09/high-performance-computation-with_17.html




Thursday, January 26, 2012
Optimizations with Cython
                                                      http://cython.org/


     Cython is a Python extension that lets developers annotate functions so they can be compiled to C.

   # setup.py

   from distutils.core import setup

   from distutils.extension import Extension

   from Cython.Distutils import build_ext

   # for notes on compiler flags see:

   # http://docs.python.org/install/index.html

   setup(

   cmdclass = {'build_ext': build_ext},

   ext_modules = [Extension("spearman_correlation_cython",
    ["spearman_correlation_cython.pyx"])]

   )


                               http://aimotion.blogspot.com/2011/09/high-performance-computation-with_17.html




Thursday, January 26, 2012
Optimizations with Cython
                                                      http://cython.org/


     Cython is a Python extension that lets developers annotate functions so they can be compiled to C.

   # setup.py

   from distutils.core import setup

   from distutils.extension import Extension

   from Cython.Distutils import build_ext

   # for notes on compiler flags see:

   # http://docs.python.org/install/index.html

   setup(

   cmdclass = {'build_ext': build_ext},

   ext_modules = [Extension("spearman_correlation_cython",
    ["spearman_correlation_cython.pyx"])]

   )


                               http://aimotion.blogspot.com/2011/09/high-performance-computation-with_17.html




Thursday, January 26, 2012
Cache/Paralelism with joblib
                                   http://packages.python.org/joblib/index.html




            Investigate how to use multiprocessing and parallel packages with similarities
                                             computation




                 from joblib import Parallel
                 ...

            def get_similarities(self, source_id):
                return Parallel(n_jobs=3) ((other_id, delayed(self.get_similarity)
                    (source_id, other_id)) for other_id, v in self.model)




Thursday, January 26, 2012

More Related Content

Viewers also liked

Sistemas de Recomendação: Conceitos, Técnicas, Ferramentas e Aplicações
Sistemas de Recomendação: Conceitos, Técnicas, Ferramentas e AplicaçõesSistemas de Recomendação: Conceitos, Técnicas, Ferramentas e Aplicações
Sistemas de Recomendação: Conceitos, Técnicas, Ferramentas e AplicaçõesJonathas Magalhães
 
Sistemas Recomendação em Redes Sociais
Sistemas Recomendação em Redes SociaisSistemas Recomendação em Redes Sociais
Sistemas Recomendação em Redes SociaisNatã Melo
 
Python e Aprendizagem de Máquina (Inteligência Artificial)
Python e Aprendizagem de Máquina (Inteligência Artificial)Python e Aprendizagem de Máquina (Inteligência Artificial)
Python e Aprendizagem de Máquina (Inteligência Artificial)Marcel Caraciolo
 
Playcenter - abr.07
Playcenter - abr.07Playcenter - abr.07
Playcenter - abr.07Jubrac Jacui
 
Niver Caio - 17.06.07
Niver Caio - 17.06.07Niver Caio - 17.06.07
Niver Caio - 17.06.07Jubrac Jacui
 
Presentatie Transmedia
Presentatie TransmediaPresentatie Transmedia
Presentatie TransmediaSjef Kerkhofs
 
ZFS - Zettabyte File System
ZFS - Zettabyte File SystemZFS - Zettabyte File System
ZFS - Zettabyte File SystemCataldo Cigliola
 
Cha Bar Tati - 12.05.07
Cha Bar Tati - 12.05.07Cha Bar Tati - 12.05.07
Cha Bar Tati - 12.05.07Jubrac Jacui
 
Social media school 2011 webversie
Social media school 2011 webversieSocial media school 2011 webversie
Social media school 2011 webversieSjef Kerkhofs
 
Lcu14 wrap up meeting. Summary of Core Develoment teams achievements
Lcu14 wrap up meeting. Summary of Core Develoment teams achievementsLcu14 wrap up meeting. Summary of Core Develoment teams achievements
Lcu14 wrap up meeting. Summary of Core Develoment teams achievementsAgustin Benito Bethencourt
 
Pelajaran 1 Bm
Pelajaran 1 BmPelajaran 1 Bm
Pelajaran 1 Bmamoi286
 
IT User Apprenticeship at Happy Computers
IT User Apprenticeship at Happy ComputersIT User Apprenticeship at Happy Computers
IT User Apprenticeship at Happy ComputersPaul McElvaney
 
נוכחות אונליין - המכללה לעסקים קטנים, המכללה למנהל
נוכחות אונליין - המכללה לעסקים קטנים, המכללה למנהל    נוכחות אונליין - המכללה לעסקים קטנים, המכללה למנהל
נוכחות אונליין - המכללה לעסקים קטנים, המכללה למנהל Udi Salant
 
090703 Sns Miyamura
090703 Sns Miyamura090703 Sns Miyamura
090703 Sns Miyamurayuu_2003
 
The Mighty Gabby Embodying Resistance in the Creative Process
The Mighty Gabby Embodying Resistance in the Creative Process The Mighty Gabby Embodying Resistance in the Creative Process
The Mighty Gabby Embodying Resistance in the Creative Process Ian Walcott-Skinner
 
The Eastern Origins Of Western Civilization Editted
The Eastern Origins Of Western Civilization EdittedThe Eastern Origins Of Western Civilization Editted
The Eastern Origins Of Western Civilization Edittedguestecd0c6
 

Viewers also liked (20)

Sistemas de Recomendação: Conceitos, Técnicas, Ferramentas e Aplicações
Sistemas de Recomendação: Conceitos, Técnicas, Ferramentas e AplicaçõesSistemas de Recomendação: Conceitos, Técnicas, Ferramentas e Aplicações
Sistemas de Recomendação: Conceitos, Técnicas, Ferramentas e Aplicações
 
Sistemas Recomendação em Redes Sociais
Sistemas Recomendação em Redes SociaisSistemas Recomendação em Redes Sociais
Sistemas Recomendação em Redes Sociais
 
Python e Aprendizagem de Máquina (Inteligência Artificial)
Python e Aprendizagem de Máquina (Inteligência Artificial)Python e Aprendizagem de Máquina (Inteligência Artificial)
Python e Aprendizagem de Máquina (Inteligência Artificial)
 
Scmad Chapter14
Scmad Chapter14Scmad Chapter14
Scmad Chapter14
 
Playcenter - abr.07
Playcenter - abr.07Playcenter - abr.07
Playcenter - abr.07
 
Niver Caio - 17.06.07
Niver Caio - 17.06.07Niver Caio - 17.06.07
Niver Caio - 17.06.07
 
我们的故事
我们的故事我们的故事
我们的故事
 
Presentatie Transmedia
Presentatie TransmediaPresentatie Transmedia
Presentatie Transmedia
 
ZFS - Zettabyte File System
ZFS - Zettabyte File SystemZFS - Zettabyte File System
ZFS - Zettabyte File System
 
Cha Bar Tati - 12.05.07
Cha Bar Tati - 12.05.07Cha Bar Tati - 12.05.07
Cha Bar Tati - 12.05.07
 
Social media school 2011 webversie
Social media school 2011 webversieSocial media school 2011 webversie
Social media school 2011 webversie
 
Lcu14 wrap up meeting. Summary of Core Develoment teams achievements
Lcu14 wrap up meeting. Summary of Core Develoment teams achievementsLcu14 wrap up meeting. Summary of Core Develoment teams achievements
Lcu14 wrap up meeting. Summary of Core Develoment teams achievements
 
Cradle To Cradle Kort
Cradle To Cradle KortCradle To Cradle Kort
Cradle To Cradle Kort
 
Pelajaran 1 Bm
Pelajaran 1 BmPelajaran 1 Bm
Pelajaran 1 Bm
 
IT User Apprenticeship at Happy Computers
IT User Apprenticeship at Happy ComputersIT User Apprenticeship at Happy Computers
IT User Apprenticeship at Happy Computers
 
נוכחות אונליין - המכללה לעסקים קטנים, המכללה למנהל
נוכחות אונליין - המכללה לעסקים קטנים, המכללה למנהל    נוכחות אונליין - המכללה לעסקים קטנים, המכללה למנהל
נוכחות אונליין - המכללה לעסקים קטנים, המכללה למנהל
 
Circuitos De Potencia
Circuitos De PotenciaCircuitos De Potencia
Circuitos De Potencia
 
090703 Sns Miyamura
090703 Sns Miyamura090703 Sns Miyamura
090703 Sns Miyamura
 
The Mighty Gabby Embodying Resistance in the Creative Process
The Mighty Gabby Embodying Resistance in the Creative Process The Mighty Gabby Embodying Resistance in the Creative Process
The Mighty Gabby Embodying Resistance in the Creative Process
 
The Eastern Origins Of Western Civilization Editted
The Eastern Origins Of Western Civilization EdittedThe Eastern Origins Of Western Civilization Editted
The Eastern Origins Of Western Civilization Editted
 

Similar to Sistemas de Recomendação: Como funciona e Onde Se aplica?

UX: What Not to Do
UX: What Not to DoUX: What Not to Do
UX: What Not to DoRob Surrency
 
Rapid Prototype the User Experience
Rapid Prototype the User ExperienceRapid Prototype the User Experience
Rapid Prototype the User ExperienceHong Qu
 
Researcher online 1 Building an Online Identity
Researcher online 1 Building an Online IdentityResearcher online 1 Building an Online Identity
Researcher online 1 Building an Online IdentityHelen Webster
 
LUXr 1-day workshop, July 18, 2012 [San Francisco]
LUXr 1-day workshop, July 18, 2012 [San Francisco]LUXr 1-day workshop, July 18, 2012 [San Francisco]
LUXr 1-day workshop, July 18, 2012 [San Francisco]LUXr
 
LUXr 1-day workshop, Wed November 07, 2012 [San Francisco]
LUXr 1-day workshop, Wed November 07, 2012 [San Francisco]LUXr 1-day workshop, Wed November 07, 2012 [San Francisco]
LUXr 1-day workshop, Wed November 07, 2012 [San Francisco]LUXr
 
LUXr 1-day workshop, June 13, 2012 [San Francisco]
LUXr 1-day workshop, June 13, 2012 [San Francisco]LUXr 1-day workshop, June 13, 2012 [San Francisco]
LUXr 1-day workshop, June 13, 2012 [San Francisco]LUXr
 
Demystifying User Experience
Demystifying User ExperienceDemystifying User Experience
Demystifying User ExperienceUday Shankar
 
Social Media Workshop
Social Media WorkshopSocial Media Workshop
Social Media WorkshopNick Betts
 
LUXr User Experience in Lean Startups : 2-day workshop for Startup Hawaii, Ju...
LUXr User Experience in Lean Startups : 2-day workshop for Startup Hawaii, Ju...LUXr User Experience in Lean Startups : 2-day workshop for Startup Hawaii, Ju...
LUXr User Experience in Lean Startups : 2-day workshop for Startup Hawaii, Ju...LUXr
 
Pukunui Moodle Intro
Pukunui Moodle IntroPukunui Moodle Intro
Pukunui Moodle IntroShane Elliott
 
Rally Roundtable : Lean Startup + User Experience = Awesome, July 11, 2012 [S...
Rally Roundtable : Lean Startup + User Experience = Awesome, July 11, 2012 [S...Rally Roundtable : Lean Startup + User Experience = Awesome, July 11, 2012 [S...
Rally Roundtable : Lean Startup + User Experience = Awesome, July 11, 2012 [S...LUXr
 
Rally Roundtable : Lean Startup + User Experience = Awesome, July 11, 2012 [S...
Rally Roundtable : Lean Startup + User Experience = Awesome, July 11, 2012 [S...Rally Roundtable : Lean Startup + User Experience = Awesome, July 11, 2012 [S...
Rally Roundtable : Lean Startup + User Experience = Awesome, July 11, 2012 [S...Kate Rutter
 
LUXr 1-day workshop, August 15, 2012 [San Francisco]
LUXr 1-day workshop, August 15, 2012 [San Francisco]LUXr 1-day workshop, August 15, 2012 [San Francisco]
LUXr 1-day workshop, August 15, 2012 [San Francisco]LUXr
 
Digi foot 2012
Digi foot 2012Digi foot 2012
Digi foot 2012tpoelzer
 
Julio presentation at_celtech
Julio presentation at_celtechJulio presentation at_celtech
Julio presentation at_celtechlabmeetings
 
LUXr 1-day workshop, April 27, 2012 [San Francisco]
LUXr 1-day workshop, April 27, 2012 [San Francisco]LUXr 1-day workshop, April 27, 2012 [San Francisco]
LUXr 1-day workshop, April 27, 2012 [San Francisco]LUXr
 
Pac 021113
Pac 021113Pac 021113
Pac 021113tlokey
 

Similar to Sistemas de Recomendação: Como funciona e Onde Se aplica? (20)

UX: What Not to Do
UX: What Not to DoUX: What Not to Do
UX: What Not to Do
 
Rapid Prototype the User Experience
Rapid Prototype the User ExperienceRapid Prototype the User Experience
Rapid Prototype the User Experience
 
Technology in Education, 4 10-12
Technology in Education, 4 10-12Technology in Education, 4 10-12
Technology in Education, 4 10-12
 
Researcher online 1 Building an Online Identity
Researcher online 1 Building an Online IdentityResearcher online 1 Building an Online Identity
Researcher online 1 Building an Online Identity
 
LUXr 1-day workshop, July 18, 2012 [San Francisco]
LUXr 1-day workshop, July 18, 2012 [San Francisco]LUXr 1-day workshop, July 18, 2012 [San Francisco]
LUXr 1-day workshop, July 18, 2012 [San Francisco]
 
LUXr 1-day workshop, Wed November 07, 2012 [San Francisco]
LUXr 1-day workshop, Wed November 07, 2012 [San Francisco]LUXr 1-day workshop, Wed November 07, 2012 [San Francisco]
LUXr 1-day workshop, Wed November 07, 2012 [San Francisco]
 
LUXr 1-day workshop, June 13, 2012 [San Francisco]
LUXr 1-day workshop, June 13, 2012 [San Francisco]LUXr 1-day workshop, June 13, 2012 [San Francisco]
LUXr 1-day workshop, June 13, 2012 [San Francisco]
 
Demystifying User Experience
Demystifying User ExperienceDemystifying User Experience
Demystifying User Experience
 
Social Media Workshop
Social Media WorkshopSocial Media Workshop
Social Media Workshop
 
LUXr User Experience in Lean Startups : 2-day workshop for Startup Hawaii, Ju...
LUXr User Experience in Lean Startups : 2-day workshop for Startup Hawaii, Ju...LUXr User Experience in Lean Startups : 2-day workshop for Startup Hawaii, Ju...
LUXr User Experience in Lean Startups : 2-day workshop for Startup Hawaii, Ju...
 
Pukunui Moodle Intro
Pukunui Moodle IntroPukunui Moodle Intro
Pukunui Moodle Intro
 
Rally Roundtable : Lean Startup + User Experience = Awesome, July 11, 2012 [S...
Rally Roundtable : Lean Startup + User Experience = Awesome, July 11, 2012 [S...Rally Roundtable : Lean Startup + User Experience = Awesome, July 11, 2012 [S...
Rally Roundtable : Lean Startup + User Experience = Awesome, July 11, 2012 [S...
 
Rally Roundtable : Lean Startup + User Experience = Awesome, July 11, 2012 [S...
Rally Roundtable : Lean Startup + User Experience = Awesome, July 11, 2012 [S...Rally Roundtable : Lean Startup + User Experience = Awesome, July 11, 2012 [S...
Rally Roundtable : Lean Startup + User Experience = Awesome, July 11, 2012 [S...
 
LUXr 1-day workshop, August 15, 2012 [San Francisco]
LUXr 1-day workshop, August 15, 2012 [San Francisco]LUXr 1-day workshop, August 15, 2012 [San Francisco]
LUXr 1-day workshop, August 15, 2012 [San Francisco]
 
Digi foot 2012
Digi foot 2012Digi foot 2012
Digi foot 2012
 
Julio presentation at_celtech
Julio presentation at_celtechJulio presentation at_celtech
Julio presentation at_celtech
 
Social Monitoring, Intelligence and Brand Chatter
Social Monitoring, Intelligence and Brand ChatterSocial Monitoring, Intelligence and Brand Chatter
Social Monitoring, Intelligence and Brand Chatter
 
LUXr 1-day workshop, April 27, 2012 [San Francisco]
LUXr 1-day workshop, April 27, 2012 [San Francisco]LUXr 1-day workshop, April 27, 2012 [San Francisco]
LUXr 1-day workshop, April 27, 2012 [San Francisco]
 
Pac 021113
Pac 021113Pac 021113
Pac 021113
 
Ecology Online Class
Ecology Online ClassEcology Online Class
Ecology Online Class
 

More from Marcel Caraciolo

Como interpretar seu próprio genoma com Python
Como interpretar seu próprio genoma com PythonComo interpretar seu próprio genoma com Python
Como interpretar seu próprio genoma com PythonMarcel Caraciolo
 
Joblib: Lightweight pipelining for parallel jobs (v2)
Joblib:  Lightweight pipelining for parallel jobs (v2)Joblib:  Lightweight pipelining for parallel jobs (v2)
Joblib: Lightweight pipelining for parallel jobs (v2)Marcel Caraciolo
 
Construindo softwares de bioinformática para análises clínicas : Desafios e...
Construindo softwares  de bioinformática  para análises clínicas : Desafios e...Construindo softwares  de bioinformática  para análises clínicas : Desafios e...
Construindo softwares de bioinformática para análises clínicas : Desafios e...Marcel Caraciolo
 
Como Python ajudou a automatizar o nosso laboratório v.2
Como Python ajudou a automatizar o nosso laboratório v.2Como Python ajudou a automatizar o nosso laboratório v.2
Como Python ajudou a automatizar o nosso laboratório v.2Marcel Caraciolo
 
Como Python pode ajudar na automação do seu laboratório
Como Python pode ajudar na automação do  seu laboratórioComo Python pode ajudar na automação do  seu laboratório
Como Python pode ajudar na automação do seu laboratórioMarcel Caraciolo
 
Python on Science ? Yes, We can.
Python on Science ?   Yes, We can.Python on Science ?   Yes, We can.
Python on Science ? Yes, We can.Marcel Caraciolo
 
Oficina Python: Hackeando a Web com Python 3
Oficina Python: Hackeando a Web com Python 3Oficina Python: Hackeando a Web com Python 3
Oficina Python: Hackeando a Web com Python 3Marcel Caraciolo
 
Recommender Systems with Ruby (adding machine learning, statistics, etc)
Recommender Systems with Ruby (adding machine learning, statistics, etc)Recommender Systems with Ruby (adding machine learning, statistics, etc)
Recommender Systems with Ruby (adding machine learning, statistics, etc)Marcel Caraciolo
 
Opensource - Como começar e dá dinheiro ?
Opensource - Como começar e dá dinheiro ?Opensource - Como começar e dá dinheiro ?
Opensource - Como começar e dá dinheiro ?Marcel Caraciolo
 
Benchy, python framework for performance benchmarking of Python Scripts
Benchy, python framework for performance benchmarking  of Python ScriptsBenchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking of Python ScriptsMarcel Caraciolo
 
Python e 10 motivos por que devo conhece-la ?
Python e 10 motivos por que devo conhece-la ?Python e 10 motivos por que devo conhece-la ?
Python e 10 motivos por que devo conhece-la ?Marcel Caraciolo
 
GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...
GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...
GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...Marcel Caraciolo
 
Benchy: Lightweight framework for Performance Benchmarks
Benchy: Lightweight framework for Performance Benchmarks Benchy: Lightweight framework for Performance Benchmarks
Benchy: Lightweight framework for Performance Benchmarks Marcel Caraciolo
 
Construindo Sistemas de Recomendação com Python
Construindo Sistemas de Recomendação com PythonConstruindo Sistemas de Recomendação com Python
Construindo Sistemas de Recomendação com PythonMarcel Caraciolo
 
Python, A pílula Azul da programação
Python, A pílula Azul da programaçãoPython, A pílula Azul da programação
Python, A pílula Azul da programaçãoMarcel Caraciolo
 
Construindo Soluções Científicas com Big Data & MapReduce
Construindo Soluções Científicas com Big Data & MapReduceConstruindo Soluções Científicas com Big Data & MapReduce
Construindo Soluções Científicas com Big Data & MapReduceMarcel Caraciolo
 
Como Python está mudando a forma de aprendizagem à distância no Brasil
Como Python está mudando a forma de aprendizagem à distância no BrasilComo Python está mudando a forma de aprendizagem à distância no Brasil
Como Python está mudando a forma de aprendizagem à distância no BrasilMarcel Caraciolo
 
Novas Tendências para a Educação a Distância: Como reinventar a educação ?
Novas Tendências para a Educação a Distância: Como reinventar a educação ?Novas Tendências para a Educação a Distância: Como reinventar a educação ?
Novas Tendências para a Educação a Distância: Como reinventar a educação ?Marcel Caraciolo
 
Aula WebCrawlers com Regex - PyCursos
Aula WebCrawlers com Regex - PyCursosAula WebCrawlers com Regex - PyCursos
Aula WebCrawlers com Regex - PyCursosMarcel Caraciolo
 

More from Marcel Caraciolo (20)

Como interpretar seu próprio genoma com Python
Como interpretar seu próprio genoma com PythonComo interpretar seu próprio genoma com Python
Como interpretar seu próprio genoma com Python
 
Joblib: Lightweight pipelining for parallel jobs (v2)
Joblib:  Lightweight pipelining for parallel jobs (v2)Joblib:  Lightweight pipelining for parallel jobs (v2)
Joblib: Lightweight pipelining for parallel jobs (v2)
 
Construindo softwares de bioinformática para análises clínicas : Desafios e...
Construindo softwares  de bioinformática  para análises clínicas : Desafios e...Construindo softwares  de bioinformática  para análises clínicas : Desafios e...
Construindo softwares de bioinformática para análises clínicas : Desafios e...
 
Como Python ajudou a automatizar o nosso laboratório v.2
Como Python ajudou a automatizar o nosso laboratório v.2Como Python ajudou a automatizar o nosso laboratório v.2
Como Python ajudou a automatizar o nosso laboratório v.2
 
Como Python pode ajudar na automação do seu laboratório
Como Python pode ajudar na automação do  seu laboratórioComo Python pode ajudar na automação do  seu laboratório
Como Python pode ajudar na automação do seu laboratório
 
Python on Science ? Yes, We can.
Python on Science ?   Yes, We can.Python on Science ?   Yes, We can.
Python on Science ? Yes, We can.
 
Oficina Python: Hackeando a Web com Python 3
Oficina Python: Hackeando a Web com Python 3Oficina Python: Hackeando a Web com Python 3
Oficina Python: Hackeando a Web com Python 3
 
Recommender Systems with Ruby (adding machine learning, statistics, etc)
Recommender Systems with Ruby (adding machine learning, statistics, etc)Recommender Systems with Ruby (adding machine learning, statistics, etc)
Recommender Systems with Ruby (adding machine learning, statistics, etc)
 
Opensource - Como começar e dá dinheiro ?
Opensource - Como começar e dá dinheiro ?Opensource - Como começar e dá dinheiro ?
Opensource - Como começar e dá dinheiro ?
 
Big Data com Python
Big Data com PythonBig Data com Python
Big Data com Python
 
Benchy, python framework for performance benchmarking of Python Scripts
Benchy, python framework for performance benchmarking  of Python ScriptsBenchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking of Python Scripts
 
Python e 10 motivos por que devo conhece-la ?
Python e 10 motivos por que devo conhece-la ?Python e 10 motivos por que devo conhece-la ?
Python e 10 motivos por que devo conhece-la ?
 
GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...
GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...
GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...
 
Benchy: Lightweight framework for Performance Benchmarks
Benchy: Lightweight framework for Performance Benchmarks Benchy: Lightweight framework for Performance Benchmarks
Benchy: Lightweight framework for Performance Benchmarks
 
Construindo Sistemas de Recomendação com Python
Construindo Sistemas de Recomendação com PythonConstruindo Sistemas de Recomendação com Python
Construindo Sistemas de Recomendação com Python
 
Python, A pílula Azul da programação
Python, A pílula Azul da programaçãoPython, A pílula Azul da programação
Python, A pílula Azul da programação
 
Construindo Soluções Científicas com Big Data & MapReduce
Construindo Soluções Científicas com Big Data & MapReduceConstruindo Soluções Científicas com Big Data & MapReduce
Construindo Soluções Científicas com Big Data & MapReduce
 
Como Python está mudando a forma de aprendizagem à distância no Brasil
Como Python está mudando a forma de aprendizagem à distância no BrasilComo Python está mudando a forma de aprendizagem à distância no Brasil
Como Python está mudando a forma de aprendizagem à distância no Brasil
 
Novas Tendências para a Educação a Distância: Como reinventar a educação ?
Novas Tendências para a Educação a Distância: Como reinventar a educação ?Novas Tendências para a Educação a Distância: Como reinventar a educação ?
Novas Tendências para a Educação a Distância: Como reinventar a educação ?
 
Aula WebCrawlers com Regex - PyCursos
Aula WebCrawlers com Regex - PyCursosAula WebCrawlers com Regex - PyCursos
Aula WebCrawlers com Regex - PyCursos
 

Recently uploaded

IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 

Recently uploaded (20)

IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 

Sistemas de Recomendação: Como funciona e Onde Se aplica?

  • 1. Sistemas de Recomendação Marcel Pinheiro Caraciolo marcel@orygens.com @marcelcaraciolo http://www.orygens.com Thursday, January 26, 2012
  • 2. Quem é Marcel ? Marcel Pinheiro Caraciolo - @marcelcaraciolo Sergipano, porém Recifense. Mestre em Ciência da Computação no CIN/UFPE na área de mineração de dados Diretor de Pesquisa e Desenvolvimento na Orygens Membro e Moderador da Celúla de Usuários Python de Pernambuco (PUG-PE) Minhas áreas de interesse: Computação móvel e Computação inteligente Meus blogs: http://www.mobideia.com (sobre Mobilidade desde 2006) http://aimotion.blogspot.com (sobre I.A. desde 2009) Jovem Aprendiz ainda nas artes pythonicas.... (desde 2007) Thursday, January 26, 2012
  • 5. 1.0 2.0 Fonte de Informação Fluxo Contínuo de Informação VI Encontro do PUG-PE VI Encontro do PUG-PE Thursday, January 26, 2012
  • 6. WEB SITES WEB APPLICATIONS WEB SERVICES 3.0 SEMANTIC WEB USERS VI Encontro do PUG-PE VI Encontro do PUG-PE Thursday, January 26, 2012
  • 7. Usar informação coletiva de forma efetiva afim de aprimorar uma aplicação Thursday, January 26, 2012
  • 8. Intelligence from Mining Data User User User User User Um usuário influencia outros por resenhas, notas, recomendações e blogs Um usuário é influenciado por outros por resenhas, notas, recomendações e blogs Thursday, January 26, 2012
  • 9. aggregation information: lists ratings user-generated content reviews blogs recommendations wikis Collective Intelligence voting Your application bookmarking Search tag cloud tagging saving Natural Language Processing Clustering and Harness external content predictive models Thursday, January 26, 2012
  • 10. WEB SITES WEB APPLICATIONS WEB SERVICES 3.0 SEMANTIC WEB USERS antes... VI Encontro do PUG-PE VI Encontro do PUG-PE Friday, October 1, 20102012 Thursday, January 26,
  • 12. estamos sobrecarregados de informações Thursday, January 26, 2012
  • 13. muitas vezes inúteis Thursday, January 26, 2012 Friday, October 1, 2010
  • 14. às vezes procuramos isso... Friday, October 1, 2010 2012 Thursday, January 26,
  • 15. e encontramos isso! Friday, October 1, 2010 2012 Thursday, January 26,
  • 16. google? Friday, October 1, 2010 2012 Thursday, January 26,
  • 17. google? midias sociais? Friday, October 1, 20102012 Thursday, January 26,
  • 18. eeeeuuuu... google? midias sociais? riday, October 1, 2010 2012 Thursday, January 26,
  • 20. “A lot of times, people don’t know what they want until you show it to them.” Steve Jobs “We are leaving the Information age, and entering into the Recommendation age.” Chris Anderson, from book Long Tail Thursday, January 26, 2012
  • 21. Recomendações Sociais Família/Amigos Amigos/ Família O Que eu deveria ler ? Ref: Flickr-BlueAlgae “Eu acho que você deveria ler Ref: Flickr photostream: jefield estes livros. Thursday, January 26, 2012
  • 22. Recomendações por Interação Entrada: Avalie alguns livros O Que eu deveria ler ? Saída: “Livros que você pode gostar são …” Thursday, January 26, 2012
  • 23. Sistemas desenhados para sugerir algo para mim do meu interesse! Thursday, January 26, 2012
  • 24. Por que Recomendação ? Thursday, January 26, 2012
  • 25. Netflix - 2/3 dos filmes alugados vêm de recomendação Google News - 38% das notícias mais clicadas vêm de recomendação Amazon - 38% das vendas vêm de recomendação Fonte: Celma & Lamere, ISMIR 2007 Thursday, January 26, 2012
  • 26. !"#$%"#&'"%(&$)") Nós+,&-.$/).#&0#/"1.#$%234(".# * estamos sobrecarregados de informação $/)#5(&6 7&.2.#"$4,#)$8 * 93((3&/.#&0#:&'3".;#5&&<.# $/)#:-.34#2%$4<.#&/(3/" Milhares de artigos e posts * =/#>$/&3;#?#@A#+B#4,$//"(.;# novos todos os dias 2,&-.$/).#&0#7%&6%$:.# "$4,#)$8 * =/#C"1#D&%<;#."'"%$(# Milhões de Músicas, Filmes e 2,&-.$/).#&0#$)#:"..$6".# Livros ."/2#2&#-.#7"%#)$8 Milhares de Ofertas e Promoções Thursday, January 26, 2012
  • 27. O que pode ser recomendado ? Contatos em Redes Sociais Artigos Produtos Messagens de Propaganda Cursos e-learning Livros Tags Músicas Futuras namoradas Roupas Filmes Restaurantes Programas de Tv Vídeos Papers Opções de Investimento Profissionais Módulos de código Thursday, January 26, 2012
  • 28. E como funciona a recomendação ? Thursday, January 26, 2012
  • 29. O que os sistemas de recomendação realmente fazem ? 1. Prediz o quanto você pode gostar de um certo produto ou serviço 2. Sugere um lista de N items ordenada de acordo com seu interese 3. Sugere uma lista de N usuários ordernada para um produto/serviço 4. Explica a você o porque esses items foram recomendados 5. Ajusta a predição e a recomendação baseado em seu feedback e de outros. Thursday, January 26, 2012
  • 30. Filtragem baseada por Conteúdo Similar Duro de O Vento Toy Armagedon Items Matar Levou Store recomenda gosta Marcel Usuários Thursday, January 26, 2012
  • 31. Problemas com filtragem por conteúdo 1. Análise dos dados Restrita - Items e usuários pouco detalhados. Pior em áudio ou imagens 2. Dados Especializados - Uma pessoa que não tem experiência com Sushi não recebe o melhor restaurante de Sushi da cidade 3. Efeito Portfólio - Só porque eu vi 1 filme da Xuxa quando criança, tem que me recomendar todos dela Thursday, January 26, 2012
  • 32. Filtragem Colaborativa O Vento Toy Thor Armagedon Items Levou Store gosta recomenda Marcel Rafael Amanda Usuários Similar Thursday, January 26, 2012
  • 33. Problemas com filtragem colaborativa 1. Escabilidade - Amazon com 5M usuários, 50K items, 1.4B avaliações 2. Dados esparsos - Novos usuários e items que não tem histórico 3. Partida Fria - Só avaliei apenas um único livro no Amazon! 4. Popularidade - Todo mundo lê ‘Harry Potter’ 5. Hacking - A pessoa que lê ‘Harry Potter’ lê Kama Sutra Thursday, January 26, 2012
  • 34. Filtragem Híbrida Combinação de múltiplos métodos Duro de O Vento Toy Armagedon Items Matar Levou Store Ontologias Dados Símbolicos Marcel Rafael Luciana Usuários Thursday, January 26, 2012
  • 35. Como eles são apresentados ? Destaques Mais sobre este artista... Alguem similar a você também gostou disso O mais popular em seu grupo... Já que você escutou esta, você pode querer esta... Lançamentos Escute músicas de artistas similares. Estes dois item vêm juntos.. Thursday, January 26, 2012
  • 36. Como eles são avaliados ? Como sabemos se a recomendação é boa ? Geralmente se divide-se em treinamento/teste (80/20) Críterios utilizados: - Erro de Predição: RMSE - Curva ROC*, rank-utility, F-Measure *http://code.google.com/p/pyplotmining/ Thursday, January 26, 2012
  • 38. Por que mobile ? Mais de 1 bilhão de Aparelhos Mais de 5 bilhões de apps baixadas Destaque no segmento mobile http://foursquare.com http://vimeo.com/29323612 Thursday, January 26, 2012
  • 39. Sistemas de Recomendação Móvel Deve-se levar em conta informações temporais e espaciais Como definir que contexto ele está inserido ? E as avaliações como ser capturadas em uma tela limitada? Thursday, January 26, 2012
  • 40. a strong heterogeneity. At case study is carried out in Section 5. Finaly, the ser's location is constantly conclusion of this paper and future work ata-processing capability in overview are discussed in Section 6. WSEAS TRANSACTIONS on COMPUTERS services on the system ht new challenges [4-6]. type of location-based approach, users want to be e real-time and targeted 2 System Workflow and Architecture Arquitetura Figure 1 gives the workflow of our system. repackage the heterogeneous data and service, and republic them as web service. The service com new code to not just the indexed Users can send their inquiries demand by successful design of this module is the key After an simply on a static operating in the mobile phone. And the client problem for realization of cross-platform new appl mechanism tly, the rise of a large .0 applications (blog, Recomendações processadas via Mobile (Inviável Hoje) will get the current location information and sent it together with users’ inqueries demand to the service and data sharing. The functional layer has three components as Multi-Mode Location Information Index, service m large-scal Web Albums, Blog and server. Server-side application will analyze the Thus it ca tes that users have the very relevant data and provide matched restaurant Context-based Collaborative Filtering changing of direct, rapid, useful and recommendation and navigation. Algorithm, and Location-based Personalized So in th tion recommendation and - Tudo é processado em Back-End (Servidor) Application data information of our system e enviado ao celular via Web Recommendation and Navigation. We will and Serv ]. can be divided into two parts: the location-based discuss every function component in details as Middlewa n can be user-friendly data (such as traffic and road condition data, follows. Architectu GPS map, and entity information, etc.) and the two techn ient mobile terminals, It Value-added Services integration a very important research value-added data provided by users (such as combinati in Web 2.0 very wide market prospect. Ratings, Comments, Blog and Tags, etc.). User Tagging !!Despite th Value-added DB signs and realizes a Comments Tags Information Publish platforms h User mobile restaurant Ratings …..…. Recommendation information navigation system. In order Restaurant Query ……... Ping” webs side response speed for facilities ra propose a memory pool Location-based DB website, wh Client However, it Accept command, no-data GPS-info E-Map Entity-info ……... Mobile Information Pushing Platform static guidin terrupt mechanism, which Prescribed Location-based Info. mobile loca Context-based Location-based ize the server-side control Users‘ Collaborative Filtering inconvenien personalized ient side, we combine the Matched Entity Collaborative recommendation and with the visi lication data with the & Route Info. Recommendation & Multi-Mode Location Navigation In order Entity Feature Info. scenario as nd propose a collaborative Information Index mmend mechanisms, which and propose Server h real-time location-based Let us Personalized Location-based Data and Service Middleware example. ecommend personalized Location-based Value-added DB location a Restaurant Comments Tags Recommendation & from its c ually provide personalized Ratings …..…. through th Navigation Services ndation to build their own Clien informatio Location-based h can help them to consider Services informatio munity users!collaborative Fig.1. System Workflow Location-based GPS Navigation current lo DB informatio Location-based info Traffic-info Booking the targe E-Map Entity-query informatio matching 810 Issue 5, Volume 6, May 2009 informatio Fig 1. Architecture of the Mobile Information Thursday, January 26, 2012 Accordi
  • 41. Informações Disponíveis Localização, Tags, Contexto Thursday, January 26, 2012
  • 42. Informações Disponíveis Avaliação Implícita Thursday, January 26, 2012
  • 43. Um dos mais populares sistemas de localização móvel Checkins, diga aonde você está! Recomendações de lugares Thursday, January 26, 2012
  • 44. Assistente Virtual Móvel Conversacional Já se utiliza de informações das redes Sociais Recomendação de Restaurantes Thursday, January 26, 2012
  • 45. Google HotPot Repositório de Reviews Recomendação de Lugares Thursday, January 26, 2012
  • 47. Meu trabalho de Mestrado Offering Products and Services Using Product Reviews from Social Networks in Mobile Decision Aid Systems Marcel Caraciolo∗ and Germano Vasconcelos† Informatics Center Federal University Of Pernambuco WebSite: http://www.cin.ufpe.br/ Email: ∗ mpc@cin.ufpe.br † gcv@cin.ufpe.br Abstract—Recommendation engines provide information fil- extremely used by users to give a more nuanced view about tering functions and decision aids that have a great potential a product in order to make an informed decision [5]. application the mobile context. An aspect that hasn’t been Nonetheless, providing users with relevant recommenda- extensively exploited yet in the current recommendations is the improvement in the explanation of the recommendation. tion information it is a difficult task. Besides the technical For instance, exploiting the service and product description components such as the user model representation and infor- and the opinion of users about the recommended products, mation filtering techniques to generate the recommendations, where associated would bring a better explanation for the user. the information must be user-friendly visualized. This is a In this paper we will present the foundations for a mobile requirement specially to support the user in the purchase product/service recommender system which incorporate both Thursday, January 26, 2012 structured (supplier driven) product descriptions and subject decision process, and to convince him about the utility of the
  • 48. source, the recommendation architecture that we propose will would rely more on collaborative-filtering techniques, that is, aggregate the results of such filtering techniques. Bezerra and Carvalho proposed approaches where the results the reviews from similar users. We aim at integrating the previously mentioned hybrid prod- Figure 1 shows a overview of our meta recommender achieved showed to be very promising [19]. approach. By combining the content-based filtering and the uct recommendation approach in a mobile application so the users could benefit from useful and logical recommendations. collaborative-based one into a hybrid recommender system, it A. Moreover, we aim at providing a suited explanation for each would use the services/products III. S YSTEM catalogues repositories which D ESIGN How reviews from web services sources can be aggregated in the for recommendation to the user, since the current approaches just only deliver product recommendations with a overall score the services to be recommended, and the review repository Application data information our mobile recommender sys- that contains the user opinions about those services. All this datatembecan be from data source containers in the web product description can extracted divided into two parts: the rec mobile recommendation process? without pointing out the appropriateness of such recommen- dation [13]. Besides the basic information provided by the such(such location-based social network Foursquare its attributes) and the user as the as location, description and [17] as mo suppliers, the system will deliver the explanation, providing displayed at the Figure 2 and the location recommendation relevant reviews of similar users, we believe that it will engine from Google: Google HotPot [18]. by user (such as rating, comments, reviews or ratings provided wh increase the confidence in the buying decision process and the tags, etc.). The Figure 3 gives the system’s architecture and po product accepptance rate. In the mobile context this approach could help the users in this process and showing the user relative components. thi opinions could contribute to achieve this task. rec spe !"#$"%&'$ 5&-$ !"#$%&'%($) !".,"/#) acc !"*+#,$+'-) !"*+#,$+'-) +,-*.&$ !(#$()&'*&%$ /01&'234&$ !6#$6,00&41&7$ wh res !<#$<'&2&'&04&%A$B,431*,0A$&14C$ ves 0+44%6+'%$,.")1%#"2) 0+($"($)1%#"2) 3,4$"',(5) ou 3,4$"',(5) )))67,8,#%)+,4%$91$'%4)-1":)))) suc !"#$%&"'()*+,#&-,.) /$%,0"12()*3$4%)3""5.) ))))1,;&,<4)<1&%%,')=2)4&:&8$1)) )))))))))))%$4%,5)94,14>?) <',7)41$ pro 8&=,%*1,'>$ exp 8&4,99&0731*,0$:0;*0&$ !B#$B*%1$,2$D4,'&7$<',7)41%$ !(#$()&'*&%$ ma 8&?*&@$ we Fig. 2. User Reviews from Foursquare Social Network 8&=,%*1,'>$ com 7"$%) !"8+99"(2"')) !8#$830E&7$<',7)41%$ The content-based filtering approach will be used to filter ext the product/service repository, while the collaborative based 8&%).1%$ B. approach will derive the product review recommendations. In addition we will use text mining techniques to distinct the !"8+99"(2%$,+(#) polarity of the user review between positive or negative one. This information summarized would contribute in the product Architecture Fig. 3. Mobile Recommender System rat score recommendation computation. The final product recom- Fig. 1. Meta Recommender Architecture mendation score is computed by integrating the result of both me recommenders. By now, weproduct/service recommender, the user could In our mobile are considering to use different and Since one of the goals of this work is to incorporate options regarding this integration approach, one and get a list of recommen- different data sources of user opinions and descriptions, we filter some products or services at special oth is the symbolic data analysis approach (SDA) [19], which have addopted an meta recommendation architecture. By using eachtations. The user user ratings/reviews arehis preferences or give his product description and also can enter modeled ow a meta recommender architecture, the system would provide a personalized control over the generated recommendation list feedback to some offered product recommendation. as set of modal symbolic descriptions that summarizes the Re information provided by the corresponding data sources. It is formed by the combination of rich data [16]. The influence Thursday, January 26, 2012 specific data sources could be explicitly controlled by a novel Other functionalities are systems which,i n of the next ve best approach in hybrid recommender the retrieval the of the
  • 49. Text Mining A Lot! Sentiment Analysis for Extracting the Polarity Meta-Recommender Engines Content-Based Filtering kNN - Nearest Neighbors Hybrid Meta Recommender Symbolic Data Analysis (SDA) Evaluation in Experimental DataSets Architectural Proposal for Mobile Recommender Thursday, January 26, 2012
  • 50. Crab A Python Framework for Building Recommendation Engines Marcel Caraciolo Ricardo Caspirro Bruno Melo @marcelcaraciolo @ricardocaspirro @brunomelo Thursday, January 26, 2012
  • 51. What is Crab ? A python framework for building recommendation engines A Scikit module for collaborative, content and hybrid filtering Mahout Alternative for Python Developers :D Open-Source under the BSD license https://github.com/muricoca/crab Thursday, January 26, 2012
  • 52. The current Crab Thursday, January 26, 2012
  • 53. The current Crab >>>#load the dataset Thursday, January 26, 2012
  • 54. The current Crab >>>#load the dataset >>> from crab.datasets import load_sample_movies Thursday, January 26, 2012
  • 55. The current Crab >>>#load the dataset >>> from crab.datasets import load_sample_movies >>> data = load_sample_movies() Thursday, January 26, 2012
  • 56. The current Crab >>>#load the dataset >>> from crab.datasets import load_sample_movies >>> data = load_sample_movies() >>> data Thursday, January 26, 2012
  • 57. The current Crab >>>#load the dataset >>> from crab.datasets import load_sample_movies >>> data = load_sample_movies() >>> data {'DESCR': 'sample_movies data set was collected by the book called nProgramming the Collective Intelligence by Toby Segaran nnNotesn----- nThis data set consists ofnt* n ratings with (1-5) from n users to n movies.',  'data': {1: {1: 3.0, 2: 4.0, 3: 3.5, 4: 5.0, 5: 3.0},   2: {1: 3.0, 2: 4.0, 3: 2.0, 4: 3.0, 5: 3.0, 6: 2.0},   3: {2: 3.5, 3: 2.5, 4: 4.0, 5: 4.5, 6: 3.0},   4: {1: 2.5, 2: 3.5, 3: 2.5, 4: 3.5, 5: 3.0, 6: 3.0},   5: {2: 4.5, 3: 1.0, 4: 4.0},   6: {1: 3.0, 2: 3.5, 3: 3.5, 4: 5.0, 5: 3.0, 6: 1.5},   7: {1: 2.5, 2: 3.0, 4: 3.5, 5: 4.0}},  'item_ids': {1: 'Lady in the Water',   2: 'Snakes on a Planet',   3: 'You, Me and Dupree',   4: 'Superman Returns',   5: 'The Night Listener',   6: 'Just My Luck'},  'user_ids': {1: 'Jack Matthews',   2: 'Mick LaSalle',   3: 'Claudia Puig',   4: 'Lisa Rose',   5: 'Toby',   6: 'Gene Seymour',   7: 'Michael Phillips'}} Thursday, January 26, 2012
  • 58. The current Crab Thursday, January 26, 2012
  • 59. The current Crab >>> from crab.models import MatrixPreferenceDataModel Thursday, January 26, 2012
  • 60. The current Crab >>> from crab.models import MatrixPreferenceDataModel >>> m = MatrixPreferenceDataModel(data.data) Thursday, January 26, 2012
  • 61. The current Crab >>> from crab.models import MatrixPreferenceDataModel >>> m = MatrixPreferenceDataModel(data.data) >>> print m MatrixPreferenceDataModel (7 by 6)          1 2 3 4 5 ... 1 3.000000 4.000000 3.500000 5.000000 3.000000 2 3.000000 4.000000 2.000000 3.000000 3.000000 3 --- 3.500000 2.500000 4.000000 4.500000 4 2.500000 3.500000 2.500000 3.500000 3.000000 5 --- 4.500000 1.000000 4.000000 --- 6 3.000000 3.500000 3.500000 5.000000 3.000000 7 2.500000 3.000000 --- 3.500000 4.000000 Thursday, January 26, 2012
  • 62. The current Crab Thursday, January 26, 2012
  • 63. The current Crab >>> #import pairwise distance Thursday, January 26, 2012
  • 64. The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances Thursday, January 26, 2012
  • 65. The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity Thursday, January 26, 2012
  • 66. The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity >>> from crab.similarities import UserSimilarity Thursday, January 26, 2012
  • 67. The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity >>> from crab.similarities import UserSimilarity >>> similarity = UserSimilarity(m, euclidean_distances) Thursday, January 26, 2012
  • 68. The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity >>> from crab.similarities import UserSimilarity >>> similarity = UserSimilarity(m, euclidean_distances) >>> similarity[1] Thursday, January 26, 2012
  • 69. The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity >>> from crab.similarities import UserSimilarity >>> similarity = UserSimilarity(m, euclidean_distances) >>> similarity[1] [(1, 1.0), (6, 0.66666666666666663), (4, 0.34054242658316669), (3, 0.32037724101704074), (7, 0.32037724101704074), (2, 0.2857142857142857), (5, 0.2674788903885893)] Thursday, January 26, 2012
  • 70. The current Crab >>> #import pairwise distance >>> from crab.metrics.pairwise import euclidean_distances >>> #import similarity >>> from crab.similarities import UserSimilarity >>> similarity = UserSimilarity(m, euclidean_distances) >>> similarity[1] [(1, 1.0), (6, 0.66666666666666663), MatrixPreferenceDataModel (7 by 6)          1 2 3 4 5 (4, 0.34054242658316669), 1 3.000000 4.000000 3.500000 5.000000 3.000000 (3, 0.32037724101704074), 2 3.000000 4.000000 2.000000 3.000000 3.000000 3 --- 3.500000 2.500000 4.000000 4.500000 (7, 0.32037724101704074), 4 2.500000 3.500000 2.500000 3.500000 3.000000 5 --- 4.500000 1.000000 4.000000 --- (2, 0.2857142857142857), 6 3.000000 3.500000 3.500000 5.000000 3.000000 (5, 0.2674788903885893)] 7 2.500000 3.000000 --- 3.500000 4.000000 Thursday, January 26, 2012
  • 71. The current Crab Thursday, January 26, 2012
  • 72. The current Crab >>> from crab.recommenders.knn import UserBasedRecommender Thursday, January 26, 2012
  • 73. The current Crab >>> from crab.recommenders.knn import UserBasedRecommender >>> recsys = UserBasedRecommender(model=m, similarity=similarity, capper=True,with_preference=True) Thursday, January 26, 2012
  • 74. The current Crab >>> from crab.recommenders.knn import UserBasedRecommender >>> recsys = UserBasedRecommender(model=m, similarity=similarity, capper=True,with_preference=True) >>> recsys.recommend(5) array([[ 5. , 3.45712869],        [ 1. , 2.78857832],        [ 6. , 2.38193068]]) Thursday, January 26, 2012
  • 75. The current Crab >>> from crab.recommenders.knn import UserBasedRecommender >>> recsys = UserBasedRecommender(model=m, similarity=similarity, capper=True,with_preference=True) >>> recsys.recommend(5) array([[ 5. , 3.45712869],        [ 1. , 2.78857832],        [ 6. , 2.38193068]]) >>> recsys.recommended_because(user_id=5,item_id=1) array([[ 2. , 3. ],        [ 1. , 3. ],        [ 6. , 3. ],        [ 7. , 2.5],        [ 4. , 2.5]]) Thursday, January 26, 2012
  • 76. The current Crab >>> from crab.recommenders.knn import UserBasedRecommender >>> recsys = UserBasedRecommender(model=m, similarity=similarity, capper=True,with_preference=True) >>> recsys.recommend(5) array([[ 5. , 3.45712869],        [ 1. , 2.78857832],        [ 6. , 2.38193068]]) >>> recsys.recommended_because(user_id=5,item_id=1) array([[ 2. , 3. ],        [ 1. , 3. ], MatrixPreferenceDataModel (7 by 6)          1 2 3 4 5 ...        [ 6. , 3. ], 1 3.000000 4.000000 3.500000 5.000000 3.000000 2 3.000000 4.000000 2.000000 3.000000 3.000000        [ 7. , 2.5], 3 --- 3.500000 2.500000 4.000000 4.500000        [ 4. , 2.5]]) 4 2.500000 3.500000 2.500000 3.500000 3.000000 5 --- 4.500000 1.000000 4.000000 --- 6 3.000000 3.500000 3.500000 5.000000 3.000000 7 2.500000 3.000000 --- 3.500000 4.000000 Thursday, January 26, 2012
  • 77. The current Crab Collaborative Filtering algorithms User-Based, Item-Based and Slope One Evaluation of the Recommender Algorithms Precision, Recall, F1-Score, RMSE Precision-Recall Charts Thursday, January 26, 2012
  • 79. Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator Thursday, January 26, 2012
  • 80. Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator >>> evaluator = CfEvaluator() Thursday, January 26, 2012
  • 81. Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator >>> evaluator = CfEvaluator() >>> evaluator.evaluate(recommender=recsys,metric='rmse') Thursday, January 26, 2012
  • 82. Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator >>> evaluator = CfEvaluator() >>> evaluator.evaluate(recommender=recsys,metric='rmse') {'rmse': 0.69467177857026907} Thursday, January 26, 2012
  • 83. Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator >>> evaluator = CfEvaluator() >>> evaluator.evaluate(recommender=recsys,metric='rmse') {'rmse': 0.69467177857026907} >>> evaluator.evaluate_on_split(recommender=recsys, at =2) Thursday, January 26, 2012
  • 84. Evaluating your recommender >>> from crab.metrics.classes import CfEvaluator >>> evaluator = CfEvaluator() >>> evaluator.evaluate(recommender=recsys,metric='rmse') {'rmse': 0.69467177857026907} >>> evaluator.evaluate_on_split(recommender=recsys, at =2) ({'error': [{'mae': 0.345, 'nmae': 0.4567, 'rmse': 0.568}, {'mae': 0.456, 'nmae': 0.356778, 'rmse': 0.6788}, {'mae': 0.456, 'nmae': 0.356778, 'rmse': 0.6788}], 'ir': [{'f1score': 0.456, 'precision': 0.78557, 'recall':0.55677}, {'f1score': 0.64567, 'precision': 0.67865, 'recall': 0.785955}, {'f1score': 0.45070, 'precision': 0.74744, 'recall': 0.858585}]}, {'final_score': {'avg': {'f1score': 0.495955, 'mae': 0.429292, 'nmae': 0.373739, 'precision': 0.63932929, 'recall': 0.729939393, 'rmse': 0.3466868}, 'stdev': {'f1score': 0.09938383 , 'mae': 0.0593933, 'nmae': 0.03393939, 'precision': 0.0192929, 'recall': 0.031293939, 'rmse': 0.234949494}}}) Thursday, January 26, 2012
  • 85. Distributing the recommendation computations Use Hadoop and Map-Reduce intensively Investigating the Yelp mrjob framework https://github.com/pfig/mrjob Develop the Netflix and novel standard-of-the-art used Matrix Factorization, Singular Value Decomposition (SVD), Boltzman machines The most commonly used is Slope One technique. Simple algebra math with slope one algebra y = a*x+b Thursday, January 26, 2012
  • 86. Cache/Paralelism with joblib http://packages.python.org/joblib/index.html from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] Thursday, January 26, 2012
  • 87. Cache/Paralelism with joblib http://packages.python.org/joblib/index.html from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache Thursday, January 26, 2012
  • 88. Cache/Paralelism with joblib http://packages.python.org/joblib/index.html from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache >>># With memory.cache Thursday, January 26, 2012
  • 89. Cache/Paralelism with joblib http://packages.python.org/joblib/index.html from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache >>># With memory.cache >>> timeit similarity.get_similarities (‘marcel_caraciolo’) Thursday, January 26, 2012
  • 90. Cache/Paralelism with joblib http://packages.python.org/joblib/index.html from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache >>># With memory.cache >>> timeit similarity.get_similarities >>> timeit similarity.get_similarities (‘marcel_caraciolo’) (‘marcel_caraciolo’) Thursday, January 26, 2012
  • 91. Cache/Paralelism with joblib http://packages.python.org/joblib/index.html from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache >>># With memory.cache >>> timeit similarity.get_similarities >>> timeit similarity.get_similarities (‘marcel_caraciolo’) (‘marcel_caraciolo’) 100 loops, best of 3: 978 ms per loop Thursday, January 26, 2012
  • 92. Cache/Paralelism with joblib http://packages.python.org/joblib/index.html from joblib import Memory memory = Memory(cachedir=’’, verbose=0) class UserSimilarity(BaseSimilarity):     ...     @memory.cache  def get_similarity(self, source_id, target_id):          source_preferences = self.model.preferences_from_user(source_id)          target_preferences = self.model.preferences_from_user(target_id) ...         return self.distance(source_preferences, target_preferences)             if not source_preferences.shape[1] == 0                 and not target_preferences.shape[1] == 0 else np.array([[np.nan]]) def get_similarities(self, source_id):         return[(other_id, self.get_similarity(source_id, other_id)) for other_id, v in self.model] >>> #Without memory.cache >>># With memory.cache >>> timeit similarity.get_similarities >>> timeit similarity.get_similarities (‘marcel_caraciolo’) (‘marcel_caraciolo’) 100 loops, best of 3: 978 ms per loop 100 loops, best of 3: 434 ms per loop Thursday, January 26, 2012
  • 93. Distributed Computing with mrJob https://github.com/Yelp/mrjob Thursday, January 26, 2012
  • 94. Distributed Computing with mrJob https://github.com/Yelp/mrjob It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or local (for testing) Thursday, January 26, 2012
  • 95. Distributed Computing with mrJob https://github.com/Yelp/mrjob It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or local (for testing) Thursday, January 26, 2012
  • 96. Distributed Computing with mrJob https://github.com/Yelp/mrjob """The classic MapReduce job: count the frequency of words. """ from mrjob.job import MRJob import re WORD_RE = re.compile(r"[w']+") class MRWordFreqCount(MRJob):     def mapper(self, _, line):         for word in WORD_RE.findall(line):             yield (word.lower(), 1)     def reducer(self, word, counts):         yield (word, sum(counts)) if __name__ == '__main__':     MRWordFreqCount.run() It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or local (for testing) Thursday, January 26, 2012
  • 97. Distributed Computing with mrJob https://github.com/Yelp/mrjob Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduce Thursday, January 26, 2012
  • 98. Distributed Computing with mrJob https://github.com/Yelp/mrjob Elsayed et al: Pairwise Document Similarity in Large Collections with MapReduce Thursday, January 26, 2012
  • 99. Future studies with Sparse Matrices Real datasets come with lots of empty values http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html Solutions: scipy.sparse package Sharding operations Matrix Factorization techniques (SVD) Apontador Reviews Dataset Thursday, January 26, 2012
  • 100. Future studies with Sparse Matrices Real datasets come with lots of empty values http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html Solutions: scipy.sparse package Sharding operations Matrix Factorization techniques (SVD) Crab implements a Matrix Factorization with Expectation Maximization algorithm Apontador Reviews Dataset Thursday, January 26, 2012
  • 101. Future studies with Sparse Matrices Real datasets come with lots of empty values http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html Solutions: scipy.sparse package Sharding operations Matrix Factorization techniques (SVD) Crab implements a Matrix Factorization with Expectation Maximization algorithm scikits.crab.svd package Apontador Reviews Dataset Thursday, January 26, 2012
  • 102. Benchmarks Pure Python w/ Python w/ Scipy Dataset dicts and Numpy MovieLens 100k 15.32 s 9.56 s http://www.grouplens.org/node/73 Old Crab New Crab Thursday, January 26, 2012
  • 103. Benchmarks Pure Python w/ Python w/ Scipy Dataset dicts and Numpy MovieLens 100k 15.32 s 9.56 s http://www.grouplens.org/node/73 Old Crab New Crab Time ellapsed ( Recommend 5 items) 0 4 8 12 16 Thursday, January 26, 2012
  • 104. Benchmarks Pure Python w/ Python w/ Scipy Dataset dicts and Numpy MovieLens 100k 15.32 s 9.56 s http://www.grouplens.org/node/73 Old Crab New Crab Time ellapsed ( Recommend 5 items) 0 4 8 12 16 Thursday, January 26, 2012
  • 105. Benchmarks Pure Python w/ Python w/ Scipy Dataset dicts and Numpy MovieLens 100k 15.32 s 9.56 s http://www.grouplens.org/node/73 Old Crab New Crab Time ellapsed ( Recommend 5 items) 0 4 8 12 16 Thursday, January 26, 2012
  • 106. Why migrate ? Old Crab running only using Pure Python Recommendations demand heavy maths calculations and lots of processing Compatible with Numpy and Scipy libraries High Standard and popular scientific libraries optimized for scientific calculations in Python Scikits projects are amazing! Active Communities, Scientific Conferences and updated projects (e.g. scikit-learn) Turn the Crab framework visible for the community Join the scientific researchers and machine learning developers around the Globe coding with Python to help us in this project Be Fast and Furious Thursday, January 26, 2012
  • 107. How are we working ? Sprints, Online Discussions and Issues https://github.com/muricoca/crab/wiki/UpcomingEvents Thursday, January 26, 2012
  • 108. How are we working ? Our Project’s Home Page http://muricoca.github.com/crab Thursday, January 26, 2012
  • 109. Future Releases Planned Release 0.1 Collaborative Filtering Algorithms working, sample datasets to load and test Planned Release 0.11 Evaluation of Recommendation Algorithms and Database Models support Planned Release 0.12 Recommendation as Services with REST APIs .... Thursday, January 26, 2012
  • 110. Join us! 1. Read our Wiki Page https://github.com/muricoca/crab/wiki/Developer-Resources 2. Check out our current sprints and open issues https://github.com/muricoca/crab/issues 3. Forks, Pull Requests mandatory 4. Join us at irc.freenode.net #muricoca or at our discussion list in work :( Thursday, January 26, 2012
  • 112. Recomendação  em  redes  sociais !"#$%*'+,-)% ./0#$-+1'/% this engine with the popular brazilian social network AtéPassar Integrated More than 70.000 students registered studying for the public examinations Recommend StudyGroups, Friends,Video Classes, Questions and Concursos More than 70.000 items available for recommend % % !"--(0".(12%&'()%*&+,-$%.,#/& % Written in Python using a open-source framework Crab !"#"$%&&'%()*&+,-(.'&/,-0&+,-(.'& %12%&'303#2,&('",'&2,"&34& % Framework available for building recommender systems (My contribution) It is running since January 2011 In March B#0-%<#+'CC#/3#$% was performed. 2011 , questionnaire %% %&-$-C#0#$"%% Liked Not Liked -1'/"% 23% mender Components Interac- 77% Figure 3: AtePassar Recommender Syste face hat students do not meet phys- Thursday, January 26, 2012
  • 113. colecione descontos WWW. FAVORITOZ. COM Thursday, January 26, 2012
  • 115. Recomendações Sociais 1. Usuário se loga via Facebook 2. Usuário acessa a e-commerce parceira da LikeStore. 3. Usuário já recebe recomendações personalizadas na entrada. 4. Usuário recebe recomendações no carrinho de compras 5. Usuário recebe recomendações na página do produto. Produtos Similares Quem comprou este também comprou Amigos que curtiram/ compraram isto Thursday, January 26, 2012
  • 116. Construção  do  Social  Genoma   Thursday, January 26, 2012
  • 117. Alguém  duvida  ainda  ? http://www.shopycat.com/ Thursday, January 26, 2012
  • 119. Join us! 1. Read our Wiki Page https://github.com/muricoca/crab/wiki/Developer-Resources 2. Check out our current sprints and open issues https://github.com/muricoca/crab/issues 3. Forks, Pull Requests mandatory 4. Join us at irc.freenode.net #muricoca or at our discussion list in scikit-crab@googlegroups.com Thursday, January 26, 2012
  • 120. Dicas para Arquitetura de Recomendação Thursday, January 26, 2012
  • 121. Dicas para Arquitetura de Recomendação Thursday, January 26, 2012
  • 122. Dicas para Arquitetura de Recomendação Thursday, January 26, 2012
  • 123. Dicas para Arquitetura de Recomendação Thursday, January 26, 2012
  • 124. Items Recomendados Toby Segaran, Programming Collective SatnamAlag, Collective Intelligence in Intelligence, O'Reilly, 2007 Action, Manning Publications, 2009 Sites como TechCrunch e ReadWriteWeb Thursday, January 26, 2012
  • 125. Conferências Recomendadas - ACM RecSys. –ICWSM: Weblogand Social Media –WebKDD: Web Knowledge Discovery and Data Mining –WWW: The original WWW conference –SIGIR: Information Retrieval –ACM KDD: Knowledge Discovery and Data Mining –ICML: Machine Learning Thursday, January 26, 2012
  • 126. Onde você estará em tudo isso ? Fonte: Hunch.com Obrigado !! HUNCH Vendida ao Ebay por $80M Thursday, January 26, 2012
  • 127. Sistemas de Recomendação Marcel Pinheiro Caraciolo marcel@orygens.com @marcelcaraciolo http://www.orygens.com Thursday, January 26, 2012
  • 128. Optimizations with Cython http://cython.org/ Cython is a Python extension that lets developers annotate functions so they can be compiled to C. http://aimotion.blogspot.com/2011/09/high-performance-computation-with_17.html Thursday, January 26, 2012
  • 129. Optimizations with Cython http://cython.org/ Cython is a Python extension that lets developers annotate functions so they can be compiled to C. # setup.py from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext # for notes on compiler flags see: # http://docs.python.org/install/index.html setup( cmdclass = {'build_ext': build_ext}, ext_modules = [Extension("spearman_correlation_cython", ["spearman_correlation_cython.pyx"])] ) http://aimotion.blogspot.com/2011/09/high-performance-computation-with_17.html Thursday, January 26, 2012
  • 130. Optimizations with Cython http://cython.org/ Cython is a Python extension that lets developers annotate functions so they can be compiled to C. # setup.py from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext # for notes on compiler flags see: # http://docs.python.org/install/index.html setup( cmdclass = {'build_ext': build_ext}, ext_modules = [Extension("spearman_correlation_cython", ["spearman_correlation_cython.pyx"])] ) http://aimotion.blogspot.com/2011/09/high-performance-computation-with_17.html Thursday, January 26, 2012
  • 131. Cache/Paralelism with joblib http://packages.python.org/joblib/index.html Investigate how to use multiprocessing and parallel packages with similarities computation from joblib import Parallel ... def get_similarities(self, source_id):         return Parallel(n_jobs=3) ((other_id, delayed(self.get_similarity) (source_id, other_id)) for other_id, v in self.model) Thursday, January 26, 2012