Construindo Sistemas de Recomendação com Python

1,480 views
1,369 views

Published on

Neste tutorial apresentei usando Python Básico conceitos de como construir um sistema de recomendação por filtragem colaborativa.

Mutirão PyCursos:
Vídeo em: https://plus.google.com/u/0/events/c3hqbk20omt3r5uoq13gpk82i9g

Published in: Technology

Construindo Sistemas de Recomendação com Python

  1. 1. Sistemas de Recomendação usando PythonMarcel Pinheiro Caraciolomarcel@pingmind.com @marcelcaraciolo http://www.pycursos.com
  2. 2. Quem é Marcel ?Marcel Pinheiro Caraciolo - @marcelcaraciolo Sergipano, porém Recifense. Mestre em Ciência da Computação no CIN/UFPE na área de mineração de dados Diretor de Pesquisa e Desenvolvimento no Atépassar CEO e Co-fundador do PyCursos/ Pingmind Membro e Moderador da Celúla de Usuários Python de Pernambuco (PUG-PE) Minhas áreas de interesse: Computação móvel e Computação inteligente Meus blogs: http://www.mobideia.com (sobre Mobilidade desde 2006) http://aimotion.blogspot.com (sobre I.A. desde 2009)
  3. 3. WEB
  4. 4. WEB
  5. 5. 1.0 2.0Fonte de Informação Fluxo Contínuo de Informação VI Encontro do PUG-PE VI Encontro do PUG-PE
  6. 6. WEB SITESWEB APPLICATIONS WEB SERVICES 3.0 SEMANTIC WEB USERS VI Encontro do PUG-PE VI Encontro do PUG-PE
  7. 7. Usar informação coletiva de forma efetiva afim de aprimorar uma aplicação
  8. 8. Intelligence from Mining Data User UserUser User User Um usuário influencia outros por resenhas, notas, recomendações e blogs Um usuário é influenciado por outros por resenhas, notas, recomendações e blogs
  9. 9. aggregation information: lists ratings user-generated content reviews blogs recommendations wikis Collective Intelligence voting Your application bookmarking Search tag cloud tagging saving Natural Language Processing Clustering and Harness external content predictive models
  10. 10. WEB SITESWEB APPLICATIONS WEB SERVICES 3.0 SEMANTIC WEB USERS antes... VI Encontro do PUG-PE VI Encontro do PUG-PE
  11. 11. Atualmente
  12. 12. estamos sobrecarregados de informações
  13. 13. muitas vezes inúteis
  14. 14. às vezesprocuramos isso...
  15. 15. e encontramos isso!
  16. 16. google?
  17. 17. google?midias sociais?
  18. 18. eeeeuuuu... google?midias sociais?
  19. 19. Sistemas de Recomendação
  20. 20. “A lot of times, people don’t know what they want until you show it to them.” Steve Jobs“We are leaving the Information age, andentering into the Recommendation age.” Chris Anderson, from book Long Tail
  21. 21. Recomendações Sociais Família/Amigos Amigos/ Família O Que eudeveria ler ? Ref: Flickr-BlueAlgae “Eu acho que você deveria ler Ref: Flickr photostream: jefield estes livros.
  22. 22. Recomendações por Interação Entrada: Avalie alguns livros O Que eu deveria ler ? Saída: “Livros que você pode gostar são …”
  23. 23. Sistemas desenhados para sugerir algo para mim do meu interesse!
  24. 24. Por que Recomendação ?
  25. 25. Netflix - 2/3 dos filmes alugados vêm de recomendaçãoGoogle News - 38% das notícias mais clicadas vêm de recomendaçãoAmazon - 38% das vendas vêm de recomendação Fonte: Celma & Lamere, ISMIR 2007
  26. 26. !"#$%"#&"%(&$)") Nós+,&-.$/).#&0#/"1.#$%234(".# * estamos sobrecarregados de informação $/)#5(&6 7&.2.#"$4,#)$8 * 93((3&/.#&0#:&3".;#5&&<.# $/)#:-.34#2%$4<.#&/(3/"Milhares de artigos e posts * =/#>$/&3;#?#@A#+B#4,$//"(.;# novos todos os dias 2,&-.$/).#&0#7%&6%$:.# "$4,#)$8 * =/#C"1#D&%<;#.""%$(#Milhões de Músicas, Filmes e 2,&-.$/).#&0#$)#:"..$6".# Livros ."/2#2&#-.#7"%#)$8 Milhares de Ofertas e Promoções
  27. 27. O que pode ser recomendado ? Contatos em Redes Sociais Artigos Produtos Messagens de PropagandaCursos e-learning Livros Tags Músicas Futuras namoradas Roupas Filmes Restaurantes Programas de Tv Vídeos Papers Opções de Investimento Profissionais Módulos de código
  28. 28. E como funciona a recomendação ?
  29. 29. O que os sistemas de recomendação realmente fazem ? 1. Prediz o quanto você pode gostar de um certo produto ou serviço2. Sugere um lista de N items ordenada de acordo com seu interese3. Sugere uma lista de N usuários ordernada para um produto/serviço4. Explica a você o porque esses items foram recomendados5. Ajusta a predição e a recomendação baseado em seu feedback e de outros.
  30. 30. Filtragem baseada por Conteúdo SimilarDuro de O Vento Toy Armagedon Items Matar Levou Store recomenda gosta Marcel Usuários
  31. 31. Problemas com filtragem por conteúdo 1. Análise dos dados Restrita - Items e usuários pouco detalhados. Pior em áudio ou imagens 2. Dados Especializados - Uma pessoa que não tem experiência com Sushi não recebe o melhor restaurante de Sushi da cidade 3. Efeito Portfólio- Só porque eu vi 1 filme da Xuxa quando criança, tem que me recomendar todos dela
  32. 32. Filtragem Colaborativa O Vento ToyThor Armagedon Items Levou Storegosta recomenda Marcel Rafael Amanda Usuários Similar
  33. 33. Problemas com filtragem colaborativa 1. Escabilidade - Amazon com 5M usuários, 50K items, 1.4B avaliações 2. Dados esparsos - Novos usuários e items que não tem histórico 3. Partida Fria - Só avaliei apenas um único livro no Amazon! 4. Popularidade - Todo mundo lê ‘Harry Potter’ 5. Hacking - A pessoa que lê ‘Harry Potter’ lê Kama Sutra
  34. 34. Filtragem Híbrida Combinação de múltiplos métodosDuro de O Vento Toy Armagedon Items Matar Levou Store Ontologias Dados Símbolicos Marcel Rafael Luciana Usuários
  35. 35. Como eles são apresentados ? Destaques Mais sobre este artista... Alguem similar a você também gostou disso O mais popular em seu grupo...Já que você escutou esta, você pode querer esta... Lançamentos Escute músicas de artistas similares. Estes dois item vêm juntos..
  36. 36. Como eles são avaliados ?Como sabemos se a recomendação é boa ?Geralmente se divide-se em treinamento/teste (80/20)Críterios utilizados: - Erro de Predição: RMSE- Curva ROC*, rank-utility, F-Measure *http://code.google.com/p/pyplotmining/
  37. 37. How to build a recommender system with Python ?There is one option... Crab A Python Framework for Building Recommendation Engines https://github.com/python-recsys/crab
  38. 38. How to build a recommender system with Python ?There is one option... But it’s still in development! Crab A Python Framework for Building Recommendation Engines https://github.com/python-recsys/crab
  39. 39. But here we will create one from Zero with Python!Find someone similar to you O Vento Toy Thor Armagedon Items Levou Store like recommends Marcel Rafael Amanda Users Similar
  40. 40. But here we will create one from Step Zero with Python!Find someone similar to you Movies Ratings Dataset
  41. 41. But here we will create one from Step Zero with Python!Find someone similar to you Movies Ratings Dataset Mr. X deu nota 4 para Snow Crash e 2 para Girl with the Dragon Tatoo, O que recomendar para ele ?
  42. 42. But here we will create one from Step Zero with Python!Find someone similar to you
  43. 43. But here we will create one from Step Zero with Python!Find someone similar to you Descobrimos que Amy é mais similar dentre as opções, Podemos recomendar um filme visto por ela com 5 estrelas :)
  44. 44. But here we will create one from Step Zero with Python!Mais uma métrica de similaridade: Distância Euclideana
  45. 45. But here we will create one from Step Zero with Python!Mais uma métrica de similaridade: Distância Euclideana
  46. 46. But here we will create one from Step Zero with Python!Mais uma métrica de similaridade: Distância Euclideana
  47. 47. Show me the code!
  48. 48. Show me the code!>>>#Representing the data in Python
  49. 49. Show me the code!>>>#Representing the data in Python>>>users = {"Angelica": {"Blues Traveler": 3.5, "Broken Bells": 2.0, "Norah Jones": 4.5, "Phoenix": 5.0, "Slightly Stoopid": 1.5, "The Strokes": 2.5, "Vampire Weekend": 2.0}, "Bill": {"Blues Traveler": 2.0, "Broken Bells": 3.5, "Deadmau5": 4.0, "Phoenix": 2.0, "Slightly Stoopid": 3.5, "Vampire Weekend": 3.0}, "Chan": {"Blues Traveler": 5.0, "Broken Bells": 1.0, "Deadmau5": 1.0, "Norah Jones": 3.0, "Phoenix": 5, "Slightly Stoopid": 1.0}, "Dan": {"Blues Traveler": 3.0, "Broken Bells": 4.0, "Deadmau5": 4.5, "Phoenix": 3.0, "Slightly Stoopid": 4.5, "The Strokes": 4.0, "Vampire Weekend": 2.0}, "Hailey": {"Broken Bells": 4.0, "Deadmau5": 1.0, "Norah Jones": 4.0, "The Strokes": 4.0, "Vampire Weekend": 1.0}, "Jordyn": {"Broken Bells": 4.5, "Deadmau5": 4.0, "Norah Jones": 5.0, "Phoenix": 5.0, "Slightly Stoopid": 4.5, "The Strokes": 4.0, "Vampire Weekend": 4.0}, "Sam": {"Blues Traveler": 5.0, "Broken Bells": 2.0, "Norah Jones": 3.0, "Phoenix": 5.0, "Slightly Stoopid": 4.0, "The Strokes": 5.0}, "Veronica": {"Blues Traveler": 3.0, "Norah Jones": 5.0, "Phoenix": 4.0, "Slightly Stoopid": 2.5, "The Strokes": 3.0}}
  50. 50. Show me the code!
  51. 51. Show me the code!>>>#Representing the data in Python
  52. 52. Show me the code!>>>#Representing the data in Python>>>users = {"Angelica": {"Blues Traveler": 3.5, "Broken Bells": 2.0, "Norah Jones": 4.5, "Phoenix": 5.0, "Slightly Stoopid": 1.5, "The Strokes": 2.5, "Vampire Weekend": 2.0}, "Bill": {"Blues Traveler": 2.0, "Broken Bells": 3.5, "Deadmau5": 4.0, "Phoenix": 2.0, "Slightly Stoopid": 3.5, "Vampire Weekend": 3.0}, "Chan": {"Blues Traveler": 5.0, "Broken Bells": 1.0, "Deadmau5": 1.0, "Norah Jones": 3.0, "Phoenix": 5, "Slightly Stoopid": 1.0}, "Dan": {"Blues Traveler": 3.0, "Broken Bells": 4.0, "Deadmau5": 4.5, "Phoenix": 3.0, "Slightly Stoopid": 4.5, "The Strokes": 4.0, "Vampire Weekend": 2.0}, "Hailey": {"Broken Bells": 4.0, "Deadmau5": 1.0, "Norah Jones": 4.0, "The Strokes": 4.0, "Vampire Weekend": 1.0}, "Jordyn": {"Broken Bells": 4.5, "Deadmau5": 4.0, "Norah Jones": 5.0, "Phoenix": 5.0, "Slightly Stoopid": 4.5, "The Strokes": 4.0, "Vampire Weekend": 4.0}, "Sam": {"Blues Traveler": 5.0, "Broken Bells": 2.0, "Norah Jones": 3.0, "Phoenix": 5.0, "Slightly Stoopid": 4.0, "The Strokes": 5.0}, "Veronica": {"Blues Traveler": 3.0, "Norah Jones": 5.0, "Phoenix": 4.0, "Slightly Stoopid": 2.5, "The Strokes": 3.0}}
  53. 53. Codificando o Mahantan
  54. 54. Codificando o Mahantandef manhattan(rating1, rating2): """Computes the Manhattan distance. Both rating1 and rating2 are dictionaries of the form {The Strokes: 3.0, Slightly Stoopid: 2.5}""" distance = 0 commonRatings = False for key in rating1: if key in rating2: distance += abs(rating1[key] – rating2[key]) commonRatings = True if commonRatings: return distance else: return -1 #Indicates no ratings in common
  55. 55. Codificando o Mahantan
  56. 56. Codificando o Mahantandef manhattan(rating1, rating2): """Computes the Manhattan distance. Both rating1 and rating2 are dictionaries of the form {The Strokes: 3.0, Slightly Stoopid: 2.5}""" distance = 0 commonRatings = False for key in rating1: if key in rating2: distance += abs(rating1[key] – rating2[key]) commonRatings = True if commonRatings: return distance else: return -1 #Indicates no ratings in common
  57. 57. Codificando o Mahantandef manhattan(rating1, rating2): """Computes the Manhattan distance. Both rating1 and rating2 are dictionaries of the form {The Strokes: 3.0, Slightly Stoopid: 2.5}""" distance = 0 commonRatings = False for key in rating1: if key in rating2: distance += abs(rating1[key] – rating2[key]) commonRatings = True if commonRatings: return distance else: return -1 #Indicates no ratings in common>>> manhattan(users[Hailey], users[Veronica])2.0>>> manhattan(users[Hailey], users[Jordyn])1.5>>>
  58. 58. Codificando Euclidean
  59. 59. Codificando Euclideandef euclidean(rating1, rating2): """Computes the euclidean distance. Both rating1 and rating2 are dictionaries of the form {The Strokes: 3.0, Slightly Stoopid: 2.5}""" distance = 0.0 commonRatings = False for key in rating1: if key in rating2: distance += pow(abs(rating1[key] - rating2[key]), 2.0) commonRatings = True if commonRatings: return pow(distance, 1/2.0) else: return -1 #Indicates no ratings in common
  60. 60. Codificando Euclidean1.4142135623730951
  61. 61. Codificando Euclidean def euclidean(rating1, rating2): """Computes the euclidean distance. Both rating1 and rating2 are dictionaries of the form {The Strokes: 3.0, Slightly Stoopid: 2.5}""" distance = 0.0 commonRatings = False for key in rating1: if key in rating2: distance += pow(abs(rating1[key] - rating2[key]), 2.0) commonRatings = True if commonRatings: return pow(distance, 1/2.0) else: return -1 #Indicates no ratings in common1.4142135623730951
  62. 62. Codificando Euclidean def euclidean(rating1, rating2): """Computes the euclidean distance. Both rating1 and rating2 are dictionaries of the form {The Strokes: 3.0, Slightly Stoopid: 2.5}""" distance = 0.0 commonRatings = False for key in rating1: if key in rating2: distance += pow(abs(rating1[key] - rating2[key]), 2.0) commonRatings = True if commonRatings: return pow(distance, 1/2.0) else: return -1 #Indicates no ratings in common>>> euclidean(users[Hailey], users[Veronica])1.4142135623730951
  63. 63. Find the closest users
  64. 64. Find the closest usersdef computeNearestNeighbor(username, users): """creates a sorted list of users based on their distance to username""" distances = [] for user in users: if user != username: distance = manhattan(users[user], users[username]) distances.append((distance, user)) # sort based on distance -- closest first distances.sort() return distances
  65. 65. Find the closest users
  66. 66. Find the closest users>>> computeNearestNeighbor(Hailey, users)[(2.0, Veronica), (4.0, Chan),(4.0, Sam), (4.5, Dan), (5.0,Angelica), (5.5, Bill), (7.5, Jordyn)]>>>
  67. 67. Find the closest usersdef computeNearestNeighbor(username, users): """creates a sorted list of users based on their distance to username""" distances = [] for user in users: if user != username: distance = manhattan(users[user], users[username]) distances.append((distance, user)) # sort based on distance -- closest first distances.sort() return distances >>> computeNearestNeighbor(Hailey, users) [(2.0, Veronica), (4.0, Chan),(4.0, Sam), (4.5, Dan), (5.0, Angelica), (5.5, Bill), (7.5, Jordyn)] >>>
  68. 68. The recommender
  69. 69. The recommenderdef recommend(username, users): """Give list of recommendations""" # first find nearest neighbor nearest = computeNearestNeighbor(username, users)[0][1] recommendations = [] # now find bands neighbor rated that user didnt neighborRatings = users[nearest] userRatings = users[username] for artist in neighborRatings: if not artist in userRatings: recommendations.append((artist, neighborRatings[artist])) recommendations.sort(key=lambda artistTuple: artistTuple[1], reverse = True) return recommendations
  70. 70. The recommender
  71. 71. The recommender>>> recommend(Hailey, users)[(Phoenix, 4.0), (Blues Traveler, 3.0), (Slightly Stoopid, 2.5)]>>> recommend(Chan, users)[(The Strokes, 4.0), (Vampire Weekend, 1.0)]>>> recommend(Angelica, users)[]
  72. 72. The recommender def recommend(username, users): """Give list of recommendations""" # first find nearest neighbor nearest = computeNearestNeighbor(username, users)[0][1] recommendations = [] # now find bands neighbor rated that user didnt neighborRatings = users[nearest] userRatings = users[username] for artist in neighborRatings: if not artist in userRatings: recommendations.append((artist, neighborRatings[artist])) recommendations.sort(key=lambda artistTuple: artistTuple[1], reverse = True) return recommendations>>> recommend(Hailey, users)[(Phoenix, 4.0), (Blues Traveler, 3.0), (Slightly Stoopid, 2.5)]>>> recommend(Chan, users)[(The Strokes, 4.0), (Vampire Weekend, 1.0)]>>> recommend(Angelica, users)[]
  73. 73. The recommender
  74. 74. The recommender>>> computeNearestNeighbor(Angelica, users)[(3.5, Veronica), (4.5, Chan), (5.0, Hailey), (8.0, Sam), (9.0,Bill), (9.0, Dan), (9.5, Jordyn)](Hailey, users)
  75. 75. The recommender def recommend(username, users): """Give list of recommendations""" # first find nearest neighbor nearest = computeNearestNeighbor(username, users)[0][1] recommendations = [] # now find bands neighbor rated that user didnt neighborRatings = users[nearest] userRatings = users[username] for artist in neighborRatings: if not artist in userRatings: recommendations.append((artist, neighborRatings[artist])) recommendations.sort(key=lambda artistTuple: artistTuple[1], reverse = True) return recommendations>>> computeNearestNeighbor(Angelica, users)[(3.5, Veronica), (4.5, Chan), (5.0, Hailey), (8.0, Sam), (9.0,Bill), (9.0, Dan), (9.5, Jordyn)](Hailey, users)
  76. 76. But we need to improve it more...
  77. 77. The Pearson Correlation
  78. 78. The Pearson Correlation
  79. 79. The Pearson Correlation
  80. 80. The Pearson Correlation
  81. 81. The Pearson CorrelationOutput: -1 (perfect disagreement) to 1 (perfect agreement)
  82. 82. The Pearson Correlation
  83. 83. Pearson Correlation
  84. 84. Pearson Correlationdef pearson(rating1, rating2): sum_xy = 0 sum_x = 0 sum_y = 0 sum_x2 = 0 sum_y2 = 0 n = 0 for key in rating1: if key in rating2: n += 1 x = rating1[key] y = rating2[key] sum_xy += x * y sum_x += x sum_y += y sum_x2 += x**2 sum_y2 += y**2 # now compute denominator denominator = sqrt(sum_x2 - (sum_x**2) / n) * sqrt(sum_y2 -(sum_y**2) / n) if denominator == 0: return 0 else: return (sum_xy - (sum_x * sum_y) / n) / denominator
  85. 85. Pearson Correlation
  86. 86. Pearson Correlation>>> pearson(users[Angelica], users[Bill])-0.90405349906826993>>> pearson(users[Angelica], users[Hailey])0.42008402520840293>>> pearson(users[Angelica], users[Jordyn])0.76397486054754316>>>
  87. 87. Pearson Correlationdef pearson(rating1, rating2): sum_xy = 0 sum_x = 0 sum_y = 0 sum_x2 = 0 sum_y2 = 0 n = 0 for key in rating1: if key in rating2: n += 1 x = rating1[key] y = rating2[key] sum_xy += x * y sum_x += x sum_y += y sum_x2 += x**2 sum_y2 += y**2 # now compute denominator denominator = sqrt(sum_x2 - (sum_x**2) / n) * sqrt(sum_y2 -(sum_y**2) / n) if denominator == 0: return 0 else: return (sum_xy - (sum_x * sum_y) / n) / denominator>>> pearson(users[Angelica], users[Bill])-0.90405349906826993>>> pearson(users[Angelica], users[Hailey])0.42008402520840293>>> pearson(users[Angelica], users[Jordyn])0.76397486054754316>>>
  88. 88. Which one to choose ?
  89. 89. K-nearest Neighbors (kNN)Find k most similars to you
  90. 90. K-nearest Neighbors (kNN)Challenge you!
  91. 91. Final Coderecsys.py
  92. 92. Item Based Filtering
  93. 93. Change people to items to
  94. 94. Change people to items{Lisa Rose: {Lady in the Water: 2.5, Snakes on a Plane: 3.5}, Gene Seymour: {Lady in the Water: 3.0, Snakes on a Plane: 3.5}} to{Lady in the Water:{Lisa Rose:2.5,Gene Seymour:3.0},Snakes on a Plane:{Lisa Rose:3.5,Gene Seymour:3.5}} etc.
  95. 95. Change people to items to
  96. 96. Change people to items{Lisa Rose: {Lady in the Water: 2.5, Snakes on a Plane: 3.5}, Gene Seymour: {Lady in the Water: 3.0, Snakes on a Plane: 3.5}} to{Lady in the Water:{Lisa Rose:2.5,Gene Seymour:3.0},Snakes on a Plane:{Lisa Rose:3.5,Gene Seymour:3.5}} etc.
  97. 97. Change people to items{Lisa Rose: {Lady in the Water: 2.5, Snakes on a Plane: 3.5}, Gene Seymour: {Lady in the Water: 3.0, Snakes on a Plane: 3.5}} to{Lady in the Water:{Lisa Rose:2.5,Gene Seymour:3.0},Snakes on a Plane:{Lisa Rose:3.5,Gene Seymour:3.5}} etc.def transformPrefs(prefs): result={} for person in prefs: for item in prefs[person]: result.setdefault(item,{}) # Flip item and person result[item][person]=prefs[person][item] return result
  98. 98. Change people to items{Lisa Rose: {Lady in the Water: 2.5, Snakes on a Plane: 3.5}, Gene Seymour: {Lady in the Water: 3.0, Snakes on a Plane: 3.5}} to{Lady in the Water:{Lisa Rose:2.5,Gene Seymour:3.0},Snakes on a Plane:{Lisa Rose:3.5,Gene Seymour:3.5}} etc.def transformPrefs(prefs): result={} for person in prefs: for item in prefs[person]: result.setdefault(item,{}) # Flip item and person result[item][person]=prefs[person][item] return result>> movies=recommendations.transformPrefs(recommendations.users)>> recommendations.computeNearestNeighbors(‘Blues Traveler’, movies)[(0.657, You, Me and Dupree), (0.487, Lady in the Water), (0.111, Snakes on aPlane), (-0.179, The Night Listener), (-0.422, Just My Luck)]
  99. 99. Change people to items to
  100. 100. Change people to items{Lisa Rose: {Lady in the Water: 2.5, Snakes on a Plane: 3.5}, Gene Seymour: {Lady in the Water: 3.0, Snakes on a Plane: 3.5}} to{Lady in the Water:{Lisa Rose:2.5,Gene Seymour:3.0},Snakes on a Plane:{Lisa Rose:3.5,Gene Seymour:3.5}} etc.
  101. 101. Change people to items{Lisa Rose: {Lady in the Water: 2.5, Snakes on a Plane: 3.5}, Gene Seymour: {Lady in the Water: 3.0, Snakes on a Plane: 3.5}} to{Lady in the Water:{Lisa Rose:2.5,Gene Seymour:3.0},Snakes on a Plane:{Lisa Rose:3.5,Gene Seymour:3.5}} etc.def transformPrefs(prefs): result={} for person in prefs: for item in prefs[person]: result.setdefault(item,{}) # Flip item and person result[item][person]=prefs[person][item] return result
  102. 102. Change people to items{Lisa Rose: {Lady in the Water: 2.5, Snakes on a Plane: 3.5}, Gene Seymour: {Lady in the Water: 3.0, Snakes on a Plane: 3.5}} to{Lady in the Water:{Lisa Rose:2.5,Gene Seymour:3.0},Snakes on a Plane:{Lisa Rose:3.5,Gene Seymour:3.5}} etc.def transformPrefs(prefs): result={} for person in prefs: for item in prefs[person]: result.setdefault(item,{}) # Flip item and person result[item][person]=prefs[person][item] return result>> movies=recommendations.transformPrefs(recommendations.critics)>> recommendations.computeNearestNeighbors(movies,Superman Returns)[(0.657, You, Me and Dupree), (0.487, Lady in the Water), (0.111, Snakes on aPlane), (-0.179, The Night Listener), (-0.422, Just My Luck)]
  103. 103. User Based Filtering até agora!Problemas de Escalabilidade e Esparsidade
  104. 104. Item Based FilteringFind k most similars to the item
  105. 105. Find the closest items
  106. 106. Find the closest itemsdef calculateSimilarItems(prefs,sim_distance=manhattan):! # Create a dictionary of items showing which other items they! # are most similar to.! result={}! # Invert the preference matrix to be item-centric! itemPrefs=transformPrefs(prefs)! c=0! for item in itemPrefs:! ! # Status updates for large datasets! ! c+=1! ! if c%100==0: print "%d / %d" % (c,len(itemPrefs)) # Find the most similar items to this one scores=computeNearestNeighbor(item,itemPrefs,distance=sim_distance) result[item]=scores! return result
  107. 107. Find the closest itemsdef calculateSimilarItems(prefs,sim_distance=manhattan):! # Create a dictionary of items showing which other items they! # are most similar to.! result={}! # Invert the preference matrix to be item-centric! itemPrefs=transformPrefs(prefs)! c=0! for item in itemPrefs:! ! # Status updates for large datasets! ! c+=1! ! if c%100==0: print "%d / %d" % (c,len(itemPrefs)) # Find the most similar items to this one scores=computeNearestNeighbor(item,itemPrefs,distance=sim_distance) result[item]=scores! return result>>> itemsim=recommendations.calculateSimilarItems(users)>>> itemsim{Lady in the Water: [(0.40000000000000002, You, Me and Dupree), (0.2857142857142857, TheNight Listener),... Snakes on a Plane: [(0.22222222222222221, Lady in the Water),(0.18181818181818182, The Night Listener),... etc.
  108. 108. The recommender
  109. 109. The recommenderdef recommend(username,users, similarities, n=3): scores = {} totalSim = {} # # now get the ratings for the user # userRatings = users[username] # Loop over items rated by this user for item, rating in userRatings.items(): #Loop over items similar to this one for sim, other_item in similarities[item]: # Ignore if this user has already rated this item if other_item in userRatings: continue # Weighted sum of rating times similarity scores.setdefault(other_item, 0.0) scores[other_item]+= sim * rating # Sum of all the similarities totalSim.setdefault(other_item, 0.0) totalSim[other_item] += sim # Divide each total score by total weighting to get an average recommendations = [(score/totalSim[item],item) for item,score in scores.items()] # finally sort and return recommendations.sort(key=lambda artistTuple: artistTuple[1], reverse = True) # Return the first n items return recommendations[:n]
  110. 110. The recommender>>> recommend(Hailey, users,similarities,3)[(3.1176470588235294, Slightly Stoopid), (2.639207507820647, Phoenix),(2.64476386036961, Blues Traveler)]
  111. 111. The recommender def recommend(username,users, similarities, n=3): scores = {} totalSim = {} # # now get the ratings for the user # userRatings = users[username] # Loop over items rated by this user for item, rating in userRatings.items(): #Loop over items similar to this one for sim, other_item in similarities[item]: # Ignore if this user has already rated this item if other_item in userRatings: continue # Weighted sum of rating times similarity scores.setdefault(other_item, 0.0) scores[other_item]+= sim * rating # Sum of all the similarities totalSim.setdefault(other_item, 0.0) totalSim[other_item] += sim # Divide each total score by total weighting to get an average recommendations = [(score/totalSim[item],item) for item,score in scores.items()] # finally sort and return recommendations.sort(key=lambda artistTuple: artistTuple[1], reverse = True) # Return the first n items return recommendations[:n]>>> recommend(Hailey, users,similarities,3)[(3.1176470588235294, Slightly Stoopid), (2.639207507820647, Phoenix),(2.64476386036961, Blues Traveler)]
  112. 112. Content Based Filtering SimilarDuro de O Vento Toy Armagedon Items Matar Levou Store recommend likes Marcel Users
  113. 113. source, the recommendation architecture that we propose will would rely more on collaborative-filtering techniques, that is,aggregate the results of such filtering techniques. Bezerra and Carvalho proposed approaches where the results the reviews from similar users. We aim at integrating the previously mentioned hybrid prod- Figure 1 shows a overview of our meta recommender achieved showed to be very promising [19]. approach. By combining the content-based filtering and theuct recommendation approach in a mobile application so the A. Crab is already in productionusers could benefit from useful and logical recommendations. collaborative-based one into a hybrid recommender system, itMoreover, we aim at providing a suited explanation for each would use the services/products III. S YSTEM catalogues repositories which D ESIGNrecommendation to the user, since the current approaches just the services to be recommended, and the review repository Application data information our mobile recommender sys- that contains the user opinions about those services. All this foronly deliver product recommendations with a overall scorewithout pointing out the appropriateness of such recommen- datatembecan be from data source containers in the web product description can extracted divided into two parts: the recdation [13]. Besides the basic information provided by the such(such location-based social network Foursquare its attributes) and the user as the as location, description and [17] as Hybrid Meta Approach gives the system’s architecture andsuppliers, the system will deliver the explanation, providingrelevant reviews of similar users, we believe that it will tags, etc.). The Figure 3increase the confidence in the buying decision process and the displayed at the Figure 2 and the location recommendation engine from Google: Google HotPot [18]. by user (such as rating, comments, reviews or ratings provided mo whproduct accepptance rate. In the mobile context this approach pocould help the users in this process and showing the user relative components. thiopinions could contribute to achieve this task. rec spe !"#$"%&$ 5&-$ !"#$%&%($) !".,"/#) acc !"*+#,$+-) !"*+#,$+-) +,-*.&$ !(#$()&*&%$ /01&234&$ !6#$6,00&41&7$ wh res !<#$<&2&&04&%A$B,431*,0A$&14C$ ves 0+44%6+%$,.")1%#"2) 0+($"($)1%#"2) 3,4$",(5) ou 3,4$",(5) )))67,8,#%)+,4%$91$%4)-1":)))) suc !"#$%&"()*+,#&-,.) /$%,0"12()*3$4%)3""5.) ))))1,;&,<4)<1&%%,)=2)4&:&8$1)) )))))))))))%$4%,5)94,14>?) <,7)41$ pro 8&=,%*1,>$ exp 8&4,99&0731*,0$:0;*0&$ !B#$B*%1$,2$D4,&7$<,7)41%$ !(#$()&*&%$ ma 8&?*&@$ we Fig. 2. User Reviews from Foursquare Social Network 8&=,%*1,>$ com 7"$%) !"8+99"(2")) !8#$830E&7$<,7)41%$ The content-based filtering approach will be used to filter ext the product/service repository, while the collaborative based 8&%).1%$ B. approach will derive the product review recommendations. In addition we will use text mining techniques to distinct the !"8+99"(2%$,+(#) polarity of the user review between positive or negative one. This information summarized would contribute in the product Architecture Fig. 3. Mobile Recommender System rat score recommendation computation. The final product recom- Fig. 1. Meta Recommender Architecture mendation score is computed by integrating the result of both me recommenders. By now, weproduct/service recommender, the user could In our mobile are considering to use different and Since one of the goals of this work is to incorporate options regarding this integration approach, one and get a list of recommen-different data sources of user opinions and descriptions, we filter some products or services at special oth is the symbolic data analysis approach (SDA) [19], whichhave addopted an meta recommendation architecture. By using eachtations. The user user ratings/reviews arehis preferences or give his product description and also can enter modeled owa meta recommender architecture, the system would providea personalized control over the generated recommendation list feedback to some offered product recommendation. as set of modal symbolic descriptions that summarizes the Re information provided by the corresponding data sources. It is
  114. 114. Crab is already in production Brazilian Social Network called Atepassar.com Educational network with more than 60.000 students and 120 video-classes Running on Python + Numpy + Scipy and DjangoBackend for RecommendationsMongoDB - mongoengine Daily Recommendations with Explanations
  115. 115. Distributing the recommendation computationsUse Hadoop and Map-Reduce intensively Investigating the Yelp mrjob framework https://github.com/pfig/mrjobDevelop the Netflix and novel standard-of-the-art used Matrix Factorization, Singular Value Decomposition (SVD), Boltzman machinesThe most commonly used is Slope One technique. Simple algebra math with slope one algebra y = a*x+b
  116. 116. Distributed Computing with mrJob https://github.com/Yelp/mrjobhttp://aimotion.blogspot.com/2012/08/introduction-to-recommendations-with.html
  117. 117. Distributed Computing with mrJob https://github.com/Yelp/mrjobIt supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or local (for testing) http://aimotion.blogspot.com/2012/08/introduction-to-recommendations-with.html
  118. 118. Distributed Computing with mrJob https://github.com/Yelp/mrjobIt supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or local (for testing) http://aimotion.blogspot.com/2012/08/introduction-to-recommendations-with.html
  119. 119. Distributed Computing with mrJob https://github.com/Yelp/mrjob """The classic MapReduce job: count the frequency of words. """ from mrjob.job import MRJob import re WORD_RE = re.compile(r"[w]+") class MRWordFreqCount(MRJob):     def mapper(self, _, line):         for word in WORD_RE.findall(line):             yield (word.lower(), 1)     def reducer(self, word, counts):         yield (word, sum(counts)) if __name__ == __main__:     MRWordFreqCount.run()It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or local (for testing) http://aimotion.blogspot.com/2012/08/introduction-to-recommendations-with.html
  120. 120. Future studies with Sparse Matrices Real datasets come with lots of empty values http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.htmlSolutions: scipy.sparse package Sharding operations Matrix Factorization techniques (SVD) Apontador Reviews Dataset
  121. 121. Future studies with Sparse Matrices Real datasets come with lots of empty values http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html Solutions: scipy.sparse package Sharding operations Matrix Factorization techniques (SVD) Crab implements a MatrixFactorization with Expectation Maximization algorithm Apontador Reviews Dataset
  122. 122. Future studies with Sparse Matrices Real datasets come with lots of empty values http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html Solutions: scipy.sparse package Sharding operations Matrix Factorization techniques (SVD) Crab implements a MatrixFactorization with Expectation Maximization algorithm scikits.crab.svd package Apontador Reviews Dataset
  123. 123. How are we working ? Our Project’s Home Pagehttp://github.com/python-recsys/crab
  124. 124. Future Releases Planned Release 0.1 Collaborative Filtering Algorithms working, sample datasets to load and test Planned Release 0.11 Sparse Matrixes and Database Models support Planned Release 0.12 Slope One Agorithm, new factorization techniques implemented....
  125. 125. Join us!1. Read our Wiki Page https://github.com/python-recsys/crab/wiki/Developer-Resources2. Check out our current sprints and open issues https://github.com/python-recsys/crab/issues3. Forks, Pull Requests mandatory4. Join us at irc.freenode.net #muricoca or at our discussion list http://groups.google.com/group/scikit-crab
  126. 126. Construção  do  Social  Genoma  
  127. 127. colecione descontoshttp://aimotion.blogspot.com.br/2013/01/how-recommend-deals-on-line-for-coupon.html WWW. FAVORITOZ. COM
  128. 128. Recommended BooksToby Segaran, Programming Collective SatnamAlag, Collective Intelligence inIntelligence, OReilly, 2007 Action, Manning Publications, 2009 ACM RecSys, KDD , SBSC...
  129. 129. Conferências Recomendadas- ACM RecSys.–ICWSM: Weblogand Social Media–WebKDD: Web Knowledge Discovery and Data Mining–WWW: The original WWW conference–SIGIR: Information Retrieval–ACM KDD: Knowledge Discovery and Data Mining–ICML: Machine Learning
  130. 130. Sistemas de Recomendação usando PythonMarcel Pinheiro Caraciolomarcel@pingmind.com @marcelcaraciolo http://www.pycursos.com

×