Neste tutorial apresentei usando Python Básico conceitos de como construir um sistema de recomendação por filtragem colaborativa.
Mutirão PyCursos:
Vídeo em: https://plus.google.com/u/0/events/c3hqbk20omt3r5uoq13gpk82i9g
Apresentação da Palestra sobre o Framework Django, utilizado no desenvolvimento de sites e aplicações web. Na apresentação são mostrados os recursos do Django, citando seu ORM, acesso a bancos de dados, templates, cache, views, urls e diversos outros recursos.
Recommender Systems represent one of the most widespread and impactful applications of predictive machine learning models.
Amazon, YouTube, Netflix, Facebook and many other companies generate an important fraction of their revenues thanks to their ability to model and accurately predict users ratings and preferences.
In this presentation we cover the following points:
→ introduction to recommender systems
→ working with explicit vs implicit feedback
→ content-based vs collaborative filtering approaches
→ user-based and item-item methods
→ machine learning and deep learning models
→ pros & cons of the methods: scalability, accuracy, explainability
Overview of the Recommender system or recommendation system. RFM Concepts in brief. Collaborative Filtering in Item and User based. Content-based Recommendation also described.Product Association Recommender System. Stereotype Recommendation described with advantage and limitations.Customer Lifetime. Recommender System Analysis and Solving Cycle.
Apresentação da Palestra sobre o Framework Django, utilizado no desenvolvimento de sites e aplicações web. Na apresentação são mostrados os recursos do Django, citando seu ORM, acesso a bancos de dados, templates, cache, views, urls e diversos outros recursos.
Recommender Systems represent one of the most widespread and impactful applications of predictive machine learning models.
Amazon, YouTube, Netflix, Facebook and many other companies generate an important fraction of their revenues thanks to their ability to model and accurately predict users ratings and preferences.
In this presentation we cover the following points:
→ introduction to recommender systems
→ working with explicit vs implicit feedback
→ content-based vs collaborative filtering approaches
→ user-based and item-item methods
→ machine learning and deep learning models
→ pros & cons of the methods: scalability, accuracy, explainability
Overview of the Recommender system or recommendation system. RFM Concepts in brief. Collaborative Filtering in Item and User based. Content-based Recommendation also described.Product Association Recommender System. Stereotype Recommendation described with advantage and limitations.Customer Lifetime. Recommender System Analysis and Solving Cycle.
In this lecture, I will first cover the recent advances in neural recommender systems such as autoencoder-based and MLP-based recommender systems. Then, I will introduce the recent achievement for automatic playlist continuation in music recommendation.
Mahout is an open source machine learning java library from Apache Software Foundation, and therefore platform independent, that provides a fertile framework and collection of patterns and ready-made component for testing and deploying new large-scale algorithms.
With these slides we aims at providing a deeper understanding of its architecture.
Reproducible AI using MLflow and PyTorchDatabricks
Model reproducibility is becoming the next frontier for successful AI models building and deployments for both Research and Production scenarios. In this talk, we will show you how to build reproducible AI models and workflows using PyTorch and MLflow that can be shared across your teams, with traceability and speed up collaboration for AI projects.
Minicurso apresentado na Jornada de Atualização em Computação, Elétrica e Eletrônica (JACEE) da Universidade Federal do Espírito Santo (UFES). São apresentados o mercado para Android, o histórico da plataforma, a sua arquitetura, as diferenças do processo de desenvolvimento Java e Android e os componentes básicos de aplicação. Também são descritos os passos para criação de dois aplicativos do início ao fim.
In this lecture, I will first cover the recent advances in neural recommender systems such as autoencoder-based and MLP-based recommender systems. Then, I will introduce the recent achievement for automatic playlist continuation in music recommendation.
Mahout is an open source machine learning java library from Apache Software Foundation, and therefore platform independent, that provides a fertile framework and collection of patterns and ready-made component for testing and deploying new large-scale algorithms.
With these slides we aims at providing a deeper understanding of its architecture.
Reproducible AI using MLflow and PyTorchDatabricks
Model reproducibility is becoming the next frontier for successful AI models building and deployments for both Research and Production scenarios. In this talk, we will show you how to build reproducible AI models and workflows using PyTorch and MLflow that can be shared across your teams, with traceability and speed up collaboration for AI projects.
Minicurso apresentado na Jornada de Atualização em Computação, Elétrica e Eletrônica (JACEE) da Universidade Federal do Espírito Santo (UFES). São apresentados o mercado para Android, o histórico da plataforma, a sua arquitetura, as diferenças do processo de desenvolvimento Java e Android e os componentes básicos de aplicação. Também são descritos os passos para criação de dois aplicativos do início ao fim.
Palestra "relâmpago" (15 min) mostrando alguns detalhes de construção de DSLs em Python: decorators, sobrecarga de operadores, __getattr__, utilizando como exemplo a biblioteca de expectations e matchers Should-DSL (http://github.com/hugobr/should-dsl)
Pip - Instalando Pacotes facilmente para Pythonpugpe
Apresentação realizada no IX Encontro do Grupo de Usuários de Python de Pernambuco por Luciano Rodrigues na Unibratec - 27/11/2010 - I Toró de Palestras
O Tkinter é um conjunto de widgets (componentes de interface gráfica) para Interface Gráfica com o Usuário(GUI). Uma das vantagens desse módulo gráfico é que ele já vem junto com o Python.
Slide da palestra sobre Python.
Programando em Python, 27 de Agosto às 19h (na Fuctura)
Por Richardson Lima, administrador de redes do grupo de pesquisas avançadas em redes de computadores, realidade virtual e multimídia na Universidade Federal de Pernambuco e desenvolvedor da comunidade Debian (sobre o curso)
You’re Not A Dog: How Lawyers Can Put Their Best Foot Forward OnlineRocket Matter, LLC
As famous New Yorker cartoon goes, “On the Internet, nobody knows you’re a dog.” But as a lawyer, “being a dog” may cost you your two most valuable assets, your reputation and your license.
Further, as the web continues to “go social”, web users will become better at identifying dogs. Let’s talk about how you can put your best foot forward online in an ethical, as well as, effective manner.
This slide deck is courtesy Gyi Tsakalakis, of AttorneySync.com. Gyi has been helping lawyers understand how to put the web to work for their practices since 2008. A former practicing attorney, Gyi is a law firm web strategist and owner at AttorneySync.
Computer-Assisted Consumer Profiles on Twitterolindgallet
**Update 7/24/2014**
Cleaned it up. Likely to be the final version unless I figure out image processing. That may be an entirely different presentation though.
**
Presentation given for BarcampNOLA7.
Touched on a variety of topics like natural language processing, sentiment analysis, and ethics. Chose the context of Twitter since I'm more familiar with text processing than image processing. Twitter has some unique problems that make it not straightforward to take the data that is covered here.
Overall, the presentation went longer than expected (The time frame was 15-30 min). Didn't have much time for discussion, although one spawned about the inaccuracy of sentiment analysis. No questions, though I blame the length of the presentation. I also was expecting the room to be mostly computer programmers, but there were some business people, sales, and marketing.
Next time I present, I would give myself more time (45 seems more reasonable) to elaborate on important topics I had to skim through (ethics, programming algorithms, consumer psychology and irrational behavior).
Benchy: Lightweight framework for Performance Benchmarks Marcel Caraciolo
Benchy: Lightweight framework for Performance Benchmarks on Python Scripts.
Presented at XXVI Pernambuco Python User Group Meeting at Recife, Pernambuco, Brazil on 06.04.2013
Novas Tendências para a Educação a Distância: Como reinventar a educação ?Marcel Caraciolo
Apresentação realizada durante a Conferência Talk a Bit em Junho/2012 e realizada durante o PET 2012 por Marcel Caraciolo.
Universidade Federal de Pernambuco, 2012
Aula sobre construção de webcrawlers utilizando expressões regulares e Python
Instrutor: Marcel Caraciolo
Mais informações sobre o restante do curso em:
http://www.pycursos.com/regex
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
2. Quem é Marcel ?
Marcel Pinheiro Caraciolo - @marcelcaraciolo
Sergipano, porém Recifense.
Mestre em Ciência da Computação no CIN/UFPE na área de mineração de dados
Diretor de Pesquisa e Desenvolvimento no Atépassar
CEO e Co-fundador do PyCursos/ Pingmind
Membro e Moderador da Celúla de Usuários Python de Pernambuco (PUG-PE)
Minhas áreas de interesse: Computação móvel e Computação inteligente
Meus blogs: http://www.mobideia.com (sobre Mobilidade desde 2006)
http://aimotion.blogspot.com (sobre I.A. desde 2009)
8. Intelligence from
Mining Data
User
User
User User
User
Um usuário influencia outros
por resenhas, notas, recomendações e blogs
Um usuário é influenciado por outros
por resenhas, notas, recomendações e blogs
9. aggregation information: lists
ratings
user-generated content
reviews
blogs recommendations
wikis Collective Intelligence voting
Your application bookmarking
Search
tag cloud tagging
saving
Natural Language Processing
Clustering and Harness external content
predictive models
10. WEB SITES
WEB APPLICATIONS
WEB SERVICES
3.0 SEMANTIC WEB
USERS
antes...
VI Encontro do PUG-PE
VI Encontro do PUG-PE
20. “A lot of times, people don’t know what
they want until you show it to them.”
Steve Jobs
“We are leaving the Information age, and
entering into the Recommendation age.”
Chris Anderson, from book Long Tail
21. Recomendações Sociais
Família/Amigos
Amigos/ Família
O Que eu
deveria ler ?
Ref: Flickr-BlueAlgae
“Eu acho que
você deveria ler
Ref: Flickr photostream: jefield estes livros.
22. Recomendações por Interação
Entrada: Avalie alguns livros
O Que eu
deveria ler ?
Saída:
“Livros que você
pode gostar
são …”
25. Netflix
- 2/3 dos filmes alugados vêm de recomendação
Google News
- 38% das notícias mais clicadas vêm de recomendação
Amazon
- 38% das vendas vêm de recomendação
Fonte: Celma & Lamere, ISMIR 2007
26. !"#$%"#&'"%(&$)")
Nós+,&-.$/).#&0#/"1.#$%234(".#
* estamos sobrecarregados de
informação
$/)#5(&6 7&.2.#"$4,#)$8
* 93((3&/.#&0#:&'3".;#5&&<.#
$/)#:-.34#2%$4<.#&/(3/"
Milhares de artigos e posts
* =/#>$/&3;#?#@A#+B#4,$//"(.;#
novos todos os dias
2,&-.$/).#&0#7%&6%$:.#
"$4,#)$8
* =/#C"1#D&%<;#."'"%$(#
Milhões de Músicas, Filmes e
2,&-.$/).#&0#$)#:"..$6".#
Livros
."/2#2&#-.#7"%#)$8
Milhares de Ofertas e
Promoções
27. O que pode ser recomendado ?
Contatos em Redes Sociais Artigos
Produtos Messagens de Propaganda
Cursos e-learning Livros
Tags Músicas
Futuras namoradas
Roupas Filmes
Restaurantes
Programas de Tv
Vídeos Papers
Opções de Investimento Profissionais
Módulos de código
29. O que os sistemas de recomendação
realmente fazem ?
1. Prediz o quanto você pode gostar de um certo
produto ou serviço
2. Sugere um lista de N items ordenada de acordo
com seu interese
3. Sugere uma lista de N usuários ordernada
para um produto/serviço
4. Explica a você o porque esses items foram
recomendados
5. Ajusta a predição e a recomendação baseado em
seu feedback e de outros.
30. Filtragem baseada por Conteúdo
Similar
Duro de O Vento Toy
Armagedon Items
Matar Levou Store
recomenda
gosta
Marcel Usuários
31. Problemas com filtragem por
conteúdo
1. Análise dos dados Restrita
- Items e usuários pouco detalhados. Pior em áudio ou imagens
2. Dados Especializados
- Uma pessoa que não tem experiência com Sushi não recebe o
melhor restaurante de Sushi da cidade
3. Efeito Portfólio
- Só porque eu vi 1 filme da Xuxa quando criança, tem que me
recomendar todos dela
32. Filtragem Colaborativa
O Vento Toy
Thor Armagedon Items
Levou Store
gosta
recomenda
Marcel Rafael Amanda Usuários
Similar
33. Problemas com filtragem colaborativa
1. Escabilidade
- Amazon com 5M usuários, 50K items, 1.4B avaliações
2. Dados esparsos
- Novos usuários e items que não tem histórico
3. Partida Fria
- Só avaliei apenas um único livro no Amazon!
4. Popularidade
- Todo mundo lê ‘Harry Potter’
5. Hacking
- A pessoa que lê ‘Harry Potter’ lê Kama Sutra
34. Filtragem Híbrida
Combinação de múltiplos métodos
Duro de O Vento Toy
Armagedon Items
Matar Levou Store
Ontologias
Dados
Símbolicos
Marcel Rafael Luciana Usuários
35. Como eles são
apresentados ?
Destaques Mais sobre este artista...
Alguem similar a você também gostou disso
O mais popular em seu grupo...
Já que você escutou esta, você pode querer esta...
Lançamentos Escute músicas de artistas similares.
Estes dois item vêm juntos..
36. Como eles são avaliados ?
Como sabemos se a recomendação é boa ?
Geralmente se divide-se em treinamento/teste (80/20)
Críterios utilizados:
- Erro de Predição: RMSE
- Curva ROC*, rank-utility, F-Measure
*http://code.google.com/p/pyplotmining/
37. How to build a recommender
system with Python ?
There is one option...
Crab
A Python Framework for Building
Recommendation Engines
https://github.com/python-recsys/crab
38. How to build a recommender
system with Python ?
There is one option... But it’s still in development!
Crab
A Python Framework for Building
Recommendation Engines
https://github.com/python-recsys/crab
39. But here we will create one from
Zero with Python!
Find someone similar to you
O Vento Toy
Thor Armagedon Items
Levou Store
like
recommends
Marcel Rafael Amanda Users
Similar
40. But here we will create one from
Step Zero with Python!
Find someone similar to you
Movies Ratings Dataset
41. But here we will create one from
Step Zero with Python!
Find someone similar to you
Movies Ratings Dataset
Mr. X deu nota 4 para
Snow Crash e 2 para
Girl with the Dragon Tatoo,
O que recomendar para ele ?
42. But here we will create one from
Step Zero with Python!
Find someone similar to you
43. But here we will create one from
Step Zero with Python!
Find someone similar to you
Descobrimos que Amy é mais similar dentre as opções,
Podemos recomendar um filme visto por ela com 5 estrelas :)
44. But here we will create one from
Step Zero with Python!
Mais uma métrica de similaridade: Distância Euclideana
45. But here we will create one from
Step Zero with Python!
Mais uma métrica de similaridade: Distância Euclideana
46. But here we will create one from
Step Zero with Python!
Mais uma métrica de similaridade: Distância Euclideana
54. Codificando o Mahantan
def manhattan(rating1, rating2):
"""Computes the Manhattan distance. Both rating1 and rating2 are
dictionaries of the form {'The Strokes': 3.0, 'Slightly
Stoopid': 2.5}"""
distance = 0
commonRatings = False
for key in rating1:
if key in rating2:
distance += abs(rating1[key] – rating2[key])
commonRatings = True
if commonRatings:
return distance
else:
return -1 #Indicates no ratings in common
56. Codificando o Mahantan
def manhattan(rating1, rating2):
"""Computes the Manhattan distance. Both rating1 and rating2 are
dictionaries of the form {'The Strokes': 3.0, 'Slightly
Stoopid': 2.5}"""
distance = 0
commonRatings = False
for key in rating1:
if key in rating2:
distance += abs(rating1[key] – rating2[key])
commonRatings = True
if commonRatings:
return distance
else:
return -1 #Indicates no ratings in common
57. Codificando o Mahantan
def manhattan(rating1, rating2):
"""Computes the Manhattan distance. Both rating1 and rating2 are
dictionaries of the form {'The Strokes': 3.0, 'Slightly
Stoopid': 2.5}"""
distance = 0
commonRatings = False
for key in rating1:
if key in rating2:
distance += abs(rating1[key] – rating2[key])
commonRatings = True
if commonRatings:
return distance
else:
return -1 #Indicates no ratings in common
>>> manhattan(users['Hailey'], users['Veronica'])
2.0
>>> manhattan(users['Hailey'], users['Jordyn'])
1.5
>>>
59. Codificando Euclidean
def euclidean(rating1, rating2):
"""Computes the euclidean distance.
Both rating1 and rating2 are dictionaries of the form
{'The Strokes': 3.0, 'Slightly Stoopid': 2.5}"""
distance = 0.0
commonRatings = False
for key in rating1:
if key in rating2:
distance += pow(abs(rating1[key] - rating2[key]), 2.0)
commonRatings = True
if commonRatings:
return pow(distance, 1/2.0)
else:
return -1 #Indicates no ratings in common
61. Codificando Euclidean
def euclidean(rating1, rating2):
"""Computes the euclidean distance.
Both rating1 and rating2 are dictionaries of the form
{'The Strokes': 3.0, 'Slightly Stoopid': 2.5}"""
distance = 0.0
commonRatings = False
for key in rating1:
if key in rating2:
distance += pow(abs(rating1[key] - rating2[key]), 2.0)
commonRatings = True
if commonRatings:
return pow(distance, 1/2.0)
else:
return -1 #Indicates no ratings in common
1.4142135623730951
62. Codificando Euclidean
def euclidean(rating1, rating2):
"""Computes the euclidean distance.
Both rating1 and rating2 are dictionaries of the form
{'The Strokes': 3.0, 'Slightly Stoopid': 2.5}"""
distance = 0.0
commonRatings = False
for key in rating1:
if key in rating2:
distance += pow(abs(rating1[key] - rating2[key]), 2.0)
commonRatings = True
if commonRatings:
return pow(distance, 1/2.0)
else:
return -1 #Indicates no ratings in common
>>> euclidean(users['Hailey'], users['Veronica'])
1.4142135623730951
64. Find the closest users
def computeNearestNeighbor(username, users):
"""creates a sorted list of users based on their distance to
username"""
distances = []
for user in users:
if user != username:
distance = manhattan(users[user], users[username])
distances.append((distance, user))
# sort based on distance -- closest first
distances.sort()
return distances
67. Find the closest users
def computeNearestNeighbor(username, users):
"""creates a sorted list of users based on their distance to
username"""
distances = []
for user in users:
if user != username:
distance = manhattan(users[user], users[username])
distances.append((distance, user))
# sort based on distance -- closest first
distances.sort()
return distances
>>> computeNearestNeighbor('Hailey', users)
[(2.0, 'Veronica'), (4.0, 'Chan'),(4.0, 'Sam'), (4.5, 'Dan'), (5.0,
'Angelica'), (5.5, 'Bill'), (7.5, 'Jordyn')]
>>>
69. The recommender
def recommend(username, users):
"""Give list of recommendations"""
# first find nearest neighbor
nearest = computeNearestNeighbor(username, users)[0][1]
recommendations = []
# now find bands neighbor rated that user didn't
neighborRatings = users[nearest]
userRatings = users[username]
for artist in neighborRatings:
if not artist in userRatings:
recommendations.append((artist, neighborRatings[artist]))
recommendations.sort(key=lambda artistTuple: artistTuple[1],
reverse = True)
return recommendations
94. Change people to items
{'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5}}
to
{'Lady in the Water':{'Lisa Rose':2.5,'Gene Seymour':3.0},
'Snakes on a Plane':{'Lisa Rose':3.5,'Gene Seymour':3.5}} etc.
96. Change people to items
{'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5}}
to
{'Lady in the Water':{'Lisa Rose':2.5,'Gene Seymour':3.0},
'Snakes on a Plane':{'Lisa Rose':3.5,'Gene Seymour':3.5}} etc.
97. Change people to items
{'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5}}
to
{'Lady in the Water':{'Lisa Rose':2.5,'Gene Seymour':3.0},
'Snakes on a Plane':{'Lisa Rose':3.5,'Gene Seymour':3.5}} etc.
def transformPrefs(prefs):
result={}
for person in prefs:
for item in prefs[person]:
result.setdefault(item,{})
# Flip item and person
result[item][person]=prefs[person][item]
return result
98. Change people to items
{'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5}}
to
{'Lady in the Water':{'Lisa Rose':2.5,'Gene Seymour':3.0},
'Snakes on a Plane':{'Lisa Rose':3.5,'Gene Seymour':3.5}} etc.
def transformPrefs(prefs):
result={}
for person in prefs:
for item in prefs[person]:
result.setdefault(item,{})
# Flip item and person
result[item][person]=prefs[person][item]
return result
>> movies=recommendations.transformPrefs(recommendations.users)
>> recommendations.computeNearestNeighbors(‘Blues Traveler’, movies)
[(0.657, 'You, Me and Dupree'), (0.487, 'Lady in the Water'), (0.111, 'Snakes on a
Plane'), (-0.179, 'The Night Listener'), (-0.422, 'Just My Luck')]
100. Change people to items
{'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5}}
to
{'Lady in the Water':{'Lisa Rose':2.5,'Gene Seymour':3.0},
'Snakes on a Plane':{'Lisa Rose':3.5,'Gene Seymour':3.5}} etc.
101. Change people to items
{'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5}}
to
{'Lady in the Water':{'Lisa Rose':2.5,'Gene Seymour':3.0},
'Snakes on a Plane':{'Lisa Rose':3.5,'Gene Seymour':3.5}} etc.
def transformPrefs(prefs):
result={}
for person in prefs:
for item in prefs[person]:
result.setdefault(item,{})
# Flip item and person
result[item][person]=prefs[person][item]
return result
102. Change people to items
{'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5}}
to
{'Lady in the Water':{'Lisa Rose':2.5,'Gene Seymour':3.0},
'Snakes on a Plane':{'Lisa Rose':3.5,'Gene Seymour':3.5}} etc.
def transformPrefs(prefs):
result={}
for person in prefs:
for item in prefs[person]:
result.setdefault(item,{})
# Flip item and person
result[item][person]=prefs[person][item]
return result
>> movies=recommendations.transformPrefs(recommendations.critics)
>> recommendations.computeNearestNeighbors(movies,'Superman Returns')
[(0.657, 'You, Me and Dupree'), (0.487, 'Lady in the Water'), (0.111, 'Snakes on a
Plane'), (-0.179, 'The Night Listener'), (-0.422, 'Just My Luck')]
106. Find the closest items
def calculateSimilarItems(prefs,sim_distance=manhattan):
! # Create a dictionary of items showing which other items they
! # are most similar to.
! result={}
! # Invert the preference matrix to be item-centric
! itemPrefs=transformPrefs(prefs)
! c=0
! for item in itemPrefs:
! ! # Status updates for large datasets
! ! c+=1
! ! if c%100==0: print "%d / %d" % (c,len(itemPrefs))
# Find the most similar items to this one
scores=computeNearestNeighbor(item,itemPrefs,distance=sim_distance)
result[item]=scores
! return result
107. Find the closest items
def calculateSimilarItems(prefs,sim_distance=manhattan):
! # Create a dictionary of items showing which other items they
! # are most similar to.
! result={}
! # Invert the preference matrix to be item-centric
! itemPrefs=transformPrefs(prefs)
! c=0
! for item in itemPrefs:
! ! # Status updates for large datasets
! ! c+=1
! ! if c%100==0: print "%d / %d" % (c,len(itemPrefs))
# Find the most similar items to this one
scores=computeNearestNeighbor(item,itemPrefs,distance=sim_distance)
result[item]=scores
! return result
>>> itemsim=recommendations.calculateSimilarItems(users)
>>> itemsim
{'Lady in the Water': [(0.40000000000000002, 'You, Me and Dupree'), (0.2857142857142857, 'The
Night Listener'),... 'Snakes on a Plane': [(0.22222222222222221, 'Lady in the Water'),
(0.18181818181818182, 'The Night Listener'),... etc.
109. The recommender
def recommend(username,users, similarities, n=3):
scores = {}
totalSim = {}
#
# now get the ratings for the user
#
userRatings = users[username]
# Loop over items rated by this user
for item, rating in userRatings.items():
#Loop over items similar to this one
for sim, other_item in similarities[item]:
# Ignore if this user has already rated this item
if other_item in userRatings: continue
# Weighted sum of rating times similarity
scores.setdefault(other_item, 0.0)
scores[other_item]+= sim * rating
# Sum of all the similarities
totalSim.setdefault(other_item, 0.0)
totalSim[other_item] += sim
# Divide each total score by total weighting to get an average
recommendations = [(score/totalSim[item],item) for item,score in scores.items()]
# finally sort and return
recommendations.sort(key=lambda artistTuple: artistTuple[1], reverse = True)
# Return the first n items
return recommendations[:n]
111. The recommender
def recommend(username,users, similarities, n=3):
scores = {}
totalSim = {}
#
# now get the ratings for the user
#
userRatings = users[username]
# Loop over items rated by this user
for item, rating in userRatings.items():
#Loop over items similar to this one
for sim, other_item in similarities[item]:
# Ignore if this user has already rated this item
if other_item in userRatings: continue
# Weighted sum of rating times similarity
scores.setdefault(other_item, 0.0)
scores[other_item]+= sim * rating
# Sum of all the similarities
totalSim.setdefault(other_item, 0.0)
totalSim[other_item] += sim
# Divide each total score by total weighting to get an average
recommendations = [(score/totalSim[item],item) for item,score in scores.items()]
# finally sort and return
recommendations.sort(key=lambda artistTuple: artistTuple[1], reverse = True)
# Return the first n items
return recommendations[:n]
>>> recommend('Hailey', users,similarities,3)
[(3.1176470588235294, 'Slightly Stoopid'),
(2.639207507820647, 'Phoenix'),(2.64476386036961, 'Blues Traveler')]
112. Content Based Filtering
Similar
Duro de O Vento Toy
Armagedon Items
Matar Levou Store
recommend
likes
Marcel Users
113. source, the recommendation architecture that we propose will would rely more on collaborative-filtering techniques, that is,
aggregate the results of such filtering techniques. Bezerra and Carvalho proposed approaches where the results
the reviews from similar users.
We aim at integrating the previously mentioned hybrid prod- Figure 1 shows a overview of our meta recommender
achieved showed to be very promising [19].
approach. By combining the content-based filtering and the
uct recommendation approach in a mobile application so the
A.
Crab is already in production
users could benefit from useful and logical recommendations. collaborative-based one into a hybrid recommender system, it
Moreover, we aim at providing a suited explanation for each would use the services/products III. S YSTEM catalogues
repositories which D ESIGN
recommendation to the user, since the current approaches just the services to be recommended, and the review repository
Application data information our mobile recommender sys-
that contains the user opinions about those services. All this for
only deliver product recommendations with a overall score
without pointing out the appropriateness of such recommen- datatembecan be from data source containers in the web product description
can extracted divided into two parts: the rec
dation [13]. Besides the basic information provided by the such(such location-based social network Foursquare its attributes) and the user
as the as location, description and [17] as
Hybrid Meta Approach gives the system’s architecture and
suppliers, the system will deliver the explanation, providing
relevant reviews of similar users, we believe that it will
tags, etc.). The Figure 3
increase the confidence in the buying decision process and the
displayed at the Figure 2 and the location recommendation
engine from Google: Google HotPot [18]. by user (such as rating, comments,
reviews or ratings provided
mo
wh
product accepptance rate. In the mobile context this approach
po
could help the users in this process and showing the user
relative components. thi
opinions could contribute to achieve this task. rec
spe
!"#$"%&'$ 5&-$
!"#$%&'%($) !".,"/#) acc
!"*+#,$+'-) !"*+#,$+'-) +,-*.&$
!(#$()&'*&%$
/01&'234&$ !6#$6,00&41&7$
wh
res
!<#$<'&2&'&04&%A$B,431*,0A$&14C$
ves
0+44%6+'%$,.")1%#"2)
0+($"($)1%#"2)
3,4$"',(5)
ou
3,4$"',(5)
)))67,8,#%)+,4%$91$'%4)-1":))))
suc
!"#$%&"'()*+,#&-,.)
/$%,0"12()*3$4%)3""5.)
))))1,;&,<4)<1&%%,')=2)4&:&8$1))
)))))))))))%$4%,5)94,14>?) <',7)41$
pro
8&=,%*1,'>$
exp
8&4,99&0731*,0$:0;*0&$ !B#$B*%1$,2$D4,'&7$<',7)41%$
!(#$()&'*&%$
ma
8&?*&@$
we
Fig. 2. User Reviews from Foursquare Social Network 8&=,%*1,'>$
com
7"$%)
!"8+99"(2"'))
!8#$830E&7$<',7)41%$
The content-based filtering approach will be used to filter ext
the product/service repository, while the collaborative based
8&%).1%$ B.
approach will derive the product review recommendations. In
addition we will use text mining techniques to distinct the
!"8+99"(2%$,+(#) polarity of the user review between positive or negative one.
This information summarized would contribute in the product Architecture
Fig. 3. Mobile Recommender System rat
score recommendation computation. The final product recom-
Fig. 1. Meta Recommender Architecture
mendation score is computed by integrating the result of both
me
recommenders. By now, weproduct/service recommender, the user could
In our mobile are considering to use different and
Since one of the goals of this work is to incorporate options regarding this integration approach, one and get a list of recommen-
different data sources of user opinions and descriptions, we filter some products or services at special oth
is the symbolic data analysis approach (SDA) [19], which
have addopted an meta recommendation architecture. By using eachtations. The user user ratings/reviews arehis preferences or give his
product description and also can enter modeled ow
a meta recommender architecture, the system would provide
a personalized control over the generated recommendation list
feedback to some offered product recommendation.
as set of modal symbolic descriptions that summarizes the Re
information provided by the corresponding data sources. It is
114. Crab is already in production
Brazilian Social Network called Atepassar.com
Educational network with more than 60.000 students and 120 video-classes
Running on Python
+ Numpy + Scipy and
Django
Backend for Recommendations
MongoDB - mongoengine
Daily Recommendations
with Explanations
115. Distributing the recommendation computations
Use Hadoop and Map-Reduce intensively
Investigating the Yelp mrjob framework https://github.com/pfig/mrjob
Develop the Netflix and novel standard-of-the-art used
Matrix Factorization, Singular Value Decomposition (SVD), Boltzman machines
The most commonly used is Slope One technique.
Simple algebra math with slope one algebra y = a*x+b
116. Distributed Computing with mrJob
https://github.com/Yelp/mrjob
http://aimotion.blogspot.com/2012/08/introduction-to-recommendations-with.html
117. Distributed Computing with mrJob
https://github.com/Yelp/mrjob
It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or
local (for testing)
http://aimotion.blogspot.com/2012/08/introduction-to-recommendations-with.html
118. Distributed Computing with mrJob
https://github.com/Yelp/mrjob
It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or
local (for testing)
http://aimotion.blogspot.com/2012/08/introduction-to-recommendations-with.html
119. Distributed Computing with mrJob
https://github.com/Yelp/mrjob
"""The classic MapReduce job: count the frequency of words.
"""
from mrjob.job import MRJob
import re
WORD_RE = re.compile(r"[w']+")
class MRWordFreqCount(MRJob):
def mapper(self, _, line):
for word in WORD_RE.findall(line):
yield (word.lower(), 1)
def reducer(self, word, counts):
yield (word, sum(counts))
if __name__ == '__main__':
MRWordFreqCount.run()
It supports Amazon’s Elastic MapReduce(EMR) service, your own Hadoop cluster or
local (for testing)
http://aimotion.blogspot.com/2012/08/introduction-to-recommendations-with.html
120. Future studies with Sparse Matrices
Real datasets come with lots of empty values
http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html
Solutions:
scipy.sparse package
Sharding operations
Matrix Factorization
techniques (SVD)
Apontador Reviews Dataset
121. Future studies with Sparse Matrices
Real datasets come with lots of empty values
http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html
Solutions:
scipy.sparse package
Sharding operations
Matrix Factorization
techniques (SVD)
Crab implements a Matrix
Factorization with Expectation
Maximization algorithm
Apontador Reviews Dataset
122. Future studies with Sparse Matrices
Real datasets come with lots of empty values
http://aimotion.blogspot.com/2011/05/evaluating-recommender-systems.html
Solutions:
scipy.sparse package
Sharding operations
Matrix Factorization
techniques (SVD)
Crab implements a Matrix
Factorization with Expectation
Maximization algorithm
scikits.crab.svd package
Apontador Reviews Dataset
123. How are we working ?
Our Project’s Home Page
http://github.com/python-recsys/crab
124. Future Releases
Planned Release 0.1
Collaborative Filtering Algorithms working, sample datasets to load and test
Planned Release 0.11
Sparse Matrixes and Database Models support
Planned Release 0.12
Slope One Agorithm, new factorization techniques implemented
....
125. Join us!
1. Read our Wiki Page
https://github.com/python-recsys/crab/wiki/Developer-Resources
2. Check out our current sprints and open issues
https://github.com/python-recsys/crab/issues
3. Forks, Pull Requests mandatory
4. Join us at irc.freenode.net #muricoca or at our
discussion list
http://groups.google.com/group/scikit-crab
130. Conferências Recomendadas
- ACM RecSys.
–ICWSM: Weblogand Social Media
–WebKDD: Web Knowledge Discovery and Data Mining
–WWW: The original WWW conference
–SIGIR: Information Retrieval
–ACM KDD: Knowledge Discovery and Data Mining
–ICML: Machine Learning