This document discusses collaborative filtering and data mining techniques for making recommendations, including user-user and item-item collaborative filtering methods such as nearest neighbor approaches, clustering, correlation analysis, and linear regression; it also covers higher-order recommendation models like Bayesian belief networks and association rule mining.
What is Datamining? Which algorithms can be used for Datamining?Seval Çapraz
This presentation includes what is datamining, which technics and algorithms are available in datamining. This presentation helps you to understand the concepts of datamining.
Definition of classification
Basic principles of classification
Typical
How Does Classification Works?
Difference between Classification & Prediction.
Machine learning techniques
Decision Trees
k-Nearest Neighbors
text mining, data mining, machine learning, unstructured data, big data, database, data warehouse, text mining (industry), research (industry), text analysis, text, text analytics, unstructured, data science, structured data, advanced analytics, what is data mining, data mining lecture, data mining techniques, information, learning from data, computre technolog, technology, data process, data mining tutorial,
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
What is Datamining? Which algorithms can be used for Datamining?Seval Çapraz
This presentation includes what is datamining, which technics and algorithms are available in datamining. This presentation helps you to understand the concepts of datamining.
Definition of classification
Basic principles of classification
Typical
How Does Classification Works?
Difference between Classification & Prediction.
Machine learning techniques
Decision Trees
k-Nearest Neighbors
text mining, data mining, machine learning, unstructured data, big data, database, data warehouse, text mining (industry), research (industry), text analysis, text, text analytics, unstructured, data science, structured data, advanced analytics, what is data mining, data mining lecture, data mining techniques, information, learning from data, computre technolog, technology, data process, data mining tutorial,
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Machine Learning encompasses data acquisition, transmission, retention, analysis, and reduction. The expected outgrowth of 24x7 data systems and operations centers is Knowledge Engineering and Data Intensive Analytics AKA Machine Learning. This presentation will develop and apply Machine Learning concepts to the Upstream O&G industry. Specific focus will be given to the fundamental concepts and definitions of Machine Learning along with the application of Machine Learning.
Analysis on different Data mining Techniques and algorithms used in IOTIJERA Editor
In this paper, we discusses about five functionalities of data mining in IOT that affects the performance and that
are: Data anomaly detection, Data clustering, Data classification, feature selection, time series prediction. Some
important algorithm has also been reviewed here of each functionalities that show advantages and limitations as
well as some new algorithm that are in research direction. Here we had represent knowledge view of data
mining in IOT.
A brief overview of Real-Time Analytics at Netflix and the challenges we've faced in designing and deploying production ready products based on real-time data.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
2. 04/18/13 Data Mining: Principles and Algorithms
2
3. Outline
Motivation
Systems in Action
A Conceptual Framework
User-User Methods
Item-Item Methods
Recent Advances and Open Problems
04/18/13 Data Mining: Principles and Algorithms
3
4. Motivation
User Perspective
Lots of online products, books, movies, etc.
Reduce my choices…please…
Manager Perspective
“ if I have 3 million customers on the web, I should have
3 million stores on the web.”
CEO of Amazon.com [SCH01]
04/18/13 Data Mining: Principles and Algorithms
4
5. Example: Recommendation
Customers who bought this book also bought:
•Data Preparation for Data Mining: by Dorian Pyle (Author)
•The Elements of Statistical Learning: by T. Hastie, et al
•Data Mining: Introductory and Advanced Topics: by Margaret H. Dunham
•Mining the Web: Analysis of Hypertext and Semi Structured Data
04/18/13 Data Mining: Principles and Algorithms
5
7. Other Examples
Movielens: movies
Moviecritic: movies again
My launch: music
Gustos starrater: web pages
Jester: Jokes
TV Recommender: TV shows
Suggest 1.0 : different products
And much more…
04/18/13 Data Mining: Principles and Algorithms
7
8. How it Works?
Each user has a profile
Users rate items
Explicitly: score from 1..5
Implicitly: web usage mining
Time spent in viewing the item
Navigation path
Etc…
System does the rest, How?
This is what we will show today
04/18/13 Data Mining: Principles and Algorithms
8
9. Basic Approaches
Collaborative Filtering (CF)
Look at users collective behavior
Look at the active user history
Combine!
Content-based Filtering
Recommend items based on key-words
More appropriate for information retrieval
04/18/13 Data Mining: Principles and Algorithms
9
10. Collaborative Filtering: A
Framework
Items: I
i1 i2 … ij … in
u1 The task:
u2 3 1.5 …. … 2 Q1: Find Unknown ratings?
Q2: Which items should we
…
rij=? recommend to this user?
.
ui 2
.
... .
1
Users: U
um Unknown function
f: U x I→ R
04/18/13 Data Mining: Principles and Algorithms
10
12. User-User Similarity: Intuition
Target
Customer
Q3: How to combine?
Q1: How to measure
similarity?
Q2: How to select
neighbors?
04/18/13 Data Mining: Principles and Algorithms
12
13. How to Measure Similarity?
i1 in
Pearson correlation coefficient ui
∑Rated Itemsra )(rij − ri )
j∈ Commonly
(raj −
ua
w p ( a, i ) =
∑ (raj − ra ) 2
j∈Commonly Rated Items
∑ ( rij − ri ) 2
j∈Commonly Rated Items
Cosine measure
Users are vectors in product-dimension space
ra .ri
wc (a, i ) =
r a 2 * ri 2
04/18/13 Data Mining: Principles and Algorithms
13
14. Nearest Neighbor Approaches
[SAR00a]
Offline phase:
Do nothing…just store transactions
Online phase:
Identify highly similar users to the active one
Best K ones
All with a measure greater than a threshold
Prediction
∑ w(a, i)(r − r )
ij i
raj = ra + i
User a’s neutral
∑ w(a, i)
i
User i’s deviation
User a’s estimated deviation
04/18/13 Data Mining: Principles and Algorithms
14
15. Horting Method [ AGG99 ]
K-NN is not transitive
Horting takes advantage of transitivity
Uses new similarity measure: Predictability
User i predicts user a if
They have rated sufficiently common items
There is an error-bounded linear
transformation from user i’s ratings to a’s ones
04/18/13 Data Mining: Principles and Algorithms
15
16. How Horting Works?
Offline phase: build neighborhood graph
Online phase: Compute raj
1- Identify users who predict ua
2- Identify users who rated j
Ua 3- Find shortest paths from group1 to 2
4- Backward propagation and averaging
- Better for sparse environments
- Not well evaluated
04/18/13 Data Mining: Principles and Algorithms
16
17. Clustering [BRE98]
Offline phase:
Build clusters: k-mean, k-medoid, etc.
Online phase:
Identify the nearest cluster to the active user
Prediction:
Use the center of the cluster
Weighted average between cluster members
Weights depend on the active user
Faster Slower but a little
more accurate
04/18/13 Data Mining: Principles and Algorithms
17
18. Clustering vs. k-NN
Approaches
K-NN using Pearson measure is slower but more
accurate
Clustering is more scalable
Active user
Bad recommendations
We can use soft clustering but
will lose computational edge
04/18/13 Data Mining: Principles and Algorithms
18
19. Did We Answer the Questions?
Target
Customer
Q3: How to combine?
Q1: How to measure
similarity?
Q2: How to select
neighbors?
04/18/13 Data Mining: Principles and Algorithms
19
20. Are We Done?
Q1:How to measure similarity? Done... Really??
Sparsity results from the poor representation!
∑ ......
j∈ Commonly Rated Items
w p ( a, i ) = U1 rates recycled letter pads High
.....
U2 rates recycled memo pads High
Both of them like Recycled office products
They are similar but the math won’t work
for that
What about Sparsity?
Not enough common Items Example from [SAR00P]
implies spurious neighbors
and hence bad recommendations
By working at the right level of abstraction we
can eliminate sparsity
04/18/13 Data Mining: Principles and Algorithms
20
21. The Power of Representation [UNG98]
Action Foreign Classic
Q1-B: How can we formalize this intuition?
04/18/13 Data Mining: Principles and Algorithms
21
22. How to Abstract?
Semi-manual Methods
Use product features
Cluster products first, then cluster users
Works only if we have descriptive features
Automatic Methods
Adjusted Product Taxonomy
Latent Semantic Indexing
04/18/13 Data Mining: Principles and Algorithms
22
23. Adjusted Product Taxonomy [CHO04]
• Input : product taxonomy
•Output: modified taxonomy with even distribution
04/18/13 Data Mining: Principles and Algorithms
23
24. Adjusted Product Taxonomy (2)
Using
original
taxonomy
Number of transactions
having this category
Using
adjusted
taxonomy
04/18/13 Data Mining: Principles and Algorithms
24
25. Latent Semantic Indexing [SAR00b]
=
Sk I’
R R UUk S
k Ik’
mXn k
mXr rXr
k k k
rXn
The reconstructed matrix Rk = Uk.Sk.Ik’ is the closest
rank-k matrix to the original matrix R.
• Captures latent associations
• Reduced space is less-noisy
04/18/13 Data Mining: Principles and Algorithms
25
26. Are We Done? (2)
Not adequately
Q2:How to Select Neighbors? answered
We don’t expect to use the same neighbors
for all products
Neighbors should be product-category
specific
Q2-B. How can we determine whether or not a
user is relevant to a given product?
04/18/13 Data Mining: Principles and Algorithms
26
27. Selecting Relevant Instances
[YU01]
Superman and Batman and correlated
Predict this
Titanic and Batman are negatively correlated
“Dances with Wolves” has nothing to do with Batman’s rating
Karen is not a good instance to consider
How can we formalize this? Mutual Information
MI(X;Y) = H(X) – H(X|Y)
04/18/13 Data Mining: Principles and Algorithms
27
28. Selecting Relevant Instances (2)
Offline phase:
Estimate mutual information between items
For each item:
Find users who rated it
Compute their strength (how many relevant items
they also rated)
Retain subset of them (10% works fine)
Online phase:
To predict the target item’s rating, run k-NN on
its reduced instance space
Better results with less data… quality not quantity is what matter
04/18/13 Data Mining: Principles and Algorithms
28
29. Are We Done? (3)
Q3:How to combine?
Weighted average
Discover association rules in neighbors’ transactions
[LEE01, WAN04]
For every x in this group:
like(x, Item1) ^ like(x, Item2) like(x, Item3)
Use confidence and support to judge the quality of the
prediction
Prediction is done on the binary level (like, dislike)
Costly to run online
04/18/13 Data Mining: Principles and Algorithms
29
30. User-User Methods Evaluation
Achieve good quality in practice
The more processing we push offline, the better
the method scale
However:
User preference is dynamic
High update frequency of offline-calculated
information
No recommendation for new users
We don’t know much about them yet
04/18/13 Data Mining: Principles and Algorithms
30
32. Item-Item Similarity: The Intuition
Search for similarities among items
All computations can be done offline
Item-Item similarity is more stable that user-user
similarity
No need for frequent updates
First Order Models
Correlation Analysis
Linear Regression
Higher Order Models
Belief Network
Association Rule Mining
04/18/13 Data Mining: Principles and Algorithms
32
33. Correlation-based Methods [SAR01]
Same as in user-user similarity but on item vectors
Pearson correlation coefficient
Look for users who rated both items
i1 ii ij in
∑ (r uj − r )(rui − ri )
j
u1
sij = u∈ Users Rated Both Items
∑ (ruj − rj ) 2
u∈Users Rated Both Items
∑ (rui − ri ) 2
u∈Users Rated Both Items
um
04/18/13 Data Mining: Principles and Algorithms
33
34. Correlation-based Methods (2)
Offline phase:
Calculate n(n-1) similarity measures
For each item
Determine its k-most similar items
Online phase:
Predict rating for a given user-item pair as a
weighted sum over similar items that he rated
∑s r ij ai
raj = i∈similar items Ua 2 3 ? 4
∑s ij
i∈similar items
j
04/18/13 Data Mining: Principles and Algorithms
34
35. Regression Based Methods [VUC00]
Offline phase:
Fit n(n-1) linear regressions
F (x) is a linear transformation of a user rating on
ij
item i to his rating on item j
Online phase
Same as previous method
The weights are inversely proportional to the
regression error rates
∑ w f (r ij ij
i∈rated items by a
ai )
raj =
∑w ij
i∈rated items by a
04/18/13 Data Mining: Principles and Algorithms
35
36. Higher Order Models
Previous approaches used the Naïve Bayes
assumption
Item effects on a given one are independent
Not always true
Higher order models can do better
Belief Network
Association Rule Mining
04/18/13 Data Mining: Principles and Algorithms
36
37. Bayesian Belief Network: introduction
Bayesian belief network allows a subset of the variables to
be conditionally independent
A graphical model of causal relationships
Represents dependency among the variables
Gives a specification of joint probability distribution
Nodes: random variables
Links: dependency
X Y X,Y are the parents of Z, and Y is the
parent of P
Z No dependency between Z and P
P
Has no loops or cycles
04/18/13 Data Mining: Principles and Algorithms
37
38. Bayesian Belief Network: An Example
Family
Smoker
History
(FH, S) (FH, ~S) (~FH, S) (~FH, ~S)
LC 0.8 0.5 0.7 0.1
LungCancer Emphysema ~LC 0.2 0.5 0.3 0.9
The conditional probability table
for the variable LungCancer:
PositiveXRay Dyspnea Shows the conditional probability
for each possible combination of its
parents n
Bayesian Belief Networks P ( z1,..., zn) = ∏ P ( z i | Parents ( Z i ))
i =1
04/18/13 Data Mining: Principles and Algorithms
38
39. Belief Network for CF [BRE98]
Every item is a node
Binary rating (like, dislike)
Learn offline a belief network over the training date
CPT table at each node is represented as a decision tree
Use greedy algorithms to determine the best network
structure
Use probabilistic inference for online prediction
04/18/13 Data Mining: Principles and Algorithms
39
40. Belief Network for CF: An Example
CPT
Friends B.H
M.P
Probability
decision tree for the random variable “Melrose Palace” in
the movie domain
04/18/13 Data Mining: Principles and Algorithms
40
41. Association Rule Mining
Offline processing
Work on the binary level (like, dislike)
View user as market basket containing items
liked by user
Discover association rules between items
Online processing:
Match items that the active user like with rules
left hand side
Recommend rules’ consequent based on
support and confidence
04/18/13 Data Mining: Principles and Algorithms
41
42. Association Rule Mining : Problems
High support threshold leads to low coverage and may
eliminate important, but infrequent items from
consideration
Low support thresholds result in very large model sizes,
computationally expensive offline pattern discovery phase
and slower online matching phase
Solution:
Adaptive Association Rule Mining
04/18/13 Data Mining: Principles and Algorithms
42
43. Adaptive Association Rule Mining [LIN01]
Given:
transaction dataset
Desired number
minConfidence
of rules
target item
desired range for number of
rules
specified minimum confidence
minSupport
Find: set S of association rules for target item such that
number of rules in S is in given range
rules in S satisfy minimum confidence constraint
rules in S have higher support than rules not in S that satisfy above
constraints
04/18/13 Data Mining: Principles and Algorithms
43
44. Adaptive Association Rule Mining (2)
Discover rules with one item on the head
Like (x, item1) ^ Like (x, item2) Like(x,
target)
The miner discovers association rules iteratively
(for each target item) until the desired number of
rules are extracted
Support is adjusted per-item
04/18/13 Data Mining: Principles and Algorithms
44
45. Item-Item Methods: Why It Works?
Like(x,Book1)^like(x,book2) Like(x,Movie1) like(x,Movie2)
like(x,book3)
Book1, Book2
Support Movie1 Support
Book Movie
gang gang
Without discovering the
We use the right neighbors for each groups themselves thus
item eliminating costly online
matching
In general better quality than user-user methods and better response time [LIN03]
04/18/13 Data Mining: Principles and Algorithms
45
46. Recent Work and Open Problems
Order-based methods
Ordering items is more informative than rating them
[KAM03] developed k-o’mean to work on orders
Preference-based methods
Total ordering of items is not feasible
Work on partial orders (preferences) [COH99]
Integrating background knowledge
User demographic information, item-features, etc..
Modeling time
Sequential patterns
04/18/13 Data Mining: Principles and Algorithms
46
47. References (1)
Charu C. Aggarwal, Joel L. Wolf, Kun-Lung Wu, Philip S. Yu: Horting Hatches
an Egg: A New Graph-Theoretic Approach to Collaborative Filtering. KDD
1999: 201-212
J. Breese, D. Heckerman, C. Kadie Empirical Analysis of Predictive
Algorithms for Collaborative Filtering. In Proc. 14th Conf. Uncertainty in
Artificial Intelligence, Madison, July 1998.
Yoon Ho Cho and Jae Kyeong Kim: Application of Web usage mining and
product taxonomy to collaborative recommendations in e-commerce. Expert
Systems with Applications, 26(2), 2003
William W. Cohen, Robert E. Schapire, and Yoram Singer. Learning to order
things. In Advances in Neural Processing Systems 10, Denver, CO, 1997
Jiawe Han, Fall 2003 online course notes available at:
http://www-courses.cs.uiuc.edu/~cs397han/slides/05.ppt
Toshihiro Kamishima: Nantonac collaborative filtering: recommendation
based on order responses. KDD 2003: 583-588
Lee, C.-H, Kim, Y.-H., Rhee, P.-K. Web personalization expert with combining
collaborative filtering and association rule mining technique. Expert Systems
with Applications, v 21, n 3, October, 2001, p 131-137
04/18/13 Data Mining: Principles and Algorithms
47
48. References (2)
W. Lin, 2001P, online presentation available at: http://www.wiwi.hu-
berlin.de/~myra/WEBKDD2000/WEBKDD2000_ARCHIVE/LinAlvarezRuiz_W
ebKDD2000.ppt
Weiyang Lin, Sergio A. Alvarez, and Carolina Ruiz. Efficient adaptive-support
association rule mining for recommender systems. Data Mining and
Knowledge Discovery, 6:83--105, 2002
G. Linden, B. Smith, and J. York, "Amazon.com Recommendations Iemto
-item collaborative filtering", IEEE Internet Computing, Vo. 7, No. 1, pp. 7680,
Jan. 2003.
Badrul M. Sarwar, George Karypis, Joseph A. Konstan, John Riedl: Analysis
of recommendation algorithms for e-commerce. ACM Conf. Electronic
Commerce 2000: 158-167
B. Sarwar, G. Karypis, J. Konstan, and J. Riedl: Application of dimensionality
reduction in recommender systems--a case study. In ACM WebKDD 2000
Web Mining for E-Commerce Workshop, 2000.
B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Riedl. Item-based
collaborative filtering recommendation algorithms. WWW’01
04/18/13 Data Mining: Principles and Algorithms
48
49. References (3)
B. Sarwar, 2000P, online presentation available at: http://www.wiwi.hu-
berlin.de/~myra/WEBKDD2000/WEBKDD2000_ARCHIVE/badrul.ppt
J. Ben Schafer, Joseph A. Konstan, John Riedl: E-Commerce
Recommendation Applications. Data Mining and Knowledge Discovery 5(1/2):
115-153, 2001
L.H. Ungar and D.P. Foster: Clustering Methods for Collaborative Filtering,
AAAI Workshop on Recommendation Systems, 1998.
Yi-Fan Wang, Yu-Liang Chuang, Mei-Hua Hsu and Huan-Chao Keh: A
personalized recommender system for the cosmetic business. Expert
Systems with Applications, v 26, n 3, April, 2004 Pages 427-434
S. Vucetic and Z. Obradovic. A regression-based approach for scaling-up
personalized recommender systems in e-commerce. In ACM WebKDD 2000
Web Mining for E-Commerce Workshop, 2000.
Kai Yu, Xiaowei Xu, Martin Ester, and Hans-Peter Kriegel: Selecting relevant
instances for efficient accurate collaborative filtering. In Proceedings of the
10th CIKM, pages 239--246. ACM Press, 2001.
Cheng Zhai, Spring 2003 online course notes available at:
http://sifaka.cs.uiuc.edu/course/2003-497CXZ/loc/cf.ppt
04/18/13 Data Mining: Principles and Algorithms
49
50. 04/18/13 Data Mining: Principles and Algorithms
50