1. (In) Formal Concept Analysis
Prof. Kim Mens
Louvain School of Engineering
Department of Computing Science and Engineering
UCL
http://www.info.ucl.ac.be/~km
Lecture Notes : (In)Formal concept analysis 30/03/2009
2. Information explosion
IT advances in the last decade(s)
have caused an explosion of
information
E.g., growth of the internet
This leads to a real information
overload
How to manage (i.e., search,
structure) all that information?
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
2 / 48
3. (Small) example
Dataset = someone’s iTunes™ music library
≥ 5000 songs each having a name, artist, rating, genre, ...
How to manage all that data
How to find a song we like?
Can we find interesting relations bet ween songs?
which songs are similar?
in what way are they similar?
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
3 / 48
4. Managing large data sets
Given a data set with many thousands of elements:
web pages, text or other documents
In
data libraries (books, songs, movies, ...) ge n
e ra
customer and personnel databases l ...
having certain properties:
indexes, relevant keywords, tags, genres, ...
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
4 / 48
5. Managing large data sets
Given a data set with many thousands of elements:
web pages, text or other documents
data libraries (books, songs, movies, ...)
customer and personnel databases
Questions
1. How to find relevant data?
2. How to discover (hidden) structure in that data?
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
5 / 48
6. Running example (revisited)
Songs Genres
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
6 / 48
7. Running example
How to manage all those songs?
Three concrete applications
1. Finding a song based on its genre
2. Discover (un)expected dependencies bet ween genres
• as well as absence of expected dependencies
3. Discover a user profile
• e.g., what songs does she like most
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
7 / 48
8. A Google-like search engine for songs
Galois
Genres (separated by spaces) : party dance
search
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
8 / 48
9. A Google-like search engine for songs
Galois
Genres (separated by spaces) : party dance
search
Search results [ party, dance ] :
• Technologic – Daft Punk
• Whole Again - Atomic Kitten
• Get Busy - Sean Paul
• Destination Calabria – Alex Gaudino
• Rock This Party – Bob Sinclar
Refine search by genres :
• [ slow, pop, soft ]
• [ beat ]
Remove genres from search :
• party
• dance
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
9 / 48
10. A Google-like search engine for songs
Galois
Genres (separated by spaces) : party dance beat
search
Search results [ party, dance, beat ] :
• Technologic – Daft Punk
• Get Busy - Sean Paul
• Destination Calabria – Alex Gaudino
• Rock This Party – Bob Sinclar
Refine search by genres :
• [ electronic ]
• [ reggae ]
Remove genres from search :
• party
• dance
• beat
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
10 / 48
11. A Google-like search engine for songs
Galois
Genres (separated by spaces) : party dance beat reggae
search
Search results [ party, dance, beat, reggae ] :
• Get Busy - Sean Paul
Remove genres from search :
• party
• dance
• beat
• reggae
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
11 / 48
12. A Google-like search engine for songs
Galois
Genres (separated by spaces) : party reggae
search
Search results [ party, reggae ] :
• Could You Be Loved – Bob Marley
Refine search by genres :
• [ dance, beat ]
Remove genres from search :
• party
• reggae
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
12 / 48
13. Running example
How to manage all those songs?
Three concrete applications:
1. Finding a song based on its genre
2. Discover (un)expected dependencies bet ween genres
• as well as absence of expected dependencies
3. Discover a user profile
• what songs does she like most
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
13 / 48
14. Structure of the world-wide music scene
?
http://sixdegrees.hu/last.fm/index.html
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
14 / 48
15. Dependencies bet ween genres
New wave is so eighties
Dance music is party music
Disco is from the seventies
Classical music and slows are for softies
...
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
15 / 48
16. Running example
How to manage all those songs?
Three concrete applications:
1. Finding a song based on its genre
2. Discover (un)expected dependencies bet ween genres
• as well as absence of expected dependencies
3. Discover a user profile
• what songs does she like most
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
16 / 48
17. Discover a user profile
To analyse the preferred genres of a user
for match-making or publicity purposes
For example,
most of her music is party music
she likes background music
she’s not such a big fan of classical
none of her music is hard
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
17 / 48
18. Running example
we
c a n i s?
How to manage all those songs?
how ll th
So
ve a
c h ie
Three concrete applications:
a
1. Finding a song based on its genre
2. Discover (un)expected dependencies bet ween genres
• as well as absence of expected dependencies
3. Discover a user profile
• what songs does she like most
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
18 / 48
19. Formal concept analysis...
... may be of help
FCA was invented around 1980 in Darmstadt as a
mathematical theory for modelling the notion of a “concept”
Since then it has been applied in many domains of computer
science dealing with large data sets
data analysis
knowledge discovery
soft ware engineering
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
19 / 48
20. Data set is represented by a
“context”
Objects Attributes
Relation
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
20 / 48
21. Formal concept analysis...
Starts from a context C
a set G of objects
a set M of attributes
a relation I bet ween the objects and the attributes
Determines concepts
Maximal groups of objects and attributes
Plus hierarchical relationships
Subset relationships bet ween those groups
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
21 / 48
22. A “concept” represents a group of
related objects and attributes
Intuitively, we look for maximal
“rectangles” in the binary relation I
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
22 / 48
23. A concept
Objects Attributes
New Wave Party Eighties
Alice - Sisters of Mercy
A Forest - The Cure
A concept is a maximal group of objects and attributes
Group:
Every object of the concept has those attributes
Every attribute of the concept holds for those objects
Maximal
No other object (outside the concept) has those same attributes
No other attribute (outside the concept) is shared by these objects
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
23 / 48
24. Not a concept
Need to include this as well Need to include this
Intuitively, we look for maximal
“rectangles” in the binary relation I
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
24 / 48
25. Formal concept analysis...
... derives hierarchies of concepts from data sets
It generates and visualizes hierarchies of concepts on a
mathematically founded basis
FCA
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
25 / 48
26. A concept hierarchy
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
26 / 48
27. Yet another concept
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
27 / 48
28. A subconcept
The blue concept is a subconcept of the green one.
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
28 / 48
29. A subconcept
Party Dance Beat
Technologic
In Da Club
Get Busy
Destination Calabria
Rock This Party
is subconcept of
is subset of is subset of
Party Electronic Dance Beat
Technologic
Destination Calabria
Rock This Party
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
29 / 48
30. Concept lattice
For a given context, the set of all formal concepts, together
with the partial order “is subconcept of” form a lattice
A lattice is a mathematical structure with some interesting
properties:
for any t wo concepts there is always a greatest common
subconcept and a least common superconcept
it is even a complete lattice, i.e. a unique top (least common
superconcept) and bottom element (greatest common
subconcept) exist
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
30 / 48
31. A concept lattice
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
31 / 48
32. A concept lattice
Party Dance Beat
Technologic
In Da Club
Get Busy
Destination Calabria
Rock This Party
is subconcept of
Party Electronic Dance Beat
New Wave Party Eighties
Alice – Sisters of Mercy Technologic
Forest – The Cure
Destination Calabria
Rock This Party
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
32 / 48
34. A concept lattice in detail
(sparse labelling)
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
34 / 48
35. Running example revisited
How does it work?
How to manage all those songs?
Three concrete applications
1. Finding a song based on its genre
2. Discover (un)expected dependencies bet ween genres
• as well as absence of expected dependencies
3. Discover a user profile
• e.g., what songs does she like most
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
35 / 48
36. A Google-like search engine for songs
Galois
Genres (separated by spaces) : party dance
search
Search results [ party, dance ] :
• Technologic – Daft Punk
• Whole Again - Atomic Kitten
• Get Busy - Sean Paul
• Destination Calabria – Alex Gaudino
• Rock This Party – Bob Sinclar
Refine search by genres :
• [ slow, pop, soft ]
• [ beat ]
Remove genres from search :
• party
• dance
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
36 / 48
37. A Google-like search engine for songs
Galois
Genres (separated by spaces) : party dance beat
search
Search results [ party, dance, beat ] :
• Technologic – Daft Punk
• Get Busy - Sean Paul
• Destination Calabria – Alex Gaudino
• Rock This Party – Bob Sinclar
Refine search by genres :
• [ electronic ]
• [ reggae ]
Remove genres from search :
• party
• dance
• beat
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
37 / 48
38. A Google-like search engine for songs
Galois
Genres (separated by spaces) : party dance beat reggae
search
Search results [ party, dance, beat, reggae ] :
• Get Busy - Sean Paul
Refine search by genres :
• [ electronic ]
• [ reggae ]
Remove genres from search :
• party
• dance
• beat
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
38 / 48
39. A Google-like search engine for songs
Galois
Genres (separated by spaces) : party reggae
search
Search results [ party, reggae ] :
• Could You Be Loved – Bob Marley
Refine search by genres :
• [ dance, beat ]
Remove genres from search :
• party
• reggae
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
39 / 48
40. Running example revisited
How does it work?
How to manage all those songs?
Three concrete applications
1. Finding a song based on its genre
2. Discover (un)expected dependencies bet ween genres
• as well as absence of expected dependencies
3. Discover a user profile
• e.g., what songs does she like most
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
40 / 48
41. Implications
Dance music is party music
Slows are soft
Disco is from the seventies
Classical music is soft
New wave is
from the eighties
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
41 / 48
42. Implications
Dance music is party music
Slows are soft
New wave is from the eighties
Disco is from the seventies
Classical music is soft
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
42 / 48
43. Associations
Most dance music has a beat
Most of her music is party music
A lot of music from the eighties is party music
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
43 / 48
44. Running example revisited
How does it work?
How to manage all those songs?
Three concrete applications
1. Finding a song based on its genre
2. Discover (un)expected dependencies bet ween genres
• as well as absence of expected dependencies
3. Discover a user profile
• e.g., what songs does she like most
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
44 / 48
45. Concept lattice
(with number of objects)
Also likes some Preferred music
background music is party music
and so on ...
Not such a big
fan of classical
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
45 / 48
46. Some problems...
Concept lattice can get very dense for large data sets
Concept lattice can grow exponential in size of context
Attributes are not always binary
What if data is incomplete or imprecise
False positives and negatives
...
(Some solutions have been proposed to overcome these
problems)
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
46 / 48
47. Conclusion
FCA is an interesting technique to analyse large data sets
especially to discover interesting concepts, relations and
structures in the data
Can be applied to many application domains
Based on a formal mathematical theory
Yet easy to use and understand intuitively
Quality of results depends on size and quality of the data
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
47 / 48
48. Sources
B. Ganter, R. Wille: Formal Concept Analysis –Mathematical Foundations. Springer,
Heidelberg 1999
Uta Priss’ Formal Concept Analysis Homepage
http://www.upriss.org.uk/fca/fca.html
Gerd Stumme’s course “Formale Begriffsanalyse”
http://www.kde.cs.uni-kassel.de/lehre/ss2005/formale_begriffsanalyse
Context Explorer (ConExp)
http://conexp.sourceforge.net/
J. Fallon: Application des treillis de Galois à la recherche d’informations. Master’s
thesis, Université catholique de Louvain, Département d’Ingénierie Informatique,
2004
Lecture Notes : (In)Formal concept analysis Prof. Kim Mens – UCL, Belgium 30/03/2009
48 / 48