SlideShare a Scribd company logo
1 of 35
December 14-16, 2015, Serena Hotel, Islamabad
13th International Conference on Frontiers of Information Technology (FIT), 2015
Multi-View Clustering
Algorithms and Applications
Presented by
Syed Fawad Hussain, PhD
Ghulam Ishaq Khan Institute of Engineering Sciences
and Technology.
Invited Talk, FIT 2015
Outline
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
Multi-View Clustering: Algorithms and Applications
2
1. Introduction
1. Data generation
2. Motivation
2. Clustering and Co-Clustering
1. Traditional Clustering
2. Co-clustering
3. Multi-View Multi-Dimensional Clustering
1. Multiview data
2. Knowledge transfer between views
3. Experimental results
4. Application Areas of Multi-View Clustering
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Information Generation
 A huge percentage of information is
generated (mostly un-structured)
documents, journals, web pages, emails...
 Information is usually generated
from different sources
 Different languages (for web pages)
 Different feature extractors (e.g. images)
 Different links (citation data)
 Different sections (movie data from imdb)
 Etc.
1. Introduction
Syed Fawad Hussain, PhD
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Views
 Data is described by a set of variables/features
 Words describing documents
 Keywords describing movies
 Links describing webpages
 Actors describing movies
 Features describing images
 Sound describing video clips, etc.
 A view?
 A set of features/attributes/variables describing a set of
objects/instances.
 Is independent, and individually sufficient for learning
4
1. Introduction
Syed Fawad Hussain, PhD
Clustering
5
2. Clustering and Co-Clustering
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
 Division of data into groups of ‘similar objects’
 Classical clustering algorithms are based on “similarities” and
organize data into classes such that there is
 high intra-class similarity
 low inter-class similarity
 Example:
P1(1,2), P2(2,2)
P3(4,5), P4(5,7),
P1 P2 P3 P4
P1 0 1 18 41
P2 1 0 13 34
P3 18 13 0 5
P4 41 34 5 0
C1 {P1,P2}
C2 {P3,P4}
Co-Clustering
6
2. Clustering and Co-Clustering
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
 How to automatically find semantic relationship in the data?
 How to calculate similarity between documents?
Basic Idea:
 Two documents are similar if they contain similar words
 Two words are similar if they occur in similar documents
 Solution?
 Create similarity matrices R – between docs, and C – between words
 Iteratively update R and C using the other.
Boeing recently unveiled its
new B787 aircraft dubbed
the “Dreamliner”.
Airbus’ latest A350 is a
next generation plane is
due to fly in 2013
d1 d2
Co-Clustering
7
2. Clustering and Co-Clustering
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
Hussain et al, 2010
 The algorithm is as follows
 Step 1 - Given A, define R(0)=I, C(0)=I
 Step 2 – for k=1 to t, do
Step 3: Output R(t) and C(t)
Co-Clustering
8
2. Clustering and Co-Clustering
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
 Bipartite Graph
 G=(V1,V2,E)
 V1={d1,d2,…,dm}
 V2={w1,w2,…,wn}
 E =Aij , iV1, j V2
Practically 4 iterations are enough
 Iteration 1:
 R(1) : Sim(d1,d2), Sim(d1,d3), …
 C(1): Sim(w1,w2), Sim(w1,w3), …
 Iteration 2:
 R(2) : Sim(d1,d4) via C24 and C34 …
 …
Successive iterations means
paths of increasing length
d1 d2 d3 d4
w1
w2 w3 w4 w5 w6
Aij
Co-Clustering
9
2. Clustering and Co-Clustering
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
Gene
clusters
0 10 20 30 40 50 60
0
1
2
( co-
62 )
0 10 20 30 40 50 60
-1
0
1
2
( co-
42 )
0 10 20 30 40 50 60
0
1
2
3
( co-
63 )
Expression
level
Expression
Expression
Expression
 Colon Cancer dataset
 1096 genes
 62 tissues  Normal (42) + Tumor (20)
Source: Hussain S.F, 2011
Single view vs Multiple views
 Are these “researchers” similar?
 Are their publication text similar?
 Do they often cite the same (group of) authors?
 Do they often publish in the same venue?
 Are these “movies” similar?
 Are they described by similar text in their plot?
 Do they have similar/same actors?
 Are they being described by similar keywords (genre)?
10
3. Multi-view Clustering
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
What are the natural grouping in this data?
11
3. Multi-view Clustering
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
Single view vs Multiple views
12
3. Multi-view Clustering
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
Movie: Titanic
Leonardo diCaprio Kate Winslet … …
ship Iceberg europe voyage …
romantic
tragedy
adventure
…
…
Movie by Actors
Movie by plot
Movie by genre
Source: imdb
Multi-view data
13
3. Multi-view Clustering
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
Movies-by-Actors Matrix
Movies/
actors
DiCaprio Kate Keanu Jolie
Titanic 1 1 0 0
Matrix 0 0 1 0
… … … … …
Movies-by-Keywords Matrix
Movies/
plot
ship iceberg Sci-fi murder
Titanic 1 1 0 0
Matrix 0 0 1 1
… … … … …
Movies-by-Genre Matrix
Movies/
genre
romantic tragedy war Sci-fi
Titanic 1 1 0 0
Matrix 0 0 0 1
… … … … …
Rows are similar across all views!
Clustering on multiple views
14
3. Multi-view Clustering
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
Movies-by-Keywords Matrix
Movies
Clustering 2
Intermediate result
Movies-by-Actors Matrix
Clustering 1
Intermediate result
Movies-by-Genre Matrix
Clustering 3
Intermediate result
Combined Clustering
Better than each
individual clustering
Multi-View Learning
 SIAM-Similar dataset: containing 1690 articles published in SIAM J MATRIX
ANAL A, SIAM J NUMER ANAL and SIAM J SCI COMPUT.
15
3. Multi-view Clustering
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
View Spectral Sum LMF
Abstract 0.2037
0.630 0.714
Title 0.2021
Keywords 0.2502
Authors 0.0017
citation 0.0078
[Wang et al, 2010]
Why it works?
 The probability of disagreement is bound by the probability of error in the
individual views
 Each view (must) have complementary information
 A single view is quite sparse (curse of dimensionality)
 The more informative the single views, the better the results.
16
3. Multi-view Clustering
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
Multi-view co-clustering
17
3. Multi-view Clustering
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
M: a single data view
R: row-row similarity matrix
C: col-col similarity matrix
χ-SIM : Co-clustering Algo
[Hussain et al, 2015]
Experimental setup
18
3. Multi-view Clustering
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
Dataset used
Experiments:
 Single view clustering
 Single view co-clustering
 Multi-view co-clustering
Results
19
3. Multi-view Clustering
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
Single View Co-Clustering Multi-View
𝐑(𝐭+𝟏)
= 𝐑𝐀
𝐭
𝐑(𝐭+𝟏)
= 𝐑𝐁
𝐭
VA VB VA VB VA VB VA VB
Cora 0.3209 0.3678 0.6004 0.3109 0.6004 0.7146 0.4453 0.3109
Citeseer 0.2503 0.3489 0.3783 0.3998 0.3783 0.5047 0.5897 0.3998
Cornell 0.3487 0.58974 0.3846 0.6051 0.3846 0.6051 0.4872 0.6051
Movies 0.2561 0.19125 0.2723 0.2253 0.2723 0.2853 0.2771 0.2253
Texas 0.3623 0.4670 0.4813 0.6791 0.4813 0.5508 0.6578 0.6791
Results
20
3. Multi-view Clustering
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
0.3678
0.3489
0.58974
0.2561
0.467
0.6004
0.3998
0.6051
0.2723
0.6791
0.7754
0.7135
0.7231
0.363
0.7754
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
C ORA C IT E S E E R C ORNE LL MOVIE S T E X A S
NMI
SCORE
DATASET
SINGLE VS MULTI-VIEW CLUSTERING
Single Co-clustering Multi-View
110.82 104.5 22.61 41.74 66.04
%
Increase
Co-Clustering of multi-view data
21
3. Multi-view Clustering
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
Original Matrix Cora Dataset Co-Cluster
Mideast Politics Motorcycles Baseball Computer
Graphics
Space
Jewish Ride Pitching Graphics Nasa
Israel Harleys Players Image Flight
Arab Camping Season Color Shuttle
Palestinian Bikers yankees display orbital
Success Stories
22
4. Application Areas of Multi-View Data
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
• Million-dollar prize
– Improve the baseline movie
recommendation approach of
Netflix by 10% in accuracy
– The top submissions all combine
several teams and algorithms as
an ensemble
Information Retrieval
23
4. Application Areas of Multi-View Data
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
IBM’s Watson
24
4. Application Areas of Multi-View Data
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
 Watson uses a variety of techniques like deep learning as
just one element in a very complicated ensemble of
techniques, ranging from the statistical technique of Bayesian
inference to deductive reasoning.
Keanu Reeves had a Nokia phone, but it took a land line to slip in & out
of this, the title of a 1999 sci-fi flick
Watson – Around 6 million rules, Access to 10 billion web pages, Massively
parallel Computing power (6000 computers), complex machine learning
algorithms.
Self Driving Google Cars
25
4. Application Areas of Multi-View Data
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
 Can so far driven
300,000 miles
without accident
 An average American
has an accident at
165,000 miles
 Uses multiple sources of information,
- Many Cameras ( for situational awareness),
- laser range finder ( for other traffic) ,
- GPS,
- Google maps, radar sensor, etc
Conclusion
 Data is growing at an enormous rate
 Capturing data is easy…using it is not!
26
5. Conclusion
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
“There are known knowns i.e. things we know that
we know; then there are known unknowns i.e.
things we know that we don’t know; and then we
have the unknown unknowns i.e. things we do not
know that we do not know.”
Donald Rumsfield
Former US Secretary of Defence
Conclusion
 No Free-Lunch theorem
 There is a lack of inherent superiority of any classifier
 If we make no prior assumption about the nature of the classification task, is any
classification method superior overall?
 Is any algorithm overall superior to random guessing?
 Answer is to both questions… NO!
 The Ugly-duckling theorem
 In the absence of assumptions there is no “best” feature representation.
 You need to try with a variety of methods, and
 You need to know your data, and
 You need to experiment a bit,
and finally
You need to contact and work with a machine learning expert
27
5. Conclusion
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
Thank You

28
References
[Xu,2013] C. Xu, D. Tao and C. Xu, A survey on multi-view learning, arXiv
preprint arXiv:1304.5634 (2013).
[Andew et. al, 2013] G. Andrew, R. Arora, J. Bilmes, and K. Livescu. Deep
canonical correlation analysis. In ICML, pp. 1247–1255, 2013
[Wang, 2009] W. Tang, Z. Lu and I. Dhillon, Clustering with multiple graphs, Data
Mining, 2009. ICDM'09. Ninth IEEE International Conference on. IEEE,
2009.
[Wang, ]W. Wang, R. Arora, K. Livescu, and J. Bilmes, On Deep Multi-View
Representation Learning, ” in Proc. of the 30th Int. Conf. Machine Learning
(ICML 2013), 2013, pp. 1247–1255.
29
Multi-view clustering
References
[Hussain, 2010] S.F. Hussain, C. Grimal, G. Bisson, An improved co-similarity
measure for document clustering. Machine Learning and Applications
(ICMLA), 2010 Ninth International Conference on. IEEE, 2010.
[Hussain, 2011] S.F. Hussain. "Bi-clustering gene expression data using co-
similarity." Advanced Data Mining and Applications. Springer Berlin
Heidelberg, 2011. 190-200.
[Hussain, 2015] Hussain, Syed Fawad, and Shariq Bashir. "Co-clustering of multi-
view datasets." Knowledge and Information Systems (2015): 1-26.
30
Multi-view clustering
Co-Clustering
31
3. Multi-View Multi-Dimensional Clustering
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
 Traditional clustering equates to finding groups in data “ under all
features/attributes”. In co-clustering (also called bi-clustering), the
pattern/behavior is usually observed under “a specified subset of
attributes/conditions”
 Preferred when
 Things behave different under
different subsets e.g. gene
expression data
 To improve clustering results
To minimize the effect of “curse
of dimensionality”
Direct multi-view constrained clustering
 Factorize all matrices at the same time under some constraint
where A(m) is a single view, P is the common factor shared between
all graphs, and Λ(m) captures the characteristics of each graph, α is a
weighting factor
 Deep Canonical Correlation Analysis[Andew et. al, 2013]
 Deep multi-view learning representation[Wang et al, 2015]
 Survey of Multi-View Clustering [Xu et. al., 2013]
32
2. Techniques to knowledge transfer
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
[Wang et. al, 2009]
Clustering on multiple views
33
1. Introduction
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
Movies-by-Actors Matrix Movies-by-genre Matrix
Movies-by-keywords Matrix
Movies
Clustering 1 Clustering 3
Clustering 2
Intermediate result Intermediate result Intermediate result
Using Intermediate Integration
 Combine information between views at the intermediate step
 Combine intermediate results (e.g. similarity matrices) from the views
34
2. Techniques to knowledge transfer
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD
Using Late Integration
 Combine information between views at the intermediate step
 Given 2 views of the data, X(1) and X(2)
 Cluster the views to generate two predictions P(1) and P(2)
 Use P(1) as a training label for next iteration of X(2) and vice versa
35
2. Techniques to knowledge transfer
13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015
Syed Fawad Hussain, PhD

More Related Content

Similar to 2015_FIT_Talk.pptx

3rd 3DDRESD: DRESD Future Plan 0809
3rd 3DDRESD: DRESD Future Plan 08093rd 3DDRESD: DRESD Future Plan 0809
3rd 3DDRESD: DRESD Future Plan 0809
Marco Santambrogio
 
VSim_Poster
VSim_PosterVSim_Poster
VSim_Poster
Joy Guey
 
TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs
TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designsTUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs
TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs
Hong-Linh Truong
 

Similar to 2015_FIT_Talk.pptx (20)

Smart Business and Artificial Intelligence
Smart Business and Artificial IntelligenceSmart Business and Artificial Intelligence
Smart Business and Artificial Intelligence
 
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...
 
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...
 
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
 
Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)Turning Data into Knowledge (KESW2014 Keynote)
Turning Data into Knowledge (KESW2014 Keynote)
 
Exploring Big Data Landscapes with Elastic Displays
Exploring Big Data Landscapes with Elastic DisplaysExploring Big Data Landscapes with Elastic Displays
Exploring Big Data Landscapes with Elastic Displays
 
profile summary
profile summaryprofile summary
profile summary
 
3rd 3DDRESD: DRESD Future Plan 0809
3rd 3DDRESD: DRESD Future Plan 08093rd 3DDRESD: DRESD Future Plan 0809
3rd 3DDRESD: DRESD Future Plan 0809
 
Building the iRODS Consortium
Building the iRODS ConsortiumBuilding the iRODS Consortium
Building the iRODS Consortium
 
“Semantic Technologies for Smart Services”
“Semantic Technologies for Smart Services” “Semantic Technologies for Smart Services”
“Semantic Technologies for Smart Services”
 
Thirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping StudyThirteen Years of SysML: A Systematic Mapping Study
Thirteen Years of SysML: A Systematic Mapping Study
 
Benchmarking Linked Data Introductory Remarks
Benchmarking Linked Data Introductory RemarksBenchmarking Linked Data Introductory Remarks
Benchmarking Linked Data Introductory Remarks
 
Data and science
Data and scienceData and science
Data and science
 
VSim_Poster
VSim_PosterVSim_Poster
VSim_Poster
 
PAKDD2013
PAKDD2013PAKDD2013
PAKDD2013
 
TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs
TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designsTUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs
TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs
 
Carpenter "The Future of the Scholarly Record"
Carpenter "The Future of the Scholarly Record"Carpenter "The Future of the Scholarly Record"
Carpenter "The Future of the Scholarly Record"
 
A Presentation of My Research Activity
A Presentation of My Research ActivityA Presentation of My Research Activity
A Presentation of My Research Activity
 
New Research Articles 2020 November Issue International Journal of Software E...
New Research Articles 2020 November Issue International Journal of Software E...New Research Articles 2020 November Issue International Journal of Software E...
New Research Articles 2020 November Issue International Journal of Software E...
 
OSFair2017 Workshop | Brokering services facilitating interoperability and da...
OSFair2017 Workshop | Brokering services facilitating interoperability and da...OSFair2017 Workshop | Brokering services facilitating interoperability and da...
OSFair2017 Workshop | Brokering services facilitating interoperability and da...
 

Recently uploaded

result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 

Recently uploaded (20)

Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 

2015_FIT_Talk.pptx

  • 1. December 14-16, 2015, Serena Hotel, Islamabad 13th International Conference on Frontiers of Information Technology (FIT), 2015 Multi-View Clustering Algorithms and Applications Presented by Syed Fawad Hussain, PhD Ghulam Ishaq Khan Institute of Engineering Sciences and Technology. Invited Talk, FIT 2015
  • 2. Outline 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD Multi-View Clustering: Algorithms and Applications 2 1. Introduction 1. Data generation 2. Motivation 2. Clustering and Co-Clustering 1. Traditional Clustering 2. Co-clustering 3. Multi-View Multi-Dimensional Clustering 1. Multiview data 2. Knowledge transfer between views 3. Experimental results 4. Application Areas of Multi-View Clustering
  • 3. 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Information Generation  A huge percentage of information is generated (mostly un-structured) documents, journals, web pages, emails...  Information is usually generated from different sources  Different languages (for web pages)  Different feature extractors (e.g. images)  Different links (citation data)  Different sections (movie data from imdb)  Etc. 1. Introduction Syed Fawad Hussain, PhD
  • 4. 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Views  Data is described by a set of variables/features  Words describing documents  Keywords describing movies  Links describing webpages  Actors describing movies  Features describing images  Sound describing video clips, etc.  A view?  A set of features/attributes/variables describing a set of objects/instances.  Is independent, and individually sufficient for learning 4 1. Introduction Syed Fawad Hussain, PhD
  • 5. Clustering 5 2. Clustering and Co-Clustering 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD  Division of data into groups of ‘similar objects’  Classical clustering algorithms are based on “similarities” and organize data into classes such that there is  high intra-class similarity  low inter-class similarity  Example: P1(1,2), P2(2,2) P3(4,5), P4(5,7), P1 P2 P3 P4 P1 0 1 18 41 P2 1 0 13 34 P3 18 13 0 5 P4 41 34 5 0 C1 {P1,P2} C2 {P3,P4}
  • 6. Co-Clustering 6 2. Clustering and Co-Clustering 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD  How to automatically find semantic relationship in the data?  How to calculate similarity between documents? Basic Idea:  Two documents are similar if they contain similar words  Two words are similar if they occur in similar documents  Solution?  Create similarity matrices R – between docs, and C – between words  Iteratively update R and C using the other. Boeing recently unveiled its new B787 aircraft dubbed the “Dreamliner”. Airbus’ latest A350 is a next generation plane is due to fly in 2013 d1 d2
  • 7. Co-Clustering 7 2. Clustering and Co-Clustering 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD Hussain et al, 2010  The algorithm is as follows  Step 1 - Given A, define R(0)=I, C(0)=I  Step 2 – for k=1 to t, do Step 3: Output R(t) and C(t)
  • 8. Co-Clustering 8 2. Clustering and Co-Clustering 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD  Bipartite Graph  G=(V1,V2,E)  V1={d1,d2,…,dm}  V2={w1,w2,…,wn}  E =Aij , iV1, j V2 Practically 4 iterations are enough  Iteration 1:  R(1) : Sim(d1,d2), Sim(d1,d3), …  C(1): Sim(w1,w2), Sim(w1,w3), …  Iteration 2:  R(2) : Sim(d1,d4) via C24 and C34 …  … Successive iterations means paths of increasing length d1 d2 d3 d4 w1 w2 w3 w4 w5 w6 Aij
  • 9. Co-Clustering 9 2. Clustering and Co-Clustering 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD Gene clusters 0 10 20 30 40 50 60 0 1 2 ( co- 62 ) 0 10 20 30 40 50 60 -1 0 1 2 ( co- 42 ) 0 10 20 30 40 50 60 0 1 2 3 ( co- 63 ) Expression level Expression Expression Expression  Colon Cancer dataset  1096 genes  62 tissues  Normal (42) + Tumor (20) Source: Hussain S.F, 2011
  • 10. Single view vs Multiple views  Are these “researchers” similar?  Are their publication text similar?  Do they often cite the same (group of) authors?  Do they often publish in the same venue?  Are these “movies” similar?  Are they described by similar text in their plot?  Do they have similar/same actors?  Are they being described by similar keywords (genre)? 10 3. Multi-view Clustering 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD
  • 11. What are the natural grouping in this data? 11 3. Multi-view Clustering 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD
  • 12. Single view vs Multiple views 12 3. Multi-view Clustering 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD Movie: Titanic Leonardo diCaprio Kate Winslet … … ship Iceberg europe voyage … romantic tragedy adventure … … Movie by Actors Movie by plot Movie by genre Source: imdb
  • 13. Multi-view data 13 3. Multi-view Clustering 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD Movies-by-Actors Matrix Movies/ actors DiCaprio Kate Keanu Jolie Titanic 1 1 0 0 Matrix 0 0 1 0 … … … … … Movies-by-Keywords Matrix Movies/ plot ship iceberg Sci-fi murder Titanic 1 1 0 0 Matrix 0 0 1 1 … … … … … Movies-by-Genre Matrix Movies/ genre romantic tragedy war Sci-fi Titanic 1 1 0 0 Matrix 0 0 0 1 … … … … … Rows are similar across all views!
  • 14. Clustering on multiple views 14 3. Multi-view Clustering 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD Movies-by-Keywords Matrix Movies Clustering 2 Intermediate result Movies-by-Actors Matrix Clustering 1 Intermediate result Movies-by-Genre Matrix Clustering 3 Intermediate result Combined Clustering Better than each individual clustering
  • 15. Multi-View Learning  SIAM-Similar dataset: containing 1690 articles published in SIAM J MATRIX ANAL A, SIAM J NUMER ANAL and SIAM J SCI COMPUT. 15 3. Multi-view Clustering 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD View Spectral Sum LMF Abstract 0.2037 0.630 0.714 Title 0.2021 Keywords 0.2502 Authors 0.0017 citation 0.0078 [Wang et al, 2010]
  • 16. Why it works?  The probability of disagreement is bound by the probability of error in the individual views  Each view (must) have complementary information  A single view is quite sparse (curse of dimensionality)  The more informative the single views, the better the results. 16 3. Multi-view Clustering 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD
  • 17. Multi-view co-clustering 17 3. Multi-view Clustering 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD M: a single data view R: row-row similarity matrix C: col-col similarity matrix χ-SIM : Co-clustering Algo [Hussain et al, 2015]
  • 18. Experimental setup 18 3. Multi-view Clustering 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD Dataset used Experiments:  Single view clustering  Single view co-clustering  Multi-view co-clustering
  • 19. Results 19 3. Multi-view Clustering 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD Single View Co-Clustering Multi-View 𝐑(𝐭+𝟏) = 𝐑𝐀 𝐭 𝐑(𝐭+𝟏) = 𝐑𝐁 𝐭 VA VB VA VB VA VB VA VB Cora 0.3209 0.3678 0.6004 0.3109 0.6004 0.7146 0.4453 0.3109 Citeseer 0.2503 0.3489 0.3783 0.3998 0.3783 0.5047 0.5897 0.3998 Cornell 0.3487 0.58974 0.3846 0.6051 0.3846 0.6051 0.4872 0.6051 Movies 0.2561 0.19125 0.2723 0.2253 0.2723 0.2853 0.2771 0.2253 Texas 0.3623 0.4670 0.4813 0.6791 0.4813 0.5508 0.6578 0.6791
  • 20. Results 20 3. Multi-view Clustering 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD 0.3678 0.3489 0.58974 0.2561 0.467 0.6004 0.3998 0.6051 0.2723 0.6791 0.7754 0.7135 0.7231 0.363 0.7754 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 C ORA C IT E S E E R C ORNE LL MOVIE S T E X A S NMI SCORE DATASET SINGLE VS MULTI-VIEW CLUSTERING Single Co-clustering Multi-View 110.82 104.5 22.61 41.74 66.04 % Increase
  • 21. Co-Clustering of multi-view data 21 3. Multi-view Clustering 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD Original Matrix Cora Dataset Co-Cluster Mideast Politics Motorcycles Baseball Computer Graphics Space Jewish Ride Pitching Graphics Nasa Israel Harleys Players Image Flight Arab Camping Season Color Shuttle Palestinian Bikers yankees display orbital
  • 22. Success Stories 22 4. Application Areas of Multi-View Data 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD • Million-dollar prize – Improve the baseline movie recommendation approach of Netflix by 10% in accuracy – The top submissions all combine several teams and algorithms as an ensemble
  • 23. Information Retrieval 23 4. Application Areas of Multi-View Data 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD
  • 24. IBM’s Watson 24 4. Application Areas of Multi-View Data 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD  Watson uses a variety of techniques like deep learning as just one element in a very complicated ensemble of techniques, ranging from the statistical technique of Bayesian inference to deductive reasoning. Keanu Reeves had a Nokia phone, but it took a land line to slip in & out of this, the title of a 1999 sci-fi flick Watson – Around 6 million rules, Access to 10 billion web pages, Massively parallel Computing power (6000 computers), complex machine learning algorithms.
  • 25. Self Driving Google Cars 25 4. Application Areas of Multi-View Data 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD  Can so far driven 300,000 miles without accident  An average American has an accident at 165,000 miles  Uses multiple sources of information, - Many Cameras ( for situational awareness), - laser range finder ( for other traffic) , - GPS, - Google maps, radar sensor, etc
  • 26. Conclusion  Data is growing at an enormous rate  Capturing data is easy…using it is not! 26 5. Conclusion 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD “There are known knowns i.e. things we know that we know; then there are known unknowns i.e. things we know that we don’t know; and then we have the unknown unknowns i.e. things we do not know that we do not know.” Donald Rumsfield Former US Secretary of Defence
  • 27. Conclusion  No Free-Lunch theorem  There is a lack of inherent superiority of any classifier  If we make no prior assumption about the nature of the classification task, is any classification method superior overall?  Is any algorithm overall superior to random guessing?  Answer is to both questions… NO!  The Ugly-duckling theorem  In the absence of assumptions there is no “best” feature representation.  You need to try with a variety of methods, and  You need to know your data, and  You need to experiment a bit, and finally You need to contact and work with a machine learning expert 27 5. Conclusion 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD
  • 29. References [Xu,2013] C. Xu, D. Tao and C. Xu, A survey on multi-view learning, arXiv preprint arXiv:1304.5634 (2013). [Andew et. al, 2013] G. Andrew, R. Arora, J. Bilmes, and K. Livescu. Deep canonical correlation analysis. In ICML, pp. 1247–1255, 2013 [Wang, 2009] W. Tang, Z. Lu and I. Dhillon, Clustering with multiple graphs, Data Mining, 2009. ICDM'09. Ninth IEEE International Conference on. IEEE, 2009. [Wang, ]W. Wang, R. Arora, K. Livescu, and J. Bilmes, On Deep Multi-View Representation Learning, ” in Proc. of the 30th Int. Conf. Machine Learning (ICML 2013), 2013, pp. 1247–1255. 29 Multi-view clustering
  • 30. References [Hussain, 2010] S.F. Hussain, C. Grimal, G. Bisson, An improved co-similarity measure for document clustering. Machine Learning and Applications (ICMLA), 2010 Ninth International Conference on. IEEE, 2010. [Hussain, 2011] S.F. Hussain. "Bi-clustering gene expression data using co- similarity." Advanced Data Mining and Applications. Springer Berlin Heidelberg, 2011. 190-200. [Hussain, 2015] Hussain, Syed Fawad, and Shariq Bashir. "Co-clustering of multi- view datasets." Knowledge and Information Systems (2015): 1-26. 30 Multi-view clustering
  • 31. Co-Clustering 31 3. Multi-View Multi-Dimensional Clustering 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD  Traditional clustering equates to finding groups in data “ under all features/attributes”. In co-clustering (also called bi-clustering), the pattern/behavior is usually observed under “a specified subset of attributes/conditions”  Preferred when  Things behave different under different subsets e.g. gene expression data  To improve clustering results To minimize the effect of “curse of dimensionality”
  • 32. Direct multi-view constrained clustering  Factorize all matrices at the same time under some constraint where A(m) is a single view, P is the common factor shared between all graphs, and Λ(m) captures the characteristics of each graph, α is a weighting factor  Deep Canonical Correlation Analysis[Andew et. al, 2013]  Deep multi-view learning representation[Wang et al, 2015]  Survey of Multi-View Clustering [Xu et. al., 2013] 32 2. Techniques to knowledge transfer 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD [Wang et. al, 2009]
  • 33. Clustering on multiple views 33 1. Introduction 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD Movies-by-Actors Matrix Movies-by-genre Matrix Movies-by-keywords Matrix Movies Clustering 1 Clustering 3 Clustering 2 Intermediate result Intermediate result Intermediate result
  • 34. Using Intermediate Integration  Combine information between views at the intermediate step  Combine intermediate results (e.g. similarity matrices) from the views 34 2. Techniques to knowledge transfer 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD
  • 35. Using Late Integration  Combine information between views at the intermediate step  Given 2 views of the data, X(1) and X(2)  Cluster the views to generate two predictions P(1) and P(2)  Use P(1) as a training label for next iteration of X(2) and vice versa 35 2. Techniques to knowledge transfer 13th Internaitonal Conference on Frontiers of IT, December 14-16, 2015 Syed Fawad Hussain, PhD