SlideShare a Scribd company logo
1 of 19
Two Birds with One Stone: A Graph-based Framework for Disambiguating and Tagging People Names in Web Search Lili Jiang1, Jianyong Wang2, Ning An3, Shengyuan Wang2, Jian Zhan2, Lian Li1 1School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China 2Department of Computer Science and Technology, Tsinghua University, Beijing, China 3Oracle Corporation, Nashua, NH, USA jianglili06@lzu.cn {jianyong,wwssyy,zhanjian}@tsinghua.edu.cn;ning.an@oracle.com;lil@lzu.edu.cn Presenter: Kuan-HuaHuo (霍冠樺) Date: 2010/04/13 WWW 2009 MADRID! Poster Sessions: Friday, April 24, 2009 1
2 Outline  INTRODUCTION THE FRAMEWORK FOR WEB PEOPLE NAME DISAMBIGUATION AND TAGGING Graph Modeling Clustering Algorithm Tagging a Namesake EXPERIMENTAL EVALUATION CONCLUSION 2
INTRODUCTION (1/2) We have observed that people tag information, such as  locations and  e-mail addresses,  appears to be informative,  and a combination of such tags can almost identify a unique target people. 3
INTRODUCTION (2/2) The contributions of this work include:  a) A novel weighted graph representation was proposed to model the relationships between tags;  b) An effective graph clustering algorithm was devised to disambiguate and tag people names; c) An extensive performance study was conducted, which shows that the proposed framework achieves much better performance than previous approaches. 4
THE FRAMEWORK FOR WEB PEOPLE NAME DISAMBIGUATION AND TAGGING Graph Modeling Clustering Algorithm Tagging a Namesake 5
Graph Modeling (1/6) Document corpus D = {d1, d2, . . . , dk} be the top k returned results from a search engine. We extract seven types of tags T = {people name, organization, location, email address, phone number, date, occupation} from chunk windows around the query name in each cleaned document A = S7 i=1, is a union of seven tag sets where Ti contains the non-repetitive tags with the type  6
Graph Modeling (2/6) Regarding a people name, the proposed framework  views the tag corpus A as an undirected labeled graph G=(V,E), The node set V = {v1, v2, . . .} is a set of unique tags.  Each edge (vi, vj) 2 E        represents that the tags cor-responding to viand vj co-occur in a same  document in D  7
Graph Modeling (3/6) Bridge-tags: the tags occurring in multiple documents. The maximal clique subgraph containing all tags in a single document as a micro-cluster. 8 ,[object Object]
Nodes 3, 4 and 5: bridge-tag.
There are in total four micro-clusters.,[object Object]
Graph Modeling (5/6) Node Weighting The higher the number of unique tags of a certain type, the smaller the weight of each node with this tag type. v denotes any tag with the type  ti2 T.  |A| is the number of non-repetitive tags in D |Ti| is the number of unique tags of the type “ti”. 10
Graph Modeling (6/6) Edge Weighting Assuming the conditional independence between any two unique tags, we have the joint probability for two connected nodes v and w, p(vw) = p(v)p(w). Then the edge between v and w, is weighted as where n is the number of documents in which tags v and w co-occur, pi(v) is the probability distribution obtained from the frequency of tag v divided by the total number of all tag occurrences in the ith document di. 11
Clustering Algorithm (1/2) We develop a two-step clustering approach for the problem of name disambiguation. The first step: To identify the complete set of micro-clusters (i.e., the maximal clique subgraphs). The second step: To use a single link clustering method to further cluster the set of micro-clusters generated from the previous step into a final set of macro-clusters. 12
Clustering Algorithm (2/2) Each micro-cluster M is stored as a vector of tags The similarity between any two micro-clusters, ConnectivityStrength, is estimated using cosine distance.  The weight of any tag v in      is defined by the following Cohesion formula. 13
Tagging a Namesake Tags with higher Cohesion weight in a cluster are selected to describe the corresponding namesake effectively. 14
EXPERIMENTAL EVALUATION (1/2) The measures used in the experiments are FEB (the Extended B-cubed measure based on Precision and Recall) and FPI (the harmonic mean of Purity and Inverse Purity) [1]. The threshold for ConnectivityStrength used in clustering was empirically estimated based on the training data. 15
EXPERIMENTAL EVALUATION (2/2) The experimental results show that the approach achieves much better overall performance (measured by FEB and FPI ) in name disambiguation than the state-of-the-art systems. In addition, it can output a set of selected tags to describe each cluster (corresponding to a namesake). 16
CONCLUSION (1/2) A novel weighted-graph based framework Type  weights for tags and a new way (i.e., Cohesion) to measure the importance of a tag in an unsupervised clustering algorithm 17
CONCLUSION (2/2) Experimental results show that the proposed ap-proach outperforms all the solutions in a recent Web people search task. In the future, we plan to apply the proposed framework in a real Web people search system. 18

More Related Content

What's hot

Investigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysisInvestigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysistuxette
 
Home Page Live(Www2007)
Home Page Live(Www2007)Home Page Live(Www2007)
Home Page Live(Www2007)tomelf2007
 
Fuzzy c means clustering protocol for wireless sensor networks
Fuzzy c means clustering protocol for wireless sensor networksFuzzy c means clustering protocol for wireless sensor networks
Fuzzy c means clustering protocol for wireless sensor networksmourya chandra
 
Finding Neighbors in Images Represented By Quadtree
Finding Neighbors in Images Represented By QuadtreeFinding Neighbors in Images Represented By Quadtree
Finding Neighbors in Images Represented By Quadtreeiosrjce
 
What is Odd about Binary Parseval Frames
What is Odd about Binary Parseval FramesWhat is Odd about Binary Parseval Frames
What is Odd about Binary Parseval FramesMicah Bullock
 
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...cscpconf
 
The Relation between a Framework for Collaborative Ontology Engineering and N...
The Relation between a Framework for Collaborative Ontology Engineering and N...The Relation between a Framework for Collaborative Ontology Engineering and N...
The Relation between a Framework for Collaborative Ontology Engineering and N...Christophe Debruyne
 
A survey on methods and applications of meta-learning with GNNs
A survey on methods and applications of meta-learning with GNNsA survey on methods and applications of meta-learning with GNNs
A survey on methods and applications of meta-learning with GNNsShreya Goyal
 
Machine Learning with Go
Machine Learning with GoMachine Learning with Go
Machine Learning with GoJames Bowman
 

What's hot (13)

Investigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysisInvestigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysis
 
Home Page Live(Www2007)
Home Page Live(Www2007)Home Page Live(Www2007)
Home Page Live(Www2007)
 
Fuzzy c means clustering protocol for wireless sensor networks
Fuzzy c means clustering protocol for wireless sensor networksFuzzy c means clustering protocol for wireless sensor networks
Fuzzy c means clustering protocol for wireless sensor networks
 
Finding Neighbors in Images Represented By Quadtree
Finding Neighbors in Images Represented By QuadtreeFinding Neighbors in Images Represented By Quadtree
Finding Neighbors in Images Represented By Quadtree
 
Bcs 011
Bcs 011Bcs 011
Bcs 011
 
What is Odd about Binary Parseval Frames
What is Odd about Binary Parseval FramesWhat is Odd about Binary Parseval Frames
What is Odd about Binary Parseval Frames
 
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...
AN EMPIRICAL COMPARISON OF WEIGHTING FUNCTIONS FOR MULTI-LABEL DISTANCEWEIGHT...
 
The Relation between a Framework for Collaborative Ontology Engineering and N...
The Relation between a Framework for Collaborative Ontology Engineering and N...The Relation between a Framework for Collaborative Ontology Engineering and N...
The Relation between a Framework for Collaborative Ontology Engineering and N...
 
Impquest2
Impquest2Impquest2
Impquest2
 
Self-organizing map
Self-organizing mapSelf-organizing map
Self-organizing map
 
A survey on methods and applications of meta-learning with GNNs
A survey on methods and applications of meta-learning with GNNsA survey on methods and applications of meta-learning with GNNs
A survey on methods and applications of meta-learning with GNNs
 
Fuzzy OWL-2 Annotation for MetOcean Ontology
Fuzzy OWL-2 Annotation for MetOcean OntologyFuzzy OWL-2 Annotation for MetOcean Ontology
Fuzzy OWL-2 Annotation for MetOcean Ontology
 
Machine Learning with Go
Machine Learning with GoMachine Learning with Go
Machine Learning with Go
 

Viewers also liked

Exploring Identities of Bengali - Indian Women in Jhumpa Lahiri's The Namesake
Exploring Identities of Bengali - Indian Women in Jhumpa Lahiri's The NamesakeExploring Identities of Bengali - Indian Women in Jhumpa Lahiri's The Namesake
Exploring Identities of Bengali - Indian Women in Jhumpa Lahiri's The Namesakeemkrispi
 
The Term of " Diaspora ".
The Term of " Diaspora ".The Term of " Diaspora ".
The Term of " Diaspora ".Ankita Gohel
 
Diaspora & Hybridity
Diaspora & Hybridity  Diaspora & Hybridity
Diaspora & Hybridity Ahsan Ovi
 
Of Love and Betrayal, Sin and Redemption, Exile and Return: Recapturing the S...
Of Love and Betrayal, Sin and Redemption, Exile and Return: Recapturing the S...Of Love and Betrayal, Sin and Redemption, Exile and Return: Recapturing the S...
Of Love and Betrayal, Sin and Redemption, Exile and Return: Recapturing the S...iosrjce
 
Diaspora powerpoint avp
Diaspora powerpoint avpDiaspora powerpoint avp
Diaspora powerpoint avpCassBrion
 
Women in the diaspora
Women in the diasporaWomen in the diaspora
Women in the diasporaDolly Bhasin
 
Hybridity in Postcolonialism
Hybridity in PostcolonialismHybridity in Postcolonialism
Hybridity in PostcolonialismAnis Zulaikha
 

Viewers also liked (12)

Exploring Identities of Bengali - Indian Women in Jhumpa Lahiri's The Namesake
Exploring Identities of Bengali - Indian Women in Jhumpa Lahiri's The NamesakeExploring Identities of Bengali - Indian Women in Jhumpa Lahiri's The Namesake
Exploring Identities of Bengali - Indian Women in Jhumpa Lahiri's The Namesake
 
Dissertation
DissertationDissertation
Dissertation
 
The Term of " Diaspora ".
The Term of " Diaspora ".The Term of " Diaspora ".
The Term of " Diaspora ".
 
Diaspora & Hybridity
Diaspora & Hybridity  Diaspora & Hybridity
Diaspora & Hybridity
 
Diaspora
DiasporaDiaspora
Diaspora
 
Of Love and Betrayal, Sin and Redemption, Exile and Return: Recapturing the S...
Of Love and Betrayal, Sin and Redemption, Exile and Return: Recapturing the S...Of Love and Betrayal, Sin and Redemption, Exile and Return: Recapturing the S...
Of Love and Betrayal, Sin and Redemption, Exile and Return: Recapturing the S...
 
The namesake
The namesake The namesake
The namesake
 
Diaspora powerpoint avp
Diaspora powerpoint avpDiaspora powerpoint avp
Diaspora powerpoint avp
 
Diaspora as Postcolonial context
Diaspora as Postcolonial contextDiaspora as Postcolonial context
Diaspora as Postcolonial context
 
Women in the diaspora
Women in the diasporaWomen in the diaspora
Women in the diaspora
 
diaspora
diasporadiaspora
diaspora
 
Hybridity in Postcolonialism
Hybridity in PostcolonialismHybridity in Postcolonialism
Hybridity in Postcolonialism
 

Similar to Namesake

論文紹介:Learning With Neighbor Consistency for Noisy Labels
論文紹介:Learning With Neighbor Consistency for Noisy Labels論文紹介:Learning With Neighbor Consistency for Noisy Labels
論文紹介:Learning With Neighbor Consistency for Noisy LabelsToru Tamaki
 
Paper Explained: Understanding the wiring evolution in differentiable neural ...
Paper Explained: Understanding the wiring evolution in differentiable neural ...Paper Explained: Understanding the wiring evolution in differentiable neural ...
Paper Explained: Understanding the wiring evolution in differentiable neural ...Devansh16
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
 
Hierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamHierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamkevig
 
Hierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamHierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamkevig
 
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptxthanhdowork
 
A simple and fast exact clustering algorithm defined for complex
A simple and fast exact clustering algorithm defined for complexA simple and fast exact clustering algorithm defined for complex
A simple and fast exact clustering algorithm defined for complexAlexander Decker
 
A simple and fast exact clustering algorithm defined for complex
A simple and fast exact clustering algorithm defined for complexA simple and fast exact clustering algorithm defined for complex
A simple and fast exact clustering algorithm defined for complexAlexander Decker
 
Efficiency of TreeMatch Algorithm in XML Tree Pattern Matching
Efficiency of TreeMatch Algorithm in XML Tree Pattern  MatchingEfficiency of TreeMatch Algorithm in XML Tree Pattern  Matching
Efficiency of TreeMatch Algorithm in XML Tree Pattern MatchingIOSR Journals
 
Mapping Subsets of Scholarly Information
Mapping Subsets of Scholarly InformationMapping Subsets of Scholarly Information
Mapping Subsets of Scholarly InformationPaul Houle
 
Tatyana Makhalova - 2015 - News clustering approach based on discourse text s...
Tatyana Makhalova - 2015 - News clustering approach based on discourse text s...Tatyana Makhalova - 2015 - News clustering approach based on discourse text s...
Tatyana Makhalova - 2015 - News clustering approach based on discourse text s...Association for Computational Linguistics
 
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...ijseajournal
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...ijcseit
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ijcseit
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...ijcseit
 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...IJERA Editor
 
FAST DETECTION OF DDOS ATTACKS USING NON-ADAPTIVE GROUP TESTING
FAST DETECTION OF DDOS ATTACKS USING NON-ADAPTIVE GROUP TESTINGFAST DETECTION OF DDOS ATTACKS USING NON-ADAPTIVE GROUP TESTING
FAST DETECTION OF DDOS ATTACKS USING NON-ADAPTIVE GROUP TESTINGIJNSA Journal
 
Using spectral radius ratio for node degree
Using spectral radius ratio for node degreeUsing spectral radius ratio for node degree
Using spectral radius ratio for node degreeIJCNCJournal
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clusteringIAEME Publication
 

Similar to Namesake (20)

論文紹介:Learning With Neighbor Consistency for Noisy Labels
論文紹介:Learning With Neighbor Consistency for Noisy Labels論文紹介:Learning With Neighbor Consistency for Noisy Labels
論文紹介:Learning With Neighbor Consistency for Noisy Labels
 
Paper Explained: Understanding the wiring evolution in differentiable neural ...
Paper Explained: Understanding the wiring evolution in differentiable neural ...Paper Explained: Understanding the wiring evolution in differentiable neural ...
Paper Explained: Understanding the wiring evolution in differentiable neural ...
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
 
Hierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamHierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a stream
 
Hierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a streamHierarchical topics in texts generated by a stream
Hierarchical topics in texts generated by a stream
 
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
240401_JW_labseminar[LINE: Large-scale Information Network Embeddin].pptx
 
A simple and fast exact clustering algorithm defined for complex
A simple and fast exact clustering algorithm defined for complexA simple and fast exact clustering algorithm defined for complex
A simple and fast exact clustering algorithm defined for complex
 
A simple and fast exact clustering algorithm defined for complex
A simple and fast exact clustering algorithm defined for complexA simple and fast exact clustering algorithm defined for complex
A simple and fast exact clustering algorithm defined for complex
 
Entropy 19-00079
Entropy 19-00079Entropy 19-00079
Entropy 19-00079
 
Efficiency of TreeMatch Algorithm in XML Tree Pattern Matching
Efficiency of TreeMatch Algorithm in XML Tree Pattern  MatchingEfficiency of TreeMatch Algorithm in XML Tree Pattern  Matching
Efficiency of TreeMatch Algorithm in XML Tree Pattern Matching
 
Mapping Subsets of Scholarly Information
Mapping Subsets of Scholarly InformationMapping Subsets of Scholarly Information
Mapping Subsets of Scholarly Information
 
Tatyana Makhalova - 2015 - News clustering approach based on discourse text s...
Tatyana Makhalova - 2015 - News clustering approach based on discourse text s...Tatyana Makhalova - 2015 - News clustering approach based on discourse text s...
Tatyana Makhalova - 2015 - News clustering approach based on discourse text s...
 
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...
 
FAST DETECTION OF DDOS ATTACKS USING NON-ADAPTIVE GROUP TESTING
FAST DETECTION OF DDOS ATTACKS USING NON-ADAPTIVE GROUP TESTINGFAST DETECTION OF DDOS ATTACKS USING NON-ADAPTIVE GROUP TESTING
FAST DETECTION OF DDOS ATTACKS USING NON-ADAPTIVE GROUP TESTING
 
Using spectral radius ratio for node degree
Using spectral radius ratio for node degreeUsing spectral radius ratio for node degree
Using spectral radius ratio for node degree
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clustering
 

Recently uploaded

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Namesake

  • 1. Two Birds with One Stone: A Graph-based Framework for Disambiguating and Tagging People Names in Web Search Lili Jiang1, Jianyong Wang2, Ning An3, Shengyuan Wang2, Jian Zhan2, Lian Li1 1School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China 2Department of Computer Science and Technology, Tsinghua University, Beijing, China 3Oracle Corporation, Nashua, NH, USA jianglili06@lzu.cn {jianyong,wwssyy,zhanjian}@tsinghua.edu.cn;ning.an@oracle.com;lil@lzu.edu.cn Presenter: Kuan-HuaHuo (霍冠樺) Date: 2010/04/13 WWW 2009 MADRID! Poster Sessions: Friday, April 24, 2009 1
  • 2. 2 Outline INTRODUCTION THE FRAMEWORK FOR WEB PEOPLE NAME DISAMBIGUATION AND TAGGING Graph Modeling Clustering Algorithm Tagging a Namesake EXPERIMENTAL EVALUATION CONCLUSION 2
  • 3. INTRODUCTION (1/2) We have observed that people tag information, such as locations and e-mail addresses, appears to be informative, and a combination of such tags can almost identify a unique target people. 3
  • 4. INTRODUCTION (2/2) The contributions of this work include: a) A novel weighted graph representation was proposed to model the relationships between tags; b) An effective graph clustering algorithm was devised to disambiguate and tag people names; c) An extensive performance study was conducted, which shows that the proposed framework achieves much better performance than previous approaches. 4
  • 5. THE FRAMEWORK FOR WEB PEOPLE NAME DISAMBIGUATION AND TAGGING Graph Modeling Clustering Algorithm Tagging a Namesake 5
  • 6. Graph Modeling (1/6) Document corpus D = {d1, d2, . . . , dk} be the top k returned results from a search engine. We extract seven types of tags T = {people name, organization, location, email address, phone number, date, occupation} from chunk windows around the query name in each cleaned document A = S7 i=1, is a union of seven tag sets where Ti contains the non-repetitive tags with the type 6
  • 7. Graph Modeling (2/6) Regarding a people name, the proposed framework views the tag corpus A as an undirected labeled graph G=(V,E), The node set V = {v1, v2, . . .} is a set of unique tags. Each edge (vi, vj) 2 E represents that the tags cor-responding to viand vj co-occur in a same document in D 7
  • 8.
  • 9. Nodes 3, 4 and 5: bridge-tag.
  • 10.
  • 11. Graph Modeling (5/6) Node Weighting The higher the number of unique tags of a certain type, the smaller the weight of each node with this tag type. v denotes any tag with the type ti2 T. |A| is the number of non-repetitive tags in D |Ti| is the number of unique tags of the type “ti”. 10
  • 12. Graph Modeling (6/6) Edge Weighting Assuming the conditional independence between any two unique tags, we have the joint probability for two connected nodes v and w, p(vw) = p(v)p(w). Then the edge between v and w, is weighted as where n is the number of documents in which tags v and w co-occur, pi(v) is the probability distribution obtained from the frequency of tag v divided by the total number of all tag occurrences in the ith document di. 11
  • 13. Clustering Algorithm (1/2) We develop a two-step clustering approach for the problem of name disambiguation. The first step: To identify the complete set of micro-clusters (i.e., the maximal clique subgraphs). The second step: To use a single link clustering method to further cluster the set of micro-clusters generated from the previous step into a final set of macro-clusters. 12
  • 14. Clustering Algorithm (2/2) Each micro-cluster M is stored as a vector of tags The similarity between any two micro-clusters, ConnectivityStrength, is estimated using cosine distance. The weight of any tag v in is defined by the following Cohesion formula. 13
  • 15. Tagging a Namesake Tags with higher Cohesion weight in a cluster are selected to describe the corresponding namesake effectively. 14
  • 16. EXPERIMENTAL EVALUATION (1/2) The measures used in the experiments are FEB (the Extended B-cubed measure based on Precision and Recall) and FPI (the harmonic mean of Purity and Inverse Purity) [1]. The threshold for ConnectivityStrength used in clustering was empirically estimated based on the training data. 15
  • 17. EXPERIMENTAL EVALUATION (2/2) The experimental results show that the approach achieves much better overall performance (measured by FEB and FPI ) in name disambiguation than the state-of-the-art systems. In addition, it can output a set of selected tags to describe each cluster (corresponding to a namesake). 16
  • 18. CONCLUSION (1/2) A novel weighted-graph based framework Type weights for tags and a new way (i.e., Cohesion) to measure the importance of a tag in an unsupervised clustering algorithm 17
  • 19. CONCLUSION (2/2) Experimental results show that the proposed ap-proach outperforms all the solutions in a recent Web people search task. In the future, we plan to apply the proposed framework in a real Web people search system. 18
  • 20. 19 ~ Thank you for your listening ~