SlideShare a Scribd company logo
1 of 26
Download to read offline
Unsupervised Learning
        of Social Networks
from a Multiple-Source News Corpus

             Hristo Tanev

          European Commission
          Joint Research Centre
            hristo.tanev@jrc.it
Introduction
Social networks provide an intuitive
picture of inferred relationships between
entities, such as people and organizations.
Social network analysis uses Social
Networks to identify underlying groups,
communication patterns, and other
information.
Manual construction of a social network is
very laborious task. Algorithms for
automatic detection of relations may be
used to save time and human efforts.
Introduction
We present an unsupervised methodology
for automatic learning of social networks
We use multiple-source syntactically
parsed news corpus.
In order to overcome the efficiency
problems which emerge from using
syntactic information on real-world data,
we put forward an efficient graph
matching algorithm.
Related work
Learning social networks from
Friend-Of-A-Friend links (Mika 2005)
or statistical co-occurrences
Disadvantage: cannot detect the
type of the relation
Related work
Support Vector Machines (SVM)
provide more accurate means for
relation extraction (Zelenko et.al.
2003)
Disadvantages:
• require a sufficient amount of annotated
  data
• each pair of named entities should be
  evaluated separately, which slows down
  the relation extraction
Related work
(Romano et.al. 2006) propose a generic
unsupervised method for learning of
syntactic patterns for relation extraction
Disadvantages:
• they use the Web as a training corpus, which
  makes the learning very slow
• they match each pattern against each
  sentence which is not efficient when matching
  many templates against a big corpus
Unsupervised learning of social
          networks
Our algorithm is unsupervised – it accepts on its
input one, two, or other small number of two-slot
seed syntactic templates which express certain
semantic relation.
The algorithm uses news clusters to learn new
syntactic patterns expressing the same semantic
relation.
When the patterns are learned we apply a novel
efficient methodology for pattern matching to
extract related person names from the text.
Extracted relations are aggregated in a social
network.
EMM news clusters
European Media Monitor downloads
news from different sources around
the clock.
Every day 4000-5000 English
language news are downloaded.
The news articles are grouped into
topic clusters.
Parsing the corpus
The training and the test corpus
consist of English-language news
articles from 200 sources.
Articles are parsed with a full
dependency parser, MiniPar.
                  meet
           subj           obj

                  in
         Bush              Blair

                  March
Learning patterns
Provide manually a very small
number of seed syntactic templates
which express the main relation.
For example, for the relation “X
supports Y” we use the syntactic
patterns:
   X    subj support obj Y
  X    subj praise obj Y
Learning patterns
Match these templates against the
news clusters in the corpus. Each
pair of person names which fill the
slots X and Y is called an anchor
pair.
From “Bush praised the Prime
Minister Hamid Karzai”, the
algorithm will extract the anchor
pair (X:Bush; Y:Hamid Karzai)
Learning patterns
Normalize the anchor pairs using
the information in the EMM
database.
After this step, the example anchor
pair will become (X:George W.
Bush; Y:Hamid Karzai).
Learning patterns
For each extracted anchor pair,
search in the same cluster all the
sentences where both names of the
anchor pair occur.
The assumption is that the same
relation will hold between the same
pairs of names in the whole news
cluster, since all articles in it have
the same topic.
Learning patterns
From all the sentences in which at least
one anchor pair appears, learn syntactic
pattern using our pattern-learning
algorithm similar to the General
Structure Learning algorithm (GSL)
described in (Szpektor et.al. 2006)
Example: X subj-agree-with Y
Each pattern obtains as a score the
number of different anchor pairs which
support it
Learning patterns
Pattern selection and filtering
• Filter out all templates which appear for
  less than 2 anchor pairs.
• Take out generic patterns like “X say Y”,
  “X have Y”, “X is Y”, etc. using a a
  predefined template list
Syntactic Network model
“Prodi met          “Berlusconi met
President Bush in   President Chirac”
September”
Syntactic Network model
Adding syntactic templates
Efficiency
The worst case time complexity of building
SyntNet is O(|w| log |w|), where |w| is the
number of the words in the parsed corpus
The worst case time complexity of the syntactic
matching algorithm is bounded by O((|s|+|t|)
(log MaxArcO)), where |s| is the number of the
sentences in the corpus, |t| is the number of the
templates, and the MaxArcO is the maximum
number of occurrences of an SyntNet arc, i.e. the
size of the maximal index set of a SyntNet arc
Evaluation schema

To evaluate our algorithm we learned syntactic
patterns for “meeting” and “support”
relationships between people
We evaluate the algorithm how well it captures
relationship between the top 33 VIP from our
database
We do not evaluate how it captures relation
mentions
If a specific relation (e.g. “meeting”) holds
between a pair of people X and Y, it is sufficient
that the algorithm finds at least one mention of
this relation between X and Y
Experiments
For paraphrase learning we used a training
corpus of 98'000 English-language news articles
clustered in 22'000 EMM topic clusters published
in the period 01/May/2006 – 03/Oct/2006.
For testing the method, we used 125'000
English-language news articles published in the
period 03/Oct/2006 – 31/Oct/2006.
To read the test corpus and the templates in the
memory and to build SyntNet+ it took 9 min and
3 sec. It took only 45 seconds to match the 101
syntactic templates against the test corpus of
about 1'080'000 parsed sentences.
We normalized extracted names using the EMM
database
Relationship extraction evaluation on the top
         33 VIP from the EMM DB
           Precision Recall       F1


           0.61       0.56        0.58
meeting


           0.57       0.10        0.17
support


           0.60       0.32        0.42
overall
Using the social network view
We run the PageRank algorithm on
the automatically extracted
“meeting” network and found the top
5 ranked people
We compared this ranking with
simple frequency-based people
ranking
Comparing two people ranking
            schemas
Pagerank         Frequency

C. Rice          G.W. Bush

G.W. Bush        T. Blair

V. Putin         C. Rice

E. Olmert        N. al-Maliki

T. Blair         S. Hussein
Conclusions and future work
We presented an unsupervised method for
social network learning from news clusters
We presented very efficient syntactic
pattern matching algorithm
Automatically learned social networks can
be used for some analyst tasks
In our future work we will try to consider
more types of relations
We consider learning and using more
abstract patterns
THANK YOU!

More Related Content

What's hot

DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...IJNSA Journal
 
Link prediction with the linkpred tool
Link prediction with the linkpred toolLink prediction with the linkpred tool
Link prediction with the linkpred toolRaf Guns
 
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING cscpconf
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network AnalysisFred Stutzman
 
712201907
712201907712201907
712201907IJRAT
 
IRJET- A Survey on Link Prediction Techniques
IRJET-  	  A Survey on Link Prediction TechniquesIRJET-  	  A Survey on Link Prediction Techniques
IRJET- A Survey on Link Prediction TechniquesIRJET Journal
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Editor IJARCET
 
08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)dnac
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...Daniel Katz
 
Interpreting sslar
Interpreting sslarInterpreting sslar
Interpreting sslarRatzman III
 
Modelling and Analyzing Complex Networks"
Modelling and Analyzing Complex Networks"Modelling and Analyzing Complex Networks"
Modelling and Analyzing Complex Networks"butest
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Mariana Damova, Ph.D
 
EXTRACTING ARABIC RELATIONS FROM THE WEB
EXTRACTING ARABIC RELATIONS FROM THE WEBEXTRACTING ARABIC RELATIONS FROM THE WEB
EXTRACTING ARABIC RELATIONS FROM THE WEBijcsit
 
02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collectiondnac
 
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measuresdnac
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networkseSAT Publishing House
 
06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)Duke Network Analysis Center
 
Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities inmoresmile
 

What's hot (20)

DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
 
Link prediction with the linkpred tool
Link prediction with the linkpred toolLink prediction with the linkpred tool
Link prediction with the linkpred tool
 
Ijetcas14 639
Ijetcas14 639Ijetcas14 639
Ijetcas14 639
 
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
712201907
712201907712201907
712201907
 
IRJET- A Survey on Link Prediction Techniques
IRJET-  	  A Survey on Link Prediction TechniquesIRJET-  	  A Survey on Link Prediction Techniques
IRJET- A Survey on Link Prediction Techniques
 
mlss
mlssmlss
mlss
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257
 
08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
 
Interpreting sslar
Interpreting sslarInterpreting sslar
Interpreting sslar
 
Modelling and Analyzing Complex Networks"
Modelling and Analyzing Complex Networks"Modelling and Analyzing Complex Networks"
Modelling and Analyzing Complex Networks"
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011
 
EXTRACTING ARABIC RELATIONS FROM THE WEB
EXTRACTING ARABIC RELATIONS FROM THE WEBEXTRACTING ARABIC RELATIONS FROM THE WEB
EXTRACTING ARABIC RELATIONS FROM THE WEB
 
02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collection
 
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networks
 
06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)
 
Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities in
 

Similar to Unsupervised Learning of a Social Network from a Multiple-Source News Corpus

IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAijistjournal
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations onijistjournal
 
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsIJCSIS Research Publications
 
Textual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNNTextual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNNRounak Dhaneriya
 
Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)IJERA Editor
 
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...CSCJournals
 
Using NLP to find contextual relationships between fashion houses
Using NLP to find contextual relationships between fashion housesUsing NLP to find contextual relationships between fashion houses
Using NLP to find contextual relationships between fashion housesSushant Shankar
 
A framework for emotion mining from text in online social networks(final)
A framework for emotion mining from text in online social networks(final)A framework for emotion mining from text in online social networks(final)
A framework for emotion mining from text in online social networks(final)es712
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Cuong Tran Van
 
Extraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity MiningExtraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity Miningiosrjce
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similaritypathsproject
 
Secured Ontology Mapping
Secured Ontology Mapping Secured Ontology Mapping
Secured Ontology Mapping dannyijwest
 
Automatic multiple choice question generation system for
Automatic multiple choice question generation system forAutomatic multiple choice question generation system for
Automatic multiple choice question generation system forAlexander Decker
 
Sentence compression via clustering of dependency graph nodes - NLP-KE 2012
Sentence compression via clustering of dependency graph nodes - NLP-KE 2012Sentence compression via clustering of dependency graph nodes - NLP-KE 2012
Sentence compression via clustering of dependency graph nodes - NLP-KE 2012Ayman El-Kilany
 
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector SpaceBeyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector SpaceVijay Prakash Dwivedi
 
Improving Question Answering by Bridging Linguistic Structures with Statistic...
Improving Question Answering by Bridging Linguistic Structures with Statistic...Improving Question Answering by Bridging Linguistic Structures with Statistic...
Improving Question Answering by Bridging Linguistic Structures with Statistic...Jinho Choi
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
 

Similar to Unsupervised Learning of a Social Network from a Multiple-Source News Corpus (20)

IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
 
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word Pairs
 
Textual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNNTextual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNN
 
Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)
 
Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
 
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
 
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
 
Using NLP to find contextual relationships between fashion houses
Using NLP to find contextual relationships between fashion housesUsing NLP to find contextual relationships between fashion houses
Using NLP to find contextual relationships between fashion houses
 
A framework for emotion mining from text in online social networks(final)
A framework for emotion mining from text in online social networks(final)A framework for emotion mining from text in online social networks(final)
A framework for emotion mining from text in online social networks(final)
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...
 
Extraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity MiningExtraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity Mining
 
E017252831
E017252831E017252831
E017252831
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
 
Secured Ontology Mapping
Secured Ontology Mapping Secured Ontology Mapping
Secured Ontology Mapping
 
Automatic multiple choice question generation system for
Automatic multiple choice question generation system forAutomatic multiple choice question generation system for
Automatic multiple choice question generation system for
 
Sentence compression via clustering of dependency graph nodes - NLP-KE 2012
Sentence compression via clustering of dependency graph nodes - NLP-KE 2012Sentence compression via clustering of dependency graph nodes - NLP-KE 2012
Sentence compression via clustering of dependency graph nodes - NLP-KE 2012
 
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector SpaceBeyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
 
Improving Question Answering by Bridging Linguistic Structures with Statistic...
Improving Question Answering by Bridging Linguistic Structures with Statistic...Improving Question Answering by Bridging Linguistic Structures with Statistic...
Improving Question Answering by Bridging Linguistic Structures with Statistic...
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Unsupervised Learning of a Social Network from a Multiple-Source News Corpus

  • 1. Unsupervised Learning of Social Networks from a Multiple-Source News Corpus Hristo Tanev European Commission Joint Research Centre hristo.tanev@jrc.it
  • 2. Introduction Social networks provide an intuitive picture of inferred relationships between entities, such as people and organizations. Social network analysis uses Social Networks to identify underlying groups, communication patterns, and other information. Manual construction of a social network is very laborious task. Algorithms for automatic detection of relations may be used to save time and human efforts.
  • 3. Introduction We present an unsupervised methodology for automatic learning of social networks We use multiple-source syntactically parsed news corpus. In order to overcome the efficiency problems which emerge from using syntactic information on real-world data, we put forward an efficient graph matching algorithm.
  • 4. Related work Learning social networks from Friend-Of-A-Friend links (Mika 2005) or statistical co-occurrences Disadvantage: cannot detect the type of the relation
  • 5. Related work Support Vector Machines (SVM) provide more accurate means for relation extraction (Zelenko et.al. 2003) Disadvantages: • require a sufficient amount of annotated data • each pair of named entities should be evaluated separately, which slows down the relation extraction
  • 6. Related work (Romano et.al. 2006) propose a generic unsupervised method for learning of syntactic patterns for relation extraction Disadvantages: • they use the Web as a training corpus, which makes the learning very slow • they match each pattern against each sentence which is not efficient when matching many templates against a big corpus
  • 7. Unsupervised learning of social networks Our algorithm is unsupervised – it accepts on its input one, two, or other small number of two-slot seed syntactic templates which express certain semantic relation. The algorithm uses news clusters to learn new syntactic patterns expressing the same semantic relation. When the patterns are learned we apply a novel efficient methodology for pattern matching to extract related person names from the text. Extracted relations are aggregated in a social network.
  • 8. EMM news clusters European Media Monitor downloads news from different sources around the clock. Every day 4000-5000 English language news are downloaded. The news articles are grouped into topic clusters.
  • 9. Parsing the corpus The training and the test corpus consist of English-language news articles from 200 sources. Articles are parsed with a full dependency parser, MiniPar. meet subj obj in Bush Blair March
  • 10. Learning patterns Provide manually a very small number of seed syntactic templates which express the main relation. For example, for the relation “X supports Y” we use the syntactic patterns: X subj support obj Y X subj praise obj Y
  • 11. Learning patterns Match these templates against the news clusters in the corpus. Each pair of person names which fill the slots X and Y is called an anchor pair. From “Bush praised the Prime Minister Hamid Karzai”, the algorithm will extract the anchor pair (X:Bush; Y:Hamid Karzai)
  • 12. Learning patterns Normalize the anchor pairs using the information in the EMM database. After this step, the example anchor pair will become (X:George W. Bush; Y:Hamid Karzai).
  • 13. Learning patterns For each extracted anchor pair, search in the same cluster all the sentences where both names of the anchor pair occur. The assumption is that the same relation will hold between the same pairs of names in the whole news cluster, since all articles in it have the same topic.
  • 14. Learning patterns From all the sentences in which at least one anchor pair appears, learn syntactic pattern using our pattern-learning algorithm similar to the General Structure Learning algorithm (GSL) described in (Szpektor et.al. 2006) Example: X subj-agree-with Y Each pattern obtains as a score the number of different anchor pairs which support it
  • 15. Learning patterns Pattern selection and filtering • Filter out all templates which appear for less than 2 anchor pairs. • Take out generic patterns like “X say Y”, “X have Y”, “X is Y”, etc. using a a predefined template list
  • 16. Syntactic Network model “Prodi met “Berlusconi met President Bush in President Chirac” September”
  • 19. Efficiency The worst case time complexity of building SyntNet is O(|w| log |w|), where |w| is the number of the words in the parsed corpus The worst case time complexity of the syntactic matching algorithm is bounded by O((|s|+|t|) (log MaxArcO)), where |s| is the number of the sentences in the corpus, |t| is the number of the templates, and the MaxArcO is the maximum number of occurrences of an SyntNet arc, i.e. the size of the maximal index set of a SyntNet arc
  • 20. Evaluation schema To evaluate our algorithm we learned syntactic patterns for “meeting” and “support” relationships between people We evaluate the algorithm how well it captures relationship between the top 33 VIP from our database We do not evaluate how it captures relation mentions If a specific relation (e.g. “meeting”) holds between a pair of people X and Y, it is sufficient that the algorithm finds at least one mention of this relation between X and Y
  • 21. Experiments For paraphrase learning we used a training corpus of 98'000 English-language news articles clustered in 22'000 EMM topic clusters published in the period 01/May/2006 – 03/Oct/2006. For testing the method, we used 125'000 English-language news articles published in the period 03/Oct/2006 – 31/Oct/2006. To read the test corpus and the templates in the memory and to build SyntNet+ it took 9 min and 3 sec. It took only 45 seconds to match the 101 syntactic templates against the test corpus of about 1'080'000 parsed sentences. We normalized extracted names using the EMM database
  • 22. Relationship extraction evaluation on the top 33 VIP from the EMM DB Precision Recall F1 0.61 0.56 0.58 meeting 0.57 0.10 0.17 support 0.60 0.32 0.42 overall
  • 23. Using the social network view We run the PageRank algorithm on the automatically extracted “meeting” network and found the top 5 ranked people We compared this ranking with simple frequency-based people ranking
  • 24. Comparing two people ranking schemas Pagerank Frequency C. Rice G.W. Bush G.W. Bush T. Blair V. Putin C. Rice E. Olmert N. al-Maliki T. Blair S. Hussein
  • 25. Conclusions and future work We presented an unsupervised method for social network learning from news clusters We presented very efficient syntactic pattern matching algorithm Automatically learned social networks can be used for some analyst tasks In our future work we will try to consider more types of relations We consider learning and using more abstract patterns