SlideShare a Scribd company logo
Unsupervised Learning
        of Social Networks
from a Multiple-Source News Corpus

             Hristo Tanev

          European Commission
          Joint Research Centre
            hristo.tanev@jrc.it
Introduction
Social networks provide an intuitive
picture of inferred relationships between
entities, such as people and organizations.
Social network analysis uses Social
Networks to identify underlying groups,
communication patterns, and other
information.
Manual construction of a social network is
very laborious task. Algorithms for
automatic detection of relations may be
used to save time and human efforts.
Introduction
We present an unsupervised methodology
for automatic learning of social networks
We use multiple-source syntactically
parsed news corpus.
In order to overcome the efficiency
problems which emerge from using
syntactic information on real-world data,
we put forward an efficient graph
matching algorithm.
Related work
Learning social networks from
Friend-Of-A-Friend links (Mika 2005)
or statistical co-occurrences
Disadvantage: cannot detect the
type of the relation
Related work
Support Vector Machines (SVM)
provide more accurate means for
relation extraction (Zelenko et.al.
2003)
Disadvantages:
• require a sufficient amount of annotated
  data
• each pair of named entities should be
  evaluated separately, which slows down
  the relation extraction
Related work
(Romano et.al. 2006) propose a generic
unsupervised method for learning of
syntactic patterns for relation extraction
Disadvantages:
• they use the Web as a training corpus, which
  makes the learning very slow
• they match each pattern against each
  sentence which is not efficient when matching
  many templates against a big corpus
Unsupervised learning of social
          networks
Our algorithm is unsupervised – it accepts on its
input one, two, or other small number of two-slot
seed syntactic templates which express certain
semantic relation.
The algorithm uses news clusters to learn new
syntactic patterns expressing the same semantic
relation.
When the patterns are learned we apply a novel
efficient methodology for pattern matching to
extract related person names from the text.
Extracted relations are aggregated in a social
network.
EMM news clusters
European Media Monitor downloads
news from different sources around
the clock.
Every day 4000-5000 English
language news are downloaded.
The news articles are grouped into
topic clusters.
Parsing the corpus
The training and the test corpus
consist of English-language news
articles from 200 sources.
Articles are parsed with a full
dependency parser, MiniPar.
                  meet
           subj           obj

                  in
         Bush              Blair

                  March
Learning patterns
Provide manually a very small
number of seed syntactic templates
which express the main relation.
For example, for the relation “X
supports Y” we use the syntactic
patterns:
   X    subj support obj Y
  X    subj praise obj Y
Learning patterns
Match these templates against the
news clusters in the corpus. Each
pair of person names which fill the
slots X and Y is called an anchor
pair.
From “Bush praised the Prime
Minister Hamid Karzai”, the
algorithm will extract the anchor
pair (X:Bush; Y:Hamid Karzai)
Learning patterns
Normalize the anchor pairs using
the information in the EMM
database.
After this step, the example anchor
pair will become (X:George W.
Bush; Y:Hamid Karzai).
Learning patterns
For each extracted anchor pair,
search in the same cluster all the
sentences where both names of the
anchor pair occur.
The assumption is that the same
relation will hold between the same
pairs of names in the whole news
cluster, since all articles in it have
the same topic.
Learning patterns
From all the sentences in which at least
one anchor pair appears, learn syntactic
pattern using our pattern-learning
algorithm similar to the General
Structure Learning algorithm (GSL)
described in (Szpektor et.al. 2006)
Example: X subj-agree-with Y
Each pattern obtains as a score the
number of different anchor pairs which
support it
Learning patterns
Pattern selection and filtering
• Filter out all templates which appear for
  less than 2 anchor pairs.
• Take out generic patterns like “X say Y”,
  “X have Y”, “X is Y”, etc. using a a
  predefined template list
Syntactic Network model
“Prodi met          “Berlusconi met
President Bush in   President Chirac”
September”
Syntactic Network model
Adding syntactic templates
Efficiency
The worst case time complexity of building
SyntNet is O(|w| log |w|), where |w| is the
number of the words in the parsed corpus
The worst case time complexity of the syntactic
matching algorithm is bounded by O((|s|+|t|)
(log MaxArcO)), where |s| is the number of the
sentences in the corpus, |t| is the number of the
templates, and the MaxArcO is the maximum
number of occurrences of an SyntNet arc, i.e. the
size of the maximal index set of a SyntNet arc
Evaluation schema

To evaluate our algorithm we learned syntactic
patterns for “meeting” and “support”
relationships between people
We evaluate the algorithm how well it captures
relationship between the top 33 VIP from our
database
We do not evaluate how it captures relation
mentions
If a specific relation (e.g. “meeting”) holds
between a pair of people X and Y, it is sufficient
that the algorithm finds at least one mention of
this relation between X and Y
Experiments
For paraphrase learning we used a training
corpus of 98'000 English-language news articles
clustered in 22'000 EMM topic clusters published
in the period 01/May/2006 – 03/Oct/2006.
For testing the method, we used 125'000
English-language news articles published in the
period 03/Oct/2006 – 31/Oct/2006.
To read the test corpus and the templates in the
memory and to build SyntNet+ it took 9 min and
3 sec. It took only 45 seconds to match the 101
syntactic templates against the test corpus of
about 1'080'000 parsed sentences.
We normalized extracted names using the EMM
database
Relationship extraction evaluation on the top
         33 VIP from the EMM DB
           Precision Recall       F1


           0.61       0.56        0.58
meeting


           0.57       0.10        0.17
support


           0.60       0.32        0.42
overall
Using the social network view
We run the PageRank algorithm on
the automatically extracted
“meeting” network and found the top
5 ranked people
We compared this ranking with
simple frequency-based people
ranking
Comparing two people ranking
            schemas
Pagerank         Frequency

C. Rice          G.W. Bush

G.W. Bush        T. Blair

V. Putin         C. Rice

E. Olmert        N. al-Maliki

T. Blair         S. Hussein
Conclusions and future work
We presented an unsupervised method for
social network learning from news clusters
We presented very efficient syntactic
pattern matching algorithm
Automatically learned social networks can
be used for some analyst tasks
In our future work we will try to consider
more types of relations
We consider learning and using more
abstract patterns
THANK YOU!

More Related Content

What's hot

DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
IJNSA Journal
 
Link prediction with the linkpred tool
Link prediction with the linkpred toolLink prediction with the linkpred tool
Link prediction with the linkpred tool
Raf Guns
 
Ijetcas14 639
Ijetcas14 639Ijetcas14 639
Ijetcas14 639
Iasir Journals
 
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
cscpconf
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
Fred Stutzman
 
712201907
712201907712201907
712201907
IJRAT
 
IRJET- A Survey on Link Prediction Techniques
IRJET-  	  A Survey on Link Prediction TechniquesIRJET-  	  A Survey on Link Prediction Techniques
IRJET- A Survey on Link Prediction Techniques
IRJET Journal
 
mlss
mlssmlss
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257
Editor IJARCET
 
08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)
dnac
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
Daniel Katz
 
Interpreting sslar
Interpreting sslarInterpreting sslar
Interpreting sslar
Ratzman III
 
Modelling and Analyzing Complex Networks"
Modelling and Analyzing Complex Networks"Modelling and Analyzing Complex Networks"
Modelling and Analyzing Complex Networks"
butest
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011
Mariana Damova, Ph.D
 
EXTRACTING ARABIC RELATIONS FROM THE WEB
EXTRACTING ARABIC RELATIONS FROM THE WEBEXTRACTING ARABIC RELATIONS FROM THE WEB
EXTRACTING ARABIC RELATIONS FROM THE WEB
ijcsit
 
02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collection
dnac
 
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures
dnac
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networks
eSAT Publishing House
 
06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)
Duke Network Analysis Center
 
Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities in
moresmile
 

What's hot (20)

DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
 
Link prediction with the linkpred tool
Link prediction with the linkpred toolLink prediction with the linkpred tool
Link prediction with the linkpred tool
 
Ijetcas14 639
Ijetcas14 639Ijetcas14 639
Ijetcas14 639
 
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
712201907
712201907712201907
712201907
 
IRJET- A Survey on Link Prediction Techniques
IRJET-  	  A Survey on Link Prediction TechniquesIRJET-  	  A Survey on Link Prediction Techniques
IRJET- A Survey on Link Prediction Techniques
 
mlss
mlssmlss
mlss
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257
 
08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
 
Interpreting sslar
Interpreting sslarInterpreting sslar
Interpreting sslar
 
Modelling and Analyzing Complex Networks"
Modelling and Analyzing Complex Networks"Modelling and Analyzing Complex Networks"
Modelling and Analyzing Complex Networks"
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011
 
EXTRACTING ARABIC RELATIONS FROM THE WEB
EXTRACTING ARABIC RELATIONS FROM THE WEBEXTRACTING ARABIC RELATIONS FROM THE WEB
EXTRACTING ARABIC RELATIONS FROM THE WEB
 
02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collection
 
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networks
 
06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)
 
Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities in
 

Similar to Unsupervised Learning of a Social Network from a Multiple-Source News Corpus

IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
ijistjournal
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
ijistjournal
 
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word Pairs
IJCSIS Research Publications
 
Textual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNNTextual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNN
Rounak Dhaneriya
 
Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)
IJERA Editor
 
Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
International Journal of Engineering Inventions www.ijeijournal.com
 
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
CSCJournals
 
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
IJET - International Journal of Engineering and Techniques
 
Using NLP to find contextual relationships between fashion houses
Using NLP to find contextual relationships between fashion housesUsing NLP to find contextual relationships between fashion houses
Using NLP to find contextual relationships between fashion houses
Sushant Shankar
 
A framework for emotion mining from text in online social networks(final)
A framework for emotion mining from text in online social networks(final)A framework for emotion mining from text in online social networks(final)
A framework for emotion mining from text in online social networks(final)
es712
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...
Cuong Tran Van
 
Extraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity MiningExtraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity Mining
iosrjce
 
E017252831
E017252831E017252831
E017252831
IOSR Journals
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
pathsproject
 
Secured Ontology Mapping
Secured Ontology Mapping Secured Ontology Mapping
Secured Ontology Mapping
dannyijwest
 
Automatic multiple choice question generation system for
Automatic multiple choice question generation system forAutomatic multiple choice question generation system for
Automatic multiple choice question generation system for
Alexander Decker
 
Sentence compression via clustering of dependency graph nodes - NLP-KE 2012
Sentence compression via clustering of dependency graph nodes - NLP-KE 2012Sentence compression via clustering of dependency graph nodes - NLP-KE 2012
Sentence compression via clustering of dependency graph nodes - NLP-KE 2012
Ayman El-Kilany
 
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector SpaceBeyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Vijay Prakash Dwivedi
 
Improving Question Answering by Bridging Linguistic Structures with Statistic...
Improving Question Answering by Bridging Linguistic Structures with Statistic...Improving Question Answering by Bridging Linguistic Structures with Statistic...
Improving Question Answering by Bridging Linguistic Structures with Statistic...
Jinho Choi
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
gerogepatton
 

Similar to Unsupervised Learning of a Social Network from a Multiple-Source News Corpus (20)

IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
 
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word Pairs
 
Textual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNNTextual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNN
 
Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)
 
Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
 
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
 
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
 
Using NLP to find contextual relationships between fashion houses
Using NLP to find contextual relationships between fashion housesUsing NLP to find contextual relationships between fashion houses
Using NLP to find contextual relationships between fashion houses
 
A framework for emotion mining from text in online social networks(final)
A framework for emotion mining from text in online social networks(final)A framework for emotion mining from text in online social networks(final)
A framework for emotion mining from text in online social networks(final)
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...
 
Extraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity MiningExtraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity Mining
 
E017252831
E017252831E017252831
E017252831
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
 
Secured Ontology Mapping
Secured Ontology Mapping Secured Ontology Mapping
Secured Ontology Mapping
 
Automatic multiple choice question generation system for
Automatic multiple choice question generation system forAutomatic multiple choice question generation system for
Automatic multiple choice question generation system for
 
Sentence compression via clustering of dependency graph nodes - NLP-KE 2012
Sentence compression via clustering of dependency graph nodes - NLP-KE 2012Sentence compression via clustering of dependency graph nodes - NLP-KE 2012
Sentence compression via clustering of dependency graph nodes - NLP-KE 2012
 
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector SpaceBeyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
 
Improving Question Answering by Bridging Linguistic Structures with Statistic...
Improving Question Answering by Bridging Linguistic Structures with Statistic...Improving Question Answering by Bridging Linguistic Structures with Statistic...
Improving Question Answering by Bridging Linguistic Structures with Statistic...
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
 

Recently uploaded

Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 

Recently uploaded (20)

Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 

Unsupervised Learning of a Social Network from a Multiple-Source News Corpus

  • 1. Unsupervised Learning of Social Networks from a Multiple-Source News Corpus Hristo Tanev European Commission Joint Research Centre hristo.tanev@jrc.it
  • 2. Introduction Social networks provide an intuitive picture of inferred relationships between entities, such as people and organizations. Social network analysis uses Social Networks to identify underlying groups, communication patterns, and other information. Manual construction of a social network is very laborious task. Algorithms for automatic detection of relations may be used to save time and human efforts.
  • 3. Introduction We present an unsupervised methodology for automatic learning of social networks We use multiple-source syntactically parsed news corpus. In order to overcome the efficiency problems which emerge from using syntactic information on real-world data, we put forward an efficient graph matching algorithm.
  • 4. Related work Learning social networks from Friend-Of-A-Friend links (Mika 2005) or statistical co-occurrences Disadvantage: cannot detect the type of the relation
  • 5. Related work Support Vector Machines (SVM) provide more accurate means for relation extraction (Zelenko et.al. 2003) Disadvantages: • require a sufficient amount of annotated data • each pair of named entities should be evaluated separately, which slows down the relation extraction
  • 6. Related work (Romano et.al. 2006) propose a generic unsupervised method for learning of syntactic patterns for relation extraction Disadvantages: • they use the Web as a training corpus, which makes the learning very slow • they match each pattern against each sentence which is not efficient when matching many templates against a big corpus
  • 7. Unsupervised learning of social networks Our algorithm is unsupervised – it accepts on its input one, two, or other small number of two-slot seed syntactic templates which express certain semantic relation. The algorithm uses news clusters to learn new syntactic patterns expressing the same semantic relation. When the patterns are learned we apply a novel efficient methodology for pattern matching to extract related person names from the text. Extracted relations are aggregated in a social network.
  • 8. EMM news clusters European Media Monitor downloads news from different sources around the clock. Every day 4000-5000 English language news are downloaded. The news articles are grouped into topic clusters.
  • 9. Parsing the corpus The training and the test corpus consist of English-language news articles from 200 sources. Articles are parsed with a full dependency parser, MiniPar. meet subj obj in Bush Blair March
  • 10. Learning patterns Provide manually a very small number of seed syntactic templates which express the main relation. For example, for the relation “X supports Y” we use the syntactic patterns: X subj support obj Y X subj praise obj Y
  • 11. Learning patterns Match these templates against the news clusters in the corpus. Each pair of person names which fill the slots X and Y is called an anchor pair. From “Bush praised the Prime Minister Hamid Karzai”, the algorithm will extract the anchor pair (X:Bush; Y:Hamid Karzai)
  • 12. Learning patterns Normalize the anchor pairs using the information in the EMM database. After this step, the example anchor pair will become (X:George W. Bush; Y:Hamid Karzai).
  • 13. Learning patterns For each extracted anchor pair, search in the same cluster all the sentences where both names of the anchor pair occur. The assumption is that the same relation will hold between the same pairs of names in the whole news cluster, since all articles in it have the same topic.
  • 14. Learning patterns From all the sentences in which at least one anchor pair appears, learn syntactic pattern using our pattern-learning algorithm similar to the General Structure Learning algorithm (GSL) described in (Szpektor et.al. 2006) Example: X subj-agree-with Y Each pattern obtains as a score the number of different anchor pairs which support it
  • 15. Learning patterns Pattern selection and filtering • Filter out all templates which appear for less than 2 anchor pairs. • Take out generic patterns like “X say Y”, “X have Y”, “X is Y”, etc. using a a predefined template list
  • 16. Syntactic Network model “Prodi met “Berlusconi met President Bush in President Chirac” September”
  • 19. Efficiency The worst case time complexity of building SyntNet is O(|w| log |w|), where |w| is the number of the words in the parsed corpus The worst case time complexity of the syntactic matching algorithm is bounded by O((|s|+|t|) (log MaxArcO)), where |s| is the number of the sentences in the corpus, |t| is the number of the templates, and the MaxArcO is the maximum number of occurrences of an SyntNet arc, i.e. the size of the maximal index set of a SyntNet arc
  • 20. Evaluation schema To evaluate our algorithm we learned syntactic patterns for “meeting” and “support” relationships between people We evaluate the algorithm how well it captures relationship between the top 33 VIP from our database We do not evaluate how it captures relation mentions If a specific relation (e.g. “meeting”) holds between a pair of people X and Y, it is sufficient that the algorithm finds at least one mention of this relation between X and Y
  • 21. Experiments For paraphrase learning we used a training corpus of 98'000 English-language news articles clustered in 22'000 EMM topic clusters published in the period 01/May/2006 – 03/Oct/2006. For testing the method, we used 125'000 English-language news articles published in the period 03/Oct/2006 – 31/Oct/2006. To read the test corpus and the templates in the memory and to build SyntNet+ it took 9 min and 3 sec. It took only 45 seconds to match the 101 syntactic templates against the test corpus of about 1'080'000 parsed sentences. We normalized extracted names using the EMM database
  • 22. Relationship extraction evaluation on the top 33 VIP from the EMM DB Precision Recall F1 0.61 0.56 0.58 meeting 0.57 0.10 0.17 support 0.60 0.32 0.42 overall
  • 23. Using the social network view We run the PageRank algorithm on the automatically extracted “meeting” network and found the top 5 ranked people We compared this ranking with simple frequency-based people ranking
  • 24. Comparing two people ranking schemas Pagerank Frequency C. Rice G.W. Bush G.W. Bush T. Blair V. Putin C. Rice E. Olmert N. al-Maliki T. Blair S. Hussein
  • 25. Conclusions and future work We presented an unsupervised method for social network learning from news clusters We presented very efficient syntactic pattern matching algorithm Automatically learned social networks can be used for some analyst tasks In our future work we will try to consider more types of relations We consider learning and using more abstract patterns