SlideShare a Scribd company logo
Department of Informatics, University of Rijeka
Radmile Matejčić 2, 51000 Rijeka, Croatia
Tel.: + 385 51 584 700 Fax: + 385 51 584 749
www.langnet.uniri.hr
Extracting keywords
from texts
Sanda Martinčić-Ipšić
smarti@uniri.hr
langnet.uniri.hr
Data Science 4.0 Conference
Introduction 1/4
Keyword extraction
automatically identify a that best describe the
document
the most representative features (terms) of the source text
huge amount of multi topic textual sources
applications:
summarization, indexing, labeling, categorization, clustering
document-oriented task (single documents)
collection-oriented task (i.e. entire web site)
more demanding
1
Data Science 4.0 Conference
2
How KE works?
Data Science 4.0 Conference
Introduction 2/4
keyword extraction is traditionally
based on statistical methods
learning from
or graph, since the number of words in isolated documents is limited
the source (document, text) is modelled in a network
words are nodes and their co-occurrence is represented with links
links can model different linguistic relations (syntactic; semantic)
3
directed and weighted co-occurrence network for two
sentences:
“Selectivity-based keyword extraction method is fully
unsupervised. It consists of two phases: keyword extraction
and keyword expansion.”
Data Science 4.0 Conference
Introduction 3/4
can exploit:
or level measures:
coreness, clustering coefficient
PageRank motivated ranking score or
HITS motivated hub and authority score
communities
strength, centrality
and eigenvector
centrality
the average strength of the node
4
Data Science 4.0 Conference
S. Beliga, A. Meštrović, S. Martinčić-Ipšić. "An Overview of Graph-Based
Keyword Extraction Methods and Approaches". Journal of Information
and Organizational Sciences, 39(1), pp. 1-20, 2015.
Complex network analysis 1/2
Directed and weighted network [Newman, 2010]
N – number of nodes
M – number of links
wij – weight for link which connects nodes i and j
Ki – node degree: number of links incident upon a node
5
𝒅𝒄𝒊
(𝒊𝒏/𝒐𝒖𝒕)
=
𝒌𝒊
(𝒊𝒏/𝒐𝒖𝒕)
𝑵 − 𝟏
𝒄𝒄𝒊 =
𝑵 − 𝟏
σ𝒊≠𝒋 𝒅𝒊𝒋
𝒃𝒄𝒊 =
σ𝒊≠𝒋≠𝒌
𝝈𝒋𝒌(𝒊)
𝝈𝒋𝒌
(𝑵 − 𝟏)(𝑵 − 𝟐)
Data Science 4.0 Conference
3. Complex network analysis 1/2
strength and selectivity
selectivity
average weight distribution on the links of the single node
generalized selectivity
6
𝒔𝒊
𝒊𝒏/𝒐𝒖𝒕
= ෍
𝒋
𝒘𝒋𝒊/𝒊𝒋
𝒈𝒆𝒊
α 𝒊𝒏/𝒐𝒖𝒕
= 𝒌𝒊
𝒊𝒏/𝒐𝒖𝒕 𝒔𝒊
𝒊𝒏/𝒐𝒖𝒕
𝒌𝒊
𝒊𝒏/𝒐𝒖𝒕
α
[Masucci &
Rodgers, 2009]𝒆𝒊
𝒊𝒏/𝒐𝒖𝒕
=
𝒔𝒊
𝒊𝒏/𝒐𝒖𝒕
𝒌𝒊
𝒊𝒏/𝒐𝒖𝒕
adapted from [Opsahl et al., 2010] in
[Beliga, Meštrović, Martinčić-Ipšić, 2016]
α∈ [0,1] – prefers nodes with higher
degrees
α ∈ [1, +∞] – prefers nodes with
lower degrees
𝛼 = 1– node strength
Data Science 4.0 Conference
Data: Croatian and English texts
collections with manually annotated keywords by
human experts
HINA Croatian News Agency
Wikipedia: technical reports covering different aspects of CS
8
HINA WIKI-20
8 human experts 15 teams (2 undergrad. stud.)
60 texts 20 texts
Croatian English
10 keywords on AVG per human 5,7 keywords on AVG per team
human consistency≈ 39.5% team consistency ≈ 30.5%
Data Science 4.0 Conference
Network construction
network per document
from all documents (collection extraction)
each word
two words are linked if they are adjacent in the sentence
proportional to the overall co-occurrence frequencies of
the corresponding word pairs
9
Data Science 4.0 Conference
Comparison of centrality measures
10
DATA
SET
MEASURE
TOP 5 TOP 10
Ravg Pavg F1avg Ravg Pavg F1avg
HINA
Closeness: 𝑐𝑐𝑖 4.3 39.3 7.2 8.4 42.9 12.8
Betweenness: 𝑏𝑐𝑖 3.6 40.8 6.5 9.2 52.8 13.9
in/out-degree: 𝑑𝑐𝑖
𝑖𝑛/𝑜𝑢𝑡 23.0 38.4 17.0 62.1 25.1 26.6
in/out-selectivity: 𝑒𝑖
𝑖𝑛/𝑜𝑢𝑡 14.0 35.9 19.3 16.8 38.0 22.1
WIKI-20
Closeness: 𝑐𝑐𝑖 1.6 42.5 3.1 5.3 63.3 9.7
Betweenness: 𝑏𝑐𝑖 0.3 10.0 0.5 2.6 55.0 4.9
in/out-degree: 𝑑𝑐𝑖
𝑖𝑛/𝑜𝑢𝑡 0.1 5.0 0.3 2.5 57.5 4.8
in/out-selectivity: 𝑒𝑖
𝑖𝑛/𝑜𝑢𝑡 2.3 17.8 3.9 8.6 18.2 10.8
TOP 10 ranked words:
in-degree: to be, and, in, on, which, for, but, this, self, of
betweenness: to be, and, in, on, self, this, which, for, Croatian, but
in/out selectivity: Bratislava, area, Tuesday, inland, revolution, verification, decade, Balkan,
freedom, Universe Data Science 4.0 Conference
Selectivity
differentiate between two types of nodes (words)
high strength and high degree values
: stop-words, conjunctions, prepositions
high strength and low degree
: nouns, adjectives, verbs and
words that are part of collocations, keyphrases, names, etc.
efficiently
extract
11
Data Science 4.0 Conference
SBKE method
12
Data Science 4.0 Conference
SBKE method
13
Data Science 4.0 Conference
14
Data Science 4.0 Conference
SBKE with Generalized Selectivity
SBKE method can be easily adjusted with new
measures
generalized selectivity measure: 𝑔𝑒𝑖
α 𝑖𝑛/𝑜𝑢𝑡
α adjust the relationship between the node degree and strength –
provide different set of keywords
Find the α parameter which best fits the experimental settings
according to 𝐼𝐼𝐶 𝑂 scores
tuning to the data set
15
Data Science 4.0 Conference
SBKE with Generalized Selectivity
The performance of the first step (SET1) of the SBKE method in terms of 𝐼𝐼𝐶 𝑂 scores for the
TOP 5 and TOP 10 keywords measured for generalized selectivity 𝑔𝑒𝑖
α 𝑖𝑛/𝑜𝑢𝑡
with different
values of parameter α (plotted as lines) and selectivity (𝑒
𝑖𝑛/𝑜𝑢𝑡
) values (plotted as dots on the
y-axis) for HINA and WIKI-20 datasets
16
Data Science 4.0 Conference
Document vs. Collection-Oriented
Extraction Results
17
HINA
60 individual networks integral network
Ravg Pavg F1avg R P F1
SET1 18.90 39.35 23.60 30.71 35.80 33.06
SET2 19.74 39.15 24.76 33.46 33.97 33.71
SET3 29.55 22.96 22.71 60.47 19.89 28.89
WIKI-
20
20 individual
networks
(ei
in/out >1)
integral network
( ei
in/out >1)
integral network
( ei
in/out >2)
Ravg Pavg F1avg R P F1 R P F1
SET1 59.06 13.40 21.48 76.17 19.08 30.51 32.71 30.30 31.46
SET2 60.12 12.33 20.46 76.64 18.87 30.29 36.45 31.97 34.06
SET3 62.17 12.05 20.19 77.50 15.75 26.18 36.68 32.04 34.21
Data Science 4.0 Conference
Keyword Extraction from Parallel
Abstracts
scientific journal “Underground Mining Engineering”
published by the University of Belgrade, Faculty of Mining and Geology
papers from diffrent fields of mining, geology, and geosciences (2002-2014)
bilinguall papers written in Serbian and in English
available online as aligned parallel text in the
Biblisha digital library: http://jerteh.rs/biblisha/ListaDokumenata.aspx?JCID=2&lng=en
18
Data Science 4.0 Conference
Data
Experiment
collection of 50 bilingual documents
with approximately 4,800 aligned sentences
originally written in Serbian and then translated into English by professionals
translators
Bilingual Serbian-English KE dataset introduced in this paper is
publicly available from http://langnet.uniri.hr/resources.html
19
Data Science 4.0 Conference
S. Beliga, Kitanović, O., Stanković, R., S. Martinčić-Ipšić.
"Keyword Extraction from Parallel Abstracts of Scientific
Publications".
Semantic Keyword-Based Search on Structured Data Sources,
LNCS 10546, Springer, Gdańsk, Poland, pp. 44-45, 2018.
Results 1/2
Evaluation according to the
set of annotated keywords; provided by the author(s) of the abstracts
set of annotated keywords without OOV words
𝑅, 𝑃 and 𝐹1 score achieves higher values for KW without OOV words
this is expected because people tagged KW that did not apear in texts
~18% of the KWs in English
~19% of the KWs in Serbian
all scores better for Serbian than for English 20
~3%
OOV
Previous research:
holds for
Croatian language.
Data Science 4.0 Conference
Results 2/2
Open questions??:
performs better on longer texts??
containing more information on the structural properties of the
input text
performs better for morphologically rich languages??
Italian ?? 21
Language Text type #words
(AVG)
F1 score
English Wikipedia
articles
5,919 24,8%
Croatian Newspap
er articles
335 34,21%
English Parallel
abstracts
of SP
110 46,73%
Serbian 100 49,57%
Data Science 4.0 Conference
SBKE according to text size
SBKE has been tested on
longer English Wikipedia texts
shorter Croatian newspaper article
short Serbian and English abstracts
SBKE in this study
can be applied to shorter documents
portable to new domain (geology)
SBKE Conclusion 1/2
Selectivity-Based Keyword Extraction - SBKE
purely from statistical and structural information
the source text is reflected into the structure of the network
comparable with existing methods
low precision vs. high recall
beside keywords, personal names and entities - not marked as keywords
for document and for collection extraction tasks
Keyword annotation is highly subjective task
even human experts have difficulties to agree upon keyphrases
inter-annotator consistency (agreement) (IIC):
Croatian HINA 40% (SBKE 26%)
English Wiki-20 30% (SBKE 17%)
22
Data Science 4.0 Conference
SBKE Conclusion 2/2
SBKE method
does not require linguistic knowledge (from statistical and structural
information )
main advantages:
networks constructed from co-occurrence of words in the input texts
the method achieves results which are above the TFxIDF but below
humans
documet or collection oriented tasks
preprocessing is not necessary (works without stemming/lemmatization)
easily portable to new languages
23
Data Science 4.0 Conference
S. Beliga, A. Meštrović, S. Martinčić-Ipšić.
”Selectivity-Based Keyword Extraction Method”, International Journal on
Semantic Web and Information Systems, vol. 12, No. 3, pp. 1-26, 2016.
more @ langnet.uniri.hr
• Text genres differentiation
• Can we determine text genre using only network structure properties?
• Legislation or literature? Blogs vs. literature?
• Authorship attribution and language detection
• Text classification
• dimensionality reduction in BoW model using structural properties of
text
• Wikipedia
• knowledge extraction and linking concepts
• Twitter
• polarity (positive/negative) detection
• prediction of missing links
• Social networks
• Coautorship networks analyis
• scientometrics 24
Data Science 4.0 Conference
LangNet team: langnet.uniri.hr/members.html
Department of informatics:
Sanda Martinčić-Ipšić,
Ana Meštrović,
Slobodan Beliga,
Lucia Načinović Prskalo
Collaborators:
Tajana Ban Kirigin,
Mihaela Matešić,
Benedikt Perak,
Zoran Levnajić
Past members:
Tanja Miličić, Domagoj Margan, Edvin Močibob, Neven Matas, Kristina Ban, Hana Rizvić, Sabina Šišović,
Luka Vretenar
25* University of Rijeka grant 13.13.2.2.07
Data Science 4.0 Conference
Sanda Martinčić-Ipšić
smarti@uniri.hr
langnet.uniri.hr
?

More Related Content

What's hot

Signature files
Signature filesSignature files
Signature files
Deepali Raikar
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Bhaskar Mitra
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
Vinoth Chandar
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
Bernard Marr
 
Web indexing finale
Web indexing finaleWeb indexing finale
Web indexing finale
Ajit More
 
Flink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data ConstellationFlink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data Constellation
Matthew Ring
 
Ir models
Ir modelsIr models
Ir models
Ambreen Angel
 
Amazon Product Sentiment review
Amazon Product Sentiment reviewAmazon Product Sentiment review
Amazon Product Sentiment review
Lalit Jain
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
Primya Tamil
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Max Irwin
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
 
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
Amazon Web Services
 
Automatic indexing
Automatic indexingAutomatic indexing
Automatic indexing
dhatchayaninandu
 
SE-IT JAVA LAB SYLLABUS
SE-IT JAVA LAB SYLLABUSSE-IT JAVA LAB SYLLABUS
SE-IT JAVA LAB SYLLABUS
nikshaikh786
 
Web mining
Web mining Web mining
Web mining
TeklayBirhane
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
silambu111
 
Evaluation in Information Retrieval
Evaluation in Information RetrievalEvaluation in Information Retrieval
Evaluation in Information Retrieval
Dishant Ailawadi
 
Text summarization
Text summarizationText summarization
Text summarization
Akash Karwande
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
Mahmood Reza Esmaili Zand
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
Benjamin Bengfort
 

What's hot (20)

Signature files
Signature filesSignature files
Signature files
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 
Web indexing finale
Web indexing finaleWeb indexing finale
Web indexing finale
 
Flink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data ConstellationFlink and NiFi, Two Stars in the Apache Big Data Constellation
Flink and NiFi, Two Stars in the Apache Big Data Constellation
 
Ir models
Ir modelsIr models
Ir models
 
Amazon Product Sentiment review
Amazon Product Sentiment reviewAmazon Product Sentiment review
Amazon Product Sentiment review
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
 
Automatic indexing
Automatic indexingAutomatic indexing
Automatic indexing
 
SE-IT JAVA LAB SYLLABUS
SE-IT JAVA LAB SYLLABUSSE-IT JAVA LAB SYLLABUS
SE-IT JAVA LAB SYLLABUS
 
Web mining
Web mining Web mining
Web mining
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
 
Evaluation in Information Retrieval
Evaluation in Information RetrievalEvaluation in Information Retrieval
Evaluation in Information Retrieval
 
Text summarization
Text summarizationText summarization
Text summarization
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 

Similar to Extracting keywords from texts - Sanda Martincic Ipsic

kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
butest
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
Laura Po
 
The Nature of Information
The Nature of InformationThe Nature of Information
The Nature of Information
Adrian Paschke
 
Profiling Linked Open Data
Profiling Linked Open DataProfiling Linked Open Data
Profiling Linked Open Data
Blerina Spahiu
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
Carole Goble
 
Web Archive Research Skills and Tools Survey (WARST)
 Web Archive Research Skills and Tools Survey (WARST) Web Archive Research Skills and Tools Survey (WARST)
Web Archive Research Skills and Tools Survey (WARST)
WARCnet
 
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
Advanced-Concepts-Team
 
Cyberistructure
CyberistructureCyberistructure
Cyberistructure
Lab Southwest
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
Besnik Fetahu
 
Semantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital Libraries
Stefan Dietze
 
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the WebRetrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Stefan Dietze
 
Data Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow ManagementData Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow Management
NeuroMat
 
Online Index Extraction from Linked Open Data Sources
Online Index Extraction from Linked Open Data SourcesOnline Index Extraction from Linked Open Data Sources
Online Index Extraction from Linked Open Data Sources
Fabio Benedetti
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET Journal
 
SoftwareInformationTechnology
SoftwareInformationTechnologySoftwareInformationTechnology
SoftwareInformationTechnology
Salhi Fadhel
 
A Distributed Audio Personalization Framework over Android
A Distributed Audio Personalization Framework over AndroidA Distributed Audio Personalization Framework over Android
A Distributed Audio Personalization Framework over Android
University of Piraeus
 
research unveiling connections and recommendations.pptx
research unveiling connections and recommendations.pptxresearch unveiling connections and recommendations.pptx
research unveiling connections and recommendations.pptx
Rishikeshreddy25
 
Linked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so farLinked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so far
Aliaksandr Birukou
 
Semantic annotation of biomedical data
Semantic annotation of biomedical dataSemantic annotation of biomedical data
Semantic annotation of biomedical data
INRAE (MISTEA) and University of Montpellier (LIRMM)
 
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsEKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
Francesco Osborne
 

Similar to Extracting keywords from texts - Sanda Martincic Ipsic (20)

kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
The Nature of Information
The Nature of InformationThe Nature of Information
The Nature of Information
 
Profiling Linked Open Data
Profiling Linked Open DataProfiling Linked Open Data
Profiling Linked Open Data
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
Web Archive Research Skills and Tools Survey (WARST)
 Web Archive Research Skills and Tools Survey (WARST) Web Archive Research Skills and Tools Survey (WARST)
Web Archive Research Skills and Tools Survey (WARST)
 
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
 
Cyberistructure
CyberistructureCyberistructure
Cyberistructure
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
 
Semantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital Libraries
 
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the WebRetrieval, Crawling and Fusion of Entity-centric Data on the Web
Retrieval, Crawling and Fusion of Entity-centric Data on the Web
 
Data Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow ManagementData Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow Management
 
Online Index Extraction from Linked Open Data Sources
Online Index Extraction from Linked Open Data SourcesOnline Index Extraction from Linked Open Data Sources
Online Index Extraction from Linked Open Data Sources
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...
 
SoftwareInformationTechnology
SoftwareInformationTechnologySoftwareInformationTechnology
SoftwareInformationTechnology
 
A Distributed Audio Personalization Framework over Android
A Distributed Audio Personalization Framework over AndroidA Distributed Audio Personalization Framework over Android
A Distributed Audio Personalization Framework over Android
 
research unveiling connections and recommendations.pptx
research unveiling connections and recommendations.pptxresearch unveiling connections and recommendations.pptx
research unveiling connections and recommendations.pptx
 
Linked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so farLinked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so far
 
Semantic annotation of biomedical data
Semantic annotation of biomedical dataSemantic annotation of biomedical data
Semantic annotation of biomedical data
 
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsEKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
EKAW 2016 - TechMiner: Extracting Technologies from Academic Publications
 

More from Institute of Contemporary Sciences

First 5 years of PSI:ML - Filip Panjevic
First 5 years of PSI:ML - Filip PanjevicFirst 5 years of PSI:ML - Filip Panjevic
First 5 years of PSI:ML - Filip Panjevic
Institute of Contemporary Sciences
 
Building valuable (online and offline) Data Science communities - Experience ...
Building valuable (online and offline) Data Science communities - Experience ...Building valuable (online and offline) Data Science communities - Experience ...
Building valuable (online and offline) Data Science communities - Experience ...
Institute of Contemporary Sciences
 
Data Science Master 4.0 on Belgrade University - Drazen Draskovic
Data Science Master 4.0 on Belgrade University - Drazen DraskovicData Science Master 4.0 on Belgrade University - Drazen Draskovic
Data Science Master 4.0 on Belgrade University - Drazen Draskovic
Institute of Contemporary Sciences
 
Deep learning fast and slow, a responsible and explainable AI framework - Ahm...
Deep learning fast and slow, a responsible and explainable AI framework - Ahm...Deep learning fast and slow, a responsible and explainable AI framework - Ahm...
Deep learning fast and slow, a responsible and explainable AI framework - Ahm...
Institute of Contemporary Sciences
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena Pekez
Institute of Contemporary Sciences
 
Application of Business Intelligence in bank risk management - Dimitar Dilov
Application of Business Intelligence in bank risk management - Dimitar DilovApplication of Business Intelligence in bank risk management - Dimitar Dilov
Application of Business Intelligence in bank risk management - Dimitar Dilov
Institute of Contemporary Sciences
 
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Institute of Contemporary Sciences
 
Recommender systems for personalized financial advice from concept to product...
Recommender systems for personalized financial advice from concept to product...Recommender systems for personalized financial advice from concept to product...
Recommender systems for personalized financial advice from concept to product...
Institute of Contemporary Sciences
 
Advanced tools in real time analytics and AI in customer support - Milan Sima...
Advanced tools in real time analytics and AI in customer support - Milan Sima...Advanced tools in real time analytics and AI in customer support - Milan Sima...
Advanced tools in real time analytics and AI in customer support - Milan Sima...
Institute of Contemporary Sciences
 
Complex AI forecasting methods for investments portfolio optimization - Pawel...
Complex AI forecasting methods for investments portfolio optimization - Pawel...Complex AI forecasting methods for investments portfolio optimization - Pawel...
Complex AI forecasting methods for investments portfolio optimization - Pawel...
Institute of Contemporary Sciences
 
From Zero to ML Hero for Underdogs - Amir Tabakovic
From Zero to ML Hero for Underdogs  - Amir TabakovicFrom Zero to ML Hero for Underdogs  - Amir Tabakovic
From Zero to ML Hero for Underdogs - Amir Tabakovic
Institute of Contemporary Sciences
 
Data and data scientists are not equal to money david hoyle
Data and data scientists are not equal to money   david hoyleData and data scientists are not equal to money   david hoyle
Data and data scientists are not equal to money david hoyle
Institute of Contemporary Sciences
 
The price is right - Tomislav Krizan
The price is right - Tomislav KrizanThe price is right - Tomislav Krizan
The price is right - Tomislav Krizan
Institute of Contemporary Sciences
 
When it's raining gold, bring a bucket - Andjela Culibrk
When it's raining gold, bring a bucket - Andjela CulibrkWhen it's raining gold, bring a bucket - Andjela Culibrk
When it's raining gold, bring a bucket - Andjela Culibrk
Institute of Contemporary Sciences
 
Reality and traps of real time data engineering - Milos Solujic
Reality and traps of real time data engineering - Milos SolujicReality and traps of real time data engineering - Milos Solujic
Reality and traps of real time data engineering - Milos Solujic
Institute of Contemporary Sciences
 
Sensor networks for personalized health monitoring - Vladimir Brusic
Sensor networks for personalized health monitoring - Vladimir BrusicSensor networks for personalized health monitoring - Vladimir Brusic
Sensor networks for personalized health monitoring - Vladimir Brusic
Institute of Contemporary Sciences
 
Improving Data Quality with Product Similarity Search
Improving Data Quality with Product Similarity SearchImproving Data Quality with Product Similarity Search
Improving Data Quality with Product Similarity Search
Institute of Contemporary Sciences
 
Prediction of good patterns for future sales using image recognition
Prediction of good patterns for future sales using image recognitionPrediction of good patterns for future sales using image recognition
Prediction of good patterns for future sales using image recognition
Institute of Contemporary Sciences
 
Using data to fight corruption: full budget transparency in local government
Using data to fight corruption: full budget transparency in local governmentUsing data to fight corruption: full budget transparency in local government
Using data to fight corruption: full budget transparency in local government
Institute of Contemporary Sciences
 
Geospatial Analysis and Open Data - Forest and Climate
Geospatial Analysis and Open Data - Forest and ClimateGeospatial Analysis and Open Data - Forest and Climate
Geospatial Analysis and Open Data - Forest and Climate
Institute of Contemporary Sciences
 

More from Institute of Contemporary Sciences (20)

First 5 years of PSI:ML - Filip Panjevic
First 5 years of PSI:ML - Filip PanjevicFirst 5 years of PSI:ML - Filip Panjevic
First 5 years of PSI:ML - Filip Panjevic
 
Building valuable (online and offline) Data Science communities - Experience ...
Building valuable (online and offline) Data Science communities - Experience ...Building valuable (online and offline) Data Science communities - Experience ...
Building valuable (online and offline) Data Science communities - Experience ...
 
Data Science Master 4.0 on Belgrade University - Drazen Draskovic
Data Science Master 4.0 on Belgrade University - Drazen DraskovicData Science Master 4.0 on Belgrade University - Drazen Draskovic
Data Science Master 4.0 on Belgrade University - Drazen Draskovic
 
Deep learning fast and slow, a responsible and explainable AI framework - Ahm...
Deep learning fast and slow, a responsible and explainable AI framework - Ahm...Deep learning fast and slow, a responsible and explainable AI framework - Ahm...
Deep learning fast and slow, a responsible and explainable AI framework - Ahm...
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena Pekez
 
Application of Business Intelligence in bank risk management - Dimitar Dilov
Application of Business Intelligence in bank risk management - Dimitar DilovApplication of Business Intelligence in bank risk management - Dimitar Dilov
Application of Business Intelligence in bank risk management - Dimitar Dilov
 
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
 
Recommender systems for personalized financial advice from concept to product...
Recommender systems for personalized financial advice from concept to product...Recommender systems for personalized financial advice from concept to product...
Recommender systems for personalized financial advice from concept to product...
 
Advanced tools in real time analytics and AI in customer support - Milan Sima...
Advanced tools in real time analytics and AI in customer support - Milan Sima...Advanced tools in real time analytics and AI in customer support - Milan Sima...
Advanced tools in real time analytics and AI in customer support - Milan Sima...
 
Complex AI forecasting methods for investments portfolio optimization - Pawel...
Complex AI forecasting methods for investments portfolio optimization - Pawel...Complex AI forecasting methods for investments portfolio optimization - Pawel...
Complex AI forecasting methods for investments portfolio optimization - Pawel...
 
From Zero to ML Hero for Underdogs - Amir Tabakovic
From Zero to ML Hero for Underdogs  - Amir TabakovicFrom Zero to ML Hero for Underdogs  - Amir Tabakovic
From Zero to ML Hero for Underdogs - Amir Tabakovic
 
Data and data scientists are not equal to money david hoyle
Data and data scientists are not equal to money   david hoyleData and data scientists are not equal to money   david hoyle
Data and data scientists are not equal to money david hoyle
 
The price is right - Tomislav Krizan
The price is right - Tomislav KrizanThe price is right - Tomislav Krizan
The price is right - Tomislav Krizan
 
When it's raining gold, bring a bucket - Andjela Culibrk
When it's raining gold, bring a bucket - Andjela CulibrkWhen it's raining gold, bring a bucket - Andjela Culibrk
When it's raining gold, bring a bucket - Andjela Culibrk
 
Reality and traps of real time data engineering - Milos Solujic
Reality and traps of real time data engineering - Milos SolujicReality and traps of real time data engineering - Milos Solujic
Reality and traps of real time data engineering - Milos Solujic
 
Sensor networks for personalized health monitoring - Vladimir Brusic
Sensor networks for personalized health monitoring - Vladimir BrusicSensor networks for personalized health monitoring - Vladimir Brusic
Sensor networks for personalized health monitoring - Vladimir Brusic
 
Improving Data Quality with Product Similarity Search
Improving Data Quality with Product Similarity SearchImproving Data Quality with Product Similarity Search
Improving Data Quality with Product Similarity Search
 
Prediction of good patterns for future sales using image recognition
Prediction of good patterns for future sales using image recognitionPrediction of good patterns for future sales using image recognition
Prediction of good patterns for future sales using image recognition
 
Using data to fight corruption: full budget transparency in local government
Using data to fight corruption: full budget transparency in local governmentUsing data to fight corruption: full budget transparency in local government
Using data to fight corruption: full budget transparency in local government
 
Geospatial Analysis and Open Data - Forest and Climate
Geospatial Analysis and Open Data - Forest and ClimateGeospatial Analysis and Open Data - Forest and Climate
Geospatial Analysis and Open Data - Forest and Climate
 

Recently uploaded

一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
ugydym
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
sapna sharmap11
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
Do People Really Know Their Fertility Intentions?  Correspondence between Sel...Do People Really Know Their Fertility Intentions?  Correspondence between Sel...
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
Xiao Xu
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
GeorgiiSteshenko
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
High Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENT
High Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENTHigh Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENT
High Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENT
ranjeet3341
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
sapna sharmap11
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
Call Girls Goa👉9024918724👉Low Rate Escorts in Goa 💃 Available 24/7
Call Girls Goa👉9024918724👉Low Rate Escorts in Goa 💃 Available 24/7Call Girls Goa👉9024918724👉Low Rate Escorts in Goa 💃 Available 24/7
Call Girls Goa👉9024918724👉Low Rate Escorts in Goa 💃 Available 24/7
nitachopra
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
9gr6pty
 
Health care analysis using sentimental analysis
Health care analysis using sentimental analysisHealth care analysis using sentimental analysis
Health care analysis using sentimental analysis
krishnasrigannavarap
 
Senior Software Profiles Backend Sample - Sheet1.pdf
Senior Software Profiles  Backend Sample - Sheet1.pdfSenior Software Profiles  Backend Sample - Sheet1.pdf
Senior Software Profiles Backend Sample - Sheet1.pdf
Vineet
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
Vineet
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
zsafxbf
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
Rebecca Bilbro
 
Ahmedabad Call Girls 7339748667 With Free Home Delivery At Your Door
Ahmedabad Call Girls 7339748667 With Free Home Delivery At Your DoorAhmedabad Call Girls 7339748667 With Free Home Delivery At Your Door
Ahmedabad Call Girls 7339748667 With Free Home Delivery At Your Door
Russian Escorts in Delhi 9711199171 with low rate Book online
 
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
22ad0301
 

Recently uploaded (20)

一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
Do People Really Know Their Fertility Intentions?  Correspondence between Sel...Do People Really Know Their Fertility Intentions?  Correspondence between Sel...
Do People Really Know Their Fertility Intentions? Correspondence between Sel...
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
High Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENT
High Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENTHigh Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENT
High Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENT
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
Call Girls Goa👉9024918724👉Low Rate Escorts in Goa 💃 Available 24/7
Call Girls Goa👉9024918724👉Low Rate Escorts in Goa 💃 Available 24/7Call Girls Goa👉9024918724👉Low Rate Escorts in Goa 💃 Available 24/7
Call Girls Goa👉9024918724👉Low Rate Escorts in Goa 💃 Available 24/7
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
 
Health care analysis using sentimental analysis
Health care analysis using sentimental analysisHealth care analysis using sentimental analysis
Health care analysis using sentimental analysis
 
Senior Software Profiles Backend Sample - Sheet1.pdf
Senior Software Profiles  Backend Sample - Sheet1.pdfSenior Software Profiles  Backend Sample - Sheet1.pdf
Senior Software Profiles Backend Sample - Sheet1.pdf
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
 
Ahmedabad Call Girls 7339748667 With Free Home Delivery At Your Door
Ahmedabad Call Girls 7339748667 With Free Home Delivery At Your DoorAhmedabad Call Girls 7339748667 With Free Home Delivery At Your Door
Ahmedabad Call Girls 7339748667 With Free Home Delivery At Your Door
 
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
 

Extracting keywords from texts - Sanda Martincic Ipsic

  • 1. Department of Informatics, University of Rijeka Radmile Matejčić 2, 51000 Rijeka, Croatia Tel.: + 385 51 584 700 Fax: + 385 51 584 749 www.langnet.uniri.hr Extracting keywords from texts Sanda Martinčić-Ipšić smarti@uniri.hr langnet.uniri.hr Data Science 4.0 Conference
  • 2. Introduction 1/4 Keyword extraction automatically identify a that best describe the document the most representative features (terms) of the source text huge amount of multi topic textual sources applications: summarization, indexing, labeling, categorization, clustering document-oriented task (single documents) collection-oriented task (i.e. entire web site) more demanding 1 Data Science 4.0 Conference
  • 3. 2 How KE works? Data Science 4.0 Conference
  • 4. Introduction 2/4 keyword extraction is traditionally based on statistical methods learning from or graph, since the number of words in isolated documents is limited the source (document, text) is modelled in a network words are nodes and their co-occurrence is represented with links links can model different linguistic relations (syntactic; semantic) 3 directed and weighted co-occurrence network for two sentences: “Selectivity-based keyword extraction method is fully unsupervised. It consists of two phases: keyword extraction and keyword expansion.” Data Science 4.0 Conference
  • 5. Introduction 3/4 can exploit: or level measures: coreness, clustering coefficient PageRank motivated ranking score or HITS motivated hub and authority score communities strength, centrality and eigenvector centrality the average strength of the node 4 Data Science 4.0 Conference S. Beliga, A. Meštrović, S. Martinčić-Ipšić. "An Overview of Graph-Based Keyword Extraction Methods and Approaches". Journal of Information and Organizational Sciences, 39(1), pp. 1-20, 2015.
  • 6. Complex network analysis 1/2 Directed and weighted network [Newman, 2010] N – number of nodes M – number of links wij – weight for link which connects nodes i and j Ki – node degree: number of links incident upon a node 5 𝒅𝒄𝒊 (𝒊𝒏/𝒐𝒖𝒕) = 𝒌𝒊 (𝒊𝒏/𝒐𝒖𝒕) 𝑵 − 𝟏 𝒄𝒄𝒊 = 𝑵 − 𝟏 σ𝒊≠𝒋 𝒅𝒊𝒋 𝒃𝒄𝒊 = σ𝒊≠𝒋≠𝒌 𝝈𝒋𝒌(𝒊) 𝝈𝒋𝒌 (𝑵 − 𝟏)(𝑵 − 𝟐) Data Science 4.0 Conference
  • 7. 3. Complex network analysis 1/2 strength and selectivity selectivity average weight distribution on the links of the single node generalized selectivity 6 𝒔𝒊 𝒊𝒏/𝒐𝒖𝒕 = ෍ 𝒋 𝒘𝒋𝒊/𝒊𝒋 𝒈𝒆𝒊 α 𝒊𝒏/𝒐𝒖𝒕 = 𝒌𝒊 𝒊𝒏/𝒐𝒖𝒕 𝒔𝒊 𝒊𝒏/𝒐𝒖𝒕 𝒌𝒊 𝒊𝒏/𝒐𝒖𝒕 α [Masucci & Rodgers, 2009]𝒆𝒊 𝒊𝒏/𝒐𝒖𝒕 = 𝒔𝒊 𝒊𝒏/𝒐𝒖𝒕 𝒌𝒊 𝒊𝒏/𝒐𝒖𝒕 adapted from [Opsahl et al., 2010] in [Beliga, Meštrović, Martinčić-Ipšić, 2016] α∈ [0,1] – prefers nodes with higher degrees α ∈ [1, +∞] – prefers nodes with lower degrees 𝛼 = 1– node strength Data Science 4.0 Conference
  • 8. Data: Croatian and English texts collections with manually annotated keywords by human experts HINA Croatian News Agency Wikipedia: technical reports covering different aspects of CS 8 HINA WIKI-20 8 human experts 15 teams (2 undergrad. stud.) 60 texts 20 texts Croatian English 10 keywords on AVG per human 5,7 keywords on AVG per team human consistency≈ 39.5% team consistency ≈ 30.5% Data Science 4.0 Conference
  • 9. Network construction network per document from all documents (collection extraction) each word two words are linked if they are adjacent in the sentence proportional to the overall co-occurrence frequencies of the corresponding word pairs 9 Data Science 4.0 Conference
  • 10. Comparison of centrality measures 10 DATA SET MEASURE TOP 5 TOP 10 Ravg Pavg F1avg Ravg Pavg F1avg HINA Closeness: 𝑐𝑐𝑖 4.3 39.3 7.2 8.4 42.9 12.8 Betweenness: 𝑏𝑐𝑖 3.6 40.8 6.5 9.2 52.8 13.9 in/out-degree: 𝑑𝑐𝑖 𝑖𝑛/𝑜𝑢𝑡 23.0 38.4 17.0 62.1 25.1 26.6 in/out-selectivity: 𝑒𝑖 𝑖𝑛/𝑜𝑢𝑡 14.0 35.9 19.3 16.8 38.0 22.1 WIKI-20 Closeness: 𝑐𝑐𝑖 1.6 42.5 3.1 5.3 63.3 9.7 Betweenness: 𝑏𝑐𝑖 0.3 10.0 0.5 2.6 55.0 4.9 in/out-degree: 𝑑𝑐𝑖 𝑖𝑛/𝑜𝑢𝑡 0.1 5.0 0.3 2.5 57.5 4.8 in/out-selectivity: 𝑒𝑖 𝑖𝑛/𝑜𝑢𝑡 2.3 17.8 3.9 8.6 18.2 10.8 TOP 10 ranked words: in-degree: to be, and, in, on, which, for, but, this, self, of betweenness: to be, and, in, on, self, this, which, for, Croatian, but in/out selectivity: Bratislava, area, Tuesday, inland, revolution, verification, decade, Balkan, freedom, Universe Data Science 4.0 Conference
  • 11. Selectivity differentiate between two types of nodes (words) high strength and high degree values : stop-words, conjunctions, prepositions high strength and low degree : nouns, adjectives, verbs and words that are part of collocations, keyphrases, names, etc. efficiently extract 11 Data Science 4.0 Conference
  • 12. SBKE method 12 Data Science 4.0 Conference
  • 13. SBKE method 13 Data Science 4.0 Conference
  • 14. 14 Data Science 4.0 Conference
  • 15. SBKE with Generalized Selectivity SBKE method can be easily adjusted with new measures generalized selectivity measure: 𝑔𝑒𝑖 α 𝑖𝑛/𝑜𝑢𝑡 α adjust the relationship between the node degree and strength – provide different set of keywords Find the α parameter which best fits the experimental settings according to 𝐼𝐼𝐶 𝑂 scores tuning to the data set 15 Data Science 4.0 Conference
  • 16. SBKE with Generalized Selectivity The performance of the first step (SET1) of the SBKE method in terms of 𝐼𝐼𝐶 𝑂 scores for the TOP 5 and TOP 10 keywords measured for generalized selectivity 𝑔𝑒𝑖 α 𝑖𝑛/𝑜𝑢𝑡 with different values of parameter α (plotted as lines) and selectivity (𝑒 𝑖𝑛/𝑜𝑢𝑡 ) values (plotted as dots on the y-axis) for HINA and WIKI-20 datasets 16 Data Science 4.0 Conference
  • 17. Document vs. Collection-Oriented Extraction Results 17 HINA 60 individual networks integral network Ravg Pavg F1avg R P F1 SET1 18.90 39.35 23.60 30.71 35.80 33.06 SET2 19.74 39.15 24.76 33.46 33.97 33.71 SET3 29.55 22.96 22.71 60.47 19.89 28.89 WIKI- 20 20 individual networks (ei in/out >1) integral network ( ei in/out >1) integral network ( ei in/out >2) Ravg Pavg F1avg R P F1 R P F1 SET1 59.06 13.40 21.48 76.17 19.08 30.51 32.71 30.30 31.46 SET2 60.12 12.33 20.46 76.64 18.87 30.29 36.45 31.97 34.06 SET3 62.17 12.05 20.19 77.50 15.75 26.18 36.68 32.04 34.21 Data Science 4.0 Conference
  • 18. Keyword Extraction from Parallel Abstracts scientific journal “Underground Mining Engineering” published by the University of Belgrade, Faculty of Mining and Geology papers from diffrent fields of mining, geology, and geosciences (2002-2014) bilinguall papers written in Serbian and in English available online as aligned parallel text in the Biblisha digital library: http://jerteh.rs/biblisha/ListaDokumenata.aspx?JCID=2&lng=en 18 Data Science 4.0 Conference
  • 19. Data Experiment collection of 50 bilingual documents with approximately 4,800 aligned sentences originally written in Serbian and then translated into English by professionals translators Bilingual Serbian-English KE dataset introduced in this paper is publicly available from http://langnet.uniri.hr/resources.html 19 Data Science 4.0 Conference S. Beliga, Kitanović, O., Stanković, R., S. Martinčić-Ipšić. "Keyword Extraction from Parallel Abstracts of Scientific Publications". Semantic Keyword-Based Search on Structured Data Sources, LNCS 10546, Springer, Gdańsk, Poland, pp. 44-45, 2018.
  • 20. Results 1/2 Evaluation according to the set of annotated keywords; provided by the author(s) of the abstracts set of annotated keywords without OOV words 𝑅, 𝑃 and 𝐹1 score achieves higher values for KW without OOV words this is expected because people tagged KW that did not apear in texts ~18% of the KWs in English ~19% of the KWs in Serbian all scores better for Serbian than for English 20 ~3% OOV Previous research: holds for Croatian language. Data Science 4.0 Conference
  • 21. Results 2/2 Open questions??: performs better on longer texts?? containing more information on the structural properties of the input text performs better for morphologically rich languages?? Italian ?? 21 Language Text type #words (AVG) F1 score English Wikipedia articles 5,919 24,8% Croatian Newspap er articles 335 34,21% English Parallel abstracts of SP 110 46,73% Serbian 100 49,57% Data Science 4.0 Conference SBKE according to text size SBKE has been tested on longer English Wikipedia texts shorter Croatian newspaper article short Serbian and English abstracts SBKE in this study can be applied to shorter documents portable to new domain (geology)
  • 22. SBKE Conclusion 1/2 Selectivity-Based Keyword Extraction - SBKE purely from statistical and structural information the source text is reflected into the structure of the network comparable with existing methods low precision vs. high recall beside keywords, personal names and entities - not marked as keywords for document and for collection extraction tasks Keyword annotation is highly subjective task even human experts have difficulties to agree upon keyphrases inter-annotator consistency (agreement) (IIC): Croatian HINA 40% (SBKE 26%) English Wiki-20 30% (SBKE 17%) 22 Data Science 4.0 Conference
  • 23. SBKE Conclusion 2/2 SBKE method does not require linguistic knowledge (from statistical and structural information ) main advantages: networks constructed from co-occurrence of words in the input texts the method achieves results which are above the TFxIDF but below humans documet or collection oriented tasks preprocessing is not necessary (works without stemming/lemmatization) easily portable to new languages 23 Data Science 4.0 Conference S. Beliga, A. Meštrović, S. Martinčić-Ipšić. ”Selectivity-Based Keyword Extraction Method”, International Journal on Semantic Web and Information Systems, vol. 12, No. 3, pp. 1-26, 2016.
  • 24. more @ langnet.uniri.hr • Text genres differentiation • Can we determine text genre using only network structure properties? • Legislation or literature? Blogs vs. literature? • Authorship attribution and language detection • Text classification • dimensionality reduction in BoW model using structural properties of text • Wikipedia • knowledge extraction and linking concepts • Twitter • polarity (positive/negative) detection • prediction of missing links • Social networks • Coautorship networks analyis • scientometrics 24 Data Science 4.0 Conference
  • 25. LangNet team: langnet.uniri.hr/members.html Department of informatics: Sanda Martinčić-Ipšić, Ana Meštrović, Slobodan Beliga, Lucia Načinović Prskalo Collaborators: Tajana Ban Kirigin, Mihaela Matešić, Benedikt Perak, Zoran Levnajić Past members: Tanja Miličić, Domagoj Margan, Edvin Močibob, Neven Matas, Kristina Ban, Hana Rizvić, Sabina Šišović, Luka Vretenar 25* University of Rijeka grant 13.13.2.2.07 Data Science 4.0 Conference