SlideShare a Scribd company logo
G. Futia F. Cairo F. Morando L. Leschiutta
Exploiting Linked Open Data
and Natural Language Processing for
Classification of Political Speech
Krems, 22nd
May 2014
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 2
Introduction
●
Our goal:
● assist anyone interested in automatic categorization of political
speeches, to identify unambiguously the main political trends
addressed by the White House
●
What we have to achieve our goal:
● TellMeFirst (http://tellmefirst.polito.it/), a topic extraction tool:
– it leverages DBpedia knowledge base and English Wikipedia
linguistic corpus
– it exploits Linked Open Data (LOD) and Natural Language
Processing (NLP) techniques
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 3
DBpedia
● A crowd-sourced community effort to extract
structured information from Wikipedia and a
central interlinking hub for the Linking Open Data
project.
● It is a suitable knowledge base for text classification
(Mendes et al., 2012; Hellmann et al., 2013; Steinmetz
et al., 2013)
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 4
Why DBpedia for US
political speeches?
Comparison between the
coverage of US politics and the
coverage of politics of other
countries
The coverage of politics in Wikipedia is “often very good for recent or
prominent topics but is lacking on older or more obscure topics”
(Brown, 2011).
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 5
Text Categorization Approach
● An instance-based approch:TellMeFirst assigns target
documents to classes based on a local comparison between
a set of pre-classified documents and the target
document itself
● This training set consists of all the Wikipedia paragraphs
where a wikilink occurs.These paragraphs are stored in a
Lucene index, where each document represents a DBpedia
resource
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 6
Success rate (%) of the TellMeFirst classification
process on US Presidents profiles
1st topic Within the
first 2 topics
Within the
first 7 topics
Full text of the Presidents profiles 95.4% 100% 100%
President profiles without name
and surname
45.4% 61.3% 90.9%
TellMeFirst provides as output the seven most relevant topics
(in the form of DBpedia URI) of the document sorted by relevance
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 7
whitehouse.gov
●
3173 videos in English were available on the White House
website on the 24th of November 2013
● These videos are categorized according to a taxonomy not
related to the subject of the speeches
● They need a semantic layer that point out the content of the
speeches, so that questions such as “what is the First Lady
talking about?” could be automatically answered
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 8
Not just a bag-of-words tool
Results obtained with TellMeFirst (on the left) and withTagCrowd (on the right)
«President Obama Speaks on the Affordable Care Act»
http://1.usa.gov/1jR4Ky2
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 9
Results (i)
Occ. % overall % 2013 % 2012 % 2011 % 2010 % 2009
Barack Obama 607 4.88% 5.68% 4.52% 5.51% 4.45% 3.88%
Patient Protection and
Affordable Care Act
286 2.30% 3.06% 1.35% 1.91% 2.47% 2.71%
American Recovery and
Reinvestment Act of 2009
278 2.23% 1.09% 1.82% 2.88% 2.84% 1.88%
Social Security 272 2.19% 2.58% 1.77% 3.54% 1.61% 0.78%
Amount and percentage of topic
occurrences extracted with TellMeFirst
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 10
Results (ii)
● “New Deal” (141 occurrences), probably used as a metaphor
within the political speeches of President Obama
● “Libya” has a value corresponding to 1.00% in 2011.This result can
be related to the full-scale revolt beginning on 17 February 2011 in
Libya
● “Deepwater Horizon oil spill” reaches the 1.05% in 2010.This
result is related to the marine oil spill which took place in the Gulf
of Mexico that began on 20 april 2010
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 11
Correlation among topics
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 12
A focus on the First Lady (i)
● According to Michelle Obama’s page on the White House
website, the First Lady “looks forward to continuing her work
on the issues close to her heart”:
● supporting military families
● helping working women balance career and family
encouraging national service
● promoting the arts and arts education
● fostering healthy eating and healthy living for children and
families across the country
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 13
A focus on the First Lady (ii)
● We tested whether TellMeFirst confirms or not these
impressions and claims, manually selecting nine Wikipedia
categories which seemed to be related to these issues
● We then interrogated the SPARQL end-point of DBpedia with
a query to collect all the topics of these categories
●
We then associated each topic to one or more of the nine
high-level categories: these categories encompassed
almost 75% of the topics
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 14
A focus on the First Lady (iii)
Wikipedia Category First Lady sp.
9 categories
All speeches
9 categories
Government of the United States 26.68% 32.68%
Education 21.64% 5.40%
Nutrition 19.96% 1.61%
Social issues 14.71% 28.38%
Barack Obama 13.66% 14.00%
Health care 11.34% 7.57%
Arts 8.61% 1.11%
Military personnel 3.99% 3.16%
Gender equality 2.73% 0.84%
Others (unclassified topics) 25.63% 38.34%
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 15
Conclusions (i)
● The ability for citizens to easily retrieve the content of political
speeches and decisions is a crucial factor in e-participation
● Not guaranteed by a traditional keywords search, as in
most of the public administration websites (the White
House website included)
● Example: in a keyword-based system, by typing the word
"education", for instance, users get as result only videos that
have the word education in their title
● All terms that belong to the semantic area of education
are omitted
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 16
Conclusions (ii)
● When documents are semantically classified through
DBpedia URIs all synonyms, hypernyms and hyponyms of
lemmas are traced to the same concept making
user search more effective
● Leveraging Wikipedia categories would allow to go
even a step further, taking advantage of the links
between concepts as designed by the Wikipedia
community
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 17
Next steps
● Building a content search/navigation layer around the
scraping/classification module
● Integration with other Linked Open Data repositories on the
Web, combining the extracted topics with other information
(President Obama's federal budget proposal?)
Thank you!
Giuseppe Futia (giuseppe.futia@polito.it)
This paper was drafted in the context of the Network of Excellence in Internet Science EINS (GA n°288021), and, in
particular, in relation with the activities concerning Evidence and Experimentation (JRA3).
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 19
Appendix - The algorithm
●
The classifier needs to hold in memory all the instances of the
training set and calculate, during classification stage, the vector
distance between training documents and target documents.
● Specifically, the algorithm used by TMF is k-Nearest Neighbor
(kNN), a type of memory-based approach which selects the
categories for a target document on the basis of the k most
similar documents within the vector space.
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 20
Appendix - Scoring formula
●
In a Lucene query, both the target document and the training
set become weighed terms vectors, where terms are weighted
by means of the TF-IDF algorithm.The query returns a list of
documents in the form of DBpedia URIs, ordered by similarity
score. Scoring formula is:
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 21
Appendix - Basic concepts
● Natural Language Processing - A field of computer science,
concerned with the interactions between computers and human
(natural) languages.
● Linked Data - A recommended best practice for exposing, sharing,
and connecting pieces of data, information, and knowledge on the
Semantic Web using URIs and RDF.
● DBpedia - A crowd-sourced community effort to extract
structured information from Wikipedia and a central interlinking hub
for the Linking Open Data project.

More Related Content

Viewers also liked

2009 BIOL503 Class 3 Torts Zyprexa Lillysignedsettlementagreement
2009 BIOL503 Class 3 Torts Zyprexa Lillysignedsettlementagreement2009 BIOL503 Class 3 Torts Zyprexa Lillysignedsettlementagreement
2009 BIOL503 Class 3 Torts Zyprexa Lillysignedsettlementagreement
Karol Pessin
 
HND Rayner Meyer 1
HND Rayner Meyer 1HND Rayner Meyer 1
HND Rayner Meyer 1Rayner Meyer
 
Piri Point
Piri PointPiri Point
Piri Point
Piri Point
 
2015年JSET全国大会 SIG-05 SIGセッションスライド
2015年JSET全国大会 SIG-05 SIGセッションスライド2015年JSET全国大会 SIG-05 SIGセッションスライド
2015年JSET全国大会 SIG-05 SIGセッションスライド
Katsusuke Shigeta
 
fannie mae 2005 Form 10-K
fannie mae 2005 Form 10-Kfannie mae 2005 Form 10-K
fannie mae 2005 Form 10-Kfinance6
 
intel First Quarter 2008
intel First Quarter 2008 intel First Quarter 2008
intel First Quarter 2008 finance6
 
Montaggio Doccia Chiocciola
Montaggio Doccia ChiocciolaMontaggio Doccia Chiocciola
Montaggio Doccia ChiocciolaGalli Gianni
 
sprint nextel Quarterly Results 2007 3rd
sprint nextel Quarterly Results 2007 3rdsprint nextel Quarterly Results 2007 3rd
sprint nextel Quarterly Results 2007 3rdfinance6
 
Lineárny park Petržalka
Lineárny park PetržalkaLineárny park Petržalka
Lineárny park PetržalkaMarcel Slávik
 

Viewers also liked (12)

2009 BIOL503 Class 3 Torts Zyprexa Lillysignedsettlementagreement
2009 BIOL503 Class 3 Torts Zyprexa Lillysignedsettlementagreement2009 BIOL503 Class 3 Torts Zyprexa Lillysignedsettlementagreement
2009 BIOL503 Class 3 Torts Zyprexa Lillysignedsettlementagreement
 
imtiyaz cv'16
imtiyaz cv'16imtiyaz cv'16
imtiyaz cv'16
 
HND Rayner Meyer 1
HND Rayner Meyer 1HND Rayner Meyer 1
HND Rayner Meyer 1
 
Piri Point
Piri PointPiri Point
Piri Point
 
2015年JSET全国大会 SIG-05 SIGセッションスライド
2015年JSET全国大会 SIG-05 SIGセッションスライド2015年JSET全国大会 SIG-05 SIGセッションスライド
2015年JSET全国大会 SIG-05 SIGセッションスライド
 
PoonamMalhotra_CV
PoonamMalhotra_CVPoonamMalhotra_CV
PoonamMalhotra_CV
 
fannie mae 2005 Form 10-K
fannie mae 2005 Form 10-Kfannie mae 2005 Form 10-K
fannie mae 2005 Form 10-K
 
intel First Quarter 2008
intel First Quarter 2008 intel First Quarter 2008
intel First Quarter 2008
 
Montaggio Doccia Chiocciola
Montaggio Doccia ChiocciolaMontaggio Doccia Chiocciola
Montaggio Doccia Chiocciola
 
vijesh resume
vijesh resumevijesh resume
vijesh resume
 
sprint nextel Quarterly Results 2007 3rd
sprint nextel Quarterly Results 2007 3rdsprint nextel Quarterly Results 2007 3rd
sprint nextel Quarterly Results 2007 3rd
 
Lineárny park Petržalka
Lineárny park PetržalkaLineárny park Petržalka
Lineárny park Petržalka
 

Similar to Cedem futia-2014

A multifaceted study of online news diversity: issues and methods
A multifaceted study of online news diversity: issues and methodsA multifaceted study of online news diversity: issues and methods
A multifaceted study of online news diversity: issues and methods
smyrnaios
 
Civic Monitoring - the example of the Italian open finance platforms OpenCoes...
Civic Monitoring - the example of the Italian open finance platforms OpenCoes...Civic Monitoring - the example of the Italian open finance platforms OpenCoes...
Civic Monitoring - the example of the Italian open finance platforms OpenCoes...
Luigi Reggi
 
MOOC Strategies in Higher Education
MOOC Strategies in Higher EducationMOOC Strategies in Higher Education
MOOC Strategies in Higher Education
Teija Lehto
 
Scanning for emerging s&t issues
Scanning for emerging s&t issuesScanning for emerging s&t issues
Scanning for emerging s&t issues
Totti Könnölä
 
OpenCoesione Monithon at CTG Albany
OpenCoesione Monithon at CTG AlbanyOpenCoesione Monithon at CTG Albany
OpenCoesione Monithon at CTG Albany
OpenCoesione
 
Science & Society -- From Dissemination to Deliberation
Science & Society -- From Dissemination to DeliberationScience & Society -- From Dissemination to Deliberation
Science & Society -- From Dissemination to DeliberationProf. Alexander Gerber
 
G4 report from breakout group 4
G4 report from breakout group 4G4 report from breakout group 4
G4 report from breakout group 4
EUPATI
 
SocialUniversity:How Do Universities Use Social Media? An Empirical Survey of...
SocialUniversity:How Do Universities Use Social Media? An Empirical Survey of...SocialUniversity:How Do Universities Use Social Media? An Empirical Survey of...
SocialUniversity:How Do Universities Use Social Media? An Empirical Survey of...
Danube University Krems, Centre for E-Governance
 
Open Data in and from schools
Open Data in and from schoolsOpen Data in and from schools
Open Data in and from schools
Marco Fioretti
 
The Role ok TAG and extend TAG in SDG 4
The Role ok TAG and extend TAG in SDG 4The Role ok TAG and extend TAG in SDG 4
The Role ok TAG and extend TAG in SDG 4
Convoy
 
Open learning- Text analysis basics
Open learning- Text analysis basicsOpen learning- Text analysis basics
Open learning- Text analysis basics
Up2Universe
 
Cooperation needs on Field Operational Tests
Cooperation needs on Field Operational TestsCooperation needs on Field Operational Tests
Cooperation needs on Field Operational TestseuroFOT
 
The Role of Technical Advisory Group
The Role of Technical Advisory GroupThe Role of Technical Advisory Group
The Role of Technical Advisory Group
ConvoyDigital
 
Research Policy Monitoring in the Era of Open Science & Big Data Workshop Report
Research Policy Monitoring in the Era of Open Science & Big Data Workshop ReportResearch Policy Monitoring in the Era of Open Science & Big Data Workshop Report
Research Policy Monitoring in the Era of Open Science & Big Data Workshop Report
Data4Impact
 
Virtual meeting on the COVID-19 response in the area of Communication and Hum...
Virtual meeting on the COVID-19 response in the area of Communication and Hum...Virtual meeting on the COVID-19 response in the area of Communication and Hum...
Virtual meeting on the COVID-19 response in the area of Communication and Hum...
Istituto nazionale di statistica
 
UNESCO , ICT and the Millennium Institute - Tapio Varis, professor emeritus
UNESCO , ICT and the Millennium Institute - Tapio Varis, professor emeritusUNESCO , ICT and the Millennium Institute - Tapio Varis, professor emeritus
UNESCO , ICT and the Millennium Institute - Tapio Varis, professor emeritus
Ed Dodds
 
Michela Insenga: 1.3) INSTEM – Innovation Network in STEM
Michela Insenga: 1.3)	INSTEM – Innovation Network in STEM Michela Insenga: 1.3)	INSTEM – Innovation Network in STEM
Michela Insenga: 1.3) INSTEM – Innovation Network in STEM
Brussels, Belgium
 
IDSP19C#F - B - Mingjun Lan - Updated - What ideologies and realities can be ...
IDSP19C#F - B - Mingjun Lan - Updated - What ideologies and realities can be ...IDSP19C#F - B - Mingjun Lan - Updated - What ideologies and realities can be ...
IDSP19C#F - B - Mingjun Lan - Updated - What ideologies and realities can be ...
IDSP - IE Dissertation Support Project
 
Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestJaganadh Gopinadhan
 
NEGOTIATING THE ANALOG MAINSTREAM WITH DIGITAL METHODS IN HAND VISIONS FROM T...
NEGOTIATING THE ANALOG MAINSTREAM WITH DIGITAL METHODS IN HAND VISIONS FROM T...NEGOTIATING THE ANALOG MAINSTREAM WITH DIGITAL METHODS IN HAND VISIONS FROM T...
NEGOTIATING THE ANALOG MAINSTREAM WITH DIGITAL METHODS IN HAND VISIONS FROM T...
Pertti Ahonen
 

Similar to Cedem futia-2014 (20)

A multifaceted study of online news diversity: issues and methods
A multifaceted study of online news diversity: issues and methodsA multifaceted study of online news diversity: issues and methods
A multifaceted study of online news diversity: issues and methods
 
Civic Monitoring - the example of the Italian open finance platforms OpenCoes...
Civic Monitoring - the example of the Italian open finance platforms OpenCoes...Civic Monitoring - the example of the Italian open finance platforms OpenCoes...
Civic Monitoring - the example of the Italian open finance platforms OpenCoes...
 
MOOC Strategies in Higher Education
MOOC Strategies in Higher EducationMOOC Strategies in Higher Education
MOOC Strategies in Higher Education
 
Scanning for emerging s&t issues
Scanning for emerging s&t issuesScanning for emerging s&t issues
Scanning for emerging s&t issues
 
OpenCoesione Monithon at CTG Albany
OpenCoesione Monithon at CTG AlbanyOpenCoesione Monithon at CTG Albany
OpenCoesione Monithon at CTG Albany
 
Science & Society -- From Dissemination to Deliberation
Science & Society -- From Dissemination to DeliberationScience & Society -- From Dissemination to Deliberation
Science & Society -- From Dissemination to Deliberation
 
G4 report from breakout group 4
G4 report from breakout group 4G4 report from breakout group 4
G4 report from breakout group 4
 
SocialUniversity:How Do Universities Use Social Media? An Empirical Survey of...
SocialUniversity:How Do Universities Use Social Media? An Empirical Survey of...SocialUniversity:How Do Universities Use Social Media? An Empirical Survey of...
SocialUniversity:How Do Universities Use Social Media? An Empirical Survey of...
 
Open Data in and from schools
Open Data in and from schoolsOpen Data in and from schools
Open Data in and from schools
 
The Role ok TAG and extend TAG in SDG 4
The Role ok TAG and extend TAG in SDG 4The Role ok TAG and extend TAG in SDG 4
The Role ok TAG and extend TAG in SDG 4
 
Open learning- Text analysis basics
Open learning- Text analysis basicsOpen learning- Text analysis basics
Open learning- Text analysis basics
 
Cooperation needs on Field Operational Tests
Cooperation needs on Field Operational TestsCooperation needs on Field Operational Tests
Cooperation needs on Field Operational Tests
 
The Role of Technical Advisory Group
The Role of Technical Advisory GroupThe Role of Technical Advisory Group
The Role of Technical Advisory Group
 
Research Policy Monitoring in the Era of Open Science & Big Data Workshop Report
Research Policy Monitoring in the Era of Open Science & Big Data Workshop ReportResearch Policy Monitoring in the Era of Open Science & Big Data Workshop Report
Research Policy Monitoring in the Era of Open Science & Big Data Workshop Report
 
Virtual meeting on the COVID-19 response in the area of Communication and Hum...
Virtual meeting on the COVID-19 response in the area of Communication and Hum...Virtual meeting on the COVID-19 response in the area of Communication and Hum...
Virtual meeting on the COVID-19 response in the area of Communication and Hum...
 
UNESCO , ICT and the Millennium Institute - Tapio Varis, professor emeritus
UNESCO , ICT and the Millennium Institute - Tapio Varis, professor emeritusUNESCO , ICT and the Millennium Institute - Tapio Varis, professor emeritus
UNESCO , ICT and the Millennium Institute - Tapio Varis, professor emeritus
 
Michela Insenga: 1.3) INSTEM – Innovation Network in STEM
Michela Insenga: 1.3)	INSTEM – Innovation Network in STEM Michela Insenga: 1.3)	INSTEM – Innovation Network in STEM
Michela Insenga: 1.3) INSTEM – Innovation Network in STEM
 
IDSP19C#F - B - Mingjun Lan - Updated - What ideologies and realities can be ...
IDSP19C#F - B - Mingjun Lan - Updated - What ideologies and realities can be ...IDSP19C#F - B - Mingjun Lan - Updated - What ideologies and realities can be ...
IDSP19C#F - B - Mingjun Lan - Updated - What ideologies and realities can be ...
 
Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latest
 
NEGOTIATING THE ANALOG MAINSTREAM WITH DIGITAL METHODS IN HAND VISIONS FROM T...
NEGOTIATING THE ANALOG MAINSTREAM WITH DIGITAL METHODS IN HAND VISIONS FROM T...NEGOTIATING THE ANALOG MAINSTREAM WITH DIGITAL METHODS IN HAND VISIONS FROM T...
NEGOTIATING THE ANALOG MAINSTREAM WITH DIGITAL METHODS IN HAND VISIONS FROM T...
 

More from Danube University Krems, Centre for E-Governance

Smart Cities workshop at CeDEM17
Smart Cities workshop at CeDEM17Smart Cities workshop at CeDEM17
Smart Cities workshop at CeDEM17
Danube University Krems, Centre for E-Governance
 
#CeDEM17 - Towards an Open Data based ICT Reference Architecture for Smart Ci...
#CeDEM17 - Towards an Open Data based ICT Reference Architecture for Smart Ci...#CeDEM17 - Towards an Open Data based ICT Reference Architecture for Smart Ci...
#CeDEM17 - Towards an Open Data based ICT Reference Architecture for Smart Ci...
Danube University Krems, Centre for E-Governance
 
#CeDEM17 - Financial Payments and Smart Cities
#CeDEM17 - Financial Payments and Smart Cities #CeDEM17 - Financial Payments and Smart Cities
#CeDEM17 - Financial Payments and Smart Cities
Danube University Krems, Centre for E-Governance
 
#CeDEM2017 Smart Cities of Self-Determined Data Subjects
#CeDEM2017 Smart Cities of Self-Determined Data Subjects#CeDEM2017 Smart Cities of Self-Determined Data Subjects
#CeDEM2017 Smart Cities of Self-Determined Data Subjects
Danube University Krems, Centre for E-Governance
 
Open Data as Enabler of Public Service Co-creation: Exploring the Drivers and...
Open Data as Enabler of Public Service Co-creation:Exploring the Drivers and...Open Data as Enabler of Public Service Co-creation:Exploring the Drivers and...
Open Data as Enabler of Public Service Co-creation: Exploring the Drivers and...
Danube University Krems, Centre for E-Governance
 
DatalEt-Ecosystem Provider - The DEEP project
DatalEt-Ecosystem Provider - The DEEP projectDatalEt-Ecosystem Provider - The DEEP project
DatalEt-Ecosystem Provider - The DEEP project
Danube University Krems, Centre for E-Governance
 
Towards Open Justice: ICT acceptance in the Greek justice system
Towards Open Justice: ICT acceptance in the Greek justice systemTowards Open Justice: ICT acceptance in the Greek justice system
Towards Open Justice: ICT acceptance in the Greek justice system
Danube University Krems, Centre for E-Governance
 
[X]CHANGING PERSPECTIVES
[X]CHANGING PERSPECTIVES[X]CHANGING PERSPECTIVES
Using fuzzy cognitive maps as decision support tool for smart cities goraczek
Using fuzzy cognitive maps as decision support tool for smart cities  goraczekUsing fuzzy cognitive maps as decision support tool for smart cities  goraczek
Using fuzzy cognitive maps as decision support tool for smart cities goraczek
Danube University Krems, Centre for E-Governance
 
Understanding of smartphone divide dal yong
Understanding of smartphone divide  dal yongUnderstanding of smartphone divide  dal yong
Understanding of smartphone divide dal yong
Danube University Krems, Centre for E-Governance
 
The motivations behind open access publishing judith schossboeck
The motivations behind open access publishing  judith schossboeckThe motivations behind open access publishing  judith schossboeck
The motivations behind open access publishing judith schossboeck
Danube University Krems, Centre for E-Governance
 
Social media as hobed of racism and hate speech kobayashi, kaigo, kwak
Social media as hobed of racism and hate speech kobayashi, kaigo, kwakSocial media as hobed of racism and hate speech kobayashi, kaigo, kwak
Social media as hobed of racism and hate speech kobayashi, kaigo, kwak
Danube University Krems, Centre for E-Governance
 
Social media and citizen engagement in asia skoric
Social media and citizen engagement in asia  skoricSocial media and citizen engagement in asia  skoric
Social media and citizen engagement in asia skoric
Danube University Krems, Centre for E-Governance
 
Realizin modeling and evaluation city's enerfy efficiency leonidas anthopoulos
Realizin modeling and evaluation city's enerfy efficiency leonidas anthopoulosRealizin modeling and evaluation city's enerfy efficiency leonidas anthopoulos
Realizin modeling and evaluation city's enerfy efficiency leonidas anthopoulos
Danube University Krems, Centre for E-Governance
 
Post 2015 paris c limate conference politics on the internet manuela hartwig
Post 2015 paris c limate conference politics on the internet  manuela hartwigPost 2015 paris c limate conference politics on the internet  manuela hartwig
Post 2015 paris c limate conference politics on the internet manuela hartwig
Danube University Krems, Centre for E-Governance
 
Open government and national sovereignty ivo babaja
Open government and national sovereignty  ivo babajaOpen government and national sovereignty  ivo babaja
Open government and national sovereignty ivo babaja
Danube University Krems, Centre for E-Governance
 
Health r isk communication in the digital era myojung chung
Health r isk communication in the digital era myojung chungHealth r isk communication in the digital era myojung chung
Health r isk communication in the digital era myojung chung
Danube University Krems, Centre for E-Governance
 
An analysis of japanese local government facebook profiles muneo kaigo
An analysis of japanese local government facebook profiles muneo kaigoAn analysis of japanese local government facebook profiles muneo kaigo
An analysis of japanese local government facebook profiles muneo kaigo
Danube University Krems, Centre for E-Governance
 
GovCamp 2016 - Co-Creation
GovCamp 2016 - Co-CreationGovCamp 2016 - Co-Creation
Datenschutzbeauftragte werden in Zukunft eine wichtige Rolle im Unternehmen s...
Datenschutzbeauftragte werden in Zukunft eine wichtige Rolle im Unternehmen s...Datenschutzbeauftragte werden in Zukunft eine wichtige Rolle im Unternehmen s...
Datenschutzbeauftragte werden in Zukunft eine wichtige Rolle im Unternehmen s...
Danube University Krems, Centre for E-Governance
 

More from Danube University Krems, Centre for E-Governance (20)

Smart Cities workshop at CeDEM17
Smart Cities workshop at CeDEM17Smart Cities workshop at CeDEM17
Smart Cities workshop at CeDEM17
 
#CeDEM17 - Towards an Open Data based ICT Reference Architecture for Smart Ci...
#CeDEM17 - Towards an Open Data based ICT Reference Architecture for Smart Ci...#CeDEM17 - Towards an Open Data based ICT Reference Architecture for Smart Ci...
#CeDEM17 - Towards an Open Data based ICT Reference Architecture for Smart Ci...
 
#CeDEM17 - Financial Payments and Smart Cities
#CeDEM17 - Financial Payments and Smart Cities #CeDEM17 - Financial Payments and Smart Cities
#CeDEM17 - Financial Payments and Smart Cities
 
#CeDEM2017 Smart Cities of Self-Determined Data Subjects
#CeDEM2017 Smart Cities of Self-Determined Data Subjects#CeDEM2017 Smart Cities of Self-Determined Data Subjects
#CeDEM2017 Smart Cities of Self-Determined Data Subjects
 
Open Data as Enabler of Public Service Co-creation: Exploring the Drivers and...
Open Data as Enabler of Public Service Co-creation:Exploring the Drivers and...Open Data as Enabler of Public Service Co-creation:Exploring the Drivers and...
Open Data as Enabler of Public Service Co-creation: Exploring the Drivers and...
 
DatalEt-Ecosystem Provider - The DEEP project
DatalEt-Ecosystem Provider - The DEEP projectDatalEt-Ecosystem Provider - The DEEP project
DatalEt-Ecosystem Provider - The DEEP project
 
Towards Open Justice: ICT acceptance in the Greek justice system
Towards Open Justice: ICT acceptance in the Greek justice systemTowards Open Justice: ICT acceptance in the Greek justice system
Towards Open Justice: ICT acceptance in the Greek justice system
 
[X]CHANGING PERSPECTIVES
[X]CHANGING PERSPECTIVES[X]CHANGING PERSPECTIVES
[X]CHANGING PERSPECTIVES
 
Using fuzzy cognitive maps as decision support tool for smart cities goraczek
Using fuzzy cognitive maps as decision support tool for smart cities  goraczekUsing fuzzy cognitive maps as decision support tool for smart cities  goraczek
Using fuzzy cognitive maps as decision support tool for smart cities goraczek
 
Understanding of smartphone divide dal yong
Understanding of smartphone divide  dal yongUnderstanding of smartphone divide  dal yong
Understanding of smartphone divide dal yong
 
The motivations behind open access publishing judith schossboeck
The motivations behind open access publishing  judith schossboeckThe motivations behind open access publishing  judith schossboeck
The motivations behind open access publishing judith schossboeck
 
Social media as hobed of racism and hate speech kobayashi, kaigo, kwak
Social media as hobed of racism and hate speech kobayashi, kaigo, kwakSocial media as hobed of racism and hate speech kobayashi, kaigo, kwak
Social media as hobed of racism and hate speech kobayashi, kaigo, kwak
 
Social media and citizen engagement in asia skoric
Social media and citizen engagement in asia  skoricSocial media and citizen engagement in asia  skoric
Social media and citizen engagement in asia skoric
 
Realizin modeling and evaluation city's enerfy efficiency leonidas anthopoulos
Realizin modeling and evaluation city's enerfy efficiency leonidas anthopoulosRealizin modeling and evaluation city's enerfy efficiency leonidas anthopoulos
Realizin modeling and evaluation city's enerfy efficiency leonidas anthopoulos
 
Post 2015 paris c limate conference politics on the internet manuela hartwig
Post 2015 paris c limate conference politics on the internet  manuela hartwigPost 2015 paris c limate conference politics on the internet  manuela hartwig
Post 2015 paris c limate conference politics on the internet manuela hartwig
 
Open government and national sovereignty ivo babaja
Open government and national sovereignty  ivo babajaOpen government and national sovereignty  ivo babaja
Open government and national sovereignty ivo babaja
 
Health r isk communication in the digital era myojung chung
Health r isk communication in the digital era myojung chungHealth r isk communication in the digital era myojung chung
Health r isk communication in the digital era myojung chung
 
An analysis of japanese local government facebook profiles muneo kaigo
An analysis of japanese local government facebook profiles muneo kaigoAn analysis of japanese local government facebook profiles muneo kaigo
An analysis of japanese local government facebook profiles muneo kaigo
 
GovCamp 2016 - Co-Creation
GovCamp 2016 - Co-CreationGovCamp 2016 - Co-Creation
GovCamp 2016 - Co-Creation
 
Datenschutzbeauftragte werden in Zukunft eine wichtige Rolle im Unternehmen s...
Datenschutzbeauftragte werden in Zukunft eine wichtige Rolle im Unternehmen s...Datenschutzbeauftragte werden in Zukunft eine wichtige Rolle im Unternehmen s...
Datenschutzbeauftragte werden in Zukunft eine wichtige Rolle im Unternehmen s...
 

Recently uploaded

04062024_First India Newspaper Jaipur.pdf
04062024_First India Newspaper Jaipur.pdf04062024_First India Newspaper Jaipur.pdf
04062024_First India Newspaper Jaipur.pdf
FIRST INDIA
 
03062024_First India Newspaper Jaipur.pdf
03062024_First India Newspaper Jaipur.pdf03062024_First India Newspaper Jaipur.pdf
03062024_First India Newspaper Jaipur.pdf
FIRST INDIA
 
Hogan Comes Home: an MIA WWII crewman is returned
Hogan Comes Home: an MIA WWII crewman is returnedHogan Comes Home: an MIA WWII crewman is returned
Hogan Comes Home: an MIA WWII crewman is returned
rbakerj2
 
Letter-from-ECI-to-MeiTY-21st-march-2024.pdf
Letter-from-ECI-to-MeiTY-21st-march-2024.pdfLetter-from-ECI-to-MeiTY-21st-march-2024.pdf
Letter-from-ECI-to-MeiTY-21st-march-2024.pdf
bhavenpr
 
EED - The Container Port PERFORMANCE INDEX 2023
EED - The Container Port PERFORMANCE INDEX 2023EED - The Container Port PERFORMANCE INDEX 2023
EED - The Container Port PERFORMANCE INDEX 2023
El Estrecho Digital
 
Hindustan Insider 2nd edition release now
Hindustan Insider 2nd edition release nowHindustan Insider 2nd edition release now
Hindustan Insider 2nd edition release now
hindustaninsider22
 
Codes n Conventionss copy (1).paaaaaaptx
Codes n Conventionss copy (1).paaaaaaptxCodes n Conventionss copy (1).paaaaaaptx
Codes n Conventionss copy (1).paaaaaaptx
ZackSpencer3
 
01062024_First India Newspaper Jaipur.pdf
01062024_First India Newspaper Jaipur.pdf01062024_First India Newspaper Jaipur.pdf
01062024_First India Newspaper Jaipur.pdf
FIRST INDIA
 
Resolutions-Key-Interventions-28-May-2024.pdf
Resolutions-Key-Interventions-28-May-2024.pdfResolutions-Key-Interventions-28-May-2024.pdf
Resolutions-Key-Interventions-28-May-2024.pdf
bhavenpr
 
What Ukraine Has Lost During Russia’s Invasion
What Ukraine Has Lost During Russia’s InvasionWhat Ukraine Has Lost During Russia’s Invasion
What Ukraine Has Lost During Russia’s Invasion
LUMINATIVE MEDIA/PROJECT COUNSEL MEDIA GROUP
 
2015pmkemenhub163.pdf 2015pmkemenhub163.pdf
2015pmkemenhub163.pdf 2015pmkemenhub163.pdf2015pmkemenhub163.pdf 2015pmkemenhub163.pdf
2015pmkemenhub163.pdf 2015pmkemenhub163.pdf
CIkumparan
 
Gabriel Whitley's Motion Summary Judgment
Gabriel Whitley's Motion Summary JudgmentGabriel Whitley's Motion Summary Judgment
Gabriel Whitley's Motion Summary Judgment
Abdul-Hakim Shabazz
 
Preview of Court Document for Iseyin community
Preview of Court Document for Iseyin communityPreview of Court Document for Iseyin community
Preview of Court Document for Iseyin community
contact193699
 

Recently uploaded (13)

04062024_First India Newspaper Jaipur.pdf
04062024_First India Newspaper Jaipur.pdf04062024_First India Newspaper Jaipur.pdf
04062024_First India Newspaper Jaipur.pdf
 
03062024_First India Newspaper Jaipur.pdf
03062024_First India Newspaper Jaipur.pdf03062024_First India Newspaper Jaipur.pdf
03062024_First India Newspaper Jaipur.pdf
 
Hogan Comes Home: an MIA WWII crewman is returned
Hogan Comes Home: an MIA WWII crewman is returnedHogan Comes Home: an MIA WWII crewman is returned
Hogan Comes Home: an MIA WWII crewman is returned
 
Letter-from-ECI-to-MeiTY-21st-march-2024.pdf
Letter-from-ECI-to-MeiTY-21st-march-2024.pdfLetter-from-ECI-to-MeiTY-21st-march-2024.pdf
Letter-from-ECI-to-MeiTY-21st-march-2024.pdf
 
EED - The Container Port PERFORMANCE INDEX 2023
EED - The Container Port PERFORMANCE INDEX 2023EED - The Container Port PERFORMANCE INDEX 2023
EED - The Container Port PERFORMANCE INDEX 2023
 
Hindustan Insider 2nd edition release now
Hindustan Insider 2nd edition release nowHindustan Insider 2nd edition release now
Hindustan Insider 2nd edition release now
 
Codes n Conventionss copy (1).paaaaaaptx
Codes n Conventionss copy (1).paaaaaaptxCodes n Conventionss copy (1).paaaaaaptx
Codes n Conventionss copy (1).paaaaaaptx
 
01062024_First India Newspaper Jaipur.pdf
01062024_First India Newspaper Jaipur.pdf01062024_First India Newspaper Jaipur.pdf
01062024_First India Newspaper Jaipur.pdf
 
Resolutions-Key-Interventions-28-May-2024.pdf
Resolutions-Key-Interventions-28-May-2024.pdfResolutions-Key-Interventions-28-May-2024.pdf
Resolutions-Key-Interventions-28-May-2024.pdf
 
What Ukraine Has Lost During Russia’s Invasion
What Ukraine Has Lost During Russia’s InvasionWhat Ukraine Has Lost During Russia’s Invasion
What Ukraine Has Lost During Russia’s Invasion
 
2015pmkemenhub163.pdf 2015pmkemenhub163.pdf
2015pmkemenhub163.pdf 2015pmkemenhub163.pdf2015pmkemenhub163.pdf 2015pmkemenhub163.pdf
2015pmkemenhub163.pdf 2015pmkemenhub163.pdf
 
Gabriel Whitley's Motion Summary Judgment
Gabriel Whitley's Motion Summary JudgmentGabriel Whitley's Motion Summary Judgment
Gabriel Whitley's Motion Summary Judgment
 
Preview of Court Document for Iseyin community
Preview of Court Document for Iseyin communityPreview of Court Document for Iseyin community
Preview of Court Document for Iseyin community
 

Cedem futia-2014

  • 1. G. Futia F. Cairo F. Morando L. Leschiutta Exploiting Linked Open Data and Natural Language Processing for Classification of Political Speech Krems, 22nd May 2014
  • 2. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 2 Introduction ● Our goal: ● assist anyone interested in automatic categorization of political speeches, to identify unambiguously the main political trends addressed by the White House ● What we have to achieve our goal: ● TellMeFirst (http://tellmefirst.polito.it/), a topic extraction tool: – it leverages DBpedia knowledge base and English Wikipedia linguistic corpus – it exploits Linked Open Data (LOD) and Natural Language Processing (NLP) techniques
  • 3. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 3 DBpedia ● A crowd-sourced community effort to extract structured information from Wikipedia and a central interlinking hub for the Linking Open Data project. ● It is a suitable knowledge base for text classification (Mendes et al., 2012; Hellmann et al., 2013; Steinmetz et al., 2013)
  • 4. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 4 Why DBpedia for US political speeches? Comparison between the coverage of US politics and the coverage of politics of other countries The coverage of politics in Wikipedia is “often very good for recent or prominent topics but is lacking on older or more obscure topics” (Brown, 2011).
  • 5. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 5 Text Categorization Approach ● An instance-based approch:TellMeFirst assigns target documents to classes based on a local comparison between a set of pre-classified documents and the target document itself ● This training set consists of all the Wikipedia paragraphs where a wikilink occurs.These paragraphs are stored in a Lucene index, where each document represents a DBpedia resource
  • 6. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 6 Success rate (%) of the TellMeFirst classification process on US Presidents profiles 1st topic Within the first 2 topics Within the first 7 topics Full text of the Presidents profiles 95.4% 100% 100% President profiles without name and surname 45.4% 61.3% 90.9% TellMeFirst provides as output the seven most relevant topics (in the form of DBpedia URI) of the document sorted by relevance
  • 7. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 7 whitehouse.gov ● 3173 videos in English were available on the White House website on the 24th of November 2013 ● These videos are categorized according to a taxonomy not related to the subject of the speeches ● They need a semantic layer that point out the content of the speeches, so that questions such as “what is the First Lady talking about?” could be automatically answered
  • 8. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 8 Not just a bag-of-words tool Results obtained with TellMeFirst (on the left) and withTagCrowd (on the right) «President Obama Speaks on the Affordable Care Act» http://1.usa.gov/1jR4Ky2
  • 9. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 9 Results (i) Occ. % overall % 2013 % 2012 % 2011 % 2010 % 2009 Barack Obama 607 4.88% 5.68% 4.52% 5.51% 4.45% 3.88% Patient Protection and Affordable Care Act 286 2.30% 3.06% 1.35% 1.91% 2.47% 2.71% American Recovery and Reinvestment Act of 2009 278 2.23% 1.09% 1.82% 2.88% 2.84% 1.88% Social Security 272 2.19% 2.58% 1.77% 3.54% 1.61% 0.78% Amount and percentage of topic occurrences extracted with TellMeFirst
  • 10. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 10 Results (ii) ● “New Deal” (141 occurrences), probably used as a metaphor within the political speeches of President Obama ● “Libya” has a value corresponding to 1.00% in 2011.This result can be related to the full-scale revolt beginning on 17 February 2011 in Libya ● “Deepwater Horizon oil spill” reaches the 1.05% in 2010.This result is related to the marine oil spill which took place in the Gulf of Mexico that began on 20 april 2010
  • 11. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 11 Correlation among topics
  • 12. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 12 A focus on the First Lady (i) ● According to Michelle Obama’s page on the White House website, the First Lady “looks forward to continuing her work on the issues close to her heart”: ● supporting military families ● helping working women balance career and family encouraging national service ● promoting the arts and arts education ● fostering healthy eating and healthy living for children and families across the country
  • 13. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 13 A focus on the First Lady (ii) ● We tested whether TellMeFirst confirms or not these impressions and claims, manually selecting nine Wikipedia categories which seemed to be related to these issues ● We then interrogated the SPARQL end-point of DBpedia with a query to collect all the topics of these categories ● We then associated each topic to one or more of the nine high-level categories: these categories encompassed almost 75% of the topics
  • 14. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 14 A focus on the First Lady (iii) Wikipedia Category First Lady sp. 9 categories All speeches 9 categories Government of the United States 26.68% 32.68% Education 21.64% 5.40% Nutrition 19.96% 1.61% Social issues 14.71% 28.38% Barack Obama 13.66% 14.00% Health care 11.34% 7.57% Arts 8.61% 1.11% Military personnel 3.99% 3.16% Gender equality 2.73% 0.84% Others (unclassified topics) 25.63% 38.34%
  • 15. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 15 Conclusions (i) ● The ability for citizens to easily retrieve the content of political speeches and decisions is a crucial factor in e-participation ● Not guaranteed by a traditional keywords search, as in most of the public administration websites (the White House website included) ● Example: in a keyword-based system, by typing the word "education", for instance, users get as result only videos that have the word education in their title ● All terms that belong to the semantic area of education are omitted
  • 16. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 16 Conclusions (ii) ● When documents are semantically classified through DBpedia URIs all synonyms, hypernyms and hyponyms of lemmas are traced to the same concept making user search more effective ● Leveraging Wikipedia categories would allow to go even a step further, taking advantage of the links between concepts as designed by the Wikipedia community
  • 17. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 17 Next steps ● Building a content search/navigation layer around the scraping/classification module ● Integration with other Linked Open Data repositories on the Web, combining the extracted topics with other information (President Obama's federal budget proposal?)
  • 18. Thank you! Giuseppe Futia (giuseppe.futia@polito.it) This paper was drafted in the context of the Network of Excellence in Internet Science EINS (GA n°288021), and, in particular, in relation with the activities concerning Evidence and Experimentation (JRA3).
  • 19. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 19 Appendix - The algorithm ● The classifier needs to hold in memory all the instances of the training set and calculate, during classification stage, the vector distance between training documents and target documents. ● Specifically, the algorithm used by TMF is k-Nearest Neighbor (kNN), a type of memory-based approach which selects the categories for a target document on the basis of the k most similar documents within the vector space.
  • 20. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 20 Appendix - Scoring formula ● In a Lucene query, both the target document and the training set become weighed terms vectors, where terms are weighted by means of the TF-IDF algorithm.The query returns a list of documents in the form of DBpedia URIs, ordered by similarity score. Scoring formula is:
  • 21. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 21 Appendix - Basic concepts ● Natural Language Processing - A field of computer science, concerned with the interactions between computers and human (natural) languages. ● Linked Data - A recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF. ● DBpedia - A crowd-sourced community effort to extract structured information from Wikipedia and a central interlinking hub for the Linking Open Data project.