Adaptive Semantic Annotation of
Entity and Concept Mentions in Text
Pablo N. Mendes
PhD dissertation defense
Ohio Center o...
Introductions and Thank you!
Outline
●

Introduction, Motivation, Background
–

KB Tagging, Annotation as a Service

●

Conceptual Model

●

Knowledge ...
Outline
●

Introduction, Motivation, Background
–

KBT: Knowledge Base Tagging of Text

–

AaaS: Annotation as a Service

...
KBT, informally
●

Knowledge Base Tagging (KBT)

●

A developer needs to
–

“extract entities”,

–

“identify what is ment...
LOCATION

DATE

Named Entity Recognition (NER) TIME

LOCATION

On Thursday, April 11, 1996, a fire in an occupied passenge...
Related Work
Semantic

Voquette SCORE
Semagix Freedom

My work

AIDA / Yago
Illinois Wikifier
TagMe

NER
ATR

Wikification...
Related Work (commercial)
Adaptability
●

–
–
–

News
●

Scientific literature

Each developer may have a different application in mind

●

differen...
Requirements
●

Transparent process
–

●

Clear understanding of where things are working or
failing

Adaptable process
–
...
Outline
●

Introduction, Motivation, Background

●

Conceptual Model

●

Knowledge Base: DBpedia

●

System: DBpedia Spotl...
A Conceptual Model of KBT
User (Creator)
LOCATION

DATE

LOCATION

On Thursday, April 11, 1996, a fire in an occupied pass...
KBT and Related Tasks
(0.87)
LOCATION
On Thursday, April 11, 1996, a fire in an occupied passenger terminal at the airport...
Novelty in the model
●

Users and objective are explicit in the model
–

Knowledge about content creators provides context...
Outline
●

Introduction, Motivation, Background

●

Conceptual Model

●

Knowledge Base: DBpedia

●

System: DBpedia Spotl...
Wikipedia Extraction
Knowledge Base
●

DBpedia is a cross-domain KB extracted from
Wikipedia [Auer et al. 2007, Bizer et al. 2009]
–
–

●

●

D...
DBpedia Extraction Framework

Added new extractors to support KBT:
- Thematic Concepts
- Topical signatures
- Distribution...
Outline
●

Introduction, Motivation, Background

●

Conceptual Model

●

Knowledge Base: DBpedia

●

System: DBpedia Spotl...
System: default workflow
●

Phrase Recognition:
–

●

Candidate Selection:
–

●

detecting possible senses for a surface f...
(…) Upon their return, Lennon and
McCartney went to New York to announce
the formation of Apple Corps.

Contextual
Related...
22

A quick example

22
23

Show Top-K Candidates
LSU_Tigers
Louisiana
State
University

23
Virtuous Cycle
Through Sztakipedia toolbar,
- DBpedia Spotlight suggests links
- to Wikipedia Editors
- catalyzes evolutio...
Contextual relatedness score: TF*ICF
[Mendes et al. @ ISEM2011]

TF*IDF (Term Freq. * Inverse Doc. Freq.)
TF: relevance of...
Outline
●

Introduction, Motivation, Background

●

Conceptual Model

●

Knowledge Base

●

System: DBpedia Spotlight

●

...
Core Evaluations
(…) Upon their return, Lennon and
McCartney went to New York to announce
the formation of Apple Corps.

C...
Phrase Recognition Results
Policies

S = { s | p(s) > cutoff_S }

(L) Lexicon-based
(LNP*) Lexicon-based with at least one...
Context-Independent Strategies
●

NAÏVE
–

●

Use surface form to build URI: “berlin” → dbpedia:Berlin

PROMINENCE
–

P(u)...
Disambiguation
●

●
●
●

Preliminary results:

With 155,000 randomly selected wikilink samples
Balance of common and less ...
Disambiguation+NIL
●

Named Entities Only

TACKBP2010
DefaultSense: 79.91%
Random: 62.00%
Unambiguous: 30.36%

NIL accurac...
Disambiguation Difficulty
●

Geopolitical entities KB: 830K entities

●

311 blog posts, 790 annotations

32
Geolocation Disamb. Eval. results

- Validates our measure of “difficulty” (performance degrades)
- Shows that our system ...
Dominance Analysis

34
Dominance Analysis

35
Tagging
•

•

Decide which spots to annotate with links to the
disambiguated resources
Different use cases have different ...
Tagging in DBpedia Spotlight
•

Tagging needs are application/user-specific

•

Can be configured based on:
–

Thresholds
...
Tagging Evaluation (News)
●

Preliminary results

Able to approximate
best precision and
best recall
Varying parameters al...
Tagging (Take home)
●

●
●

●

Combines features from spotting, candidate
selection and disambiguation
More informed to ma...
Outline
●

Introduction, Motivation, Background

●

Conceptual Model

●

Knowledge Base

●

System: DBpedia Spotlight

●

...
Case Study: Audio Tagging
http://www.bbc.co.uk/programme
s

41
Example: Audio Transcript
●

BBC Audio Archive tag suggestion

May 1945

German capital

whirlpool or not the b. b. c. wit...
Scenario: Audio Transcript Tagging
audio
Audio Creator

Editor
No punctuation or capitalization
High token transcription e...
Tagging Audio Transcripts
●

Traditional NER features are missing
–

●

Lexicon-based lookup is also difficult
–

●

Sente...
Case Study: Tweet NER
●

NER challenges
–

informal text, faulty grammar,
misspellings, short text,
irregular capitalizati...
Tweet NER Results
●

●

KBT tags added as features to a Linear chain
CRF tagger
NER improves with distant supervision from...
Educational Material
●

Emergency Management Training

“tags that summarize what happened”
“configuration parameters allow...
48

Case Study: Smart Filtering
Microposts mentioning competitors
Some User @someuser

4 Nov

At home I have an IPad and m...
Case Study: Website Tagging
Evaluation: retrieving similar sites

Consumer

KBT System

KB
Website
Similarity
49

Objectiv...
Outline
●

Introduction, Motivation, Background

●

Conceptual Model

●

Knowledge Base

●

System: DBpedia Spotlight

●

...
Conclusion
●

Model enables cross-task evaluations
–

●

KE, NER, etc. can be reused for KBT but
individually often do not...
Limitations
●

What the proposed model is not:
–

A silver bullet for all problems

–

A substitute for machine learning o...
Extensions to DBpedia
●
●

●

●

We extended DBpedia to enable KBT
Created new extractors for necessary data /
statistics
...


Demo:

DBpedia Spotlight

– http://spotlight.dbpedia.org/demo/


Web Service:

- http://spotlight.dbpedia.org/rest/{co...
My Ph.D. in retrospect
This dissertation

Evolution

WebSci'10

Knowledge base tagging

WWW'12a

KCAP'11

LREC'12a ISEM'13...
More thanks!

… and other mentors and collaborators
(too many great people for one slide!)
References
Other publications
●

Bioinformatics IE & Querying
–
–

2 Nucleic Acid Research Journal

–
●

1 Bioinformatics Journal
1 I...
Impact of my research
●

scholar.google.com: 480+ citations, h-index=12

●

Best paper award at I-Semantics 2011
–
–

4+2 ...
Leadership and Community Involvement
●

Co-organizer of Web of Linked Entities workshop series
–

●

●

ISWC2012 and WWW20...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Mentions in Text
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Mentions in Text
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Mentions in Text
Upcoming SlideShare
Loading in...5
×

Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Mentions in Text

1,929
-1

Published on

PhD defense held at Kno.e.sis Center, Wright State University, December 03, 2013.

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,929
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Mentions in Text

  1. 1. Adaptive Semantic Annotation of Entity and Concept Mentions in Text Pablo N. Mendes PhD dissertation defense Ohio Center of Excellence in Knowledge-enabled Computing (kno.e.sis) Wright State University Dayton, OH
  2. 2. Introductions and Thank you!
  3. 3. Outline ● Introduction, Motivation, Background – KB Tagging, Annotation as a Service ● Conceptual Model ● Knowledge Base: DBpedia ● System: DBpedia Spotlight ● Core Evaluations ● Case Studies – tweets, audio transcripts, educational material
  4. 4. Outline ● Introduction, Motivation, Background – KBT: Knowledge Base Tagging of Text – AaaS: Annotation as a Service – Adaptability ● Conceptual Model ● Knowledge Base: DBpedia ● System: DBpedia Spotlight ● Core Evaluations ● Case Studies
  5. 5. KBT, informally ● Knowledge Base Tagging (KBT) ● A developer needs to – “extract entities”, – “identify what is mentioned”, – “connect to knowledge bases”. ● He/she is not an NLP or IE expert ● Would like to reuse as much as possible ● May have limited computational resources → Annotation as a Service (AaaS) 5
  6. 6. LOCATION DATE Named Entity Recognition (NER) TIME LOCATION On Thursday, April 11, 1996, a fire in an occupied passenger terminal at the airport in Düsseldorf, Germany, killed 17 people and injured 62. The fire began at approximately 3:31 p.m., about the time someone reported seeing sparks falling from the ceiling in the vicinity of a flower shop at the east end of the arrivals hall on the first floor. Keyphrase Extraction (KE) On Thursday, April 11, 1996, a fire in an occupied passenger terminal at the airport in Düsseldorf, Germany, killed 17 people and injured 62. The fire began at approximately 3:31 p.m., about the time someone reported seeing sparks falling from the ceiling in the vicinity of a flower shop at the east end of the arrivals hall on the first floor. fire airport Düsseldorf, Germany Automatic Term Recognition (ATR) On Thursday, April 11, 1996, a fire in an occupied passenger terminal at the airport in Düsseldorf, Germany, killed 17 people and injured 62. The fire began at approximately 3:31 p.m., about the time someone reported seeing sparks falling from the ceiling in the vicinity of a flower shop at the east end of the arrivals hall on the first floor. fire passenger terminal sparks arrivals hall ceiling Wikification (WKF) On Thursday, April 11, 1996, a fire in an occupied passenger terminal at the airport in Düsseldorf, Germany, killed 17 people and injured 62. The fire began at approximately 3:31 p.m., about the time someone reported seeing sparks falling from the ceiling in the vicinity of a flower shop at the east end of the arrivals hall on the first floor. Entity Linking (EL) Düsseldorf LOCATION ID:4213421 6
  7. 7. Related Work Semantic Voquette SCORE Semagix Freedom My work AIDA / Yago Illinois Wikifier TagMe NER ATR Wikification KE SemTag Syntactic Domain-specific Web content Auto-extracted facts Community generated Cross-domain Multilingual 7
  8. 8. Related Work (commercial)
  9. 9. Adaptability ● – – – News ● Scientific literature Each developer may have a different application in mind ● different input and output “get key topics for summarization?” “exhaustive tagging for semantic search?” There is no one-size-fits all. But can we support adaptation to different “fits”? Tweets Audio transcripts Query keywords New terms Important phrases Named Entities Concepts related to an objective
  10. 10. Requirements ● Transparent process – ● Clear understanding of where things are working or failing Adaptable process – – ● Ability to exchange individual components in order to achieve different goals Ability to modify the behavior of existing components Adaptable to different inputs 10
  11. 11. Outline ● Introduction, Motivation, Background ● Conceptual Model ● Knowledge Base: DBpedia ● System: DBpedia Spotlight ● Core Evaluations ● Case Studies ● Conclusion
  12. 12. A Conceptual Model of KBT User (Creator) LOCATION DATE LOCATION On Thursday, April 11, 1996, a fire in an occupied passenger terminal at the airport in Düsseldorf, Germany, killed 17 people and injured 62. The fire began at approximately 3:31 p.m., about the time someone reported seeing sparks falling from the ceiling in the vicinity of a flower shop at the east end of the arrivals hall on the first floor. System KB Phrase Recognition Candidate Selection feedback Editor Disambiguation Tagging Objective Annotations User (Consumer) 12
  13. 13. KBT and Related Tasks (0.87) LOCATION On Thursday, April 11, 1996, a fire in an occupied passenger terminal at the airport in Düsseldorf, Germany, killed 17 people and injured 62. The fire began at approximately 3:31 p.m., about the time someone reported seeing sparks falling from the ceiling in the vicinity of a flower shop at the east end of the arrivals hall on the first floor. LOCATION DATE Spark_(fire) Extraction Task Outcome ATR KE NER WSD WKF KBT x EL x Recognize known terms Recognize new terms x x x (NIL) x x Classify ontological type x / x Resolve ambiguity Measure importance/relevance Tag each occurrence x x (to domain) x x x x (to text) x x x x x 13
  14. 14. Novelty in the model ● Users and objective are explicit in the model – Knowledge about content creators provides context for new types of KBT – Knowledge about consumer and objective for customizing output – Using feedback to learn from mistakes
  15. 15. Outline ● Introduction, Motivation, Background ● Conceptual Model ● Knowledge Base: DBpedia ● System: DBpedia Spotlight ● Core Evaluations ● Case Studies
  16. 16. Wikipedia Extraction
  17. 17. Knowledge Base ● DBpedia is a cross-domain KB extracted from Wikipedia [Auer et al. 2007, Bizer et al. 2009] – – ● ● Describes 3.7M things through 400M facts Use an ontology of 320 classes and 1,650 [Lehmann et al. 2013] properties DBpedia Live keeps DBpedia up-to-date with Wikipedia changes [Hellmann et al., 2009][Morsey et al., 2012] A whole ecosystem with an active community 17
  18. 18. DBpedia Extraction Framework Added new extractors to support KBT: - Thematic Concepts - Topical signatures - Distributional Semantic Model statistics for semantic relatedness [with Lehmann et al. @ SWJ 2013] 18
  19. 19. Outline ● Introduction, Motivation, Background ● Conceptual Model ● Knowledge Base: DBpedia ● System: DBpedia Spotlight ● Core Evaluations ● Case Studies ● Conclusion
  20. 20. System: default workflow ● Phrase Recognition: – ● Candidate Selection: – ● detecting possible senses for a surface form Disambiguation: – ● mention recognition (e.g. NER) choosing (ranking/classifying) one sense for a mention Tagging: – deciding if should annotate: to account for entities not in the KB, or uninformative annotations. 20
  21. 21. (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. Contextual Relatedness 0.10 0.34 0.22 0.23 0.67 0.45 0.56 0.01 0.33 0.07 Phrase Recognition New York (magazine) Candidate New York Selection Manhattan Province of New York New York City New York, New York (film) New York metropolitan area West New York, New Jersey Roman Catholic Archdiocese of New York Pennsylvania Station (New York City) New York City Disambiguation “New York” type: city, pos: 78 relevance: 0.67, ... Tagging 21
  22. 22. 22 A quick example 22
  23. 23. 23 Show Top-K Candidates LSU_Tigers Louisiana State University 23
  24. 24. Virtuous Cycle Through Sztakipedia toolbar, - DBpedia Spotlight suggests links - to Wikipedia Editors - catalyzes evolution of the knowledge source. /feedback service - allows users to submit judgements - enables system evolution with feedback - also on blogs, etc. with RDFaCE [Khalili] [with Héder @ WWW'2012] 24
  25. 25. Contextual relatedness score: TF*ICF [Mendes et al. @ ISEM2011] TF*IDF (Term Freq. * Inverse Doc. Freq.) TF: relevance of a word in the context of a DBpedia Resource IDF: words that are too common are less useful ICF: Inverse Candidate Frequency Entropy-inspired ICF is the rarity of a word with relation to the possible senses Washington, DC George Washington W={“president”,”USA”,...} Washington State “Washington” W={“capital”,”USA”,...} W={“Seattle”,”USA”,...} ICF(“Washington”,”USA”) < ICF(“Washington”,”Seattle”) 25
  26. 26. Outline ● Introduction, Motivation, Background ● Conceptual Model ● Knowledge Base ● System: DBpedia Spotlight ● Core Evaluations ● Case Studies ● Conclusion
  27. 27. Core Evaluations (…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps. Contextual Relatedness 0.10 0.34 0.22 0.23 0.67 0.45 0.56 0.01 0.33 0.07 Phrase Recognition New York (magazine) Candidate New York Selection Manhattan Province of New York New York City New York, New York (film) New York metropolitan area West New York, New Jersey Roman Catholic Archdiocese of New York Pennsylvania Station (New York City) New York City “New York” type: city, pos: 78 relevance: 0.67, ... Disambiguation Tagging 27
  28. 28. Phrase Recognition Results Policies S = { s | p(s) > cutoff_S } (L) Lexicon-based (LNP*) Lexicon-based with at least one noun (NPL) Noun Phrases, lexicon-lookup (bloom filter) (CW) Lexicon-based removing common words (Kea) Keyphrases (NER) Named Entities Only (NER U NP) N-Grams within Noun Phrases and NEs Take home Different spotting strategies with CSAW dataset - It is not only about importance / relevance - Precision is not important: taken care in steps downstream - Recall is key: a missing phrase at this stage is an overall fail - Simple methods work quite well At LREC'2012. - Combinations of techniques improve results 28
  29. 29. Context-Independent Strategies ● NAÏVE – ● Use surface form to build URI: “berlin” → dbpedia:Berlin PROMINENCE – P(u): n(u) / N (what is the ‘popularity’/importance of this URL) ● ● – ● n(u): number of times URI u occurred N: total number of occurrences Intuition: URIs that have appeared a lot are more likely to appear again DEFAULT SENSE – P(u|s): n(u,s) / n(s) ● – n(u,s): number of times URI u occurred with surface form s Intuition: some surface forms are strongly associated to some specific URIs 29
  30. 30. Disambiguation ● ● ● ● Preliminary results: With 155,000 randomly selected wikilink samples Balance of common and less prominent concepts(default sense: 55.12%) Highly ambiguous (random: 17.77%) At I-Semantics 2011. 30
  31. 31. Disambiguation+NIL ● Named Entities Only TACKBP2010 DefaultSense: 79.91% Random: 62.00% Unambiguous: 30.36% NIL accuracy = 79.27 % Non-NIL accuracy = 87.88 % Overall accuracy = 82.71 % At TAC KBP 2011. 31
  32. 32. Disambiguation Difficulty ● Geopolitical entities KB: 830K entities ● 311 blog posts, 790 annotations 32
  33. 33. Geolocation Disamb. Eval. results - Validates our measure of “difficulty” (performance degrades) - Shows that our system is more robust for disambiguating low dominance entities 33
  34. 34. Dominance Analysis 34
  35. 35. Dominance Analysis 35
  36. 36. Tagging • • Decide which spots to annotate with links to the disambiguated resources Different use cases have different needs – Only annotate prominent resources? – Only if you’re sure disambiguation is correct? – Only people? – Only things related to Berlin? 36
  37. 37. Tagging in DBpedia Spotlight • Tagging needs are application/user-specific • Can be configured based on: – Thresholds • • – Confidence Prominence (support) Whitelist or Blacklist of types • – Hide all people, Show only organizations Complex definition of a “type” through a SPARQL query. 37
  38. 38. Tagging Evaluation (News) ● Preliminary results Able to approximate best precision and best recall Varying parameters allows to cover a wide range of the P/R trade-off 38
  39. 39. Tagging (Take home) ● ● ● ● Combines features from spotting, candidate selection and disambiguation More informed to make decisions Can avoid/fix some mistakes from previous steps Offers a chance to adapt to users' needs 39
  40. 40. Outline ● Introduction, Motivation, Background ● Conceptual Model ● Knowledge Base ● System: DBpedia Spotlight ● Core Evaluations ● Case Studies ● Conclusion
  41. 41. Case Study: Audio Tagging http://www.bbc.co.uk/programme s 41
  42. 42. Example: Audio Transcript ● BBC Audio Archive tag suggestion May 1945 German capital whirlpool or not the b. b. c. witnessed when the jam and capital but then fell to the advancing bad timing in maine nineteen forty five the civilians living there feared to sing and violence steve athens hughes from one woman custom finds that tying it sit beginning of may nineteen forty five but then is being squeezed between the british americans from the west in the russian army from the east but sides fighting for every inch of land and forgets to this city is being pulverized ... ● Tags: Berlin, World War II, Russian Army, etc. Raimond & Lowis, LDOW2012. 42
  43. 43. Scenario: Audio Transcript Tagging audio Audio Creator Editor No punctuation or capitalization High token transcription error rates transcript Textual Content Creator (System) KB System Adapted Workflow Phrase Recognition editorial tags automated tags 2. Mention detection (dictionary-based) Candidate Selection Disambiguation Tagging 1. Contextual relatedness 3. Entity type preference-based reranking 43
  44. 44. Tagging Audio Transcripts ● Traditional NER features are missing – ● Lexicon-based lookup is also difficult – ● Sentence boundaries, POS tags, 50% token error, etc. “big date” → big data Our approach: – On-the-fly adaptation – Skip spotting, focus on named entities – Preliminary results: – TopN = 0.19 – 0.21 44
  45. 45. Case Study: Tweet NER ● NER challenges – informal text, faulty grammar, misspellings, short text, irregular capitalization, etc. – tweet Segmentation harder than classification Creator KBT System KB ● Phrase Recognition Candidate Selection Disambiguation Our approach: – distant supervision from DBpedia – DBpedia Spotlight tagging used as features Tagging Retrained CRF recognizer tags as features Entity mentions 45
  46. 46. Tweet NER Results ● ● KBT tags added as features to a Linear chain CRF tagger NER improves with distant supervision from KBT 46
  47. 47. Educational Material ● Emergency Management Training “tags that summarize what happened” “configuration parameters allowed removing tags that were 'too general'” 47
  48. 48. 48 Case Study: Smart Filtering Microposts mentioning competitors Some User @someuser 4 Nov At home I have an IPad and my bro has a Microsoft Surface. Another User @anotheruser The Asus Transformer Infinity is actually quite nifty. How to look for competitors? 5 Nov Annotations https://twitter.com/someuser/status/123 Knowledge Base mentions IPad category:Wi-Fi https://twitter.com/anotheruser/status/456 Microsoft Surface category:Touchscreen menti ons Asus Transformer Infinity SMART FILTERING SELECT ?tweet mentions ?product belongs ?category [Mendes et al. WI'2010 and Triplification Challenge 2010] belongs IPad 48
  49. 49. Case Study: Website Tagging Evaluation: retrieving similar sites Consumer KBT System KB Website Similarity 49 Objective
  50. 50. Outline ● Introduction, Motivation, Background ● Conceptual Model ● Knowledge Base ● System: DBpedia Spotlight ● Core Evaluations ● Case Studies ● Conclusion
  51. 51. Conclusion ● Model enables cross-task evaluations – ● KE, NER, etc. can be reused for KBT but individually often do not suffice Model enables deeper evaluations (beyond “black box”) – – ● Prescribes modularized evaluation to identify steps that need improvement Introduces and validates a measure of “difficulty to disambiguate” System adapts well to very distinct use cases 51
  52. 52. Limitations ● What the proposed model is not: – A silver bullet for all problems – A substitute for machine learning or expert knowledge or linguistics research 52
  53. 53. Extensions to DBpedia ● ● ● ● We extended DBpedia to enable KBT Created new extractors for necessary data / statistics Multilinguality: community process to maintain international chapters Results: – Data to power the computation of features necessary for adaptive KBT – Prominence, relevance, pertinence, types, etc. – All reusable to other systems that use DBpedia 53
  54. 54.  Demo: DBpedia Spotlight – http://spotlight.dbpedia.org/demo/  Web Service: - http://spotlight.dbpedia.org/rest/{component}  Components are exposed as services: – Phrase Recognition (/spot), – Disambiguation (/disambiguation) – Top K disambiguations (/candidates) – Relatedness (/related) – Annotation (/annotation)  Source code: https://github.com/dbpedia-spotlight/dbpedia-spotlight/  Apache V2 License 54
  55. 55. My Ph.D. in retrospect This dissertation Evolution WebSci'10 Knowledge base tagging WWW'12a KCAP'11 LREC'12a ISEM'13 Cross-domain Entity Recognition and Linking Linked Data WWW'12b ISWC'12 ISEM'11 Sieve EDBT'12 Twarql WI'10 MSM'13 CIKM'12 TAC'11 LREC'12b SWJ'13 EvoDyn'12 ISEM'10 Twitris Real-time Information Exploration / Filtering SFSW@ESWC'10 SWC'10 Knowledge-driven Text Exploration Scooner ACMSE'10 Complex Entity Recognition and Relationship Extraction Knowledge-driven Query Formulation Genome databases TcruziDB NAR'06 TcruziKB ICSC'08 BIBM'10 EKAW'08 Cuebee IESD@HT'13 WI'08 Cuadro Garsa ProtozoaDB NAR'08 Bioinformatics'05 55
  56. 56. More thanks! … and other mentors and collaborators (too many great people for one slide!)
  57. 57. References
  58. 58. Other publications ● Bioinformatics IE & Querying – – 2 Nucleic Acid Research Journal – ● 1 Bioinformatics Journal 1 IEEE ICSC, 1 EKAW, 1 Web Intelligence Linked Data Quality and Fusion – ● 1 LWDM 2012 @ EDBT Book chapters – Semantic Search on the Web, with Bizer et al. – The People’s Web Meets NLP, with OKF OWLG 61
  59. 59. Impact of my research ● scholar.google.com: 480+ citations, h-index=12 ● Best paper award at I-Semantics 2011 – – 4+2 students on Google Summer of Code 2012+2013 – ● 174 citations (according to scholar.google.com) About 6 open sourced third-party clients Awarded first prize on: – – ● Triplification Challenge 2010 Scripting for Semantic Web Challenge 2010 37 publications – 9 conferences, 5 workshops/posters, 3 magazines (bioinfo) – 2 book chapters – 3 workshop proceedings
  60. 60. Leadership and Community Involvement ● Co-organizer of Web of Linked Entities workshop series – ● ● ISWC2012 and WWW2013 Founder of the DBpedia Portuguese initiative, involving volunteers from 5 Brazilian universities Maintainer of 3 open source projects – – ● Twarql: streaming annotated microposts – ● Cuebee: query formulation for RDF DBpedia Spotlight: adaptive semantic annotation PC member in several conferences and workshops: ISWC, ESWC, LREC, LDOW, IJSWIS, LDL'2012, JWS, SWJ, etc. EU projects – leading FUB's participation in PlanetData (FP7 Network of Excellence). – research on LOD2 (FP7 IP). and BIG Public-Private Forum
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×