Aleksandar Kapisoda: The semantic approach for tracking scientific publications

Boehringer Ingelheim Pharma GmbH & Co. KG
Scientific Information Center - Aleksandar Kapisoda
Semantics 2015 - Vienna, Austria
The semantic approach for tracking
scientific publications

Content
1. Intro
2. Goal
3. Overview Data & Technology
4. Workflow / Pipeline
5. Challenges
6. Data Curation
7. Conclusions
8. Outlook
9. Acknowledgement
Semantics 2015
Vienna, Austria
2

Mundaneum
Mapping Knowledge
Semantics 2015
Vienna, Austria
3
http://www.mundaneum.org/en
http://archives.mundaneum.org/en/history

1895
We don't live in that kind of world
1895
Paul Otlet
Semantics 2015
Vienna, Austria
4http://www.mondotheque.be/wiki/index.php/Here

Paul Otlet
The Father of Information Science
He is one of several people who have been considered the
father of information science, a field he called "documentation".
As a young man, THE UTOPIAN started to think about a system that could represent
the multiple networked relations between objects of various formats with various
objectives.
Paul Otlet designed explicitly mapped multiple relations between multi-media
objects (so not just books) and allowed for constant transformation and modification.
THE UTOPIAN was imagining his universal information structure by making 'symbolic
links' from document to document.
Semantics 2015
Vienna, Austria
5

1895 - Universal Decimal Classification system
http://www.mondotheque.be/wiki/index.php/Here
Paul Otlet created 1895 the
Universal Decimal Classification,
(based on the Dewey Decimal
Classification),
one of the most prominent
examples of faceted
classification
Semantics 2015
Vienna, Austria
6

1895 - Universal Decimal Classification system
2015 - Taxonomies, Dictionaries & Ontologies
https://blog.semantic-web.at/wp-content/uploads/2011/02/GICS_PP.jpg
In 2015
we are creating,
editing and using
Taxonomies,
Dictionaries & Ontologies.
Semantics 2015
Vienna, Austria
7

1934 - Radiated Library
Semantics 2015
Vienna, Austria
8
Paul Otlets vision
The Book of the Books
A great network of
knowledge which is centered on
documents, included the notions,
books, journals, radio, television…
In 1934, Paul Otlet laid
out this vision in
what he called “Radiated Library”
vision.

1934 - Radiated Library
2015 – World Wide Web / Internet
Otlet's writings have sometimes been
called prescient of the current
World Wide Web/ Internet
His vision of a great network of
knowledge was centered on documents
and included the notions of hyperlinks,
search engines, remote access,
and social networks—although these
notions were described by different names.
In 1934, Otlet laid out this vision of the
computer and internet in what he called
“Radiated Library” vision.
Semantics 2015
Vienna, Austria
9

1934 - Universal Information Structure
https://s-media-cache-ak0.pinimg.com/736x/7d/71/0f/7d710ffe8ad97234ebc4867546d68a28.jpg
Semantics 2015
Vienna, Austria
10
Paul Otlet
was imagining his
universal information structure
by making 'symbolic links'
from document to document,

1934 - Universal Information Structure
2015 – Semantic Web
Paul Otlet
was imagining his
universal information structure
by making 'symbolic links'
from document to document,
a system that looks surprisingly similar
to what we now might call a
'Semantic Web'.
https://s-media-cache-ak0.pinimg.com/736x/7d/71/0f/7d710ffe8ad97234ebc4867546d68a28.jpg
Semantics 2015
Vienna, Austria
11

Ubiquitous Web/Symbiotic Web
Semantics @ BI - Evolution of Information Management
12
Evolution of WebTechnology
Semantic Web
Semantic Databases, Linked Data
Semantic Search, RDF,Text Mining
2020 - 2030
1990 - 2000
2000 - 2010
2010 - 2020
Year Evolution of Information Management at BI
Scientific Information Center
„Expert Searches“ based onText MiningTechnologies
Data Analysis based on SemanticTechnololgies
Version
Web 4.0
Web 1.0
Web 2.0
Web 3.0
Scientific Information Center
Text Mining, BI-internal Wikis (MediaWiki)
Social Web
Blogs, Wikis
Keyword Search
World Wide Web
Portals, Internet
Databases, File Servers, SQL
Scientific Library
E-Journals
LinkSolver
Interaction between
humans and machines
in symbiosis
Comparison WebTechnology vs. BI internal Information Management

BI – Publication Tracker
Goal
Why BI needs
a Publication Tracking System?
Semantics 2015
Vienna, Austria
13

Goals
Automatically Data import
ContentCuration
State of the Art Visualisation
Storage in a semantic database
Data Analysis possible
Semantics 2015
Vienna, Austria
14
Manually added database
NoContent Curation
Primite Visualisation
Storage
No Data Analysis possible
Scientific Publication Database
(State July 2015)
BI Scientific PublicationTracking
Going live September 2015

Goal – Data Analysis
Number of BI Research Publications in 2015 (Q1, Q2)
Semantics 2015
Vienna, Austria
15
Sample Data
TA: Therapeutic Area

Goal – Data Analysis
Impact Factors 2015 (Q1 + Q2) & Published Article
Semantics 2015
Vienna, Austria
16
Sample Data
TA: Therapeutic Area

Goal - Analysing data
Based on Impact Factor Journal Ranking
Semantics 2015
Vienna, Austria
17
https://sciencetechblog.files.wordpress.com/2011/05/journal-impact-factors-2008_1.jpg

Why BI is using
Semantic Technology
for
Publication Tracking?
Semantics 2015
Vienna, Austria
18

Scientific Publication
How it is looking like?
Semantics 2015
Vienna, Austria
19
http://www.ncbi.nlm.nih.gov/pubmed/26210363

Overview Data & Technology
• Data & Data Storage
• xml. files from OVID http://www.ovid.com
• MS Excel (sheet)
• Virtuoso Universal Server as a Triple Store http://virtuoso.openlinksw.com/
• Systems
• PoolParty (Thesaurus Server) https://www.poolparty.biz/portfolio-item/poolparty-thesaurus-server
• PoolParty Graph Search https://www.poolparty.biz/tag/graph-search
• SPARQL http://www.w3.org/TR/sparql11-query/#docResultDesc
• Spring https://de.wikipedia.org/wiki/Spring_(Framework)
• Maven http://maven.apache.org
Semantics 2015
Vienna, Austria
20

Workflow / Pipeline
auto-alerts from ovid (.xml file)
Alert Profile
(SearchTerms)
Scheduled Alerts
Content Enrichment
Admin User Interface
SIC Crawler
21
Current awareness searches
Data Curation
Thesaurus Management System
“reflects the average
number of citations to
recent articles published
in a journal”
Impact Factor List
Virtuoso Database
BI PublicationTracker
User Interface

Data Curation & Analysis
Challenges
Cleaning noisy data
from ovid.xml
Authors
Institutions
Cleaning noisy data
from ovid.xml
Authors
Institutions
Adding
BI internal data
Division
Theraupeutic Area
Location
Adding
BI internal data
Division
Theraupeutic Area
Location
Adding
external data
Impact Factors
Lightweight
High scalable
User Interface
Adding
external data
Impact Factors
Lightweight
High scalable
User Interface
Challenges
Semantics 2015
Vienna, Austria
22

Data Curration - Challenge
Data from .xml
Cleaning noisy data
from ovid.xml
Authors
Institutions
Cleaning noisy data
from ovid.xml
Authors
Institutions
Adding
BI internal data
Division
Theraupeutic Area
Location
Adding
BI internal data
Division
Theraupeutic Area
Location
Building
Web Application
Lightweight
High scalable
User Interface
Building
Web Application
Lightweight
High scalable
User Interface
Noisy Data
PoolParty
Thesaurus Server Admin GUI User GUI
Semantics 2015
Vienna, Austria
23

Data Curation
Ovid – Authors
Semantics 2015
Vienna, Austria
24

Curation Thesaurus Management
Institution – Mapped Concepts
Semantics 2015
Vienna, Austria
25

Data Curation
Ovid – Institution
Semantics 2015
Vienna, Austria
26

Unmapped – Mapped Concepts
Semantics 2015
Vienna, Austria
27

Institution – Mapped Concepts
Semantics 2015
Vienna, Austria
28

Authors – Mapped
Semantics 2015
Vienna, Austria
29

Unmapped – Mapped Concepts
Semantics 2015
Vienna, Austria
30

Curation Thesaurus Management System
Unmapped Concept - Company
Semantics 2015
Vienna, Austria
31

Curation Thesaurus Management System
Mapped Concept - Company
Semantics 2015
Vienna, Austria
32
Unmapped Concept

Data Curration - Challenge
BI internal Data
Cleaning noisy data
from ovid.xml
Authors
Institutions
Cleaning noisy data
from ovid.xml
Authors
Institutions
Adding
BI internal data
Division
Theraupeutic Area
Location
Adding
BI internal data
Division
Theraupeutic Area
Location
Building
Web Application
Lightweight
High scalable
User Interface
Building
Web Application
Lightweight
High scalable
User Interface
Missing BI internal data
PoolParty
Thesaurus Server Admin GUI User GUI
Semantics 2015
Vienna, Austria
33

BI Publication Tracker
Admin User Interface – Mock UP
34

Data Visualisation & Analysis
Challenges
Cleaning noisy data
from ovid.xml
Authors
Institutions
Cleaning noisy data
from ovid.xml
Authors
Institutions
Adding
BI internal data
Division
Theraupeutic Area
Location
Adding
BI internal data
Division
Theraupeutic Area
Location
Adding
external data
Impact Factors
Lightweight
High scalable
User Interface
Adding
external data
Impact Factors
Lightweight
High scalable
User Interface
Visualization & Analysis
PoolParty
Thesaurus Server Admin GUI
Visualisation &
Analysis
Semantics 2015
Vienna, Austria
35

BI Publication Tracker
GUI/User Interface
36

Conclusions
Cleaning noisy data
from ovid.xml
Authors
Institutions
Cleaning noisy data
from ovid.xml
Authors
Institutions
Adding & Linking
BI internal data
Division
Theraupeutic Area
Location
Adding & Linking
BI internal data
Division
Theraupeutic Area
Location
Adding & Linking
external data
Impact Factors
Lightweight
High scalable
User Interface
Adding & Linking
external data
Impact Factors
Lightweight
High scalable
User Interface
PoolParty
Thesaurus Server Admin GUI
Visualisation &
Analysis
Semantics 2015
Vienna, Austria
37

Conclusions
• Linked Data:
Reuse of the Data (SPARQL Endpoint)
Domain Expert
• Data Network Solution
Semantics 2015
Vienna, Austria
38

Outlook:
What We Want to achieve in the Next Steps
Technology
User Perspective
GUI
• Export of Search Results
Optimization of data of data import
• Using Ovid RSS-feeds for updates
Semantics 2015
Vienna, Austria
39

Acknowledgement
• Boehringer Ingelheim:
Vision: Value through Innovation
• Research LeadershipTeam
• S.I.C. Colleagues
• Semantic Web Company
Semantics 2015
Vienna, Austria
40

1895
We don't live in that kind of world
1895
Paul Otlet
Semantics 2015
Vienna, Austria
41http://ww.mondotheque.be/wiki/index.php/Here

2015
We do live in that kind of world
2015
Aleksandar Kapisoda
Semantics 2015
Vienna, Austria
42

Contact Information
Aleksandar Kapisoda
aleksandar.kapisoda@boehringer-ingelheim.com
Semantics 2015
Vienna, Austria
43

Aleksandar Kapisoda: The semantic approach for tracking scientific publications

Recommended

Recommended

More Related Content

Similar to Aleksandar Kapisoda: The semantic approach for tracking scientific publications

Similar to Aleksandar Kapisoda: The semantic approach for tracking scientific publications (20)

More from Semantic Web Company

More from Semantic Web Company (20)

Recently uploaded

Recently uploaded (20)

Aleksandar Kapisoda: The semantic approach for tracking scientific publications