SlideShare a Scribd company logo
Text and Data Mining
at Springer Nature
Markus Kaindl
Senior Manager Semantic Data
Frankfurt Book Fair
October 11th 2018
1
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
1
Agenda (25 Min Talk + 5 Min Q&A)
Background
• General introduction to TDM
• Use cases in science and beyond
TDM @ SN
• Springer Nature TDM Framework
• APIs and query examples
Wrap Up
• Closing remarks
• Q&A session
22
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
Background
33
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
General
introduction
to TDM
4
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
4
Text and Data Mining (TDM) is the
(semi-)automated process of selecting
and analyzing large amounts of text or
data resources for purposes such as
• searching,
• finding patterns,
• discovering relationships,
• semantic analysis and
• learning how content relates to
ideas and needs
in a way that can provide valuable
information needed for studies,
research, etc.
What is Text and Data Mining?
5
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
5
The variety of TDM techniques, practices and end-goals makes it virtually impossible to
provide a general and exhaustive illustration of how TDM works.
By means of a necessary simplification, it appears however possible to distinguish three
common – yet not all necessary – steps in TDM processes:
1. Access to content;
2. Extraction and/or copying of content;
3. Mining of text and/or data and knowledge discovery.
Different steps in the process of TDM
Text taken from:
Dr Eleonora Rosati, Associate Professor in Intellectual Property Law at the University of Southampton
Policy Department for Citizens' Rights and Constitutional Affairs; European Parliament; PE 604.942
6
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
Step 3 - Mining of text and/or data and knowledge discovery
Image taken from:
Dr Eleonora Rosati, Associate Professor in Intellectual Property Law at the University of Southampton
Policy Department for Citizens' Rights and Constitutional Affairs; European Parliament; PE 604.942
77
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
Use cases in
science and beyond
8
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
8
• Searching and archiving
• Metadata display
• Knowledge retrieval
• Multisource library
• Online portals
• Apps, products and solutions
• Fraud detection (banking, insurance) and spam filtering
- Some build in our APIs into their local search facility to simply improve their
researchers’ discovery experience, some others do specific term or data mining.
- Some go very deep into semantic text mining, some others want to retrieve hidden
research data structures as part of given science programs.
- Some even may use TDM to investigate competitor activities and staff.
Text and Data Mining => Use Cases
9
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
• User runs search on “Clinical
Orthopaedics and Related
Research” journal
portal, e. g. hip cartilage
• Though the user is
searching the portal
our API is being called
and the latest results are
returned on the fly.
Examplatory query on an external journal portal
10
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
Research Use Cases
11
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
11
- Assist banks with traditional analyses of industries and sectors
- Serve marketing to visually track information spread and influence paths across
media segments in real-time
- In the field of criminology, it is believed that TDM can help exploring and detecting
crimes and their relationships with criminals, so to assist police forces.
- Support anthropologists in their studies of cultural phenomena by mining the ever-
growing content made available through social networks
- Please note that the same technology may perform TDM for different purposes:
For instance, IBM Watson Explorer has been used –among others– to:
- Improve productivity in the workplace
- Increase efficiency in public health management
- Improve diagnostic information
- Allow for tailored drug recommendations
- Prevent the commission of crimes, including cybercrimes and hacks
- Create new culinary recipes
TDM use cases not limited to the field of research
12
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
12
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
TDM @ SN
1313
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
Springer Nature
TDM Framework
14
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
14
Text and Data Mining => A TDM Framework for Springer Nature
https://www.springernature.com/text-and-data-mining
15
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
15
• A growing part of Springer Nature’s journal articles is published open access.
• TDM is usually allowed without restrictions for these publications since the majority
of Springer Nature open access content is licensed under CC-BY.
Text and Data Mining
16
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
16
• Springer Nature is participating in the Crossref TDM working group and is
recommending Crossref services for pan-publisher TDM.
• For further information see http://tdmsupport.crossref.org/
Text and Data Mining at CrossRef
17
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
17
• For subscribers Springer Nature offers a large variety of TDM tools such as metadata
and fulltext APIs, applicable to both open access and subscribed resources.
• See https://api.springernature.com
Text and Data Mining for Subscribers
18
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
18
• Non-subscribers are offered a variety of TDM tools for our Open Access resources,
such as our Open Access fulltext API (see https://api.springernature.com).
• TDM requirements from non-subscribers for pay-walled content are treated on a
case-by-case basis.
• Please contact tdm@springernature.com.
Text and Data Mining for Non-Subscribers
19
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
19
• At the moment Springer Nature does not offer any API for image mining.
Image Mining
20
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
20
• “Argumentation mining aims to automatically detect, classify and structure
argumentation in text (…) i.e. understanding the content of serial arguments, their
linguistic structure, the relationship between the preceding and following arguments
(…).” (Mochales, R. & Moens, MF. Artif Intell Law (2011) 19: 1.
https://doi.org/10.1007/s10506-010-9104-x)
• Argumentation mining can be considered as a subset of text mining. If you are
planning to locally store non-open-access content during an argumentation mining
project, please get in contact with tdm@springernature.com to discuss options.
Argumentation Mining
2121
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
APIs and
query examples
22
• metadata: general xml metadata using PAM standard (Prism Aggregate format); uses prism and
dc (dublincore) namespaces and elements. In use but fixed (i.e., no changes will be made).
• meta/v1: based on metadata API. Has PAM standard but additional elements added for various
business reasons. Fluid API to which changes can be easily made.
• openaccess: complete, “fulltext” (i.e., body) xml where available; when fulltext/body not
available, xml metadata is provided.
• integro: journal-level information. Based on SpringerLink journal xml.
• xmldata: JATS format for fulltext as available; else metadata.
• Throttle: recommended crawling rate of 150 hits/minute and 7500/day.
• Key: special key provided for TDM.
• All APIs have xml and json except for xmldata and integro.
• A full picture of Springer Nature’s API offerings with detailed information, examples and API key
sign-up can be accessed via https://api.springernature.com.
Springer Nature APIs
23
Industry standard XML schema. Development completed May 2017 for journals.
JATS
24
• Citations APIs (internal)
• Article/chapter count: returns the number of citations for a given DOI.
• Book-level count: returns total count of citations for all the chapters in a book.
• Journal-level count: returns total count of citations for all the articles in a journal.
• Show citations: returns actual citations for a given DOI.
• Recent citations: provides recent citation data from Crossref.
Springer Nature APIs
25
- 1 billion facts / 200 gb download size
- Licensing: CC0, CC-BY-NC & CC-BY
Metadata about:
- Journals & Articles (8M) + Abstracts
- Books & Chapters (4M)
- Grants (200k), Research Organizations,
Subjects, Conferences, Ontologies
SN SciGraph APIs: Linked Data API or
Redirect API; more details here
http://scigraph.springernature.com/explorer/api/
Linked Open Data Publishing So Far
http://scigraph.springernature.com
26
• Example Queries
• DOI: http://api.springer.com/metadata/pam?q=doi:10.1007/s10404-009-0428-
3&p=2&api_key=X
• Subject: http://api.springer.com/openaccess/jats?q=subject:Physics&api_key=X
• Keyword:
http://api.springer.com/metadata/pam?q=keyword:patients%20sort:date&api_key=X
• Year: http://api.springer.com/openaccess/jats?q=year:2011&api_key=X
• Country:
http://api.springer.com/openaccess/jats?q=country:%22New%20Zealand%22&p=1&api_key=X
• ISBN: http://api.springer.com/meta/v1/pam?q=isbn:978-0-387-79148-7&api_key=X
• ISSN: http://api.springer.com/openaccess/jats?q=issn:1432-086X&api_key=X
Springer Nature APIs => General Constraints
27
• Example Queries (cont’d)
• Volume, issue:
http://api.springer.com/openaccess/jats?q=journal%3A%22CardioVascular%20and%20Interve
ntional%20Radiology%22%20volume%3A34%20issue%3A6&p=1&api_key=X
• Date: http://api.springer.com/metadata/pam?q=date:2010-03-01&api_key=X
• Issue type (meta, metadata):
http://api.springer.com/meta/v1/pam?q=journal%3A%22CardioVascular%20and%20Interventi
onal%20Radiology%22%20volume%3A37%20issue%3A2%20issuetype%3ARegular%20&api_ke
y=X
• Journalid (meta, metadata):
http://api.springer.com/meta/v1/pam?q=journalid%3A270%20heart%20%20sort%3Adate%20
%20&api_key=X
• Open Access (meta):
http://api.springer.com/meta/v1/pam?q=journalid:259%20openaccess:true&api_key=X
Springer Nature APIs => General Constraints
28
near query
• The purpose of the “near” query is to find terms within x number of words from each other. In the
highlighted case, we use “engineering” which can have various meanings and contexts:
• The “engineering NEAR/3 locomotive” API call
http://api.springer.com/xmldata/jats?q=engineering%20NEAR/3%20locomotive&api_key=X&p=5
yields four results where “engineering” is within 3 words of “locomotive” which is what was specified in
the API call.
• The “engineering NEAR/5 metals” API call
http://api.springer.com/xmldata/jats?q=engineering%20NEAR/5%20metals&api_key=X&p=5
yields four results where “engineering” is within 5 words of “metals”.
29
metadata API
• http://api.springer.com/metadata/pam?q=doi:10.1007/s10404-009-0428-3&p=2&api_key=X
• General Elements
• identifier (doi)
• title (of article/chapter)
• creator (author)
• publicationName (book, journal title)
• issn/isbn (electronic)
• genre (ArticleType/ChapterType)
• journalID
• volume
• number (issue)
• startingPage
• openAccess
• publisher
• publicationDate
• url (resolver)
• copyright
• abstract
30
meta API
• http://api.springer.com/meta/v1/pam?q=journalid:11671%20openaccess:true&api_key=X
• General Elements
• (Same as metadata, except as below)
• Casper pdf and html urls
• issueType (Regular, Supplement)
• topicalCollection (ArticleCollection; e.g.,
topicalcollection:Topics in Flow Control)
• Can query with openaccess:True
31
xmldata/fulltext API
• http://api.springer.com/xmldata/jats?q=doi:10.1007/s11258-006-9242-0&api_key=X
• http://api.springer.com/xmldata/jats?q=journalid:270%20year:2017&api_key=X&s=1&p=3
• http://api.springer.com/xmldata/jats?q=isbn:978-1-59259-559-4&api_key=X&s=1&p=3
• Returns fulltext JATS xml as available; for internal use (except as indicated elsewhere).
• Querying constraints
• doi
• journalid
• issn
• journal
• isbn
• book
• pub (publication)
• title
• name
• subject
• date
• year
• keyword
• type (Book or Journal)
• country
• JournalOnlineFirst (true or false)
• Orgname
• empty (no constraint, general
search)
32
Open Access API
• http://api.springer.com/openaccess/jats?q=issn:1432-086X&api_key=X
• Returns fulltext JATS xml for Open Access articles and chapters
33
Citations APIs (for Journal Articles)
• Article/chapter count: returns the number of citations for a given DOI
• http://citationsapi.nkb3.org/article/citations/count?doi=10.1007/s00216-007-1222-2
• Show citations: returns actual citations for a given DOI.
• http://citationsapi.nkb3.org/article/citations?doi=10.1007/s00216-007-1222-2
• Data for citations APIs comes from
the responses to requests we make
to Crossref to find out who is citing
our data.
34
Citations APIs (for Book Chapters and Journals)
• Book-level count: returns total count of citations for all the chapters in a book.
• http://citationsapi.springerservice.com/book/citations/count?doi=10.1007/978-3-642-24040-9
• Journal-level count: returns total count of citations for all the articles in a journal.
• http://citationsapi.springerservice.com/journal/367/citations/count
• Citations Combination Count and Display API:
• http://citationsapi.springerservice.com/book/citations?doi=10.1007/978-3-540-30299-
5&s=51&p=20
35
Citations APIs
• Recent citations: live feed receiving daily Crossref data; not just about recent Springer Nature
articles; this is about ANY new article citing (/citation) a Springer Nature article (/item) from
any time.
• http://citations.springer.com/xml/recent_citations/2015-02-03?start=1&end=10
• http://citations.springer.com/xml/recent_citations/2015-02-03?maxResults=10
3636
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
Wrap Up
3737
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
Closing remarks
38
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
38
Springer Nature Publications About TDM
39
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
39
• Springer Nature also offers direct metadata delivery options in various formats, such
as ONIX or MARC records, using different protocols (ftp/ftps, sftp) including the
Open Archives Initiative for metadata harvesting (OAI-PMH).
• For direct metadata download features via the metadata downloader see
https://metadata.springernature.com.
• If you have requests for metadata delivery please contact
dds.support@springernature.com.
Metadata Delivery
40
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
40
Metadata Delivery
42
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
42
Information professionals, knowledge workers and
librarians have a long familiarity with large sets of
information. What has changed is the development of
sophisticated tools for text- and data-mining (TDM) of
disparate data sets.
In this white paper, Mary Ellen Bates will offer insights
into how “info pros” can raise their role in TDM projects
within their organizations by better appreciation for
what aspects of TDM most benefit from a data
scientist’s point of view and how “info pros” can best
leverage their professional expertise in this new field.
Whitepaper: Bringing Insight to Data:
Info Professionals’ Role in Text and Data Mining
Dr. Michele Cristovao
Project and Innovation Manager
Database Research Group
Applications of Artificial Intelligence in
Life Sciences and Nanotech research
11th of October – 3pm, right here!
44
Thank you!
Markus Kaindl
markus.kaindl@springernature.com
Text and Data Mining at Springer Nature
https://www.springernature.com/text-and-data-mining
Contact:
tdm@springernature.com
Visit us at the Springer Nature Stand (Hall 4.2 F8)
4545
Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018
Q&A session

More Related Content

What's hot

Kaggle and data science
Kaggle and data scienceKaggle and data science
Kaggle and data science
Akira Shibata
 
深層学習による非滑らかな関数の推定
深層学習による非滑らかな関数の推定深層学習による非滑らかな関数の推定
深層学習による非滑らかな関数の推定
Masaaki Imaizumi
 
The Art of Business Storytelling with Data
The Art of Business Storytelling with DataThe Art of Business Storytelling with Data
The Art of Business Storytelling with Data
Andrés Fortino, PhD
 
Build Intelligent Fraud Prevention with Machine Learning and Graphs
Build Intelligent Fraud Prevention with Machine Learning and GraphsBuild Intelligent Fraud Prevention with Machine Learning and Graphs
Build Intelligent Fraud Prevention with Machine Learning and Graphs
Neo4j
 
Fact Check Your Data - Data.Monks.pptx
Fact Check Your Data - Data.Monks.pptxFact Check Your Data - Data.Monks.pptx
Fact Check Your Data - Data.Monks.pptx
Doug Hall
 
効果のあるクリエイティブ広告の見つけ方(Contextual Bandit + TS or UCB)
効果のあるクリエイティブ広告の見つけ方(Contextual Bandit + TS or UCB)効果のあるクリエイティブ広告の見つけ方(Contextual Bandit + TS or UCB)
効果のあるクリエイティブ広告の見つけ方(Contextual Bandit + TS or UCB)
Yusuke Kaneko
 
An Introduction to the Google Analytics 360 Suite
An Introduction to the Google Analytics 360 SuiteAn Introduction to the Google Analytics 360 Suite
An Introduction to the Google Analytics 360 Suite
Search Laboratory
 
ニューラルネットワーク入門
ニューラルネットワーク入門ニューラルネットワーク入門
ニューラルネットワーク入門
naoto moriyama
 
クラシックな機械学習入門 1 導入
クラシックな機械学習入門 1 導入クラシックな機械学習入門 1 導入
クラシックな機械学習入門 1 導入
Hiroshi Nakagawa
 
Featurizing log data before XGBoost
Featurizing log data before XGBoostFeaturizing log data before XGBoost
Featurizing log data before XGBoost
DataRobot
 
Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.
Esh Vckay
 
TokyoNLP#5 パーセプトロンで楽しい仲間がぽぽぽぽ~ん
TokyoNLP#5 パーセプトロンで楽しい仲間がぽぽぽぽ~んTokyoNLP#5 パーセプトロンで楽しい仲間がぽぽぽぽ~ん
TokyoNLP#5 パーセプトロンで楽しい仲間がぽぽぽぽ~ん
sleepy_yoshi
 
Is Consent Mode Working.pdf
Is Consent Mode Working.pdfIs Consent Mode Working.pdf
Is Consent Mode Working.pdf
Doug Hall
 
Ist Google Analytics noch zu retten?
Ist Google Analytics noch zu retten?Ist Google Analytics noch zu retten?
Ist Google Analytics noch zu retten?
📊 Markus Baersch
 
多目的遺伝的アルゴリズム
多目的遺伝的アルゴリズム多目的遺伝的アルゴリズム
多目的遺伝的アルゴリズム
MatsuiRyo
 
数式を使わないプライバシー保護技術
数式を使わないプライバシー保護技術数式を使わないプライバシー保護技術
数式を使わないプライバシー保護技術
Hiroshi Nakagawa
 
Visualization For Data Science
Visualization For Data ScienceVisualization For Data Science
Visualization For Data Science
Angela Zoss
 
ベイジアンネットワーク入門
ベイジアンネットワーク入門ベイジアンネットワーク入門
ベイジアンネットワーク入門Keisuke OTAKI
 
GPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge GraphGPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge Graph
Neo4j
 
車両運行管理システムのためのデータ整備と機械学習の活用
車両運行管理システムのためのデータ整備と機械学習の活用車両運行管理システムのためのデータ整備と機械学習の活用
車両運行管理システムのためのデータ整備と機械学習の活用
Eiji Sekiya
 

What's hot (20)

Kaggle and data science
Kaggle and data scienceKaggle and data science
Kaggle and data science
 
深層学習による非滑らかな関数の推定
深層学習による非滑らかな関数の推定深層学習による非滑らかな関数の推定
深層学習による非滑らかな関数の推定
 
The Art of Business Storytelling with Data
The Art of Business Storytelling with DataThe Art of Business Storytelling with Data
The Art of Business Storytelling with Data
 
Build Intelligent Fraud Prevention with Machine Learning and Graphs
Build Intelligent Fraud Prevention with Machine Learning and GraphsBuild Intelligent Fraud Prevention with Machine Learning and Graphs
Build Intelligent Fraud Prevention with Machine Learning and Graphs
 
Fact Check Your Data - Data.Monks.pptx
Fact Check Your Data - Data.Monks.pptxFact Check Your Data - Data.Monks.pptx
Fact Check Your Data - Data.Monks.pptx
 
効果のあるクリエイティブ広告の見つけ方(Contextual Bandit + TS or UCB)
効果のあるクリエイティブ広告の見つけ方(Contextual Bandit + TS or UCB)効果のあるクリエイティブ広告の見つけ方(Contextual Bandit + TS or UCB)
効果のあるクリエイティブ広告の見つけ方(Contextual Bandit + TS or UCB)
 
An Introduction to the Google Analytics 360 Suite
An Introduction to the Google Analytics 360 SuiteAn Introduction to the Google Analytics 360 Suite
An Introduction to the Google Analytics 360 Suite
 
ニューラルネットワーク入門
ニューラルネットワーク入門ニューラルネットワーク入門
ニューラルネットワーク入門
 
クラシックな機械学習入門 1 導入
クラシックな機械学習入門 1 導入クラシックな機械学習入門 1 導入
クラシックな機械学習入門 1 導入
 
Featurizing log data before XGBoost
Featurizing log data before XGBoostFeaturizing log data before XGBoost
Featurizing log data before XGBoost
 
Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.
 
TokyoNLP#5 パーセプトロンで楽しい仲間がぽぽぽぽ~ん
TokyoNLP#5 パーセプトロンで楽しい仲間がぽぽぽぽ~んTokyoNLP#5 パーセプトロンで楽しい仲間がぽぽぽぽ~ん
TokyoNLP#5 パーセプトロンで楽しい仲間がぽぽぽぽ~ん
 
Is Consent Mode Working.pdf
Is Consent Mode Working.pdfIs Consent Mode Working.pdf
Is Consent Mode Working.pdf
 
Ist Google Analytics noch zu retten?
Ist Google Analytics noch zu retten?Ist Google Analytics noch zu retten?
Ist Google Analytics noch zu retten?
 
多目的遺伝的アルゴリズム
多目的遺伝的アルゴリズム多目的遺伝的アルゴリズム
多目的遺伝的アルゴリズム
 
数式を使わないプライバシー保護技術
数式を使わないプライバシー保護技術数式を使わないプライバシー保護技術
数式を使わないプライバシー保護技術
 
Visualization For Data Science
Visualization For Data ScienceVisualization For Data Science
Visualization For Data Science
 
ベイジアンネットワーク入門
ベイジアンネットワーク入門ベイジアンネットワーク入門
ベイジアンネットワーク入門
 
GPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge GraphGPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge Graph
 
車両運行管理システムのためのデータ整備と機械学習の活用
車両運行管理システムのためのデータ整備と機械学習の活用車両運行管理システムのためのデータ整備と機械学習の活用
車両運行管理システムのためのデータ整備と機械学習の活用
 

Similar to Text and Data Mining at Springer Nature

OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017
openminted_eu
 
Text Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open AccessText Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open Access
openminted_eu
 
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE
 
OpenMinteD Project - building a TDM infrastructure
OpenMinteD Project - building a TDM infrastructureOpenMinteD Project - building a TDM infrastructure
OpenMinteD Project - building a TDM infrastructure
FutureTDM
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impact
Elena Simperl
 
A Data-driven Approach for Internet of Things Applications: Methods and Case ...
A Data-driven Approach for Internet of Things Applications: Methods and Case ...A Data-driven Approach for Internet of Things Applications: Methods and Case ...
A Data-driven Approach for Internet of Things Applications: Methods and Case ...
Suparna De
 
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015
KISK FF MU
 
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
Georg Rehm
 
Supporting Springer Nature Editors by means of Semantic Technologies
Supporting Springer Nature Editors by means of Semantic TechnologiesSupporting Springer Nature Editors by means of Semantic Technologies
Supporting Springer Nature Editors by means of Semantic Technologies
Francesco Osborne
 
The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?
openminted_eu
 
Data Mining Newspapers Metadata
Data Mining Newspapers MetadataData Mining Newspapers Metadata
Data Mining Newspapers Metadata
Jean-Philippe Moreux
 
APIdays 2018 BnF API projects
APIdays 2018 BnF API projectsAPIdays 2018 BnF API projects
APIdays 2018 BnF API projects
Isabelle REUSA
 
Introduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research GraphIntroduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research Graph
OpenAIRE
 
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
Dr. Haxel Consult
 
A Corpus of Chinese Comic Books: Database, Metadata, and Visual Object Recogn...
A Corpus of Chinese Comic Books: Database, Metadata, and Visual Object Recogn...A Corpus of Chinese Comic Books: Database, Metadata, and Visual Object Recogn...
A Corpus of Chinese Comic Books: Database, Metadata, and Visual Object Recogn...
Matthias Arnold
 
Kaindl "Managing Information Overload"
Kaindl "Managing Information Overload"Kaindl "Managing Information Overload"
Kaindl "Managing Information Overload"
National Information Standards Organization (NISO)
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
openminted_eu
 
Nextérité: Semantic Business Services
Nextérité: Semantic Business ServicesNextérité: Semantic Business Services
Nextérité: Semantic Business Services
Edith Nuss
 
Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016
Martin Voigt
 
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and SemanticsMartin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
semanticsconference
 

Similar to Text and Data Mining at Springer Nature (20)

OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017
 
Text Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open AccessText Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open Access
 
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
 
OpenMinteD Project - building a TDM infrastructure
OpenMinteD Project - building a TDM infrastructureOpenMinteD Project - building a TDM infrastructure
OpenMinteD Project - building a TDM infrastructure
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impact
 
A Data-driven Approach for Internet of Things Applications: Methods and Case ...
A Data-driven Approach for Internet of Things Applications: Methods and Case ...A Data-driven Approach for Internet of Things Applications: Methods and Case ...
A Data-driven Approach for Internet of Things Applications: Methods and Case ...
 
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015
 
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
 
Supporting Springer Nature Editors by means of Semantic Technologies
Supporting Springer Nature Editors by means of Semantic TechnologiesSupporting Springer Nature Editors by means of Semantic Technologies
Supporting Springer Nature Editors by means of Semantic Technologies
 
The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?
 
Data Mining Newspapers Metadata
Data Mining Newspapers MetadataData Mining Newspapers Metadata
Data Mining Newspapers Metadata
 
APIdays 2018 BnF API projects
APIdays 2018 BnF API projectsAPIdays 2018 BnF API projects
APIdays 2018 BnF API projects
 
Introduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research GraphIntroduction to OpenAIRE services and the OpenAIRE Research Graph
Introduction to OpenAIRE services and the OpenAIRE Research Graph
 
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
 
A Corpus of Chinese Comic Books: Database, Metadata, and Visual Object Recogn...
A Corpus of Chinese Comic Books: Database, Metadata, and Visual Object Recogn...A Corpus of Chinese Comic Books: Database, Metadata, and Visual Object Recogn...
A Corpus of Chinese Comic Books: Database, Metadata, and Visual Object Recogn...
 
Kaindl "Managing Information Overload"
Kaindl "Managing Information Overload"Kaindl "Managing Information Overload"
Kaindl "Managing Information Overload"
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
 
Nextérité: Semantic Business Services
Nextérité: Semantic Business ServicesNextérité: Semantic Business Services
Nextérité: Semantic Business Services
 
Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016
 
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and SemanticsMartin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
Martin Voigt | Streaming-based Text Mining using Deep Learning and Semantics
 

Recently uploaded

The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 

Recently uploaded (20)

The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 

Text and Data Mining at Springer Nature

  • 1. Text and Data Mining at Springer Nature Markus Kaindl Senior Manager Semantic Data Frankfurt Book Fair October 11th 2018
  • 2. 1 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 1 Agenda (25 Min Talk + 5 Min Q&A) Background • General introduction to TDM • Use cases in science and beyond TDM @ SN • Springer Nature TDM Framework • APIs and query examples Wrap Up • Closing remarks • Q&A session
  • 3. 22 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 Background
  • 4. 33 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 General introduction to TDM
  • 5. 4 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 4 Text and Data Mining (TDM) is the (semi-)automated process of selecting and analyzing large amounts of text or data resources for purposes such as • searching, • finding patterns, • discovering relationships, • semantic analysis and • learning how content relates to ideas and needs in a way that can provide valuable information needed for studies, research, etc. What is Text and Data Mining?
  • 6. 5 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 5 The variety of TDM techniques, practices and end-goals makes it virtually impossible to provide a general and exhaustive illustration of how TDM works. By means of a necessary simplification, it appears however possible to distinguish three common – yet not all necessary – steps in TDM processes: 1. Access to content; 2. Extraction and/or copying of content; 3. Mining of text and/or data and knowledge discovery. Different steps in the process of TDM Text taken from: Dr Eleonora Rosati, Associate Professor in Intellectual Property Law at the University of Southampton Policy Department for Citizens' Rights and Constitutional Affairs; European Parliament; PE 604.942
  • 7. 6 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 Step 3 - Mining of text and/or data and knowledge discovery Image taken from: Dr Eleonora Rosati, Associate Professor in Intellectual Property Law at the University of Southampton Policy Department for Citizens' Rights and Constitutional Affairs; European Parliament; PE 604.942
  • 8. 77 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 Use cases in science and beyond
  • 9. 8 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 8 • Searching and archiving • Metadata display • Knowledge retrieval • Multisource library • Online portals • Apps, products and solutions • Fraud detection (banking, insurance) and spam filtering - Some build in our APIs into their local search facility to simply improve their researchers’ discovery experience, some others do specific term or data mining. - Some go very deep into semantic text mining, some others want to retrieve hidden research data structures as part of given science programs. - Some even may use TDM to investigate competitor activities and staff. Text and Data Mining => Use Cases
  • 10. 9 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 • User runs search on “Clinical Orthopaedics and Related Research” journal portal, e. g. hip cartilage • Though the user is searching the portal our API is being called and the latest results are returned on the fly. Examplatory query on an external journal portal
  • 11. 10 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 Research Use Cases
  • 12. 11 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 11 - Assist banks with traditional analyses of industries and sectors - Serve marketing to visually track information spread and influence paths across media segments in real-time - In the field of criminology, it is believed that TDM can help exploring and detecting crimes and their relationships with criminals, so to assist police forces. - Support anthropologists in their studies of cultural phenomena by mining the ever- growing content made available through social networks - Please note that the same technology may perform TDM for different purposes: For instance, IBM Watson Explorer has been used –among others– to: - Improve productivity in the workplace - Increase efficiency in public health management - Improve diagnostic information - Allow for tailored drug recommendations - Prevent the commission of crimes, including cybercrimes and hacks - Create new culinary recipes TDM use cases not limited to the field of research
  • 13. 12 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 12 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 TDM @ SN
  • 14. 1313 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 Springer Nature TDM Framework
  • 15. 14 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 14 Text and Data Mining => A TDM Framework for Springer Nature https://www.springernature.com/text-and-data-mining
  • 16. 15 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 15 • A growing part of Springer Nature’s journal articles is published open access. • TDM is usually allowed without restrictions for these publications since the majority of Springer Nature open access content is licensed under CC-BY. Text and Data Mining
  • 17. 16 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 16 • Springer Nature is participating in the Crossref TDM working group and is recommending Crossref services for pan-publisher TDM. • For further information see http://tdmsupport.crossref.org/ Text and Data Mining at CrossRef
  • 18. 17 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 17 • For subscribers Springer Nature offers a large variety of TDM tools such as metadata and fulltext APIs, applicable to both open access and subscribed resources. • See https://api.springernature.com Text and Data Mining for Subscribers
  • 19. 18 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 18 • Non-subscribers are offered a variety of TDM tools for our Open Access resources, such as our Open Access fulltext API (see https://api.springernature.com). • TDM requirements from non-subscribers for pay-walled content are treated on a case-by-case basis. • Please contact tdm@springernature.com. Text and Data Mining for Non-Subscribers
  • 20. 19 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 19 • At the moment Springer Nature does not offer any API for image mining. Image Mining
  • 21. 20 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 20 • “Argumentation mining aims to automatically detect, classify and structure argumentation in text (…) i.e. understanding the content of serial arguments, their linguistic structure, the relationship between the preceding and following arguments (…).” (Mochales, R. & Moens, MF. Artif Intell Law (2011) 19: 1. https://doi.org/10.1007/s10506-010-9104-x) • Argumentation mining can be considered as a subset of text mining. If you are planning to locally store non-open-access content during an argumentation mining project, please get in contact with tdm@springernature.com to discuss options. Argumentation Mining
  • 22. 2121 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 APIs and query examples
  • 23. 22 • metadata: general xml metadata using PAM standard (Prism Aggregate format); uses prism and dc (dublincore) namespaces and elements. In use but fixed (i.e., no changes will be made). • meta/v1: based on metadata API. Has PAM standard but additional elements added for various business reasons. Fluid API to which changes can be easily made. • openaccess: complete, “fulltext” (i.e., body) xml where available; when fulltext/body not available, xml metadata is provided. • integro: journal-level information. Based on SpringerLink journal xml. • xmldata: JATS format for fulltext as available; else metadata. • Throttle: recommended crawling rate of 150 hits/minute and 7500/day. • Key: special key provided for TDM. • All APIs have xml and json except for xmldata and integro. • A full picture of Springer Nature’s API offerings with detailed information, examples and API key sign-up can be accessed via https://api.springernature.com. Springer Nature APIs
  • 24. 23 Industry standard XML schema. Development completed May 2017 for journals. JATS
  • 25. 24 • Citations APIs (internal) • Article/chapter count: returns the number of citations for a given DOI. • Book-level count: returns total count of citations for all the chapters in a book. • Journal-level count: returns total count of citations for all the articles in a journal. • Show citations: returns actual citations for a given DOI. • Recent citations: provides recent citation data from Crossref. Springer Nature APIs
  • 26. 25 - 1 billion facts / 200 gb download size - Licensing: CC0, CC-BY-NC & CC-BY Metadata about: - Journals & Articles (8M) + Abstracts - Books & Chapters (4M) - Grants (200k), Research Organizations, Subjects, Conferences, Ontologies SN SciGraph APIs: Linked Data API or Redirect API; more details here http://scigraph.springernature.com/explorer/api/ Linked Open Data Publishing So Far http://scigraph.springernature.com
  • 27. 26 • Example Queries • DOI: http://api.springer.com/metadata/pam?q=doi:10.1007/s10404-009-0428- 3&p=2&api_key=X • Subject: http://api.springer.com/openaccess/jats?q=subject:Physics&api_key=X • Keyword: http://api.springer.com/metadata/pam?q=keyword:patients%20sort:date&api_key=X • Year: http://api.springer.com/openaccess/jats?q=year:2011&api_key=X • Country: http://api.springer.com/openaccess/jats?q=country:%22New%20Zealand%22&p=1&api_key=X • ISBN: http://api.springer.com/meta/v1/pam?q=isbn:978-0-387-79148-7&api_key=X • ISSN: http://api.springer.com/openaccess/jats?q=issn:1432-086X&api_key=X Springer Nature APIs => General Constraints
  • 28. 27 • Example Queries (cont’d) • Volume, issue: http://api.springer.com/openaccess/jats?q=journal%3A%22CardioVascular%20and%20Interve ntional%20Radiology%22%20volume%3A34%20issue%3A6&p=1&api_key=X • Date: http://api.springer.com/metadata/pam?q=date:2010-03-01&api_key=X • Issue type (meta, metadata): http://api.springer.com/meta/v1/pam?q=journal%3A%22CardioVascular%20and%20Interventi onal%20Radiology%22%20volume%3A37%20issue%3A2%20issuetype%3ARegular%20&api_ke y=X • Journalid (meta, metadata): http://api.springer.com/meta/v1/pam?q=journalid%3A270%20heart%20%20sort%3Adate%20 %20&api_key=X • Open Access (meta): http://api.springer.com/meta/v1/pam?q=journalid:259%20openaccess:true&api_key=X Springer Nature APIs => General Constraints
  • 29. 28 near query • The purpose of the “near” query is to find terms within x number of words from each other. In the highlighted case, we use “engineering” which can have various meanings and contexts: • The “engineering NEAR/3 locomotive” API call http://api.springer.com/xmldata/jats?q=engineering%20NEAR/3%20locomotive&api_key=X&p=5 yields four results where “engineering” is within 3 words of “locomotive” which is what was specified in the API call. • The “engineering NEAR/5 metals” API call http://api.springer.com/xmldata/jats?q=engineering%20NEAR/5%20metals&api_key=X&p=5 yields four results where “engineering” is within 5 words of “metals”.
  • 30. 29 metadata API • http://api.springer.com/metadata/pam?q=doi:10.1007/s10404-009-0428-3&p=2&api_key=X • General Elements • identifier (doi) • title (of article/chapter) • creator (author) • publicationName (book, journal title) • issn/isbn (electronic) • genre (ArticleType/ChapterType) • journalID • volume • number (issue) • startingPage • openAccess • publisher • publicationDate • url (resolver) • copyright • abstract
  • 31. 30 meta API • http://api.springer.com/meta/v1/pam?q=journalid:11671%20openaccess:true&api_key=X • General Elements • (Same as metadata, except as below) • Casper pdf and html urls • issueType (Regular, Supplement) • topicalCollection (ArticleCollection; e.g., topicalcollection:Topics in Flow Control) • Can query with openaccess:True
  • 32. 31 xmldata/fulltext API • http://api.springer.com/xmldata/jats?q=doi:10.1007/s11258-006-9242-0&api_key=X • http://api.springer.com/xmldata/jats?q=journalid:270%20year:2017&api_key=X&s=1&p=3 • http://api.springer.com/xmldata/jats?q=isbn:978-1-59259-559-4&api_key=X&s=1&p=3 • Returns fulltext JATS xml as available; for internal use (except as indicated elsewhere). • Querying constraints • doi • journalid • issn • journal • isbn • book • pub (publication) • title • name • subject • date • year • keyword • type (Book or Journal) • country • JournalOnlineFirst (true or false) • Orgname • empty (no constraint, general search)
  • 33. 32 Open Access API • http://api.springer.com/openaccess/jats?q=issn:1432-086X&api_key=X • Returns fulltext JATS xml for Open Access articles and chapters
  • 34. 33 Citations APIs (for Journal Articles) • Article/chapter count: returns the number of citations for a given DOI • http://citationsapi.nkb3.org/article/citations/count?doi=10.1007/s00216-007-1222-2 • Show citations: returns actual citations for a given DOI. • http://citationsapi.nkb3.org/article/citations?doi=10.1007/s00216-007-1222-2 • Data for citations APIs comes from the responses to requests we make to Crossref to find out who is citing our data.
  • 35. 34 Citations APIs (for Book Chapters and Journals) • Book-level count: returns total count of citations for all the chapters in a book. • http://citationsapi.springerservice.com/book/citations/count?doi=10.1007/978-3-642-24040-9 • Journal-level count: returns total count of citations for all the articles in a journal. • http://citationsapi.springerservice.com/journal/367/citations/count • Citations Combination Count and Display API: • http://citationsapi.springerservice.com/book/citations?doi=10.1007/978-3-540-30299- 5&s=51&p=20
  • 36. 35 Citations APIs • Recent citations: live feed receiving daily Crossref data; not just about recent Springer Nature articles; this is about ANY new article citing (/citation) a Springer Nature article (/item) from any time. • http://citations.springer.com/xml/recent_citations/2015-02-03?start=1&end=10 • http://citations.springer.com/xml/recent_citations/2015-02-03?maxResults=10
  • 37. 3636 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 Wrap Up
  • 38. 3737 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 Closing remarks
  • 39. 38 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 38 Springer Nature Publications About TDM
  • 40. 39 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 39 • Springer Nature also offers direct metadata delivery options in various formats, such as ONIX or MARC records, using different protocols (ftp/ftps, sftp) including the Open Archives Initiative for metadata harvesting (OAI-PMH). • For direct metadata download features via the metadata downloader see https://metadata.springernature.com. • If you have requests for metadata delivery please contact dds.support@springernature.com. Metadata Delivery
  • 41. 40 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 40 Metadata Delivery
  • 42. 42 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 42 Information professionals, knowledge workers and librarians have a long familiarity with large sets of information. What has changed is the development of sophisticated tools for text- and data-mining (TDM) of disparate data sets. In this white paper, Mary Ellen Bates will offer insights into how “info pros” can raise their role in TDM projects within their organizations by better appreciation for what aspects of TDM most benefit from a data scientist’s point of view and how “info pros” can best leverage their professional expertise in this new field. Whitepaper: Bringing Insight to Data: Info Professionals’ Role in Text and Data Mining
  • 43. Dr. Michele Cristovao Project and Innovation Manager Database Research Group Applications of Artificial Intelligence in Life Sciences and Nanotech research 11th of October – 3pm, right here!
  • 44. 44 Thank you! Markus Kaindl markus.kaindl@springernature.com Text and Data Mining at Springer Nature https://www.springernature.com/text-and-data-mining Contact: tdm@springernature.com Visit us at the Springer Nature Stand (Hall 4.2 F8)
  • 45. 4545 Text and Data Mining at Springer Nature | Markus Kaindl | Frankfurt Book Fair | October 2018 Q&A session