SlideShare a Scribd company logo
1 of 35
Download to read offline
Text Mining - as Normal as Data Mining?
Andrew Hinton, Application Specialist
IISDV 2016, Tuesday 19th April 2016, Nice
Agenda
Introduction to text mining
The challenge
Applications of specialised normalization solutions
− Maximising Source Normalization
− EASL (Extraction and Search Language )
− Allows programmatic access to unstructured data similar to
SQL over structured data.
− Numeric Normalization & Range search
− Capturing weights between 60 and 80kg whether
expressed in kilograms or pounds, for patient selection
from EHRs.
− Gene Mutation Normalization
− Use case where gene mutations have been linked to rare
disease progression.
© 2016 Linguamatics Ltd2
Answers to Our Questions are in Free Text
80% of information at companies is in free text
Most of the answers to our questions are there
Ever-increasing amounts of text data to examine
© 2016 Linguamatics Ltd3
0
5.000.000
10.000.000
15.000.000
20.000.000
25.000.000
PubMed Records
− Different kinds of documents
− External literature, patents,
EHRs, internal reports, blogs,
presentations
− Different formats
− HTML, PDF, XML, Word, PPT,
Wiki, TXT, HL7
Keyword Searching
© 2016 Linguamatics Ltd4
OLED
Documents, Web Pages, Folders
All these
documents
contain the
keyword
‘Additive’.
Read ALL
the
document
to find the
relevant bit
to you
Linguamatics in Healthcare
© 2016 Linguamatics Ltd5
Electronic
Health
Record
Enterprise
Data
Warehouse
Pathology, radiology,
initial
assessment,
discharge, check up
Structured
data
Clinical
Risk
Monitor
Patient
characteristics
Patient
lists
Clinical
trials
gov
Patient
characteristics
Matching
Clinical
trials
Patient
Narrative
Semantic search
tags
Semantic
Enrichment
Clinical case
histories and/or
genomic
interpretation
Patient
characteristics
Scientific
literature
I2E Transforms Text into Actionable Insights
© 2016 Linguamatics Ltd6
Turn text Into structured data
using sophisticated queries
Accurate results: only retrieves relevant results
Complete results: comprehensive and systematic
Analytics
To drive
analytics
Enterprise
Warehouse
Search vs. Text Mining
© 2016 Linguamatics Ltd7
Text MiningSearch Engine
Filter to find
most
relevant
documents,
then read
News Feeds Literature Patents Internal Reports Social Media
Natural Language
Processing (NLP) -
understand meaning
© 2012 Linguamatics Ltd.
Use of ontologies and
clustered results
Efficient review, without
reading every document
Challenges in Unstructured Data
© 2016 Linguamatics Ltd
Different word, same
meaning
cyclosporine
ciclosporin
Neoral
Sandimmune
Different expression, same
meaning
Non-smoker
Does not smoke
Does not drink or smoke
Denies tobacco use
Different grammar, same
meaning
5mg/kg of cyclosporine per day
5mg/kg per day of cyclosporine
cyclosporine 5mg/kg per day
Same word, different
context
Diagnosed with diabetes
Family history of diabetes
No family history of diabetes
NLP
8
Linguistic Processing Using NLP
Interprets meaning of the text
Groups words into meaningful units
Search for different forms of words
© 2016 Linguamatics Ltd9
We find that p42mapk phosphorylates c-Myb on serine and threonine .
Purified recombinant p42 MAPK was found to phosphorylate Wee1 .
sentences morphology -
different forms
noun groups
match entities
verb groups
match actions
From Words to Meaning
© 2016 Linguamatics Ltd10
“Among them, nimesulide, a selective COX2 inhibitor, …”
Entrez Gene ID:
5743
inhibits
Entrez Gene ID: 5743
inhibits
Identifying
entities and
relations
Linguistics to establish relationships
Text Mining - as Normal as Data Mining?
© 2016 Linguamatics Ltd11
CHALLENGE
How can we capture
information from free text
as conveniently as
accessing a database?
One of the essential
differences is the lack of
normalization of terms
and concepts in free text.
SOLUTION
NLP-based text mining
provides the capability to
look through unstructured
text normalizing:
• Keywords to concepts
• Numerical data
• Range Search
• Gene Mutations
• Content source
BENEFIT
A set of structured facts,
relationships or
assertions, from different
data sources that can be
used for decision support
Providing tabular or visual
analytics to fill data
warehouses and support
better patient care.
Literature
Patents
Reports
Clinical
Trials
Examples of Normalization
Content Source Normalization
I2E: A Fully Federated Text Mining Platform
14
Merge into a single set of results
Content
Server 1
Content
Server 2
Content
Server 3
Content
Server 4
Federated Architecture
Normalizing Data from Different Sources
Single query
Differently structured data sources on different
servers
− Journal articles (PubMed Central) on local Enterprise
Server
− MEDLINE on remote cloud server
Single set of results
© 2016 Linguamatics Ltd15
Using EASL
EASL: Extraction And Search Language
Representing a Query in EASL
17
EASL Example
© 2016 Linguamatics Ltd18
query:
document:
- phrase:
- class: {snid: nci.C1909, pt: Pharmacologic Substance}
- treat
- class: {snid: nlm.C04.588.180, pt: Breast Neoplasms}
output:
outputSettings: {documentsPerAssertion: -1,
hitsPerDocPerAssertion: 10, outputOrdering: frequency,
resultType: standard}
Benefits of EASL
Automation
− Richer language for WSAPI applications
− Can build a completely new query vs. adapting smart query parameters
− Allows on-the-fly query production
Re-use
− Save, share and compare components of queries e.g.
− Save out Alternatives
− Load complex expressions in smart query parameters
Audit
− Human readable language for documenting the text mining strategy
− Using open mark-up language (YAML)
Conversion
− Enable scripts to convert from other query languages e.g. advanced search
Different interfaces
− Enables 3rd party applications to create I2E queries
− Developers can produce innovative specialized interfaces e.g. advanced
search plus terminologies
© 2016 Linguamatics Ltd19
EASL: Enhancing the Value of Federated Search
20
Merge into a single set of results
Content
Server 1
Content
Server 2
Content
Server 3
Content
Server 4
Federated Architecture
translate2easl
© 2016 Linguamatics Ltd21
Espacenet query Pubmed query
espacenet2easl pubmed2easl
EASL
keywords + index terms
EASL
terminologies, linguistics …
Clinical
Trials
OMIM
FDA
Drug
Labels
PatentsNIH
Grants
MEDLINE
refine
query
Range Search and Normalization
What Do We Want to Find?
Patients
− below 60 years old
− weight ≥ 80kg
− not having chemotherapy after 2010
− with a mutation C677T
© 2016 Linguamatics Ltd23
Challenge: Variety Within the Text
Below 60 years old
− aged 58
− 35 years old
− 42-year-old
− 39 y/o
Weight ≥ 80kg
− 267 pounds
− 280 lbs
− 80.4kg
− 82 kilograms
© 2016 Linguamatics Ltd24
After 2010
− January 21, 2011
− October of 2012
− 08/21/11
− 2012-05-04
Mutation C677T
− C677T
− 677C>T
− 677C/T
− 677C->T
Normalizing Gene Mutations
Different types of mutation description,
including:
− positional e.g. +869(T>C)
− rsID e.g. rs100
Transform different syntax e.g.
− 1166A/C -> A1166C
− Asn to Ser substitution at codon 127 -> N127S
− +1196C/T -> C1196T)
− g.655C/A>G -> C655G, A655G
− M567V/A -> M567V, M567A
© 2016 Linguamatics Ltd25
Mutation Normalization Examples
© 2016 Linguamatics Ltd26
Range Search
Allows search for values
within a range
− in fixed fields e.g. publication
date
− within free text e.g. dosages
Can directly ask for e.g.
− patients with diabetes under
60 with BMI under 30
Can find intervals within the
text and find these when
search for a number or an
overlapping range
© 2016 Linguamatics Ltd27
Range Search with
Normalization
Range Search (Age, Date)
− Patients aged < 60yrs
− Date before 2010
Normalizing:
− Report Date, Age, Weight & BMI
© 2016 Linguamatics Ltd28
Normalization Benefits
Ability to compare measurements with
different units e.g. kg vs. lbs
Ability to perform range search for numerics,
measurements, dates
Standardized representations to link to
structured data e.g. mutation databases
Better clustering of results e.g. drug lab codes
© 2016 Linguamatics Ltd29
Real World Example: Mutation Normalization
Mucopolysaccharidosis II: Hunter Syndrome
Rare X-linked recessive disorder
Deficiency of the lysosomal enzyme
iduronate-2-sulfatase
Leads to progressive accumulation of
glycosaminoglucans throughout the body
Signs & symptoms:
− Bone deformities with joint stiffness; Frequent
respiratory infections; Cardiomyopathy;
Hepatosplenomegaly; Neurocognitive
impairment; Reduced lifespan
− Some symptoms partially improved with enzyme
replacement therapy
Spectrum of clinical severity (mild to
severe); main difference is progressive
development of neurodegeneration in the
severe form
© 2016 Linguamatics Ltd31
32
CHALLENGE
• Scarcity of knowledge of
natural history of
disease
• Sparse data, needs high
recall across full text
papers
• Mutation patterns very
variable
• Structured databases
lack broad phenotypic
association data
© 2016 Linguamatics Ltd
TEXT ANALYTICS FOR RARE DISEASES
GENOTYPE-PHENOTYPE ASSOCIATION IN HUNTER
SYNDROME
33
CHALLENGE
• Scarcity of knowledge of
natural history of
disease
• Sparse data, needs high
recall across full text
papers
• Mutation patterns very
variable
• Structured databases
lack broad phenotypic
association data
SOLUTION
• Developed workflow with
Linguamatics I2E
• Abstracts ID’ed in
MEDLINE using broad
vocabularies
• Full text PDFs processed
for text analytics
• I2E mutation ontology
and bespoke severity
vocabs enabled
extraction of genotype-
phenotype associations
BENEFIT
• Extraction of patient
mutations matched or
bettered genetic
databases
• Increased understanding
of IDS mutational
spectrum for provider
diagnostics and patient
awareness
• Enabled rational
approach to immune
response classification
© 2016 Linguamatics Ltd
TEXT ANALYTICS FOR RARE DISEASES
GENOTYPE-PHENOTYPE ASSOCIATION IN HUNTER
SYNDROME
Shire-Use case
© 2016 Linguamatics Ltd34
In Summary
Better Normalization of
− Numbers, dates, drug codes, TNM cancer stage
− Subsequent range search
− Gene mutations
In combination with a human readable open query
language EASL
− Maximises the ease and flexibility of asking complex
questions simultaneously across different content
sources
Ultimately agile NLP text mining provides
− High quality, structured, clustered & normalized results
in the format you need
− Improves speed to insight for faster decision making
© 2016 Linguamatics Ltd35

More Related Content

What's hot

II-SDV 2016 Diane Webb - Challenges in Visualizing Pharmaceutical Information...
II-SDV 2016 Diane Webb - Challenges in Visualizing Pharmaceutical Information...II-SDV 2016 Diane Webb - Challenges in Visualizing Pharmaceutical Information...
II-SDV 2016 Diane Webb - Challenges in Visualizing Pharmaceutical Information...Dr. Haxel Consult
 
ICIC 2014 New Product Presentations ChemAxon
ICIC 2014 New Product Presentations ChemAxon ICIC 2014 New Product Presentations ChemAxon
ICIC 2014 New Product Presentations ChemAxon Dr. Haxel Consult
 
II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search
II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web SearchII-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search
II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web SearchDr. Haxel Consult
 
ICIC 2014 New Product Introduction CAS
ICIC 2014 New Product Introduction CASICIC 2014 New Product Introduction CAS
ICIC 2014 New Product Introduction CASDr. Haxel Consult
 
ICIC 2014 Increasing the efficiency of pharmaceutical research through data i...
ICIC 2014 Increasing the efficiency of pharmaceutical research through data i...ICIC 2014 Increasing the efficiency of pharmaceutical research through data i...
ICIC 2014 Increasing the efficiency of pharmaceutical research through data i...Dr. Haxel Consult
 
IC-SDV 2019: Competitive Intelligence: how to optimize the analysis of pipeli...
IC-SDV 2019: Competitive Intelligence: how to optimize the analysis of pipeli...IC-SDV 2019: Competitive Intelligence: how to optimize the analysis of pipeli...
IC-SDV 2019: Competitive Intelligence: how to optimize the analysis of pipeli...Dr. Haxel Consult
 
ICIC 2014 New Product Introduction InfoChem
ICIC 2014 New Product Introduction InfoChemICIC 2014 New Product Introduction InfoChem
ICIC 2014 New Product Introduction InfoChemDr. Haxel Consult
 
ICIC 2014 Application Programming Interface (API) Technologies to Integrate C...
ICIC 2014 Application Programming Interface (API) Technologies to Integrate C...ICIC 2014 Application Programming Interface (API) Technologies to Integrate C...
ICIC 2014 Application Programming Interface (API) Technologies to Integrate C...Dr. Haxel Consult
 
ICIC 2017: New product presentations CAS
ICIC 2017: New product presentations CASICIC 2017: New product presentations CAS
ICIC 2017: New product presentations CASDr. Haxel Consult
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)OpenAIRE
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)OpenAIRE
 
7th Content Providers Community Call
7th Content Providers Community Call7th Content Providers Community Call
7th Content Providers Community CallOpenAIRE
 
SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...Fiona Nielsen
 
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)SciBite Limited
 
Spreading the word: marketing your Trusted Institutional Repository
Spreading the word: marketing your Trusted Institutional RepositorySpreading the word: marketing your Trusted Institutional Repository
Spreading the word: marketing your Trusted Institutional RepositoryIna Smith
 
New Product Introductions - CAS
New Product Introductions - CASNew Product Introductions - CAS
New Product Introductions - CASDr. Haxel Consult
 
ICIC 2014 New Product Introduction Minesoft
ICIC 2014 New Product Introduction MinesoftICIC 2014 New Product Introduction Minesoft
ICIC 2014 New Product Introduction MinesoftDr. Haxel Consult
 

What's hot (20)

II-SDV 2016 Diane Webb - Challenges in Visualizing Pharmaceutical Information...
II-SDV 2016 Diane Webb - Challenges in Visualizing Pharmaceutical Information...II-SDV 2016 Diane Webb - Challenges in Visualizing Pharmaceutical Information...
II-SDV 2016 Diane Webb - Challenges in Visualizing Pharmaceutical Information...
 
ICIC 2014 New Product Presentations ChemAxon
ICIC 2014 New Product Presentations ChemAxon ICIC 2014 New Product Presentations ChemAxon
ICIC 2014 New Product Presentations ChemAxon
 
II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search
II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web SearchII-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search
II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search
 
ICIC 2014 New Product Introduction CAS
ICIC 2014 New Product Introduction CASICIC 2014 New Product Introduction CAS
ICIC 2014 New Product Introduction CAS
 
ICIC 2014 Increasing the efficiency of pharmaceutical research through data i...
ICIC 2014 Increasing the efficiency of pharmaceutical research through data i...ICIC 2014 Increasing the efficiency of pharmaceutical research through data i...
ICIC 2014 Increasing the efficiency of pharmaceutical research through data i...
 
IC-SDV 2019: Competitive Intelligence: how to optimize the analysis of pipeli...
IC-SDV 2019: Competitive Intelligence: how to optimize the analysis of pipeli...IC-SDV 2019: Competitive Intelligence: how to optimize the analysis of pipeli...
IC-SDV 2019: Competitive Intelligence: how to optimize the analysis of pipeli...
 
ICIC 2014 New Product Introduction InfoChem
ICIC 2014 New Product Introduction InfoChemICIC 2014 New Product Introduction InfoChem
ICIC 2014 New Product Introduction InfoChem
 
ICIC 2014 Application Programming Interface (API) Technologies to Integrate C...
ICIC 2014 Application Programming Interface (API) Technologies to Integrate C...ICIC 2014 Application Programming Interface (API) Technologies to Integrate C...
ICIC 2014 Application Programming Interface (API) Technologies to Integrate C...
 
ICIC 2017: New product presentations CAS
ICIC 2017: New product presentations CASICIC 2017: New product presentations CAS
ICIC 2017: New product presentations CAS
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
 
7th Content Providers Community Call
7th Content Providers Community Call7th Content Providers Community Call
7th Content Providers Community Call
 
Scibite - We Do.
Scibite - We Do.Scibite - We Do.
Scibite - We Do.
 
SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...
 
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
 
Cas 2
Cas 2Cas 2
Cas 2
 
Spreading the word: marketing your Trusted Institutional Repository
Spreading the word: marketing your Trusted Institutional RepositorySpreading the word: marketing your Trusted Institutional Repository
Spreading the word: marketing your Trusted Institutional Repository
 
New Product Introductions - CAS
New Product Introductions - CASNew Product Introductions - CAS
New Product Introductions - CAS
 
ICIC 2014 New Product Introduction Minesoft
ICIC 2014 New Product Introduction MinesoftICIC 2014 New Product Introduction Minesoft
ICIC 2014 New Product Introduction Minesoft
 
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
 

Viewers also liked

II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...Dr. Haxel Consult
 
II-SDV 2016 Manish Sinka - Taking Patent Research platforms beyond Search
II-SDV 2016 Manish Sinka - Taking Patent Research platforms beyond SearchII-SDV 2016 Manish Sinka - Taking Patent Research platforms beyond Search
II-SDV 2016 Manish Sinka - Taking Patent Research platforms beyond SearchDr. Haxel Consult
 
II-SDV 2016 - QWAM Content Intelligence
II-SDV 2016 - QWAM Content IntelligenceII-SDV 2016 - QWAM Content Intelligence
II-SDV 2016 - QWAM Content IntelligenceDr. Haxel Consult
 
II-SDV Arne Krüger - Elastic Search & Patent Information @ mtc
II-SDV Arne Krüger - Elastic Search & Patent Information @ mtcII-SDV Arne Krüger - Elastic Search & Patent Information @ mtc
II-SDV Arne Krüger - Elastic Search & Patent Information @ mtcDr. Haxel Consult
 
II-SDV 2016 IRIX Software Engineering
II-SDV 2016 IRIX Software EngineeringII-SDV 2016 IRIX Software Engineering
II-SDV 2016 IRIX Software EngineeringDr. Haxel Consult
 
II-SDV 2016 Nils Newman - Sentiment Analysis: What your Choice of Words Says ...
II-SDV 2016 Nils Newman - Sentiment Analysis: What your Choice of Words Says ...II-SDV 2016 Nils Newman - Sentiment Analysis: What your Choice of Words Says ...
II-SDV 2016 Nils Newman - Sentiment Analysis: What your Choice of Words Says ...Dr. Haxel Consult
 
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical Literature
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical LiteratureII-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical Literature
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical LiteratureDr. Haxel Consult
 
II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...
II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...
II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...Dr. Haxel Consult
 
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla AirII-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla AirDr. Haxel Consult
 
Monitoring and Analysis of Web Information for Various Business Contexts : Co...
Monitoring and Analysis of Web Information for Various Business Contexts : Co...Monitoring and Analysis of Web Information for Various Business Contexts : Co...
Monitoring and Analysis of Web Information for Various Business Contexts : Co...Dr. Haxel Consult
 
PatSeer Introduction
PatSeer IntroductionPatSeer Introduction
PatSeer IntroductionGridlogics
 
Pathology cptr5-genetics
Pathology   cptr5-geneticsPathology   cptr5-genetics
Pathology cptr5-geneticsMBBS IMS MSU
 
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...Dr. Haxel Consult
 

Viewers also liked (14)

II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
 
II-SDV 2016 Manish Sinka - Taking Patent Research platforms beyond Search
II-SDV 2016 Manish Sinka - Taking Patent Research platforms beyond SearchII-SDV 2016 Manish Sinka - Taking Patent Research platforms beyond Search
II-SDV 2016 Manish Sinka - Taking Patent Research platforms beyond Search
 
II-SDV 2016 - QWAM Content Intelligence
II-SDV 2016 - QWAM Content IntelligenceII-SDV 2016 - QWAM Content Intelligence
II-SDV 2016 - QWAM Content Intelligence
 
II-SDV Arne Krüger - Elastic Search & Patent Information @ mtc
II-SDV Arne Krüger - Elastic Search & Patent Information @ mtcII-SDV Arne Krüger - Elastic Search & Patent Information @ mtc
II-SDV Arne Krüger - Elastic Search & Patent Information @ mtc
 
II-SDV 2016 IRIX Software Engineering
II-SDV 2016 IRIX Software EngineeringII-SDV 2016 IRIX Software Engineering
II-SDV 2016 IRIX Software Engineering
 
II-SDV 2016 Simon Fitall -
II-SDV 2016 Simon Fitall - II-SDV 2016 Simon Fitall -
II-SDV 2016 Simon Fitall -
 
II-SDV 2016 Nils Newman - Sentiment Analysis: What your Choice of Words Says ...
II-SDV 2016 Nils Newman - Sentiment Analysis: What your Choice of Words Says ...II-SDV 2016 Nils Newman - Sentiment Analysis: What your Choice of Words Says ...
II-SDV 2016 Nils Newman - Sentiment Analysis: What your Choice of Words Says ...
 
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical Literature
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical LiteratureII-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical Literature
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical Literature
 
II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...
II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...
II-SDV 2016 Michael Iarrobino - Improving Text Mining Results with Access to ...
 
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla AirII-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air
 
Monitoring and Analysis of Web Information for Various Business Contexts : Co...
Monitoring and Analysis of Web Information for Various Business Contexts : Co...Monitoring and Analysis of Web Information for Various Business Contexts : Co...
Monitoring and Analysis of Web Information for Various Business Contexts : Co...
 
PatSeer Introduction
PatSeer IntroductionPatSeer Introduction
PatSeer Introduction
 
Pathology cptr5-genetics
Pathology   cptr5-geneticsPathology   cptr5-genetics
Pathology cptr5-genetics
 
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
 

Similar to II-SDV Andrew Hinton - Text mining - as normal as data mining?

Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)Ben Gardner
 
II-SDV 2012 Making Knowledge Discoverable: The Role of Agile Text Mining
II-SDV 2012 Making Knowledge Discoverable: The Role of Agile Text MiningII-SDV 2012 Making Knowledge Discoverable: The Role of Agile Text Mining
II-SDV 2012 Making Knowledge Discoverable: The Role of Agile Text MiningDr. Haxel Consult
 
Pistoia Alliance Debates: Text Mining for Pharma R&D in a Social World (17th ...
Pistoia Alliance Debates: Text Mining for Pharma R&D in a Social World (17th ...Pistoia Alliance Debates: Text Mining for Pharma R&D in a Social World (17th ...
Pistoia Alliance Debates: Text Mining for Pharma R&D in a Social World (17th ...Pistoia Alliance
 
Allotrope foundation vanderwall_and_little_bio_it_world_2016
Allotrope foundation vanderwall_and_little_bio_it_world_2016Allotrope foundation vanderwall_and_little_bio_it_world_2016
Allotrope foundation vanderwall_and_little_bio_it_world_2016OSTHUS
 
FDA Data Standards An Update
FDA Data Standards An UpdateFDA Data Standards An Update
FDA Data Standards An Updateshakulbio
 
Low Hanging Fruit Breakout Discussion #2
Low Hanging Fruit Breakout Discussion #2 Low Hanging Fruit Breakout Discussion #2
Low Hanging Fruit Breakout Discussion #2 Pistoia Alliance
 
Pathway studio into webinar 052715v1
Pathway studio into webinar 052715v1Pathway studio into webinar 052715v1
Pathway studio into webinar 052715v1Ann-Marie Roche
 
The Pistoia Alliance Biology Domain Strategy April 2011
The Pistoia Alliance Biology Domain Strategy April 2011The Pistoia Alliance Biology Domain Strategy April 2011
The Pistoia Alliance Biology Domain Strategy April 2011Pistoia Alliance
 
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...Databricks
 
2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI ConferenceMegan Sawchuk
 
Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017
Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017
Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017Pistoia Alliance
 
Become a Medicines Discovery Catapult Partner - Glasgow
Become a Medicines Discovery Catapult Partner - GlasgowBecome a Medicines Discovery Catapult Partner - Glasgow
Become a Medicines Discovery Catapult Partner - GlasgowMedicines Discovery Catapult
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformaticscontactsoorya
 
Expert Panel on Data Challenges in Translational Research
Expert Panel on Data Challenges in Translational ResearchExpert Panel on Data Challenges in Translational Research
Expert Panel on Data Challenges in Translational ResearchEagle Genomics
 
Illumina-General-Overview-Q1-17
Illumina-General-Overview-Q1-17Illumina-General-Overview-Q1-17
Illumina-General-Overview-Q1-17Matthew Holguin
 
Seamless Dataflow with a Clinical Metadata Repository
Seamless Dataflow with a Clinical Metadata RepositorySeamless Dataflow with a Clinical Metadata Repository
Seamless Dataflow with a Clinical Metadata RepositoryPAREXEL International
 
2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovationopen_phacts
 
A Reason Able View To The Web Of Pathway Data
A Reason Able View To The Web Of Pathway DataA Reason Able View To The Web Of Pathway Data
A Reason Able View To The Web Of Pathway Dataguest9fc5f3
 
PhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco RuizPhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco RuizLuis Marco Ruiz
 

Similar to II-SDV Andrew Hinton - Text mining - as normal as data mining? (20)

Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)
 
II-SDV 2012 Making Knowledge Discoverable: The Role of Agile Text Mining
II-SDV 2012 Making Knowledge Discoverable: The Role of Agile Text MiningII-SDV 2012 Making Knowledge Discoverable: The Role of Agile Text Mining
II-SDV 2012 Making Knowledge Discoverable: The Role of Agile Text Mining
 
Pistoia Alliance Debates: Text Mining for Pharma R&D in a Social World (17th ...
Pistoia Alliance Debates: Text Mining for Pharma R&D in a Social World (17th ...Pistoia Alliance Debates: Text Mining for Pharma R&D in a Social World (17th ...
Pistoia Alliance Debates: Text Mining for Pharma R&D in a Social World (17th ...
 
Allotrope foundation vanderwall_and_little_bio_it_world_2016
Allotrope foundation vanderwall_and_little_bio_it_world_2016Allotrope foundation vanderwall_and_little_bio_it_world_2016
Allotrope foundation vanderwall_and_little_bio_it_world_2016
 
FDA Data Standards An Update
FDA Data Standards An UpdateFDA Data Standards An Update
FDA Data Standards An Update
 
Low Hanging Fruit Breakout Discussion #2
Low Hanging Fruit Breakout Discussion #2 Low Hanging Fruit Breakout Discussion #2
Low Hanging Fruit Breakout Discussion #2
 
Pathway studio into webinar 052715v1
Pathway studio into webinar 052715v1Pathway studio into webinar 052715v1
Pathway studio into webinar 052715v1
 
The Pistoia Alliance Biology Domain Strategy April 2011
The Pistoia Alliance Biology Domain Strategy April 2011The Pistoia Alliance Biology Domain Strategy April 2011
The Pistoia Alliance Biology Domain Strategy April 2011
 
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
 
2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference2016 Standardization of Laboratory Test Coding - PHI Conference
2016 Standardization of Laboratory Test Coding - PHI Conference
 
Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017
Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017
Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017
 
Become a Medicines Discovery Catapult Partner - Glasgow
Become a Medicines Discovery Catapult Partner - GlasgowBecome a Medicines Discovery Catapult Partner - Glasgow
Become a Medicines Discovery Catapult Partner - Glasgow
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformatics
 
Expert Panel on Data Challenges in Translational Research
Expert Panel on Data Challenges in Translational ResearchExpert Panel on Data Challenges in Translational Research
Expert Panel on Data Challenges in Translational Research
 
plani_prezi_34
plani_prezi_34plani_prezi_34
plani_prezi_34
 
Illumina-General-Overview-Q1-17
Illumina-General-Overview-Q1-17Illumina-General-Overview-Q1-17
Illumina-General-Overview-Q1-17
 
Seamless Dataflow with a Clinical Metadata Repository
Seamless Dataflow with a Clinical Metadata RepositorySeamless Dataflow with a Clinical Metadata Repository
Seamless Dataflow with a Clinical Metadata Repository
 
2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation2011-12-02 Open PHACTS at STM Innovation
2011-12-02 Open PHACTS at STM Innovation
 
A Reason Able View To The Web Of Pathway Data
A Reason Able View To The Web Of Pathway DataA Reason Able View To The Web Of Pathway Data
A Reason Able View To The Web Of Pathway Data
 
PhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco RuizPhD dissertation Luis Marco Ruiz
PhD dissertation Luis Marco Ruiz
 

More from Dr. Haxel Consult

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementDr. Haxel Consult
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterDr. Haxel Consult
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCDr. Haxel Consult
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
 

More from Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Recently uploaded

Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsMonica Sydney
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfJOHNBEBONYAP1
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdfMatthew Sinclair
 
一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理F
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge GraphsEleniIlkou
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsMira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsPriya Reddy
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查ydyuyu
 
Call girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girlsCall girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girlsMonica Sydney
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查ydyuyu
 
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiAbu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiMonica Sydney
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查ydyuyu
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsMonica Sydney
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirtrahman018755
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdfMatthew Sinclair
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制pxcywzqs
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtrahman018755
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdfMatthew Sinclair
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrHenryBriggs2
 
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...kumargunjan9515
 

Recently uploaded (20)

Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsMira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
Call girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girlsCall girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girls
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
 
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiAbu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
 

II-SDV Andrew Hinton - Text mining - as normal as data mining?

  • 1. Text Mining - as Normal as Data Mining? Andrew Hinton, Application Specialist IISDV 2016, Tuesday 19th April 2016, Nice
  • 2. Agenda Introduction to text mining The challenge Applications of specialised normalization solutions − Maximising Source Normalization − EASL (Extraction and Search Language ) − Allows programmatic access to unstructured data similar to SQL over structured data. − Numeric Normalization & Range search − Capturing weights between 60 and 80kg whether expressed in kilograms or pounds, for patient selection from EHRs. − Gene Mutation Normalization − Use case where gene mutations have been linked to rare disease progression. © 2016 Linguamatics Ltd2
  • 3. Answers to Our Questions are in Free Text 80% of information at companies is in free text Most of the answers to our questions are there Ever-increasing amounts of text data to examine © 2016 Linguamatics Ltd3 0 5.000.000 10.000.000 15.000.000 20.000.000 25.000.000 PubMed Records − Different kinds of documents − External literature, patents, EHRs, internal reports, blogs, presentations − Different formats − HTML, PDF, XML, Word, PPT, Wiki, TXT, HL7
  • 4. Keyword Searching © 2016 Linguamatics Ltd4 OLED Documents, Web Pages, Folders All these documents contain the keyword ‘Additive’. Read ALL the document to find the relevant bit to you
  • 5. Linguamatics in Healthcare © 2016 Linguamatics Ltd5 Electronic Health Record Enterprise Data Warehouse Pathology, radiology, initial assessment, discharge, check up Structured data Clinical Risk Monitor Patient characteristics Patient lists Clinical trials gov Patient characteristics Matching Clinical trials Patient Narrative Semantic search tags Semantic Enrichment Clinical case histories and/or genomic interpretation Patient characteristics Scientific literature
  • 6. I2E Transforms Text into Actionable Insights © 2016 Linguamatics Ltd6 Turn text Into structured data using sophisticated queries Accurate results: only retrieves relevant results Complete results: comprehensive and systematic Analytics To drive analytics Enterprise Warehouse
  • 7. Search vs. Text Mining © 2016 Linguamatics Ltd7 Text MiningSearch Engine Filter to find most relevant documents, then read News Feeds Literature Patents Internal Reports Social Media Natural Language Processing (NLP) - understand meaning © 2012 Linguamatics Ltd. Use of ontologies and clustered results Efficient review, without reading every document
  • 8. Challenges in Unstructured Data © 2016 Linguamatics Ltd Different word, same meaning cyclosporine ciclosporin Neoral Sandimmune Different expression, same meaning Non-smoker Does not smoke Does not drink or smoke Denies tobacco use Different grammar, same meaning 5mg/kg of cyclosporine per day 5mg/kg per day of cyclosporine cyclosporine 5mg/kg per day Same word, different context Diagnosed with diabetes Family history of diabetes No family history of diabetes NLP 8
  • 9. Linguistic Processing Using NLP Interprets meaning of the text Groups words into meaningful units Search for different forms of words © 2016 Linguamatics Ltd9 We find that p42mapk phosphorylates c-Myb on serine and threonine . Purified recombinant p42 MAPK was found to phosphorylate Wee1 . sentences morphology - different forms noun groups match entities verb groups match actions
  • 10. From Words to Meaning © 2016 Linguamatics Ltd10 “Among them, nimesulide, a selective COX2 inhibitor, …” Entrez Gene ID: 5743 inhibits Entrez Gene ID: 5743 inhibits Identifying entities and relations Linguistics to establish relationships
  • 11. Text Mining - as Normal as Data Mining? © 2016 Linguamatics Ltd11 CHALLENGE How can we capture information from free text as conveniently as accessing a database? One of the essential differences is the lack of normalization of terms and concepts in free text. SOLUTION NLP-based text mining provides the capability to look through unstructured text normalizing: • Keywords to concepts • Numerical data • Range Search • Gene Mutations • Content source BENEFIT A set of structured facts, relationships or assertions, from different data sources that can be used for decision support Providing tabular or visual analytics to fill data warehouses and support better patient care. Literature Patents Reports Clinical Trials
  • 14. I2E: A Fully Federated Text Mining Platform 14 Merge into a single set of results Content Server 1 Content Server 2 Content Server 3 Content Server 4 Federated Architecture
  • 15. Normalizing Data from Different Sources Single query Differently structured data sources on different servers − Journal articles (PubMed Central) on local Enterprise Server − MEDLINE on remote cloud server Single set of results © 2016 Linguamatics Ltd15
  • 16. Using EASL EASL: Extraction And Search Language
  • 17. Representing a Query in EASL 17
  • 18. EASL Example © 2016 Linguamatics Ltd18 query: document: - phrase: - class: {snid: nci.C1909, pt: Pharmacologic Substance} - treat - class: {snid: nlm.C04.588.180, pt: Breast Neoplasms} output: outputSettings: {documentsPerAssertion: -1, hitsPerDocPerAssertion: 10, outputOrdering: frequency, resultType: standard}
  • 19. Benefits of EASL Automation − Richer language for WSAPI applications − Can build a completely new query vs. adapting smart query parameters − Allows on-the-fly query production Re-use − Save, share and compare components of queries e.g. − Save out Alternatives − Load complex expressions in smart query parameters Audit − Human readable language for documenting the text mining strategy − Using open mark-up language (YAML) Conversion − Enable scripts to convert from other query languages e.g. advanced search Different interfaces − Enables 3rd party applications to create I2E queries − Developers can produce innovative specialized interfaces e.g. advanced search plus terminologies © 2016 Linguamatics Ltd19
  • 20. EASL: Enhancing the Value of Federated Search 20 Merge into a single set of results Content Server 1 Content Server 2 Content Server 3 Content Server 4 Federated Architecture
  • 21. translate2easl © 2016 Linguamatics Ltd21 Espacenet query Pubmed query espacenet2easl pubmed2easl EASL keywords + index terms EASL terminologies, linguistics … Clinical Trials OMIM FDA Drug Labels PatentsNIH Grants MEDLINE refine query
  • 22. Range Search and Normalization
  • 23. What Do We Want to Find? Patients − below 60 years old − weight ≥ 80kg − not having chemotherapy after 2010 − with a mutation C677T © 2016 Linguamatics Ltd23
  • 24. Challenge: Variety Within the Text Below 60 years old − aged 58 − 35 years old − 42-year-old − 39 y/o Weight ≥ 80kg − 267 pounds − 280 lbs − 80.4kg − 82 kilograms © 2016 Linguamatics Ltd24 After 2010 − January 21, 2011 − October of 2012 − 08/21/11 − 2012-05-04 Mutation C677T − C677T − 677C>T − 677C/T − 677C->T
  • 25. Normalizing Gene Mutations Different types of mutation description, including: − positional e.g. +869(T>C) − rsID e.g. rs100 Transform different syntax e.g. − 1166A/C -> A1166C − Asn to Ser substitution at codon 127 -> N127S − +1196C/T -> C1196T) − g.655C/A>G -> C655G, A655G − M567V/A -> M567V, M567A © 2016 Linguamatics Ltd25
  • 26. Mutation Normalization Examples © 2016 Linguamatics Ltd26
  • 27. Range Search Allows search for values within a range − in fixed fields e.g. publication date − within free text e.g. dosages Can directly ask for e.g. − patients with diabetes under 60 with BMI under 30 Can find intervals within the text and find these when search for a number or an overlapping range © 2016 Linguamatics Ltd27
  • 28. Range Search with Normalization Range Search (Age, Date) − Patients aged < 60yrs − Date before 2010 Normalizing: − Report Date, Age, Weight & BMI © 2016 Linguamatics Ltd28
  • 29. Normalization Benefits Ability to compare measurements with different units e.g. kg vs. lbs Ability to perform range search for numerics, measurements, dates Standardized representations to link to structured data e.g. mutation databases Better clustering of results e.g. drug lab codes © 2016 Linguamatics Ltd29
  • 30. Real World Example: Mutation Normalization
  • 31. Mucopolysaccharidosis II: Hunter Syndrome Rare X-linked recessive disorder Deficiency of the lysosomal enzyme iduronate-2-sulfatase Leads to progressive accumulation of glycosaminoglucans throughout the body Signs & symptoms: − Bone deformities with joint stiffness; Frequent respiratory infections; Cardiomyopathy; Hepatosplenomegaly; Neurocognitive impairment; Reduced lifespan − Some symptoms partially improved with enzyme replacement therapy Spectrum of clinical severity (mild to severe); main difference is progressive development of neurodegeneration in the severe form © 2016 Linguamatics Ltd31
  • 32. 32 CHALLENGE • Scarcity of knowledge of natural history of disease • Sparse data, needs high recall across full text papers • Mutation patterns very variable • Structured databases lack broad phenotypic association data © 2016 Linguamatics Ltd TEXT ANALYTICS FOR RARE DISEASES GENOTYPE-PHENOTYPE ASSOCIATION IN HUNTER SYNDROME
  • 33. 33 CHALLENGE • Scarcity of knowledge of natural history of disease • Sparse data, needs high recall across full text papers • Mutation patterns very variable • Structured databases lack broad phenotypic association data SOLUTION • Developed workflow with Linguamatics I2E • Abstracts ID’ed in MEDLINE using broad vocabularies • Full text PDFs processed for text analytics • I2E mutation ontology and bespoke severity vocabs enabled extraction of genotype- phenotype associations BENEFIT • Extraction of patient mutations matched or bettered genetic databases • Increased understanding of IDS mutational spectrum for provider diagnostics and patient awareness • Enabled rational approach to immune response classification © 2016 Linguamatics Ltd TEXT ANALYTICS FOR RARE DISEASES GENOTYPE-PHENOTYPE ASSOCIATION IN HUNTER SYNDROME
  • 34. Shire-Use case © 2016 Linguamatics Ltd34
  • 35. In Summary Better Normalization of − Numbers, dates, drug codes, TNM cancer stage − Subsequent range search − Gene mutations In combination with a human readable open query language EASL − Maximises the ease and flexibility of asking complex questions simultaneously across different content sources Ultimately agile NLP text mining provides − High quality, structured, clustered & normalized results in the format you need − Improves speed to insight for faster decision making © 2016 Linguamatics Ltd35