SlideShare a Scribd company logo
Text Mining: It's About Time
Andrew Hinton & David Milward
II-SDV, Nice France 20th April 2015
Overview: Text Mining, It's About Time !
Introducing text mining
Why text mining has now come of age
Speed to insight
Real time data & timeliness of data
Temporal nature of data
Future challenges
© Linguamatics 2015
What is Text Mining?
"Text Mining is the discovery by computer of
new, previously unknown information, by
automatically extracting and relating
information from different written resources,
to reveal otherwise "hidden" meanings. A
key element is the linking together of the
extracted information to form new facts or
new hypotheses to be explored further by
more conventional means of
experimentation."
Marti Hearst, UC Berkeley
© Linguamatics 2015
How is it being used in Life Sciences?
Advanced text analytics delivers value along the pipeline
© Linguamatics 2015
Gene-disease
mapping
Target ID/selection
Mutation/expression
analysis
Toxicity analysis and
prediction
Biomarker discovery
Drug repurposing
Patent analysis
KOL identification
Opportunity scouting
Trial site selection and study design
Safety
Competitive intelligence
Pharmacovigilance
Social media
analysis
Comparative Effectiveness
Regulatory Submission QC
HEOR
SAR
Challenges
Most of the information required is in free text
Ever-increasing amounts of text data to
examine
© Linguamatics 2015
0
5.000.000
10.000.000
15.000.000
20.000.000
25.000.000
PubMed Records
− Different kinds of
documents
− External literature,
patents, EHRs, internal
reports, blogs,
presentations
− Different formats
− HTML, PDF, XML, Word,
PPT, Wiki
Challenges in Unstructured Data
© Linguamatics 2015
Different word, same
meaning
cyclosporine
ciclosporin
Neoral
Sandimmune
Different expression, same
meaning
Non-smoker
Does not smoke
Does not drink or smoke
Denies tobacco use
Different grammar, same
meaning
5mg/kg of cyclosporine per day
5mg/kg per day of cyclosporine
cyclosporine 5mg/kg per day
Same word, different
context
Diagnosed with diabetes
Family history of diabetes
No family history of diabetes
NLP
From Words to Meaning
© Linguamatics 2015
“Among them, nimesulide, a selective COX2 inhibitor, …”
Entrez Gene ID:
5743
inhibits
Entrez Gene ID: 5743
inhibits
Identifying
entities and
relations
Linguistics to establish relationships
Finding Indirect Relationships
Treatment has been applied in clinical trials
8
Thalidomide in advanced hepatocellular
carcinoma as antiangiogenic treatment
approach: a phase I/II trial.
Pinter et al.
Eur J Gastroenterol Hepatol. 2008
20(10):1012-9
Phase II study of temozolomide,
thalidomide, and celecoxib for newly
diagnosed glioblastoma in adults.
Kesari et al.
Neuro Oncol. 2008 10(3):300-8
<Thalidomide>-<Relationship>-<Gene> <Gene>-<Relationship>-<Angiogenic Process>
Modes of Use
© Linguamatics 2015 - Confidential
Reusable
pipelines
•Decision
support
•Knowledge
capture
•Classification/
mark-up
•Capture and
re-use
strategies
•Semantic
categorization
9
Speed to Insight
© Linguamatics 2015
Time = Money
© Linguamatics 2015
Speed to insight Example I: Patents
© Linguamatics 2015
Business Impact and value
© Linguamatics 2015
Leveraging Text Analytics in Patents to
Empower Business Decisions
© Linguamatics 2015
Not for
profit
Education
Research Biotech
Pharma
Medical
devices
ICT
Funder
s
Approvers
Government
Patient
Payers
Prescribers
Providers
Dispensers
Electronic
Health
Record
EHRs & Healthcare Challenges
© Linguamatics 2015
The challenge is
to unlock the
value of the huge
investment being
made in EHRs
“Natural language processing (NLP)
and visualization dashboards are the technologies most
suitable to improve EHR usability. NLP can produce
readable summaries of unstructured text, helping
clinicians retrieve information needed for point-of-care
decision making”
Frost and Sullivan, 2014
© Linguamatics 2015
CHALLENGE
Identifying disease
comorbidities for study via
patient narratives and
disease codes is often
slow and manual. To find
700 patients with HIV and
Hepatitis C took 5 medical
students 4 months.
SPEED TO INSIGHT EXAMPLE
COHORT SELECTION
MINING PATIENT RECORDS FOR DISEASE COMORBIDITIES
SOLUTION
Using text mining queries
for disease codes and
terminology took less than
half a day to identify 1100
patients.
BENEFIT
Patient groups can be
quickly identified from
both structured and
unstructured text.
Identifying new disease
cohorts is easy and can be
quickly iterated to select
new groups for study.
Real time data & timeliness of data
© Linguamatics 2015
Patent Analytics with I2E
Comprehensive Effective Search For Patent
Landscaping
CHALLENGE
Patents are a valuable
source of novel data.
Identifying drug targets
for specific indications is
often slow and manual, as
patents are long and the
language obtuse. To find
targets for 3 therapeutic
areas took 50 FTE days.
SOLUTION
A pipeline was built that
used queries to extract
target, indication,
invention type and
organisations and feed
into a database. Recall
was 10x manual, with
good precision; plus
target relevance scores.
BENEFIT
The integrated process
drastically reduces the
FTEs required to keep the
organization up-to-date
on recent findings
published in the patent
literature.
© Linguamatics 2015
Temporal Nature of Data
© Linguamatics 2015
Diagnose Cancer Earlier: Pulmonary Nodule
© Linguamatics 2015
Early diagnosis of lung cancer is limited because predictive
models rely on a combination of structured and textual data
For example:
Cancer Risk
Low Intermediate High
Nodule size, diameter (mm) <8 8 to 20 >20
Age, yr <45 45 to 60 >60
Prior cancer history No prior cancer Prior cancer history
Tobacco use (pack/day) Never smoked 1 >1
Smoking cessation Quit > 7 yr ago Quit <7 yr ago Never quit
Chronic obstructive lung disease No COPD COPD
Asbestos exposure No exposure Exposure
Nodule characteristics Smooth Lobulated Spiculated
Temporal Nature of Data
What is the challenge
− Temporal attributes of an individual “event” e.g.
cancer ‘v’ previous history of cancer (before ‘v’ after)
− Emerging hypotheses e.g. “X may represent a novel
technique for Y”
− Temporal nature of corpora e.g. Published literature-
Grants-Patents
Examples
− I2B2 Challenge
− Opposition based searching
− Patents-NIH-Grants-Medline
© Linguamatics 2015
I2B2 2014 Cardiac Risk Factors
The challenge is to extract a fixed set of Cardiac Risk
factors
Risk factors include:
− medications, mentions of diabetes, hypertension,
hyperlipidaemia, obesity, glucose/LDL/A1C/BMI test results,
“cardiac events”, family history of Coronary Artery Disease,
smoking etc.
Each annotation must also be given a temporal relation
to the document i.e.
− the patient had a heart attack BEFORE the day of the report
− the patient’s LDL was tested DURING the day of the report
There might be multiple annotations if the risk factor is
ongoing
− Diabetes is probably going to be BEFORE, DURING and AFTER
Precision: 89.8%, Recall: 93.8%, F1-score: 91.7%
© Linguamatics 2015
Key Insights: Temporal Data
Events tend to have a "default" time if no appropriate language or
dates are mentioned
− "Medications: Metformin, Aspirin" -> presumed to be continuing
Language to express temporal relations also depends on what you are
trying to extract
− "Patient discontinued metformin" -> the patient took the drug
before the report but is not continuing it
− "Starting a course of metformin" -> the patient will start the
course after the report but did not take it before the report
− "Avoid metformin" -> the patient will stop taking the drug but
took it before the report
− "Patient had Myocardial Infarction this morning" -> use pronouns
to establish relation to report (on the date the report was written)
− "previous A1c was 6.5" -> use temporal adjectives and the tense
of the verb for test results
© Linguamatics 2015
Key Insights: Temporal Data
Reports are often written after the event was described,
however, so you can't always rely on the tense of the verb
− "Her BP today was 120/90"
Extracting a date within a few words of the event often
implies the event took place in the past
− "10/12 Pt brought in after Myocardial Infarction"
− "LDL from 10/11/09 120"
© Linguamatics 2015
Opposition Searching
Single search over multiple data on different
servers providing a single set of results
Information from differently structured data is
brought together and ordered by year
© Linguamatics 2015
Connected Data Technology
© Linguamatics 2015
Single Query over
Multiple Data Sources and Network Locations
Challenges: A Big Data Future
High indexing performance
• Millions of documents, TBs of storage
• Ontologies with 100,000s of terms
• Handles large documents with ease
• Open, configurable pipeline
• Advanced table processing
VOLUME VARIETY VELOCITY VISUALIZATION
Connected data technology
• Unified heterogeneous document types across federated servers
• Connect – Normalize – Use
• Structured, semi-structured and unstructured
Distributed indexing and querying
• Multi-processor
• Multi-machine
Integrates in enterprise applications, portals, pipelines and workflows
• Open web services API
• Public query language
Strong integrated visualization
© Linguamatics 2015
Conclusion: Text mining; It’s About Time To Start Using It !
Use of text mining demonstrates clear value in
the Pharma & Healthcare sectors
− Time to insight
− Timeliness of data
− Real time data
− Temporal data
Technology improvements make possible real-
time, Text mining that is both agile and scalable
in a world of big Data
© Linguamatics 2015
Question Time
© Linguamatics 2015
Thanks to :
David Milward, James Cormack, Jane Reed, Simon Beaulah & Phil Hastings
Linguamatics Text Mining Summit
October 12-14 2015, Newport RI
www.linguamatics.com/textminingsummit
Featuring customer use cases in the life sciences and healthcare, hands-on training,
and new healthcare hackathon
© Linguamatics 2015

More Related Content

What's hot

The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
Jisc
 
THOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSTHOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOS
Maaike Duine
 
Pushing back, standards and standard organizations in a Semantic Web enabled ...
Pushing back, standards and standard organizations in a Semantic Web enabled ...Pushing back, standards and standard organizations in a Semantic Web enabled ...
Pushing back, standards and standard organizations in a Semantic Web enabled ...
Kerstin Forsberg
 
Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)
Ben Gardner
 
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
TranSMART: How open source software revolutionizes drug discovery through cro...
TranSMART: How open source software revolutionizes drug discovery through cro...TranSMART: How open source software revolutionizes drug discovery through cro...
TranSMART: How open source software revolutionizes drug discovery through cro...
keesvb
 
THOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing ElsevierTHOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing Elsevier
Maaike Duine
 
THOR Workshop - Data Publishing
THOR Workshop - Data PublishingTHOR Workshop - Data Publishing
THOR Workshop - Data Publishing
Maaike Duine
 
Big Data in Genomics: Opportunities and Challenges
Big Data in Genomics: Opportunities and ChallengesBig Data in Genomics: Opportunities and Challenges
Big Data in Genomics: Opportunities and Challenges
Matthieu Schapranow
 
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
Kerstin Forsberg
 
Fair by design
Fair by designFair by design
Fair by design
Pistoia Alliance
 
Linked Data for Biopharma
Linked Data for BiopharmaLinked Data for Biopharma
Linked Data for Biopharma
Tom Plasterer
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
SciBite Limited
 
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Tom Plasterer
 
Link Analysis of Life Sciences Linked Data
Link Analysis of Life Sciences Linked DataLink Analysis of Life Sciences Linked Data
Link Analysis of Life Sciences Linked Data
Michel Dumontier
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
Carole Goble
 
Alain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersAlain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producers
Incisive_Events
 
The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...
Hilmar Lapp
 
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
Microsoft Azure for Research
 
Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* Data
Tom Plasterer
 

What's hot (20)

The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
 
THOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSTHOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOS
 
Pushing back, standards and standard organizations in a Semantic Web enabled ...
Pushing back, standards and standard organizations in a Semantic Web enabled ...Pushing back, standards and standard organizations in a Semantic Web enabled ...
Pushing back, standards and standard organizations in a Semantic Web enabled ...
 
Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)
 
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
 
TranSMART: How open source software revolutionizes drug discovery through cro...
TranSMART: How open source software revolutionizes drug discovery through cro...TranSMART: How open source software revolutionizes drug discovery through cro...
TranSMART: How open source software revolutionizes drug discovery through cro...
 
THOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing ElsevierTHOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing Elsevier
 
THOR Workshop - Data Publishing
THOR Workshop - Data PublishingTHOR Workshop - Data Publishing
THOR Workshop - Data Publishing
 
Big Data in Genomics: Opportunities and Challenges
Big Data in Genomics: Opportunities and ChallengesBig Data in Genomics: Opportunities and Challenges
Big Data in Genomics: Opportunities and Challenges
 
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
 
Fair by design
Fair by designFair by design
Fair by design
 
Linked Data for Biopharma
Linked Data for BiopharmaLinked Data for Biopharma
Linked Data for Biopharma
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
 
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
 
Link Analysis of Life Sciences Linked Data
Link Analysis of Life Sciences Linked DataLink Analysis of Life Sciences Linked Data
Link Analysis of Life Sciences Linked Data
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
Alain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersAlain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producers
 
The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...
 
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
 
Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* Data
 

Viewers also liked

II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April 2015 in Nice
II-SDV 2015, 20 - 21 April 2015 in NiceII-SDV 2015, 20 - 21 April 2015 in Nice
II-SDV 2015, 20 - 21 April 2015 in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 21 - 21 April, in Nice
II-SDV 2015, 21 - 21 April, in NiceII-SDV 2015, 21 - 21 April, in Nice
II-SDV 2015, 21 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
II-SDV 2015 The International Information Conference on Search, Data Mining a...
II-SDV 2015 The International Information Conference on Search, Data Mining a...II-SDV 2015 The International Information Conference on Search, Data Mining a...
II-SDV 2015 The International Information Conference on Search, Data Mining a...
Dr. Haxel Consult
 
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
Dr. Haxel Consult
 

Viewers also liked (16)

II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 20 - 21 April 2015 in Nice
II-SDV 2015, 20 - 21 April 2015 in NiceII-SDV 2015, 20 - 21 April 2015 in Nice
II-SDV 2015, 20 - 21 April 2015 in Nice
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015, 21 - 21 April, in Nice
II-SDV 2015, 21 - 21 April, in NiceII-SDV 2015, 21 - 21 April, in Nice
II-SDV 2015, 21 - 21 April, in Nice
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2015 The International Information Conference on Search, Data Mining a...
II-SDV 2015 The International Information Conference on Search, Data Mining a...II-SDV 2015 The International Information Conference on Search, Data Mining a...
II-SDV 2015 The International Information Conference on Search, Data Mining a...
 
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...
 

Similar to II-SDV 2015, 20 - 21 April, in Nice

Big data, RWE and AI in Clinical Trials made simple
Big data, RWE and AI in Clinical Trials made simpleBig data, RWE and AI in Clinical Trials made simple
Big data, RWE and AI in Clinical Trials made simple
Hadas Jacoby
 
Watson – Beyond Jeopardy
Watson – Beyond Jeopardy Watson – Beyond Jeopardy
Connected Health & Me - Matic Meglic - Nov 24th 2014
Connected Health & Me - Matic Meglic - Nov 24th 2014Connected Health & Me - Matic Meglic - Nov 24th 2014
Connected Health & Me - Matic Meglic - Nov 24th 2014
ipposi
 
An Introduction to Health Informatics
An Introduction to Health InformaticsAn Introduction to Health Informatics
An Introduction to Health Informatics
Health Informatics New Zealand
 
Supporting a Collaborative R&D Organization with a Dynamic Big Data Solution
Supporting a Collaborative R&D Organization with a Dynamic Big Data SolutionSupporting a Collaborative R&D Organization with a Dynamic Big Data Solution
Supporting a Collaborative R&D Organization with a Dynamic Big Data Solution
Saama
 
Improving health care outcomes with responsible data science
Improving health care outcomes with responsible data scienceImproving health care outcomes with responsible data science
Improving health care outcomes with responsible data science
Wessel Kraaij
 
2016 LabHIT Vision
2016 LabHIT Vision2016 LabHIT Vision
2016 LabHIT Vision
Megan Sawchuk
 
Simplifying semantics for biomedical applications
Simplifying semantics for biomedical applicationsSimplifying semantics for biomedical applications
Simplifying semantics for biomedical applications
Semantic Web San Diego
 
Connectivity Needs for Improved Patient Care
Connectivity Needs for Improved Patient CareConnectivity Needs for Improved Patient Care
Connectivity Needs for Improved Patient Care
SystemOne
 
A Standards-based Approach to Development of Clinical Registries - Initial Le...
A Standards-based Approach to Development of Clinical Registries - Initial Le...A Standards-based Approach to Development of Clinical Registries - Initial Le...
A Standards-based Approach to Development of Clinical Registries - Initial Le...
Koray Atalag
 
Why Electronic Health Record Strategies are like Hemlines
Why Electronic Health Record Strategies are like HemlinesWhy Electronic Health Record Strategies are like Hemlines
Why Electronic Health Record Strategies are like Hemlines
Health Informatics New Zealand
 
Project Management_DM Presentation
Project Management_DM PresentationProject Management_DM Presentation
Project Management_DM Presentation
Daniela Margiotta
 
Apport du Big Data pour la médecine personnalisée
Apport du Big Data pour la médecine personnaliséeApport du Big Data pour la médecine personnalisée
Apport du Big Data pour la médecine personnalisée
TELECOM-PARISTECH-SANTE
 
Intel telecom paris tech 20160616
Intel   telecom paris tech 20160616Intel   telecom paris tech 20160616
Intel telecom paris tech 20160616
Alain Tassy
 
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Perficient, Inc.
 
Achieving Outcomes for NZ using Health IT
Achieving Outcomes for NZ using Health IT Achieving Outcomes for NZ using Health IT
Achieving Outcomes for NZ using Health IT
Health Informatics New Zealand
 
High Performance Computing and the Opportunity with Cognitive Technology
 High Performance Computing and the Opportunity with Cognitive Technology High Performance Computing and the Opportunity with Cognitive Technology
High Performance Computing and the Opportunity with Cognitive Technology
IBM Watson
 
Health-tech forum, February 2017, London
Health-tech forum, February 2017, LondonHealth-tech forum, February 2017, London
Health-tech forum, February 2017, London
Browne Jacobson LLP
 
Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Using Healthcare Data for Research @ The Hyve - Campus Party 2016Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Kees van Bochove
 
Improving health care outcomes with responsible data science #escience2018
Improving health care outcomes with responsible data science #escience2018Improving health care outcomes with responsible data science #escience2018
Improving health care outcomes with responsible data science #escience2018
Wessel Kraaij
 

Similar to II-SDV 2015, 20 - 21 April, in Nice (20)

Big data, RWE and AI in Clinical Trials made simple
Big data, RWE and AI in Clinical Trials made simpleBig data, RWE and AI in Clinical Trials made simple
Big data, RWE and AI in Clinical Trials made simple
 
Watson – Beyond Jeopardy
Watson – Beyond Jeopardy Watson – Beyond Jeopardy
Watson – Beyond Jeopardy
 
Connected Health & Me - Matic Meglic - Nov 24th 2014
Connected Health & Me - Matic Meglic - Nov 24th 2014Connected Health & Me - Matic Meglic - Nov 24th 2014
Connected Health & Me - Matic Meglic - Nov 24th 2014
 
An Introduction to Health Informatics
An Introduction to Health InformaticsAn Introduction to Health Informatics
An Introduction to Health Informatics
 
Supporting a Collaborative R&D Organization with a Dynamic Big Data Solution
Supporting a Collaborative R&D Organization with a Dynamic Big Data SolutionSupporting a Collaborative R&D Organization with a Dynamic Big Data Solution
Supporting a Collaborative R&D Organization with a Dynamic Big Data Solution
 
Improving health care outcomes with responsible data science
Improving health care outcomes with responsible data scienceImproving health care outcomes with responsible data science
Improving health care outcomes with responsible data science
 
2016 LabHIT Vision
2016 LabHIT Vision2016 LabHIT Vision
2016 LabHIT Vision
 
Simplifying semantics for biomedical applications
Simplifying semantics for biomedical applicationsSimplifying semantics for biomedical applications
Simplifying semantics for biomedical applications
 
Connectivity Needs for Improved Patient Care
Connectivity Needs for Improved Patient CareConnectivity Needs for Improved Patient Care
Connectivity Needs for Improved Patient Care
 
A Standards-based Approach to Development of Clinical Registries - Initial Le...
A Standards-based Approach to Development of Clinical Registries - Initial Le...A Standards-based Approach to Development of Clinical Registries - Initial Le...
A Standards-based Approach to Development of Clinical Registries - Initial Le...
 
Why Electronic Health Record Strategies are like Hemlines
Why Electronic Health Record Strategies are like HemlinesWhy Electronic Health Record Strategies are like Hemlines
Why Electronic Health Record Strategies are like Hemlines
 
Project Management_DM Presentation
Project Management_DM PresentationProject Management_DM Presentation
Project Management_DM Presentation
 
Apport du Big Data pour la médecine personnalisée
Apport du Big Data pour la médecine personnaliséeApport du Big Data pour la médecine personnalisée
Apport du Big Data pour la médecine personnalisée
 
Intel telecom paris tech 20160616
Intel   telecom paris tech 20160616Intel   telecom paris tech 20160616
Intel telecom paris tech 20160616
 
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
 
Achieving Outcomes for NZ using Health IT
Achieving Outcomes for NZ using Health IT Achieving Outcomes for NZ using Health IT
Achieving Outcomes for NZ using Health IT
 
High Performance Computing and the Opportunity with Cognitive Technology
 High Performance Computing and the Opportunity with Cognitive Technology High Performance Computing and the Opportunity with Cognitive Technology
High Performance Computing and the Opportunity with Cognitive Technology
 
Health-tech forum, February 2017, London
Health-tech forum, February 2017, LondonHealth-tech forum, February 2017, London
Health-tech forum, February 2017, London
 
Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Using Healthcare Data for Research @ The Hyve - Campus Party 2016Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Using Healthcare Data for Research @ The Hyve - Campus Party 2016
 
Improving health care outcomes with responsible data science #escience2018
Improving health care outcomes with responsible data science #escience2018Improving health care outcomes with responsible data science #escience2018
Improving health care outcomes with responsible data science #escience2018
 

More from Dr. Haxel Consult

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
Dr. Haxel Consult
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
Dr. Haxel Consult
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
Dr. Haxel Consult
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
Dr. Haxel Consult
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
Dr. Haxel Consult
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
Dr. Haxel Consult
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
Dr. Haxel Consult
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
Dr. Haxel Consult
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
Dr. Haxel Consult
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
Dr. Haxel Consult
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
Dr. Haxel Consult
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
Dr. Haxel Consult
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
Dr. Haxel Consult
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
Dr. Haxel Consult
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
Dr. Haxel Consult
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
Dr. Haxel Consult
 

More from Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

II-SDV 2015, 20 - 21 April, in Nice

  • 1. Text Mining: It's About Time Andrew Hinton & David Milward II-SDV, Nice France 20th April 2015
  • 2. Overview: Text Mining, It's About Time ! Introducing text mining Why text mining has now come of age Speed to insight Real time data & timeliness of data Temporal nature of data Future challenges © Linguamatics 2015
  • 3. What is Text Mining? "Text Mining is the discovery by computer of new, previously unknown information, by automatically extracting and relating information from different written resources, to reveal otherwise "hidden" meanings. A key element is the linking together of the extracted information to form new facts or new hypotheses to be explored further by more conventional means of experimentation." Marti Hearst, UC Berkeley © Linguamatics 2015
  • 4. How is it being used in Life Sciences? Advanced text analytics delivers value along the pipeline © Linguamatics 2015 Gene-disease mapping Target ID/selection Mutation/expression analysis Toxicity analysis and prediction Biomarker discovery Drug repurposing Patent analysis KOL identification Opportunity scouting Trial site selection and study design Safety Competitive intelligence Pharmacovigilance Social media analysis Comparative Effectiveness Regulatory Submission QC HEOR SAR
  • 5. Challenges Most of the information required is in free text Ever-increasing amounts of text data to examine © Linguamatics 2015 0 5.000.000 10.000.000 15.000.000 20.000.000 25.000.000 PubMed Records − Different kinds of documents − External literature, patents, EHRs, internal reports, blogs, presentations − Different formats − HTML, PDF, XML, Word, PPT, Wiki
  • 6. Challenges in Unstructured Data © Linguamatics 2015 Different word, same meaning cyclosporine ciclosporin Neoral Sandimmune Different expression, same meaning Non-smoker Does not smoke Does not drink or smoke Denies tobacco use Different grammar, same meaning 5mg/kg of cyclosporine per day 5mg/kg per day of cyclosporine cyclosporine 5mg/kg per day Same word, different context Diagnosed with diabetes Family history of diabetes No family history of diabetes NLP
  • 7. From Words to Meaning © Linguamatics 2015 “Among them, nimesulide, a selective COX2 inhibitor, …” Entrez Gene ID: 5743 inhibits Entrez Gene ID: 5743 inhibits Identifying entities and relations Linguistics to establish relationships
  • 8. Finding Indirect Relationships Treatment has been applied in clinical trials 8 Thalidomide in advanced hepatocellular carcinoma as antiangiogenic treatment approach: a phase I/II trial. Pinter et al. Eur J Gastroenterol Hepatol. 2008 20(10):1012-9 Phase II study of temozolomide, thalidomide, and celecoxib for newly diagnosed glioblastoma in adults. Kesari et al. Neuro Oncol. 2008 10(3):300-8 <Thalidomide>-<Relationship>-<Gene> <Gene>-<Relationship>-<Angiogenic Process>
  • 9. Modes of Use © Linguamatics 2015 - Confidential Reusable pipelines •Decision support •Knowledge capture •Classification/ mark-up •Capture and re-use strategies •Semantic categorization 9
  • 10. Speed to Insight © Linguamatics 2015
  • 11. Time = Money © Linguamatics 2015
  • 12. Speed to insight Example I: Patents © Linguamatics 2015
  • 13. Business Impact and value © Linguamatics 2015
  • 14. Leveraging Text Analytics in Patents to Empower Business Decisions © Linguamatics 2015
  • 15. Not for profit Education Research Biotech Pharma Medical devices ICT Funder s Approvers Government Patient Payers Prescribers Providers Dispensers Electronic Health Record EHRs & Healthcare Challenges © Linguamatics 2015 The challenge is to unlock the value of the huge investment being made in EHRs “Natural language processing (NLP) and visualization dashboards are the technologies most suitable to improve EHR usability. NLP can produce readable summaries of unstructured text, helping clinicians retrieve information needed for point-of-care decision making” Frost and Sullivan, 2014
  • 16. © Linguamatics 2015 CHALLENGE Identifying disease comorbidities for study via patient narratives and disease codes is often slow and manual. To find 700 patients with HIV and Hepatitis C took 5 medical students 4 months. SPEED TO INSIGHT EXAMPLE COHORT SELECTION MINING PATIENT RECORDS FOR DISEASE COMORBIDITIES SOLUTION Using text mining queries for disease codes and terminology took less than half a day to identify 1100 patients. BENEFIT Patient groups can be quickly identified from both structured and unstructured text. Identifying new disease cohorts is easy and can be quickly iterated to select new groups for study.
  • 17. Real time data & timeliness of data © Linguamatics 2015
  • 18. Patent Analytics with I2E Comprehensive Effective Search For Patent Landscaping CHALLENGE Patents are a valuable source of novel data. Identifying drug targets for specific indications is often slow and manual, as patents are long and the language obtuse. To find targets for 3 therapeutic areas took 50 FTE days. SOLUTION A pipeline was built that used queries to extract target, indication, invention type and organisations and feed into a database. Recall was 10x manual, with good precision; plus target relevance scores. BENEFIT The integrated process drastically reduces the FTEs required to keep the organization up-to-date on recent findings published in the patent literature. © Linguamatics 2015
  • 19. Temporal Nature of Data © Linguamatics 2015
  • 20. Diagnose Cancer Earlier: Pulmonary Nodule © Linguamatics 2015 Early diagnosis of lung cancer is limited because predictive models rely on a combination of structured and textual data For example: Cancer Risk Low Intermediate High Nodule size, diameter (mm) <8 8 to 20 >20 Age, yr <45 45 to 60 >60 Prior cancer history No prior cancer Prior cancer history Tobacco use (pack/day) Never smoked 1 >1 Smoking cessation Quit > 7 yr ago Quit <7 yr ago Never quit Chronic obstructive lung disease No COPD COPD Asbestos exposure No exposure Exposure Nodule characteristics Smooth Lobulated Spiculated
  • 21. Temporal Nature of Data What is the challenge − Temporal attributes of an individual “event” e.g. cancer ‘v’ previous history of cancer (before ‘v’ after) − Emerging hypotheses e.g. “X may represent a novel technique for Y” − Temporal nature of corpora e.g. Published literature- Grants-Patents Examples − I2B2 Challenge − Opposition based searching − Patents-NIH-Grants-Medline © Linguamatics 2015
  • 22. I2B2 2014 Cardiac Risk Factors The challenge is to extract a fixed set of Cardiac Risk factors Risk factors include: − medications, mentions of diabetes, hypertension, hyperlipidaemia, obesity, glucose/LDL/A1C/BMI test results, “cardiac events”, family history of Coronary Artery Disease, smoking etc. Each annotation must also be given a temporal relation to the document i.e. − the patient had a heart attack BEFORE the day of the report − the patient’s LDL was tested DURING the day of the report There might be multiple annotations if the risk factor is ongoing − Diabetes is probably going to be BEFORE, DURING and AFTER Precision: 89.8%, Recall: 93.8%, F1-score: 91.7% © Linguamatics 2015
  • 23. Key Insights: Temporal Data Events tend to have a "default" time if no appropriate language or dates are mentioned − "Medications: Metformin, Aspirin" -> presumed to be continuing Language to express temporal relations also depends on what you are trying to extract − "Patient discontinued metformin" -> the patient took the drug before the report but is not continuing it − "Starting a course of metformin" -> the patient will start the course after the report but did not take it before the report − "Avoid metformin" -> the patient will stop taking the drug but took it before the report − "Patient had Myocardial Infarction this morning" -> use pronouns to establish relation to report (on the date the report was written) − "previous A1c was 6.5" -> use temporal adjectives and the tense of the verb for test results © Linguamatics 2015
  • 24. Key Insights: Temporal Data Reports are often written after the event was described, however, so you can't always rely on the tense of the verb − "Her BP today was 120/90" Extracting a date within a few words of the event often implies the event took place in the past − "10/12 Pt brought in after Myocardial Infarction" − "LDL from 10/11/09 120" © Linguamatics 2015
  • 25. Opposition Searching Single search over multiple data on different servers providing a single set of results Information from differently structured data is brought together and ordered by year © Linguamatics 2015
  • 26. Connected Data Technology © Linguamatics 2015 Single Query over Multiple Data Sources and Network Locations
  • 27. Challenges: A Big Data Future High indexing performance • Millions of documents, TBs of storage • Ontologies with 100,000s of terms • Handles large documents with ease • Open, configurable pipeline • Advanced table processing VOLUME VARIETY VELOCITY VISUALIZATION Connected data technology • Unified heterogeneous document types across federated servers • Connect – Normalize – Use • Structured, semi-structured and unstructured Distributed indexing and querying • Multi-processor • Multi-machine Integrates in enterprise applications, portals, pipelines and workflows • Open web services API • Public query language Strong integrated visualization © Linguamatics 2015
  • 28. Conclusion: Text mining; It’s About Time To Start Using It ! Use of text mining demonstrates clear value in the Pharma & Healthcare sectors − Time to insight − Timeliness of data − Real time data − Temporal data Technology improvements make possible real- time, Text mining that is both agile and scalable in a world of big Data © Linguamatics 2015
  • 29. Question Time © Linguamatics 2015 Thanks to : David Milward, James Cormack, Jane Reed, Simon Beaulah & Phil Hastings
  • 30. Linguamatics Text Mining Summit October 12-14 2015, Newport RI www.linguamatics.com/textminingsummit Featuring customer use cases in the life sciences and healthcare, hands-on training, and new healthcare hackathon © Linguamatics 2015