SlideShare a Scribd company logo
1 of 10
Preparing your
taxonomy to
be ready for data
scientists & machine
readability: A case study
and work in progress
Mary Chitty,
Library Director &
Taxonomist, MSLS
Cambridge Healthtech,
Needham MA
mchitty@healthtech.com
SLA Annual Conference, Cleveland Ohio, Tuesday, June 18, 2019 ,
Taxonomy-Ontology Conversions: Case Studies
1992
2000
2006-14
2016 2018-19
Historical Taxonomy Process
Taxonomies & Ontologies glossary&taxonomy http://www.genomicglossaries.com/content/ontologies.asp
Company founded.
Taxonomy created by CEO
with a few hundred terms.
Major products:
conferences on emerging
technologies. focus on
preclinical drug discovery.
Acquired companies dealing
with bioinformatics, clinical
trials, energy and batteries.
Still integrating their
databases.
Met people from
OntoForce, Belgian
semantic search
engine company.
Began informal
collaboration.
Acquired companies in artificial
intelligence and Internet of
Thing. Still determining how to
integrate databases. Several
data scientists hired. Signed
formal contract with OntoForce
to use Disqover search engine.
https://www.ontoforce.com/
Taxonomy stands at
1,600+ terms now.
Conferences and other
products in preclinical and
clinical biotech and
pharma, clinical trials,
energy , AI and Internet of
Things and more.
Published Genomic
Glossaries & Taxonomies
www.genomicglossaries.com
2019
Ongoing challenges
Legacy data with inconsistencies, redundancies and ambiguities.
Integrating company acquisitions’ data into in-house database.
Still cleaning up, disambiguating and documenting in-house data and database.
Scaling up difficulties often underestimated. A major pain point for us right now.
FAIR Data
Both the EuropeanCommissionand NIH have allocatedconsiderableresourcesto making dataFAIRer.
https://www.go-fair.org/fair-principles/
Findable
• First step in
(re)using data is
to find them.
Metadata and
data should be
easy to find for
both humans
and computers.
… an essential
component of
the FAIRification
process.
Accessible
• Once the user
finds the
required data,
she/he needs to
know how can
they be
accessed
Interoperable
• Data usually
need to be
integrated with
other data …
need to
interoperate with
applications or
workflows.
Reusable
• Ultimate goal of
FAIR is to
optimise the
reuse of data…
metadata and
data should be
well-described
so that they can
be replicated
and/or combined
in different
settings.
Taxonomies and ontologies are critical for interoperability
and reproducibility, particularly in the life sciences.
Life sciences data relatively
sparse, with many attributes
”highly dimensional”, leading
to complexity and sometimes
chaos. Data on longitudinal
health outcomes limited by
HIPAA & other privacy
regulations, but crucial for
validation.
Increasing attention
being paid to data
stewardship and data
curation. Support still
a tough sell.
Reproducibility crisis?
More than 70% of
researchers have tried and
failed to reproduce
experiments.
More than half have failed
to reproduce their own
experiments.
Nature 2016 survey of researchers.
https://www.nature.com/news/1-500-
scientists-lift-the-lid-on-reproducibility-
1.19970
Life science ontologies and taxonomies
So many to choose from!
BioPortal https://bioportal.bioontology.org/
repository of biomedical ontologies has almost
800 ontologies, and mapping from ontologies
to I2B2 http://i2b2.bioontology.org/
Interdisciplinary work holds great
promise – and needs mapping of
terms between disciplines.
Pistoia Alliance Ontologies Mapping
https://www.pistoiaalliance.org/projects/curre
nt-projects/ontologies-mapping/
Data mapping also known as “data
wrangling” or “data munging”. Many
people trying to automate. Still
works in progress.
ROI Return On Investment & Cost Benefit
Cost of not having FAIR research data, PwC EU Services, 2018, European Union Publications.
https://publications.europa.eu/en/publication-detail/-/publication/d375368c-1a0a-11e9-8d04-01aa75ed71a1
Stakeholders may
balk at investing in
taxonomies or
ontologies. Software,
other IT & technology
considerations only
part of the issues.
Educating decision
makers is an
ongoing process,
even with CXOs who
value taxonomies
and ontologies.
Estimated cost
benefit analysis of
not having FAIR
research data:
Minimum of 10.2
billion Euros per
year.
Key insights
“…[T]here is a lot of work that needs doing
to prepare the data sets for these
technologies … there is a disproportionate
amount being invested in the technologies
as opposed to investing in "data-
readiness“… It's just not a slam dunk to
mash up a lot of data and think it will work."
Life Science Leader 2019 March 1, “AI In Life Sciences: Seeing past the Hype” Francois Nicolas and comment by
Christy Wilson https://www.lifescienceleader.com/doc/ai-in-life-sciences-seeing-past-the-hype-0001
“The AI solution may help accelerate some tasks, but
human expertise may be required for the broad
scope of what is needed. Currently AI in healthcare is
in the second stage of the Gartner Hype Cycle: “the
peak of inflated expectation.” However, if we don’t
allow it to catch up to the hype, it may fall back into
what Gartner calls the “trough of disillusionment.”
Key takeaways
Don’t try to “boil the
ocean”. Prototype early and
often. Think modular
• Pareto Principle 80/20
80% of effects come from
20% of effort.
Don’t try for 100%.
• Identify what your
stakeholders value.
Aim for quick wins.
Understand existing
workflows.
• Seek out allies and shared
buy-in for justification and
sustainability.
• Bundle stakeholders’ key
wants and items you know
they will eventually need.
Communicating ROI on
taxonomies, ontologies and
metadata is still challenging.
• Expectations and change
management are crucial
skills to cultivate.
• Report metrics quantitative
and qualitative.
• Recognize some challenges
not yet resolved by anyone.
Acknowledgments
Many people have participated in this ongoing project. I’m grateful for their work, insights and
encouragement.
Cambridge Innovation
Institute CII
& Cambridge Healthtech
• Phillips Kuhl, President
• Tonya Urquizo,
Knowledge Information
Services Analyst and IT
Liaison
Sanaye Bartlett, Data
Analyst & Project Manager
• Kaushik Chaudhuri,
Director of Product
Marketing
CII Disqover Team
• Kaitlyn Barago,
Associate Conference
Producer
• Nancy Clarke, Data Scientist
• Mike Croft,
Software Architect
• Ben Lakin,
Director New Initiatives
• Jaime Parlee, Director
Marketing Analytics
• Craig Wohlers, Manager
Knowledge Foundation
OntoForce
• Hans Constandt, CEO &
Founder
• Filip Pattyn, Scientific Lead
• Carla Suijkerbuijk, Business
Development North America
• Niels Vanneste,
Customer Data Scientist
• Berenice Wulbrecht, Data
Science Director, Systems
Biology
Fruitful Conversations and
emails
• Ingrid Akerblom, IEA
Diversified Consulting
• Juliane Schneider, Lead
Data Curator, eagle-I,
Harvard Catalyst
• Jane Lomax,
Head Ontologist, SciBite
• Terence Russell,
Chief Technologist, IRODS
Consortium
• John Wilbanks,
Chief Commons Officer,
Sage Bionetworks

More Related Content

What's hot

Clinical Narrative And Structured Data In The Ehr Venus And Mars Live In Harm...
Clinical Narrative And Structured Data In The Ehr Venus And Mars Live In Harm...Clinical Narrative And Structured Data In The Ehr Venus And Mars Live In Harm...
Clinical Narrative And Structured Data In The Ehr Venus And Mars Live In Harm...Nick van Terheyden
 
BIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareBIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareSkillspeed
 
Big implications of Big Data in healthcare
Big implications of Big Data in healthcareBig implications of Big Data in healthcare
Big implications of Big Data in healthcareGuires
 
The biggest opportunities in digital health for Turkey's Medical Sector
The biggest opportunities in digital health  for Turkey's Medical Sector The biggest opportunities in digital health  for Turkey's Medical Sector
The biggest opportunities in digital health for Turkey's Medical Sector Shahid Shah
 
How to Achieve the Competencies of Successful Value-based Contracting Delive...
How to Achieve the Competencies of Successful Value-based Contracting Delive...How to Achieve the Competencies of Successful Value-based Contracting Delive...
How to Achieve the Competencies of Successful Value-based Contracting Delive...Health Catalyst
 
EY Drug R&D: Big DATA for big returns
EY Drug R&D: Big DATA for big returnsEY Drug R&D: Big DATA for big returns
EY Drug R&D: Big DATA for big returnsThomas Wilckens
 
Demand connected medical devices to improve military EHRs
Demand connected medical devices to improve military EHRsDemand connected medical devices to improve military EHRs
Demand connected medical devices to improve military EHRsShahid Shah
 
Distributed Ledger Tech Applications - Health Report V1.5
Distributed Ledger Tech Applications - Health Report V1.5Distributed Ledger Tech Applications - Health Report V1.5
Distributed Ledger Tech Applications - Health Report V1.5Sean Manion PhD
 
Big data and the Healthcare Sector
Big data and the Healthcare Sector Big data and the Healthcare Sector
Big data and the Healthcare Sector Chris Groves
 
Transforming Healthcare: The Promise of Innovation
Transforming Healthcare: The Promise of InnovationTransforming Healthcare: The Promise of Innovation
Transforming Healthcare: The Promise of InnovationHealth Catalyst
 
HealthCare and Big Data with Hadoop
HealthCare and Big Data with HadoopHealthCare and Big Data with Hadoop
HealthCare and Big Data with HadoopEdureka!
 
The shift from Fee for Service to Outcomes-Driven care means huge opportuniti...
The shift from Fee for Service to Outcomes-Driven care means huge opportuniti...The shift from Fee for Service to Outcomes-Driven care means huge opportuniti...
The shift from Fee for Service to Outcomes-Driven care means huge opportuniti...Shahid Shah
 
Optimization and management observations and ideas for clinical studies
Optimization and management observations and ideas for clinical studiesOptimization and management observations and ideas for clinical studies
Optimization and management observations and ideas for clinical studiesrpochadt
 
Healthcare and Big Data - May 2017
Healthcare and Big Data -  May 2017Healthcare and Big Data -  May 2017
Healthcare and Big Data - May 2017paul young cpa, cga
 
Big Data Analytics for Healthcare Decision Support- Operational and Clinical
Big Data Analytics for Healthcare Decision Support- Operational and ClinicalBig Data Analytics for Healthcare Decision Support- Operational and Clinical
Big Data Analytics for Healthcare Decision Support- Operational and ClinicalAdrish Sannyasi
 
Analytics Hygiene - eMetrics Chicago 2014
Analytics Hygiene - eMetrics Chicago 2014Analytics Hygiene - eMetrics Chicago 2014
Analytics Hygiene - eMetrics Chicago 2014Todd Belcher
 
What’s next for healthcare information technology innovation?
What’s next for healthcare information technology innovation?What’s next for healthcare information technology innovation?
What’s next for healthcare information technology innovation?Shahid Shah
 
Deploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in HealthcareDeploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in HealthcareHealth Catalyst
 
Big data analytics in healthcare industry
Big data analytics in healthcare industryBig data analytics in healthcare industry
Big data analytics in healthcare industryBhagath Gopinath
 
Big Data in Healthcare: Hype and Hope on the Path to Personalized Medicine
Big Data in Healthcare: Hype and Hope on the Path to Personalized MedicineBig Data in Healthcare: Hype and Hope on the Path to Personalized Medicine
Big Data in Healthcare: Hype and Hope on the Path to Personalized MedicineNew York eHealth Collaborative
 

What's hot (20)

Clinical Narrative And Structured Data In The Ehr Venus And Mars Live In Harm...
Clinical Narrative And Structured Data In The Ehr Venus And Mars Live In Harm...Clinical Narrative And Structured Data In The Ehr Venus And Mars Live In Harm...
Clinical Narrative And Structured Data In The Ehr Venus And Mars Live In Harm...
 
BIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareBIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in Healthcare
 
Big implications of Big Data in healthcare
Big implications of Big Data in healthcareBig implications of Big Data in healthcare
Big implications of Big Data in healthcare
 
The biggest opportunities in digital health for Turkey's Medical Sector
The biggest opportunities in digital health  for Turkey's Medical Sector The biggest opportunities in digital health  for Turkey's Medical Sector
The biggest opportunities in digital health for Turkey's Medical Sector
 
How to Achieve the Competencies of Successful Value-based Contracting Delive...
How to Achieve the Competencies of Successful Value-based Contracting Delive...How to Achieve the Competencies of Successful Value-based Contracting Delive...
How to Achieve the Competencies of Successful Value-based Contracting Delive...
 
EY Drug R&D: Big DATA for big returns
EY Drug R&D: Big DATA for big returnsEY Drug R&D: Big DATA for big returns
EY Drug R&D: Big DATA for big returns
 
Demand connected medical devices to improve military EHRs
Demand connected medical devices to improve military EHRsDemand connected medical devices to improve military EHRs
Demand connected medical devices to improve military EHRs
 
Distributed Ledger Tech Applications - Health Report V1.5
Distributed Ledger Tech Applications - Health Report V1.5Distributed Ledger Tech Applications - Health Report V1.5
Distributed Ledger Tech Applications - Health Report V1.5
 
Big data and the Healthcare Sector
Big data and the Healthcare Sector Big data and the Healthcare Sector
Big data and the Healthcare Sector
 
Transforming Healthcare: The Promise of Innovation
Transforming Healthcare: The Promise of InnovationTransforming Healthcare: The Promise of Innovation
Transforming Healthcare: The Promise of Innovation
 
HealthCare and Big Data with Hadoop
HealthCare and Big Data with HadoopHealthCare and Big Data with Hadoop
HealthCare and Big Data with Hadoop
 
The shift from Fee for Service to Outcomes-Driven care means huge opportuniti...
The shift from Fee for Service to Outcomes-Driven care means huge opportuniti...The shift from Fee for Service to Outcomes-Driven care means huge opportuniti...
The shift from Fee for Service to Outcomes-Driven care means huge opportuniti...
 
Optimization and management observations and ideas for clinical studies
Optimization and management observations and ideas for clinical studiesOptimization and management observations and ideas for clinical studies
Optimization and management observations and ideas for clinical studies
 
Healthcare and Big Data - May 2017
Healthcare and Big Data -  May 2017Healthcare and Big Data -  May 2017
Healthcare and Big Data - May 2017
 
Big Data Analytics for Healthcare Decision Support- Operational and Clinical
Big Data Analytics for Healthcare Decision Support- Operational and ClinicalBig Data Analytics for Healthcare Decision Support- Operational and Clinical
Big Data Analytics for Healthcare Decision Support- Operational and Clinical
 
Analytics Hygiene - eMetrics Chicago 2014
Analytics Hygiene - eMetrics Chicago 2014Analytics Hygiene - eMetrics Chicago 2014
Analytics Hygiene - eMetrics Chicago 2014
 
What’s next for healthcare information technology innovation?
What’s next for healthcare information technology innovation?What’s next for healthcare information technology innovation?
What’s next for healthcare information technology innovation?
 
Deploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in HealthcareDeploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in Healthcare
 
Big data analytics in healthcare industry
Big data analytics in healthcare industryBig data analytics in healthcare industry
Big data analytics in healthcare industry
 
Big Data in Healthcare: Hype and Hope on the Path to Personalized Medicine
Big Data in Healthcare: Hype and Hope on the Path to Personalized MedicineBig Data in Healthcare: Hype and Hope on the Path to Personalized Medicine
Big Data in Healthcare: Hype and Hope on the Path to Personalized Medicine
 

Similar to Chitty taxo cleveland 2019 june

How to create a taxonomy for management buy-in
How to create a taxonomy for management buy-inHow to create a taxonomy for management buy-in
How to create a taxonomy for management buy-inMary Chitty
 
Accelerating Clinical Trials trough Multi-Stakeholder Collaborations
Accelerating Clinical Trials trough Multi-Stakeholder CollaborationsAccelerating Clinical Trials trough Multi-Stakeholder Collaborations
Accelerating Clinical Trials trough Multi-Stakeholder CollaborationsWorldCongress
 
pc15257_brochure original
pc15257_brochure originalpc15257_brochure original
pc15257_brochure originalDaria Binder
 
CLGPPT FOR DISEASE DETECTION PRESENTATION
CLGPPT FOR DISEASE DETECTION PRESENTATIONCLGPPT FOR DISEASE DETECTION PRESENTATION
CLGPPT FOR DISEASE DETECTION PRESENTATIONYashRajput82
 
Lighting Rockets at the UChicago Microbiome Launchpad
Lighting Rockets at the UChicago Microbiome LaunchpadLighting Rockets at the UChicago Microbiome Launchpad
Lighting Rockets at the UChicago Microbiome LaunchpadLaura Berry
 
의료의 미래, 디지털 헬스케어 + 의료 시장의 특성
의료의 미래, 디지털 헬스케어 + 의료 시장의 특성의료의 미래, 디지털 헬스케어 + 의료 시장의 특성
의료의 미래, 디지털 헬스케어 + 의료 시장의 특성Yoon Sup Choi
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesSciBite Limited
 
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...DataScienceConferenc1
 
CRO Industry Overview
CRO Industry OverviewCRO Industry Overview
CRO Industry OverviewUsama Malik
 
Innovation series 112318
Innovation series 112318Innovation series 112318
Innovation series 112318Tim Maurer
 
디지털 헬스케어, 그리고 예상되는 법적 이슈들
디지털 헬스케어, 그리고 예상되는 법적 이슈들디지털 헬스케어, 그리고 예상되는 법적 이슈들
디지털 헬스케어, 그리고 예상되는 법적 이슈들Yoon Sup Choi
 
SMi Group's BioBanking 2018
SMi Group's BioBanking 2018SMi Group's BioBanking 2018
SMi Group's BioBanking 2018Dale Butler
 
Data Driven Health Care Enterprise
Data Driven Health Care EnterpriseData Driven Health Care Enterprise
Data Driven Health Care Enterprisealbinpaul
 
Biosample exchanges – the past, the current and the future – how do we make i...
Biosample exchanges – the past, the current and the future – how do we make i...Biosample exchanges – the past, the current and the future – how do we make i...
Biosample exchanges – the past, the current and the future – how do we make i...Pistoia Alliance
 
Clinical Research Informatics World 2015
Clinical Research Informatics World 2015Clinical Research Informatics World 2015
Clinical Research Informatics World 2015Jaime Hodges
 
How to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in PharmaHow to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in PharmaChris Waller
 
의료의 미래, 디지털 헬스케어
의료의 미래, 디지털 헬스케어의료의 미래, 디지털 헬스케어
의료의 미래, 디지털 헬스케어Yoon Sup Choi
 
The Work Ahead: How Data and Digital Mastery Will Usher In an Era of Innovati...
The Work Ahead: How Data and Digital Mastery Will Usher In an Era of Innovati...The Work Ahead: How Data and Digital Mastery Will Usher In an Era of Innovati...
The Work Ahead: How Data and Digital Mastery Will Usher In an Era of Innovati...Cognizant
 

Similar to Chitty taxo cleveland 2019 june (20)

How to create a taxonomy for management buy-in
How to create a taxonomy for management buy-inHow to create a taxonomy for management buy-in
How to create a taxonomy for management buy-in
 
Linked data in industry
Linked data in industryLinked data in industry
Linked data in industry
 
Accelerating Clinical Trials trough Multi-Stakeholder Collaborations
Accelerating Clinical Trials trough Multi-Stakeholder CollaborationsAccelerating Clinical Trials trough Multi-Stakeholder Collaborations
Accelerating Clinical Trials trough Multi-Stakeholder Collaborations
 
pc15257_brochure original
pc15257_brochure originalpc15257_brochure original
pc15257_brochure original
 
CLGPPT FOR DISEASE DETECTION PRESENTATION
CLGPPT FOR DISEASE DETECTION PRESENTATIONCLGPPT FOR DISEASE DETECTION PRESENTATION
CLGPPT FOR DISEASE DETECTION PRESENTATION
 
Lighting Rockets at the UChicago Microbiome Launchpad
Lighting Rockets at the UChicago Microbiome LaunchpadLighting Rockets at the UChicago Microbiome Launchpad
Lighting Rockets at the UChicago Microbiome Launchpad
 
의료의 미래, 디지털 헬스케어 + 의료 시장의 특성
의료의 미래, 디지털 헬스케어 + 의료 시장의 특성의료의 미래, 디지털 헬스케어 + 의료 시장의 특성
의료의 미래, 디지털 헬스케어 + 의료 시장의 특성
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
 
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
[DSC Europe 23][DigiHealth] Dimitrios Kalogeropoulos A Sustainable Future for...
 
CRO Industry Overview
CRO Industry OverviewCRO Industry Overview
CRO Industry Overview
 
Innovation series 112318
Innovation series 112318Innovation series 112318
Innovation series 112318
 
디지털 헬스케어, 그리고 예상되는 법적 이슈들
디지털 헬스케어, 그리고 예상되는 법적 이슈들디지털 헬스케어, 그리고 예상되는 법적 이슈들
디지털 헬스케어, 그리고 예상되는 법적 이슈들
 
SMi Group's BioBanking 2018
SMi Group's BioBanking 2018SMi Group's BioBanking 2018
SMi Group's BioBanking 2018
 
Data Driven Health Care Enterprise
Data Driven Health Care EnterpriseData Driven Health Care Enterprise
Data Driven Health Care Enterprise
 
BMSystems-corporate-management-summary
BMSystems-corporate-management-summaryBMSystems-corporate-management-summary
BMSystems-corporate-management-summary
 
Biosample exchanges – the past, the current and the future – how do we make i...
Biosample exchanges – the past, the current and the future – how do we make i...Biosample exchanges – the past, the current and the future – how do we make i...
Biosample exchanges – the past, the current and the future – how do we make i...
 
Clinical Research Informatics World 2015
Clinical Research Informatics World 2015Clinical Research Informatics World 2015
Clinical Research Informatics World 2015
 
How to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in PharmaHow to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in Pharma
 
의료의 미래, 디지털 헬스케어
의료의 미래, 디지털 헬스케어의료의 미래, 디지털 헬스케어
의료의 미래, 디지털 헬스케어
 
The Work Ahead: How Data and Digital Mastery Will Usher In an Era of Innovati...
The Work Ahead: How Data and Digital Mastery Will Usher In an Era of Innovati...The Work Ahead: How Data and Digital Mastery Will Usher In an Era of Innovati...
The Work Ahead: How Data and Digital Mastery Will Usher In an Era of Innovati...
 

Recently uploaded

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 

Recently uploaded (20)

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 

Chitty taxo cleveland 2019 june

  • 1. Preparing your taxonomy to be ready for data scientists & machine readability: A case study and work in progress Mary Chitty, Library Director & Taxonomist, MSLS Cambridge Healthtech, Needham MA mchitty@healthtech.com SLA Annual Conference, Cleveland Ohio, Tuesday, June 18, 2019 , Taxonomy-Ontology Conversions: Case Studies
  • 2. 1992 2000 2006-14 2016 2018-19 Historical Taxonomy Process Taxonomies & Ontologies glossary&taxonomy http://www.genomicglossaries.com/content/ontologies.asp Company founded. Taxonomy created by CEO with a few hundred terms. Major products: conferences on emerging technologies. focus on preclinical drug discovery. Acquired companies dealing with bioinformatics, clinical trials, energy and batteries. Still integrating their databases. Met people from OntoForce, Belgian semantic search engine company. Began informal collaboration. Acquired companies in artificial intelligence and Internet of Thing. Still determining how to integrate databases. Several data scientists hired. Signed formal contract with OntoForce to use Disqover search engine. https://www.ontoforce.com/ Taxonomy stands at 1,600+ terms now. Conferences and other products in preclinical and clinical biotech and pharma, clinical trials, energy , AI and Internet of Things and more. Published Genomic Glossaries & Taxonomies www.genomicglossaries.com 2019
  • 3. Ongoing challenges Legacy data with inconsistencies, redundancies and ambiguities. Integrating company acquisitions’ data into in-house database. Still cleaning up, disambiguating and documenting in-house data and database. Scaling up difficulties often underestimated. A major pain point for us right now.
  • 4. FAIR Data Both the EuropeanCommissionand NIH have allocatedconsiderableresourcesto making dataFAIRer. https://www.go-fair.org/fair-principles/ Findable • First step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. … an essential component of the FAIRification process. Accessible • Once the user finds the required data, she/he needs to know how can they be accessed Interoperable • Data usually need to be integrated with other data … need to interoperate with applications or workflows. Reusable • Ultimate goal of FAIR is to optimise the reuse of data… metadata and data should be well-described so that they can be replicated and/or combined in different settings.
  • 5. Taxonomies and ontologies are critical for interoperability and reproducibility, particularly in the life sciences. Life sciences data relatively sparse, with many attributes ”highly dimensional”, leading to complexity and sometimes chaos. Data on longitudinal health outcomes limited by HIPAA & other privacy regulations, but crucial for validation. Increasing attention being paid to data stewardship and data curation. Support still a tough sell. Reproducibility crisis? More than 70% of researchers have tried and failed to reproduce experiments. More than half have failed to reproduce their own experiments. Nature 2016 survey of researchers. https://www.nature.com/news/1-500- scientists-lift-the-lid-on-reproducibility- 1.19970
  • 6. Life science ontologies and taxonomies So many to choose from! BioPortal https://bioportal.bioontology.org/ repository of biomedical ontologies has almost 800 ontologies, and mapping from ontologies to I2B2 http://i2b2.bioontology.org/ Interdisciplinary work holds great promise – and needs mapping of terms between disciplines. Pistoia Alliance Ontologies Mapping https://www.pistoiaalliance.org/projects/curre nt-projects/ontologies-mapping/ Data mapping also known as “data wrangling” or “data munging”. Many people trying to automate. Still works in progress.
  • 7. ROI Return On Investment & Cost Benefit Cost of not having FAIR research data, PwC EU Services, 2018, European Union Publications. https://publications.europa.eu/en/publication-detail/-/publication/d375368c-1a0a-11e9-8d04-01aa75ed71a1 Stakeholders may balk at investing in taxonomies or ontologies. Software, other IT & technology considerations only part of the issues. Educating decision makers is an ongoing process, even with CXOs who value taxonomies and ontologies. Estimated cost benefit analysis of not having FAIR research data: Minimum of 10.2 billion Euros per year.
  • 8. Key insights “…[T]here is a lot of work that needs doing to prepare the data sets for these technologies … there is a disproportionate amount being invested in the technologies as opposed to investing in "data- readiness“… It's just not a slam dunk to mash up a lot of data and think it will work." Life Science Leader 2019 March 1, “AI In Life Sciences: Seeing past the Hype” Francois Nicolas and comment by Christy Wilson https://www.lifescienceleader.com/doc/ai-in-life-sciences-seeing-past-the-hype-0001 “The AI solution may help accelerate some tasks, but human expertise may be required for the broad scope of what is needed. Currently AI in healthcare is in the second stage of the Gartner Hype Cycle: “the peak of inflated expectation.” However, if we don’t allow it to catch up to the hype, it may fall back into what Gartner calls the “trough of disillusionment.”
  • 9. Key takeaways Don’t try to “boil the ocean”. Prototype early and often. Think modular • Pareto Principle 80/20 80% of effects come from 20% of effort. Don’t try for 100%. • Identify what your stakeholders value. Aim for quick wins. Understand existing workflows. • Seek out allies and shared buy-in for justification and sustainability. • Bundle stakeholders’ key wants and items you know they will eventually need. Communicating ROI on taxonomies, ontologies and metadata is still challenging. • Expectations and change management are crucial skills to cultivate. • Report metrics quantitative and qualitative. • Recognize some challenges not yet resolved by anyone.
  • 10. Acknowledgments Many people have participated in this ongoing project. I’m grateful for their work, insights and encouragement. Cambridge Innovation Institute CII & Cambridge Healthtech • Phillips Kuhl, President • Tonya Urquizo, Knowledge Information Services Analyst and IT Liaison Sanaye Bartlett, Data Analyst & Project Manager • Kaushik Chaudhuri, Director of Product Marketing CII Disqover Team • Kaitlyn Barago, Associate Conference Producer • Nancy Clarke, Data Scientist • Mike Croft, Software Architect • Ben Lakin, Director New Initiatives • Jaime Parlee, Director Marketing Analytics • Craig Wohlers, Manager Knowledge Foundation OntoForce • Hans Constandt, CEO & Founder • Filip Pattyn, Scientific Lead • Carla Suijkerbuijk, Business Development North America • Niels Vanneste, Customer Data Scientist • Berenice Wulbrecht, Data Science Director, Systems Biology Fruitful Conversations and emails • Ingrid Akerblom, IEA Diversified Consulting • Juliane Schneider, Lead Data Curator, eagle-I, Harvard Catalyst • Jane Lomax, Head Ontologist, SciBite • Terence Russell, Chief Technologist, IRODS Consortium • John Wilbanks, Chief Commons Officer, Sage Bionetworks

Editor's Notes

  1. Key motivations for taxonomy changes were company acquisitions in new disciplines, and new data science hires.
  2. No easy answers. issues around integrating internal and external ontologies.. Starting to look into issues around ambiguity. Progress often seems to be three steps forward, one or two steps back.
  3. A colleague commented “As science becomes ever more interdisciplinary, it is a huge challenge to map data on different granular levels but semantically link them across different languages, standards, and cultures .
  4. An ontology colleague notes “Institutions either underestimate the resources needed to do this work , or they are daunted by the entire prospect and researchers have to find repositories/help outside the institution to store and curate their data, if they bother to do so. Honestly, very little data will ever be reused. ”
  5. Some resources for locating life science ontologies and mappings. Bioportal has 773 ontologies as of May 2019. Graph based ontologies, open vs proprietary ontologies, My in-house taxonomy tends to be narrow and deep. Some external taxonomies tend to be broad and shallow.
  6. PwC publication estimates time lost per year at 4.5 billion Euros, cost of storage 5.3 billion Euros [only data from academic research, private sector data not available]; license cost 360 million [private sector data not available]. Interdisciplinary and potential economic growth impacts cannot be estimated reliably.
  7. People don’t always know what they want or will eventually need., and can have difficulty articulating their desires. Important to have understanding of the challenges of the people whose problems you are trying to solve. If you ask them to change their workflow drastically, change will never happen. Don’t be too hard on yourself . Some of these are issues everyone else is still trying to figure out.