SlideShare a Scribd company logo
1 of 25
Download to read offline
S100
Martínez-Romero, M., O’Connor, M. J., Shankar, R., Panahiazar,
M., Willrett, D., Egyedi, A. L., Gevaert, O., Graybeal, J., Musen, M. A.
Stanford University
Fast and Accurate Metadata Authoring
Using Ontology-Based Recommendations
What is metadata?
2AMIA 2017 | amia.org
• Data that describe data
• Crucial for:
• Finding experimental datasets online
• Understanding how the experiments were performed
• Reusing the data to perform new analyses
3AMIA 2017 | amia.org
4AMIA 2017 | amia.org
age
Age
AGE
`Age
age (after birth)
age (in years)
age (y)
age (year)
age (years)
Age (years)
Age (Years)
age (yr)
age (yr-old)
age (yrs)
Age (yrs)
age [y]
age [year]
age [years]
age in years
age of patient
Age of patient
age of subjects
age(years)
Age(years)
Age(yrs.)
Age, year
age, years
age, yrs
age.year
age_years
Poor metadata
5AMIA 2017 | amia.org
An analysis of metadata from NCBI’s BioSample
• 73% of “Boolean” values
• nonsmoker, former-smoker
• 26% of “integer” values
• JM52, UVPgt59.4, pig
• 68% of ontology terms
• presumed normal, wild_type
Gonçalves, R. S. et al. (2017). Metadata in the BioSample Online Repository are Impaired by Numerous
Anomalies. SemSci 2017 Workshop, co-located with ISWC 2017. Vienna, Austria.
Poor metadata
[Your presentation on this and next slides]
6AMIA 2017 | amia.org
Metadata authoring is hard
• A computational
platform for metadata
management
• Goal: Overcome the
impediments to creating
high-quality metadata
7AMIA 2017 | amia.org
Metadata template
Metadata template
8AMIA 2017 | amia.org
SUBMIT METADATAFILL IN METADATADESIGN TEMPLATE
Template Designer Metadata Editor
Template authors
(e.g., standards
committees)
Metadata authors
(e.g., scientists)
Metadata Repositorytemplate metadata
LINCS
Public Databases
https://cedar.metadatacenter.org/templates/edit/https://repo.metadatacenter.org/templates/ab105771-564e-42a1-9be4-5a63891… https://cedar.metadatacenter.org/instances/edit/https://repo.metadatacenter.org/template-instances/d4f1059e-8e27-4166-902f-…
A sample study
Acute stress disorder
Stanford University
John Doe
Longitudinal
9AMIA 2017 | amia.org
We developed a metadata recommendation system
SUBMIT METADATAFILL IN METADATADESIGN TEMPLATE
Template Designer Metadata Editor
Template authors
(e.g., standards
committees)
Metadata authors
(e.g., scientists)
Metadata Repositorytemplate metadata
LINCS
Public Databases
https://cedar.metadatacenter.org/templates/edit/https://repo.metadatacenter.org/templates/ab105771-564e-42a1-9be4-5a63891… https://cedar.metadatacenter.org/instances/edit/https://repo.metadatacenter.org/template-instances/d4f1059e-8e27-4166-902f-…
A sample study
Acute stress disorder
Stanford University
John Doe
Longitudinal
Metadata recommendation system
10AMIA 2017 | amia.org
Metadata Editor Metadata Repository
https://cedar.metadatacenter.org/instances/edit/https://repo.metadatacenter.org/template-instances/d4f1059e-8e27-4166-902f-…
A sample study
Acute stress disorder
Stanford University
John Doe
Longitudinal
analyze
existing metadata
generate
suggestions
1
23
store
metadata
Metadata Recommender
11AMIA 2017 | amia.org
Filling in a CEDAR template
12AMIA 2017 | amia.org
13AMIA 2017 | amia.org
14AMIA 2017 | amia.org
15AMIA 2017 | amia.org
Evaluation workflow
16AMIA 2017 | amia.org
BioSample
template
instances
(≈35K)
Annotated
BioSample
template
instances
(≈35K)
CEDAR
BioSample
template
Training
dataset
Test
dataset
Training
dataset
Evaluation
results
CEDAR Metadata
Repository
(1)
Preprocessing
and Ingestion
(2)
Semantic
annotation
(3) Training
(4) Testing &
Analysis
Test
dataset
Gene
Expression
metadata
Metadata
Recommender
20%
80%
80%
20%
Evaluation workflow
17AMIA 2017 | amia.org
BioSample
template
instances
(≈35K)
Annotated
BioSample
template
instances
(≈35K)
CEDAR
BioSample
template
Training
dataset
Test
dataset
Training
dataset
Evaluation
results
CEDAR Metadata
Repository
(1)
Preprocessing
and Ingestion
(2)
Semantic
annotation
(3) Training
(4) Testing &
Analysis
Test
dataset
Gene
Expression
metadata
Metadata
Recommender
20%
80%
80%
20%
Evaluation workflow
18AMIA 2017 | amia.org
BioSample
template
instances
(≈35K)
Annotated
BioSample
template
instances
(≈35K)
CEDAR
BioSample
template
Training
dataset
Test
dataset
Training
dataset
Evaluation
results
CEDAR Metadata
Repository
(1)
Preprocessing
and Ingestion
(2)
Semantic
annotation
(3) Training
(4) Testing &
Analysis
Test
dataset
Gene
Expression
metadata
Metadata
Recommender
20%
80%
80%
20%
Evaluation workflow
19AMIA 2017 | amia.org
BioSample
template
instances
(≈35K)
Annotated
BioSample
template
instances
(≈35K)
CEDAR
BioSample
template
Training
dataset
Test
dataset
Training
dataset
Evaluation
results
CEDAR Metadata
Repository
(1)
Preprocessing
and Ingestion
(2)
Semantic
annotation
(3) Training
(4) Testing &
Analysis
Test
dataset
Gene
Expression
metadata
Metadata
Recommender
20%
80%
80%
20%
• For “disease”, ”sex”,
and “tissue”
• Top 3 suggestions
Testing & Analysis
Compared suggested vs. expected metadata
Measure: Reciprocal Rank (RR). Appropriate to judge
systems that return a ranking of suggestions when there is only
a relevant result
20AMIA 2017 | amia.org
!"#$%&'#()	!(+,	(!!) =
1
1
Position of the expected result
in the ranking of suggestions
How is the RR calculated?
21AMIA 2017 | amia.org
Expected Suggested K
Reciprocal Rank
(RR)
asthma
1) asthma
2) lung cancer
3) respiratory disease
1 1/1
lymphoma
1) myeloma
2) lymphoma
3) acute myeloid leukemia
2 1/2
lung cancer
1) respiratory disease
2) asthma
3) lung cancer
3 1/3
Mean Reciprocal Rank (MRR) = (1/1 + 1/2 + 1/3) / 3 = 0.61
Results
22AMIA 2017 | amia.org
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
disease tissue sex
Baseline Metadata Recommender
MeanReciprocalRank(MRR)
On average:
• Metadata
Recommender = 0.77
• Baseline
(majority vote) = 0.31
Better performance with
respect to the baseline for:
• Fields with many
different values
• Templates with many
correlated fields
Summary
• We developed a metadata recommendation system
as part of an end-to-end system for metadata
management called CEDAR
• Generates context-sensitive suggestions in real time
• Incorporates both ontology-based and free-text
suggestions
23AMIA 2017 | amia.org
Summary
Our approach makes it easier for scientists to
generate high-quality metadata for experimental
datasets
• So that the datasets can be found, interpreted, and
reused
• Essential to ensure scientific reproducibility
24AMIA 2017 | amia.org
25AMIA 2017 | amia.org
facebook.com/metadatacenter
@metadatacenter
http://cedar.metadatacenter.org
Channel: Metadata Center
github.com/metadatacenter

More Related Content

Similar to Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations (AMIA 2017 Conference)

Multidisciplinary analysis and optimization under uncertainty
Multidisciplinary analysis and optimization under uncertaintyMultidisciplinary analysis and optimization under uncertainty
Multidisciplinary analysis and optimization under uncertainty
Chen Liang
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble
 
Human resource assignment help
Human resource assignment helpHuman resource assignment help
Human resource assignment help
john mayer
 

Similar to Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations (AMIA 2017 Conference) (20)

How to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusableHow to make your published data findable, accessible, interoperable and reusable
How to make your published data findable, accessible, interoperable and reusable
 
Multidisciplinary analysis and optimization under uncertainty
Multidisciplinary analysis and optimization under uncertaintyMultidisciplinary analysis and optimization under uncertainty
Multidisciplinary analysis and optimization under uncertainty
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Dat analysis part i
Dat analysis part iDat analysis part i
Dat analysis part i
 
Data sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryData sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK Story
 
Data management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK StoryData management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK Story
 
Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...
Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...
Finding and Reusing Biomedical Datasets using CEDAR Metadata Repository and T...
 
Human resource assignment help
Human resource assignment helpHuman resource assignment help
Human resource assignment help
 
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
 
TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'
TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'
TERN ESA Workshop 2012, 'Smarter Workflows for Ecologists'
 
Build a Next-Generation Clinical Operational Metrics Solution
Build a Next-Generation Clinical Operational Metrics SolutionBuild a Next-Generation Clinical Operational Metrics Solution
Build a Next-Generation Clinical Operational Metrics Solution
 
How to expose research data in EOSC
How to expose research data in EOSCHow to expose research data in EOSC
How to expose research data in EOSC
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...
 
MPDB Presentation
MPDB PresentationMPDB Presentation
MPDB Presentation
 
BioSharing overview - NIH bioCADDIE workshop on Common Data Elements, 8th May...
BioSharing overview - NIH bioCADDIE workshop on Common Data Elements, 8th May...BioSharing overview - NIH bioCADDIE workshop on Common Data Elements, 8th May...
BioSharing overview - NIH bioCADDIE workshop on Common Data Elements, 8th May...
 
Supporting Ontology-Based Standardization of Biomedical Metadata in the CEDAR...
Supporting Ontology-Based Standardization of Biomedical Metadata in the CEDAR...Supporting Ontology-Based Standardization of Biomedical Metadata in the CEDAR...
Supporting Ontology-Based Standardization of Biomedical Metadata in the CEDAR...
 
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
 
Data analytics, a (short) tour
Data analytics, a (short) tourData analytics, a (short) tour
Data analytics, a (short) tour
 
Next Gen Clinical Data Sciences
Next Gen Clinical Data SciencesNext Gen Clinical Data Sciences
Next Gen Clinical Data Sciences
 
Publication of raw and curated NMR spectroscopic data for organic molecules
Publication of raw and curated NMR spectroscopic data for organic moleculesPublication of raw and curated NMR spectroscopic data for organic molecules
Publication of raw and curated NMR spectroscopic data for organic molecules
 

Recently uploaded

Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Silpa
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Silpa
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Silpa
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 

Recently uploaded (20)

Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 

Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations (AMIA 2017 Conference)

  • 1. S100 Martínez-Romero, M., O’Connor, M. J., Shankar, R., Panahiazar, M., Willrett, D., Egyedi, A. L., Gevaert, O., Graybeal, J., Musen, M. A. Stanford University Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations
  • 2. What is metadata? 2AMIA 2017 | amia.org • Data that describe data • Crucial for: • Finding experimental datasets online • Understanding how the experiments were performed • Reusing the data to perform new analyses
  • 3. 3AMIA 2017 | amia.org
  • 4. 4AMIA 2017 | amia.org age Age AGE `Age age (after birth) age (in years) age (y) age (year) age (years) Age (years) Age (Years) age (yr) age (yr-old) age (yrs) Age (yrs) age [y] age [year] age [years] age in years age of patient Age of patient age of subjects age(years) Age(years) Age(yrs.) Age, year age, years age, yrs age.year age_years Poor metadata
  • 5. 5AMIA 2017 | amia.org An analysis of metadata from NCBI’s BioSample • 73% of “Boolean” values • nonsmoker, former-smoker • 26% of “integer” values • JM52, UVPgt59.4, pig • 68% of ontology terms • presumed normal, wild_type Gonçalves, R. S. et al. (2017). Metadata in the BioSample Online Repository are Impaired by Numerous Anomalies. SemSci 2017 Workshop, co-located with ISWC 2017. Vienna, Austria. Poor metadata
  • 6. [Your presentation on this and next slides] 6AMIA 2017 | amia.org Metadata authoring is hard
  • 7. • A computational platform for metadata management • Goal: Overcome the impediments to creating high-quality metadata 7AMIA 2017 | amia.org Metadata template Metadata template
  • 8. 8AMIA 2017 | amia.org SUBMIT METADATAFILL IN METADATADESIGN TEMPLATE Template Designer Metadata Editor Template authors (e.g., standards committees) Metadata authors (e.g., scientists) Metadata Repositorytemplate metadata LINCS Public Databases https://cedar.metadatacenter.org/templates/edit/https://repo.metadatacenter.org/templates/ab105771-564e-42a1-9be4-5a63891… https://cedar.metadatacenter.org/instances/edit/https://repo.metadatacenter.org/template-instances/d4f1059e-8e27-4166-902f-… A sample study Acute stress disorder Stanford University John Doe Longitudinal
  • 9. 9AMIA 2017 | amia.org We developed a metadata recommendation system SUBMIT METADATAFILL IN METADATADESIGN TEMPLATE Template Designer Metadata Editor Template authors (e.g., standards committees) Metadata authors (e.g., scientists) Metadata Repositorytemplate metadata LINCS Public Databases https://cedar.metadatacenter.org/templates/edit/https://repo.metadatacenter.org/templates/ab105771-564e-42a1-9be4-5a63891… https://cedar.metadatacenter.org/instances/edit/https://repo.metadatacenter.org/template-instances/d4f1059e-8e27-4166-902f-… A sample study Acute stress disorder Stanford University John Doe Longitudinal
  • 10. Metadata recommendation system 10AMIA 2017 | amia.org Metadata Editor Metadata Repository https://cedar.metadatacenter.org/instances/edit/https://repo.metadatacenter.org/template-instances/d4f1059e-8e27-4166-902f-… A sample study Acute stress disorder Stanford University John Doe Longitudinal analyze existing metadata generate suggestions 1 23 store metadata Metadata Recommender
  • 11. 11AMIA 2017 | amia.org Filling in a CEDAR template
  • 12. 12AMIA 2017 | amia.org
  • 13. 13AMIA 2017 | amia.org
  • 14. 14AMIA 2017 | amia.org
  • 15. 15AMIA 2017 | amia.org
  • 16. Evaluation workflow 16AMIA 2017 | amia.org BioSample template instances (≈35K) Annotated BioSample template instances (≈35K) CEDAR BioSample template Training dataset Test dataset Training dataset Evaluation results CEDAR Metadata Repository (1) Preprocessing and Ingestion (2) Semantic annotation (3) Training (4) Testing & Analysis Test dataset Gene Expression metadata Metadata Recommender 20% 80% 80% 20%
  • 17. Evaluation workflow 17AMIA 2017 | amia.org BioSample template instances (≈35K) Annotated BioSample template instances (≈35K) CEDAR BioSample template Training dataset Test dataset Training dataset Evaluation results CEDAR Metadata Repository (1) Preprocessing and Ingestion (2) Semantic annotation (3) Training (4) Testing & Analysis Test dataset Gene Expression metadata Metadata Recommender 20% 80% 80% 20%
  • 18. Evaluation workflow 18AMIA 2017 | amia.org BioSample template instances (≈35K) Annotated BioSample template instances (≈35K) CEDAR BioSample template Training dataset Test dataset Training dataset Evaluation results CEDAR Metadata Repository (1) Preprocessing and Ingestion (2) Semantic annotation (3) Training (4) Testing & Analysis Test dataset Gene Expression metadata Metadata Recommender 20% 80% 80% 20%
  • 19. Evaluation workflow 19AMIA 2017 | amia.org BioSample template instances (≈35K) Annotated BioSample template instances (≈35K) CEDAR BioSample template Training dataset Test dataset Training dataset Evaluation results CEDAR Metadata Repository (1) Preprocessing and Ingestion (2) Semantic annotation (3) Training (4) Testing & Analysis Test dataset Gene Expression metadata Metadata Recommender 20% 80% 80% 20% • For “disease”, ”sex”, and “tissue” • Top 3 suggestions
  • 20. Testing & Analysis Compared suggested vs. expected metadata Measure: Reciprocal Rank (RR). Appropriate to judge systems that return a ranking of suggestions when there is only a relevant result 20AMIA 2017 | amia.org !"#$%&'#() !(+, (!!) = 1 1 Position of the expected result in the ranking of suggestions
  • 21. How is the RR calculated? 21AMIA 2017 | amia.org Expected Suggested K Reciprocal Rank (RR) asthma 1) asthma 2) lung cancer 3) respiratory disease 1 1/1 lymphoma 1) myeloma 2) lymphoma 3) acute myeloid leukemia 2 1/2 lung cancer 1) respiratory disease 2) asthma 3) lung cancer 3 1/3 Mean Reciprocal Rank (MRR) = (1/1 + 1/2 + 1/3) / 3 = 0.61
  • 22. Results 22AMIA 2017 | amia.org 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 disease tissue sex Baseline Metadata Recommender MeanReciprocalRank(MRR) On average: • Metadata Recommender = 0.77 • Baseline (majority vote) = 0.31 Better performance with respect to the baseline for: • Fields with many different values • Templates with many correlated fields
  • 23. Summary • We developed a metadata recommendation system as part of an end-to-end system for metadata management called CEDAR • Generates context-sensitive suggestions in real time • Incorporates both ontology-based and free-text suggestions 23AMIA 2017 | amia.org
  • 24. Summary Our approach makes it easier for scientists to generate high-quality metadata for experimental datasets • So that the datasets can be found, interpreted, and reused • Essential to ensure scientific reproducibility 24AMIA 2017 | amia.org
  • 25. 25AMIA 2017 | amia.org facebook.com/metadatacenter @metadatacenter http://cedar.metadatacenter.org Channel: Metadata Center github.com/metadatacenter