SlideShare a Scribd company logo
The Data Integration Score Card: Context
• I recently gave a talk where I was asked what sort of things I look out for when building a data
integration environment within Industry
• It was a good question so I thought about it and captured the top 10 in the following slide
• This is specifically aimed at pre-clinical discovery
• And there are a lot more you could add of course, but these are my personal top 10! I’m sure a lot of
people will disagree 
• There’s a lot of references on the last slide
• The views in this slide are entirely my own 
• Visit us at http://www.scibite.com for more news on all things data & semantics!
Data Integration Score Card
1. Adds Value To Public Data
• Shouldn’t just regurgitate freely available sources of data. Integrating these isn’t that hard to do these days. It should add things that you
just can’t get/do elsewhere
2. Unambiguously Identifies Concepts. Eliminates Redundancy
• There should only be one “asthma” in the system
• This includes synonyms (Breast cancer = breast tumor = breast tumour etc)
3. Maps Across Identifiers
• There should be strong connectivity across identifiers for the same concept
4. Supports Ontology-based Query
• “Give me all the inflammatory diseases…..”
5. Handles Both Structured & Unstructured Data
• Pubmed, clinical trials, OMIM etc need to be correctly indexed on concept not synonym (see #2).
6. Integrates Data!
• Sounds crazy but is the data *really* connected or does it just look like it is?
7. Can Connect To Live Data
• A system cannot always include everything. How can live data be connected up?
8. Enterprise Resource Connectors
• Is there proven path to incorporating at both a technical level AND a data level (see #9)
9. Extensible Data Model
• Even RDF systems use a data model. How will this support future use cases, how can the system cope with concept types it currently
does not know about? Note: don’t believe the “its RDF so it will magically just cope with it” answer!
10. Supports Manual Curation
• Whatever the data is, on current estimates up to 50% of it is invalid. How does the system handle this and how can users change what
they see?
References
TERMite: Turning unstructured text into data https://www.scibite.com/products/termite/
Drug discovery FAQs: workflows for answering
multidomain drug discovery questions
http://dx.doi.org/10.1016/j.drudis.2014.11.006
Scientific Lenses to Support Multiple Views over Linked
Chemistry Data
http://rd.springer.com/chapter/10.1007%2F978-3-319-
11964-9_7
API-centric Linked Data integration: The Open PHACTS
Discovery Platform case study
http://dx.doi.org/10.1016/j.websem.2014.03.003
Open PHACTS: semantic interoperability for drug
discovery
http://dx.doi.org/10.1016/j.drudis.2012.05.016
Systems chemical biology and the Semantic Web: what
they mean for the future of drug discovery research
http://dx.doi.org/10.1016/j.drudis.2011.12.019
Empowering industrial research with shared biomedical
vocabularies
http://dx.doi.org/10.1016/j.drudis.2011.09.013
Visualizing the drug target landscape http://dx.doi.org/10.1016/j.drudis.2009.09.011

More Related Content

What's hot

ICSTI TACC 2014: How Mendeley Illuminates a Broader Definition of Impact
ICSTI TACC 2014: How Mendeley Illuminates a Broader Definition of ImpactICSTI TACC 2014: How Mendeley Illuminates a Broader Definition of Impact
ICSTI TACC 2014: How Mendeley Illuminates a Broader Definition of Impact
William Gunn
 
Data Rich Information Poor Jpc
Data Rich Information Poor JpcData Rich Information Poor Jpc
Data Rich Information Poor Jpc
jpcripwell
 

What's hot (7)

ICSTI TACC 2014: How Mendeley Illuminates a Broader Definition of Impact
ICSTI TACC 2014: How Mendeley Illuminates a Broader Definition of ImpactICSTI TACC 2014: How Mendeley Illuminates a Broader Definition of Impact
ICSTI TACC 2014: How Mendeley Illuminates a Broader Definition of Impact
 
Why Is Assessment Feedback Important?: Innovative Ways to Grade
Why Is Assessment Feedback Important?: Innovative Ways to GradeWhy Is Assessment Feedback Important?: Innovative Ways to Grade
Why Is Assessment Feedback Important?: Innovative Ways to Grade
 
Green solvents App
Green solvents AppGreen solvents App
Green solvents App
 
Reactive systems
Reactive systemsReactive systems
Reactive systems
 
Data Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and AnalyticsData Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and Analytics
 
Questions for knowledge creators
Questions for knowledge creatorsQuestions for knowledge creators
Questions for knowledge creators
 
Data Rich Information Poor Jpc
Data Rich Information Poor JpcData Rich Information Poor Jpc
Data Rich Information Poor Jpc
 

Viewers also liked

Conference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in NoordwijkerhoutConference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in Noordwijkerhout
Josef Scheiber
 
AllegroGraph - AGWebView
AllegroGraph - AGWebViewAllegroGraph - AGWebView
AllegroGraph - AGWebView
Craig Norvell
 

Viewers also liked (13)

Pistoia Alliance: Emerging Life Sciences Collaboration on Common Service Spec...
Pistoia Alliance: Emerging Life Sciences Collaboration on Common Service Spec...Pistoia Alliance: Emerging Life Sciences Collaboration on Common Service Spec...
Pistoia Alliance: Emerging Life Sciences Collaboration on Common Service Spec...
 
Pistoia Alliance SESL pilot Bio IT World Hanover 12 Oct 2011
Pistoia Alliance SESL pilot Bio IT World Hanover 12 Oct 2011Pistoia Alliance SESL pilot Bio IT World Hanover 12 Oct 2011
Pistoia Alliance SESL pilot Bio IT World Hanover 12 Oct 2011
 
Conference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in NoordwijkerhoutConference presentation from #iccs2014 in Noordwijkerhout
Conference presentation from #iccs2014 in Noordwijkerhout
 
Scibite - We Do.
Scibite - We Do.Scibite - We Do.
Scibite - We Do.
 
SciBite overview July 2013
SciBite overview July 2013SciBite overview July 2013
SciBite overview July 2013
 
AllegroGraph - AGWebView
AllegroGraph - AGWebViewAllegroGraph - AGWebView
AllegroGraph - AGWebView
 
Data: The Good, The Bad & The Ugly
Data: The Good, The Bad & The UglyData: The Good, The Bad & The Ugly
Data: The Good, The Bad & The Ugly
 
Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017
Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017
Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017
 
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
SciBite - Role Of Ontologies (Pistoia Alliance Webinar)
 
Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
 
Big Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use CasesBig Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use Cases
 
Mobile Health Forum Frankfurt - Therapieempfehlung per Smartphone
Mobile Health Forum Frankfurt - Therapieempfehlung per SmartphoneMobile Health Forum Frankfurt - Therapieempfehlung per Smartphone
Mobile Health Forum Frankfurt - Therapieempfehlung per Smartphone
 

Similar to Data Integration Score Card

Data-Ed: Trends in Data Modeling
Data-Ed: Trends in Data ModelingData-Ed: Trends in Data Modeling
Data-Ed: Trends in Data Modeling
Data Blueprint
 
Why Bad Data May Be Your Best Opportunity
Why Bad Data May Be Your Best OpportunityWhy Bad Data May Be Your Best Opportunity
Why Bad Data May Be Your Best Opportunity
Zach Gardner
 
Why All the Buzz About Database Integration Solutions?
Why All the Buzz About Database Integration Solutions? Why All the Buzz About Database Integration Solutions?
Why All the Buzz About Database Integration Solutions?
apricotbyctk
 
Hcd wp-2012-better dataleadstobetteranalytics
Hcd wp-2012-better dataleadstobetteranalyticsHcd wp-2012-better dataleadstobetteranalytics
Hcd wp-2012-better dataleadstobetteranalytics
Health Care DataWorks
 

Similar to Data Integration Score Card (20)

The Data Operating System: Changing the Digital Trajectory of Healthcare
The Data Operating System: Changing the Digital Trajectory of HealthcareThe Data Operating System: Changing the Digital Trajectory of Healthcare
The Data Operating System: Changing the Digital Trajectory of Healthcare
 
The Data Operating System: Changing the Digital Trajectory of Healthcare
The Data Operating System: Changing the Digital Trajectory of HealthcareThe Data Operating System: Changing the Digital Trajectory of Healthcare
The Data Operating System: Changing the Digital Trajectory of Healthcare
 
Data Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of PeopleData Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of People
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategy
 
Don’t Make Bad Data an Excuse
Don’t Make Bad Data an ExcuseDon’t Make Bad Data an Excuse
Don’t Make Bad Data an Excuse
 
Data-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data ModelingData-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data Modeling
 
Data-Ed: Trends in Data Modeling
Data-Ed: Trends in Data ModelingData-Ed: Trends in Data Modeling
Data-Ed: Trends in Data Modeling
 
Voice of the customer requirements overview
Voice of the customer requirements overviewVoice of the customer requirements overview
Voice of the customer requirements overview
 
1 d.1
1 d.11 d.1
1 d.1
 
Atlassian User Group NYC 101718 Event
Atlassian User Group NYC 101718 EventAtlassian User Group NYC 101718 Event
Atlassian User Group NYC 101718 Event
 
Big Data Analytics in Hospitals By Dr.Mahboob ali khan Phd
Big Data Analytics in Hospitals By Dr.Mahboob ali khan PhdBig Data Analytics in Hospitals By Dr.Mahboob ali khan Phd
Big Data Analytics in Hospitals By Dr.Mahboob ali khan Phd
 
Why Bad Data May Be Your Best Opportunity
Why Bad Data May Be Your Best OpportunityWhy Bad Data May Be Your Best Opportunity
Why Bad Data May Be Your Best Opportunity
 
Revenue opportunities in the management of healthcare data deluge
Revenue opportunities in the management of healthcare data delugeRevenue opportunities in the management of healthcare data deluge
Revenue opportunities in the management of healthcare data deluge
 
Why Data Citation Currently Misses the Point
Why Data Citation Currently Misses the PointWhy Data Citation Currently Misses the Point
Why Data Citation Currently Misses the Point
 
Top BI trends and predictions for 2017
Top BI trends and predictions for 2017Top BI trends and predictions for 2017
Top BI trends and predictions for 2017
 
Big data rmoug
Big data rmougBig data rmoug
Big data rmoug
 
Why All the Buzz About Database Integration Solutions?
Why All the Buzz About Database Integration Solutions? Why All the Buzz About Database Integration Solutions?
Why All the Buzz About Database Integration Solutions?
 
Proper Data Integration can change Medical Science
Proper Data Integration can change Medical ScienceProper Data Integration can change Medical Science
Proper Data Integration can change Medical Science
 
2018 GIS in Development: Semantic Web
2018 GIS in Development: Semantic Web2018 GIS in Development: Semantic Web
2018 GIS in Development: Semantic Web
 
Hcd wp-2012-better dataleadstobetteranalytics
Hcd wp-2012-better dataleadstobetteranalyticsHcd wp-2012-better dataleadstobetteranalytics
Hcd wp-2012-better dataleadstobetteranalytics
 

More from SciBite Limited

More from SciBite Limited (6)

Ontologies & Machine Learning v2 - SciBIte Lab Of The Future 2019
Ontologies & Machine Learning v2 - SciBIte Lab Of The Future 2019Ontologies & Machine Learning v2 - SciBIte Lab Of The Future 2019
Ontologies & Machine Learning v2 - SciBIte Lab Of The Future 2019
 
Are Ontologies Relevant In A Machine Learning World?
Are Ontologies Relevant In A Machine Learning World?Are Ontologies Relevant In A Machine Learning World?
Are Ontologies Relevant In A Machine Learning World?
 
Open PHACTS (Sept 2013) EBI Industry Programme
Open PHACTS (Sept 2013) EBI Industry ProgrammeOpen PHACTS (Sept 2013) EBI Industry Programme
Open PHACTS (Sept 2013) EBI Industry Programme
 
Termite dealing with real world
Termite dealing with real worldTermite dealing with real world
Termite dealing with real world
 
Mashing Up Drug Discovery
Mashing Up Drug DiscoveryMashing Up Drug Discovery
Mashing Up Drug Discovery
 
Scibite flyer 2013
Scibite flyer 2013Scibite flyer 2013
Scibite flyer 2013
 

Recently uploaded

New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
i3 Health
 
Mastering Wealth: A Path to Financial Freedom
Mastering Wealth: A Path to Financial FreedomMastering Wealth: A Path to Financial Freedom
Mastering Wealth: A Path to Financial Freedom
FatimaMary4
 

Recently uploaded (20)

Non-Invasive assessment of arterial stiffness in advanced heart failure patie...
Non-Invasive assessment of arterial stiffness in advanced heart failure patie...Non-Invasive assessment of arterial stiffness in advanced heart failure patie...
Non-Invasive assessment of arterial stiffness in advanced heart failure patie...
 
Aptopadesha Pramana / Pariksha: The Verbal Testimony
Aptopadesha Pramana / Pariksha: The Verbal TestimonyAptopadesha Pramana / Pariksha: The Verbal Testimony
Aptopadesha Pramana / Pariksha: The Verbal Testimony
 
Retinal consideration in cataract surgery
Retinal consideration in cataract surgeryRetinal consideration in cataract surgery
Retinal consideration in cataract surgery
 
Compare home pulse pressure components collected directly from home
Compare home pulse pressure components collected directly from homeCompare home pulse pressure components collected directly from home
Compare home pulse pressure components collected directly from home
 
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
New Directions in Targeted Therapeutic Approaches for Older Adults With Mantl...
 
1130525--家醫計畫2.0糖尿病照護研討會-社團法人高雄市醫師公會.pdf
1130525--家醫計畫2.0糖尿病照護研討會-社團法人高雄市醫師公會.pdf1130525--家醫計畫2.0糖尿病照護研討會-社團法人高雄市醫師公會.pdf
1130525--家醫計畫2.0糖尿病照護研討會-社團法人高雄市醫師公會.pdf
 
Arterial health throughout cancer treatment and exercise rehabilitation in wo...
Arterial health throughout cancer treatment and exercise rehabilitation in wo...Arterial health throughout cancer treatment and exercise rehabilitation in wo...
Arterial health throughout cancer treatment and exercise rehabilitation in wo...
 
US E-cigarette Summit: Taming the nicotine industrial complex
US E-cigarette Summit: Taming the nicotine industrial complexUS E-cigarette Summit: Taming the nicotine industrial complex
US E-cigarette Summit: Taming the nicotine industrial complex
 
Anuman- An inference for helpful in diagnosis and treatment
Anuman- An inference for helpful in diagnosis and treatmentAnuman- An inference for helpful in diagnosis and treatment
Anuman- An inference for helpful in diagnosis and treatment
 
Fundamental of Radiobiology -SABBU.pptx
Fundamental of Radiobiology  -SABBU.pptxFundamental of Radiobiology  -SABBU.pptx
Fundamental of Radiobiology -SABBU.pptx
 
Gauri Gawande(9) Constipation Final.pptx
Gauri Gawande(9) Constipation Final.pptxGauri Gawande(9) Constipation Final.pptx
Gauri Gawande(9) Constipation Final.pptx
 
Is preeclampsia and spontaneous preterm delivery associate with vascular and ...
Is preeclampsia and spontaneous preterm delivery associate with vascular and ...Is preeclampsia and spontaneous preterm delivery associate with vascular and ...
Is preeclampsia and spontaneous preterm delivery associate with vascular and ...
 
Antiplatelets in IHD, Dose Duration, DAPT vs SAPT
Antiplatelets in IHD, Dose Duration, DAPT vs SAPTAntiplatelets in IHD, Dose Duration, DAPT vs SAPT
Antiplatelets in IHD, Dose Duration, DAPT vs SAPT
 
HOT NEW PRODUCT! BIG SALES FAST SHIPPING NOW FROM CHINA!! EU KU DB BK substit...
HOT NEW PRODUCT! BIG SALES FAST SHIPPING NOW FROM CHINA!! EU KU DB BK substit...HOT NEW PRODUCT! BIG SALES FAST SHIPPING NOW FROM CHINA!! EU KU DB BK substit...
HOT NEW PRODUCT! BIG SALES FAST SHIPPING NOW FROM CHINA!! EU KU DB BK substit...
 
Final CAPNOCYTOPHAGA INFECTION by Gauri Gawande.pptx
Final CAPNOCYTOPHAGA INFECTION by Gauri Gawande.pptxFinal CAPNOCYTOPHAGA INFECTION by Gauri Gawande.pptx
Final CAPNOCYTOPHAGA INFECTION by Gauri Gawande.pptx
 
Mastering Wealth: A Path to Financial Freedom
Mastering Wealth: A Path to Financial FreedomMastering Wealth: A Path to Financial Freedom
Mastering Wealth: A Path to Financial Freedom
 
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model SafeSurat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
 
linearity concept of significance, standard deviation, chi square test, stude...
linearity concept of significance, standard deviation, chi square test, stude...linearity concept of significance, standard deviation, chi square test, stude...
linearity concept of significance, standard deviation, chi square test, stude...
 
CURRENT HEALTH PROBLEMS AND ITS SOLUTION BY AYURVEDA.pptx
CURRENT HEALTH PROBLEMS AND ITS SOLUTION BY AYURVEDA.pptxCURRENT HEALTH PROBLEMS AND ITS SOLUTION BY AYURVEDA.pptx
CURRENT HEALTH PROBLEMS AND ITS SOLUTION BY AYURVEDA.pptx
 
Relationship between vascular system disfunction, neurofluid flow and Alzheim...
Relationship between vascular system disfunction, neurofluid flow and Alzheim...Relationship between vascular system disfunction, neurofluid flow and Alzheim...
Relationship between vascular system disfunction, neurofluid flow and Alzheim...
 

Data Integration Score Card

  • 1. The Data Integration Score Card: Context • I recently gave a talk where I was asked what sort of things I look out for when building a data integration environment within Industry • It was a good question so I thought about it and captured the top 10 in the following slide • This is specifically aimed at pre-clinical discovery • And there are a lot more you could add of course, but these are my personal top 10! I’m sure a lot of people will disagree  • There’s a lot of references on the last slide • The views in this slide are entirely my own  • Visit us at http://www.scibite.com for more news on all things data & semantics!
  • 2. Data Integration Score Card 1. Adds Value To Public Data • Shouldn’t just regurgitate freely available sources of data. Integrating these isn’t that hard to do these days. It should add things that you just can’t get/do elsewhere 2. Unambiguously Identifies Concepts. Eliminates Redundancy • There should only be one “asthma” in the system • This includes synonyms (Breast cancer = breast tumor = breast tumour etc) 3. Maps Across Identifiers • There should be strong connectivity across identifiers for the same concept 4. Supports Ontology-based Query • “Give me all the inflammatory diseases…..” 5. Handles Both Structured & Unstructured Data • Pubmed, clinical trials, OMIM etc need to be correctly indexed on concept not synonym (see #2). 6. Integrates Data! • Sounds crazy but is the data *really* connected or does it just look like it is? 7. Can Connect To Live Data • A system cannot always include everything. How can live data be connected up? 8. Enterprise Resource Connectors • Is there proven path to incorporating at both a technical level AND a data level (see #9) 9. Extensible Data Model • Even RDF systems use a data model. How will this support future use cases, how can the system cope with concept types it currently does not know about? Note: don’t believe the “its RDF so it will magically just cope with it” answer! 10. Supports Manual Curation • Whatever the data is, on current estimates up to 50% of it is invalid. How does the system handle this and how can users change what they see?
  • 3. References TERMite: Turning unstructured text into data https://www.scibite.com/products/termite/ Drug discovery FAQs: workflows for answering multidomain drug discovery questions http://dx.doi.org/10.1016/j.drudis.2014.11.006 Scientific Lenses to Support Multiple Views over Linked Chemistry Data http://rd.springer.com/chapter/10.1007%2F978-3-319- 11964-9_7 API-centric Linked Data integration: The Open PHACTS Discovery Platform case study http://dx.doi.org/10.1016/j.websem.2014.03.003 Open PHACTS: semantic interoperability for drug discovery http://dx.doi.org/10.1016/j.drudis.2012.05.016 Systems chemical biology and the Semantic Web: what they mean for the future of drug discovery research http://dx.doi.org/10.1016/j.drudis.2011.12.019 Empowering industrial research with shared biomedical vocabularies http://dx.doi.org/10.1016/j.drudis.2011.09.013 Visualizing the drug target landscape http://dx.doi.org/10.1016/j.drudis.2009.09.011