Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to create a taxonomy for management buy-in

107 views

Published on

Value adds of, reproducibility challenges, use cases, practical recommendations, success actors, key take aways

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

How to create a taxonomy for management buy-in

  1. 1. You thought taxonomies were hard? How to create a taxonomy for “management buy-in" Mary Chitty MSLS mchitty@healthtech.com https://www.cambridgeinnovationinstitute.com/ http://www.healthtech.com/ 1 London, Oct 15, 2019 Cambridge Healthtech Needham MA USA
  2. 2. Where I’m coming from 1992 2000 2006- 2014 2016 2018 2019 Taxonomy now 1,600+ terms and growing. Company is now mid-size. 2018 Signed contract with OntoForce to use Disqover search software. https://www.ontoforce.com/ Acquired Artificial Intelligence and Internet of Things companies. Hired several data scientists. Began informal collaboration with OntoForce, Belgian semantic search engine company.
  3. 3. Taxonomies & Ontologies: Many useful applications • E-commerce • Data visualization • Search, Semantic search and Search Engine Optimization SEO • Statistical analysis • Text mining
  4. 4. Ontologies are complex Simpler is sometimes better • “Ontologies offer advantages over other knowledge systems—they enable both computational use and human understanding, they can …include rich vocabularies of labels, synonyms, and textual definitions. If these are desirable selection criteria, then an ontology should be considered. ” • “Ontologies do also come with computational overheads, however, and can be complex to understand. Other resources such as a vocabulary do not offer the sorts of classification and rich computational descriptions of an ontology but are often much simpler to understand. Let your requirements guide you; ontologies are not a panacea—sometimes one isn’t needed at all. ” • Malone 2016
  5. 5. Value adds of taxonomies/ontologies: Interoperability Reproducibility Data harmonization especially of named entities Identification of data errors or inconsistencies Search/data navigation improvements Collaborative filtering/recommendation engines Validation of correlations to examine possible causalities Enabling previously unaskable questions/use cases
  6. 6. Information overload Complexity Data integration Legacy data Ambiguous and inconsistent data Missing or unfindable data Scaling up data processes Sustainable maintenance – perhaps the biggest challenge of all! Ongoing Challenges:
  7. 7. Primary challenges are as much cultural as technological Life sciences challenges include: Relatively sparse data compared to other domains such as financial Highly dimensional data with many variables (complex to chaotic) Inherently noisy biological data. (Increasingly studied at the single cell or gene expression level). Data on longitudinal health outcomes limited by HIPAA & other privacy regulations, but crucial for evidence based medicine validation. In our era of big data, the irony is we don’t have enough readily usable life sciences data.
  8. 8. Reproduci bility challenges • More than 70% of researchers have tried but failed to reproduce experiments. More than half failed to reproduce their own experiments. Baker, Nature 2016 • “replication alone will get us only so far (and) might actually make matters worse… an essential protection against flawed ideas … is the strategic use of multiple approaches to address one question. Each approach has its own unrelated assumptions, strengths and weaknesses. Results that agree across different methodologies are less likely to be artefacts”. Wikipedia “replication crisis”
  9. 9. Cost benefit Return On Investment ROI Educating decision makers is an ongoing process, even with CXOs who value taxonomies/ontologies. But stakeholders are often skeptical about investing in taxonomies or ontologies. Minimum of 10.2 billion Euros per year. Cost of not having FAIR research data:
  10. 10. Semantic web people have talked about terminology challenges for decades 2001 "The first layer of the semantic Web consists of ontologies and taxonomies ... A huge amount of this is being done very desperately in the realm of biotech, for the human genome and new drug development.” Tim Berners Lee, August 30, 2001 keynote at Software Development East in Boston. 2005 “Semantics is fundamentally not an information technologies issue …it originates out of the need for groups of individuals to work together towards common goals … must agree upon a set of meanings around terminologies, concepts, relations and actions … a lot of confusion arises before people realise whether they are talking about the same or different things." Eric Neumann, Applying the semantic web to drug discovery and development. Drug Discovery World Fall 2005
  11. 11. Business cases for taxonomies or ontologies is not a new problem “a month in the lab can often save an hour in the library” Attributed to chemist Frank Westheimer 1979 interview “Institutions either underestimate the resources needed to do this work , or they are daunted by the entire prospect ... Honestly, very little data will ever be reused. ” personal communication, Juliane Schneider, eagle-i, Harvard Catalyst
  12. 12. Best Practical Advice I’ve come across “One of my mantras is always start small. Show some win in some small domain. Don’t under any circumstance start with saying I’ll just build you this enormous ontology for the next two years …then your world will be better”. .. Just say I’m going to build this tiny little ontology and enable this small application over here … Always making sure my small ontology is enabling the small win.” “Question: how do you encourage semantic modeling? Answer: First I compliment them, and say what you’ve done is a great starting point – because they have actually started … I try to find a couple of structured retrieval applications that they really want to do but their current markup is not allowing ….find two compelling examples … make sure that we’ve got a deliverable in a month or a short period of time where they can do the one trial thing that adds value. Kind of get them on the slippery slope so that they’re’ the owner and they want to do it themselves.” Deborah McGuinness keynote speech 2004
  13. 13. Taxonomy Use Cases Amazon I spoke at Taxonomy Boot Camp 2017 in Washington DC and learned that Amazon has taxonomists on 24 hour emergency call, for when people can’t find their products online. Netflix “[Netflix]paid people to watch films and tag them with all kinds of metadata. This process is so sophisticated and precise that taggers receive a 36-page training document that teaches them how to rate movies ... “[Netflix] even offered a $1 million prize to the team that could design an algorithm that would improve the company's ability to predict how many stars users would give movies. It took years to improve the algorithm by a mere 10 percent. The prize was awarded in 2009, but Netflix never actually incorporated the new models. ” Madrigal, Atlantic 2014 Best business case for taxonomy ever?
  14. 14. My Taxonomy Case Studies in-house database Map industry verticals to company specific verticals Question: Can we use any existing taxonomies such as NAICs codes and CrunchBase to automate? How to integrate existing in-house database with newly acquired company databases? Work in progress. Job title functions and seniorities for people in database Data scientist automated. At least 80% assigned now. Customizable for use by various departments? Phase 1 just completed. Still reviewing and fine-tuning. Job departments for people in database. Similar to job titles. Phase 1 just started. Ontoforce internal data ingestion Project to enable improved access to existing in-house database. Uncovered inconsistencies with labels and tables. Identified data quality issues to address. Working on training users, documenting workflows. Need to add more changeability to existing taxonomy keywords structure. Starting to see how trend analysis may be possible. Work in progress.
  15. 15. My recommendations for starting a project • Consider a pilot/proof of concept. • Start small because that will be easier to evaluate and validate. Don’t try to “boil the ocean”. • Choose a variety of data complexity. Think about degrees of granularity when drafting categories. • Which categories might you want to aggregate? • Which related concepts might you want to segment further? [Phase 2] • Are there assumptions or implicit biases you might be making without realizing? • Solicit feedback from diverse stakeholders as an ongoing process.
  16. 16. Don’t be surprised while building your pilot project : Terminology consensus is challenging at best. Taxonomies, ontologies, naming, tagging, tables, models Many ways to express these concepts. “Biologists would rather share a toothbrush than gene names” Michael Ashburner GeneSeer: A sage for gene names and genomic resources "Biologists would rather share a tooth brush than data” Carol Goble “purposely misquoting Michael Ashburner” Keynote EGEE 2006 Trying for consensus often gets very emotional, challenging – and confusing.
  17. 17. https://www.go-fair.org/fair-principles/ Findable •First step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. … an essential component of the FAIRification process. Accessible •Once the user finds the required data, she/he needs to know how can they be accessed Interoperable •Data usually need to be integrated with other data … need to interoperate with applications or workflows. Reusable •Ultimate goal of FAIR is to optimise the reuse of data… metadata and data should be well-described so that they can be replicated and/or combined in different settings. Keep in mind for management presentation FAIR data can help European Commission and US National Institute of Health have allocated considerable resources to making data FAIRer.
  18. 18. Managemen t may ask “Why can’t I just Google it?” • Search works best if you know what you’re looking for exists AND what to call it. • Taxonomies are more useful if you’re not sure what you want exists, OR you don’t know what to call it, AND/OR there are multiple ways to express concept variations. • Harness the power of serendipity with taxonomies. They give people a sense of whether the “scent of information” is promising.
  19. 19. Success Factors Business case Manageme nt: Look for and document Cost reductions Productivity increases Employee time savings Added competitive advantages Sustainability Risk mitigations Assemble Sponsors, champions and/or influencers Clear executive summary, with KPIs Key Performance Indicators, milestones, values, costs 1-2 pages Utilize url links if needed Remember Align early. Align often. Ask for feedback – it’s a way of getting buy-in. Leave room for suggestions. Be aware of other company initiatives.
  20. 20. But what else needs to change? • Data-readiness “[T]here is a lot of work that needs doing to prepare the data sets for these technologies … a disproportionate amount being invested in the technologies as opposed to investing in "data-readiness… It's just not a slam dunk to mash up a lot of data and think it will work. … The AI solution may help accelerate some tasks, but human expertise may be required for the broad scope of what is needed. “ Nicholas 2019 • Open Science "Is any lifetime long enough these days to learn everything needed to get a drug to market and keep it there? " • More need than ever to collaborate to share knowledge, especially pre-competitively.
  21. 21. Key takeaways 1. Aim first for quick wins with low hanging fruit. 2. Bundle stakeholders valued wants with items you can expect they will eventually need. 3. Seek out allies to get shared buy-in for sustainable justification. 4. Pareto Principle 80% of effects come from 20% of effort. 5. Expectations/change management are crucial skills to cultivate. 6. Collect metrics (quantitative/qualitative) to measure progress, so you know when you’ve made some. 7. Recognize some challenges haven’t been resolved by anyone yet.
  22. 22. Resources to use todayChitty, Mary, Ontologies & Taxonomies glossary & taxonomy, 2019 with 40 plus ontology definitions, 15 taxonomy definitions http://www.genomicglossaries.com/content/ontologies.asp Heath, Chip and Dan, Switch: How to Change Things When Change is Hard, 2010 https://heathbrothers.com/books/switch/ McGuinness, Deborah, Ontology Development 101, A Guide to creating your first ontology http://www.ksl.stanford.edu/people/dlm/papers/ontology-tutorial-noy-mcguinness-abstract.html Research Data Alliance https://www.rd-alliance.org/ a research community organization started in 2013 by the European Commission, US National Science Foundation, US National Institute of Standards and Technology, Australian Department of Innovation. Citation References How Netflix Reverse-Engineered Hollywood, Atlantic, 2014 https://www.theatlantic.com/technology/archive/2014/01/how-netflix-reverse-engineered-hollywood/282679/ Life Science specific BioPortal https://bioportal.bioontology.org/ repository of biomedical ontologies has almost 800 ontologies, mapping from ontologies to I2B2 http://i2b2.bioontology.org/ Malone, James et. al. Ten Simple Rules for selecting a Bio-Ontology, PLoS Comput Biol 12(2), 2016: e1004743. https://doi.org/10.1371/journal.pcbi.1004743 National Center for Biomedical Ontologies NCBO BioPortal Ontology to i2b2 File Mappings http://i2b2.bioontology.org/ Pistoia Alliance, Ontologies Guidelines for Best Practices to support practical application and mapping, 2016 https://pistoiaalliance.atlassian.net/wiki/spaces/PUB/pages/43089942/Ontologies+Guidelines+for+Best+Practice Berneres Lee 2001 http://www.sdgnews.com/sd2001es_006/sd2001es_006.htm no longer on web Neumann, 2005 https://www.ddw-online.com/informatics/p148329-applying-the-semantic-web-to-drug-discovery-and-development.html Michael Ashburner GeneSeer: A sage for gene names and genomic resourcesBMC Genomics. 2005; 6: 134. 2005 Sep 21. doi: 10.1186/1471-2164-6-134 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1266031/ Carol Goble “purposely misquoting Michael Ashburner” Keynote EGEE 2006 Interoperability With Moby 1.0 - It's Better Than Sharing Your ... Semantic alignment and data standardization are vital to solve if we are going to harness modern technologies such as machine learning” Ian Harrow1 Rama Balakrishnan2 Ernesto Jimenez-Ruiz34 Simon Jupp5 Jane Lomax6 Jane Reed7 Martin Romacker8 Christian Senger9 Andrea Splendiani10 Jabe Wilson11 Peter Woollard12 Drug Discovery Today May 2019 https://www.sciencedirect.com/science/article/pii/S1359644618304215 Nicolas 2019 Life Science Leader 2019 March 1, “AI In Life Sciences: Seeing past the Hype” Francois Nicolas and comment by Christy Wilson https://www.lifescienceleader.com/doc/ai-in-life-sciences-seeing-past-the-hype-0001
  23. 23. Acknowledgments Many people have participated in this ongoing project. I’m grateful for their work, insights and encouragement. Cambridge Innovation Institute CII & Cambridge Healthtech Institute CHI •Phillips Kuhl, President •Tonya Urquizo, Knowledge Information Services Analyst & IT Liaison Sanaye Bartlett, Data Analyst & Project Manager •Kaushik Chaudhuri, Director of Product Marketing CII Disqover Team •Kaitlyn Barago, Associate Conference Producer •Nancy Clarke, Data Scientist •Mike Croft, Software Architect •Ben Lakin, Director New Initiatives •Jaime Parlee, Director Marketing Analytics •Craig Wohlers, Manager Knowledge Foundation OntoForce •Hans Constandt, CEO President •Filip Pattyn, Scientific Lead •Niels Vanneste, Customer Data Scientist •Berenice Wulbrecht, Data Science Director, Systems Biology •Fruitful conversations Emails •Ingrid Akerblom, IFA Diversified Consulting John Aubrey, Vertex Mark Burfoot, Novartis NIBR Jane Lomax, SciBite •Eric Neumann, Akidata LLC •Terrell Russell, iRODS Consortium •Juliane Schneider, eagle-i, Harvard Catalyst •Ted Slater, PaaS, Elsevier •John Wilbanks, Sage Bionetworks

×