Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

1,093 views

Published on

Traditional machine-to-machine (M2M) uses the internet to replace what was previously achieved through a wire. The challenges for IT are not much different to any other implementation of a prescribed business model.

But how are we going to leverage the connectedness of devices in the consumer Internet of Things (IoT) in a world in which every individual may show a different degree of technology adoption? Not everyone has the connected Crock Pot! The challenges are manifold, and while in 2015 we are still arguing about technical standards that hinder communication of things across platforms, the looming challenges of data integration are even more significant.

Even if all devices e.g. in the connected home of the future are going to speak one language, how are we generating actionable insight from the available information according to the users' need? How do we determine the appropriateness of action? An empty fridge might be alarming, but should we inform the user of an impending hunger crisis if the door hasn't been opened in a week, the heating system is set to low, the car is parked at the local airport? Draw your conclusions!

Ontologies organize things and establish their relationship to each other. They can be used for knowledge inference. For example, a car is a means of transport and ultimately an indicator of absence or presence. Some scientific domains are already making extensive use of ontologies to deal with vast amounts of information. The Gene Ontology (GO) has over 40k interlinked terms that describe cell and molecular biology. For every biological entity on that scale, we can ask: Where is it? What is its function? What process is it involved with? Benefitting from substantial government funding (in the range of > $40M from the NIH since 2001), knowledge inference through GO is widely applied in academic and industry research.

In this webcast I aim to introduce the three main branches localization, function and process that we use in GO and demonstrate how they're immediately applicable in the IoT — after all, a cell is just a large, interconnected system. I will further discuss relationship types that we use in the annotation of biological entities, and propose a few that are more appropriate for the IoT. I will contrast this relatively simple system with other ontologies suggested for the IoT. It is not my aim to sell GO as a one-size-fits-all, but talk about how building a large ontology has taught us pragmatism that is quite remote from many purely academic ontology proposals.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Through Ontologies

  1. 1. ORGANIZING THE INTERNET OF THINGS ACTIONABLE INSIGHT THROUGH ONTOLOGIES Boris Adryan badryan@gmail.com
  2. 2. • Computational biologist • Research group leader • Advisor at • 2015 Fellow of the Who is @BorisAdryan
  3. 3. • Why a biologist is interested in large, unstructured data • What wrong is with the IoT in its current state • How biologists deal with similar problems • Which academic concepts would be useful in the IoT WHAT TO EXPECT IN THE NEXT HOUR… (including questions!)
  4. 4. • Why a biologist is interested in large, unstructured data • What wrong is with the IoT in its current state • How biologists deal with similar problems • Which academic concepts would be useful in the IoT WHAT TO EXPECT IN THE NEXT 10 MINUTES
  5. 5. DNA = storage of a blueprint RNA = ‘active copy’ of DNA protein = the building blocks of cells and tissues LIFE AS WE KNOW IT transcription translation Gregor Johann Mendel, exhibited in the Library at the NIMR
  6. 6. ‣ Reading DNA information ‣ Determining “the sequence of a gene” was a PhD in the early 1980s ‣ Data processing was mainly transcribing the observation into a research paper BIOLOGY THEN AND NOW SEQUENCE INFORMATION Sanger sequencing ca. 1980 http://www.eplantscience.com
  7. 7. 189,739,230,107 bases base pairs on 15th April 2015 (from 159,813,411,760 bases pairs in April 2015) ‣ We can sequence a human genome in half a day ‣ Sequence databases grow faster than storage capacity ‣ Data processing is the key step in scientific understanding BIOLOGY THEN AND NOW SEQUENCE INFORMATION 1990: automation kilobases a day 2007: next-gen seq megabases a day 2015: 1000s of instruments world-wide
  8. 8. BIOLOGY THEN AND NOW GENE ACTIVITY INFORMATION ‣ When are genes needed? ‣ Classical molecular biology workflow, taking days… ‣ Data is semi-quantitative; testing one gene at the time Northern blot, ca. 1995 ‣ High-throughput gene expression profiling since mid-1990s ‣ Quantitative information for every gene in an organism ‣ Key challenge is the graphical representation and interpretation of the data screenshot from FlyBase, today
  9. 9. 2 6 ATP ‣ Signal transduction and metabolic pathways ‣ Characterisation of proteins and substrates that mediate chemical reactions ‣ Nobel prize material BIOLOGY THEN AND NOW BIOCHEMISTRY
  10. 10. ‣ We know about 250k metabolites ‣ 100k protein structures ‣ on the order of 10k different chemical reactions BIOLOGY THEN AND NOW BIOCHEMISTRY “The Robot Scientist” “small molecules” (Organic & Biomolecular Chemistry Blog) protein (via the Protein Databank, www.pdb.org)
  11. 11. ‣Everything is connected ‣ Big, noisy, often unstructured data ‣ We are learning how biological entities depend on each other DNA > RNA > proteins
  12. 12. • Why a biologist is interested in large, unstructured data • What wrong is with the IoT in its current state • How biologists deal with similar problems • Which academic concepts would be useful in the IoT WHAT TO EXPECT IN THE NEXT 5 MINUTES
  13. 13. ‣ Everything is connected ‣ Big, noisy, often unstructured data www.thingslearn.com Analytics, context integration, machine learning and predictive modelling for the IoT.
  14. 14. 0 clean shirt left + washing machine estimates 97% of your last pack of powder used + it’s Wednesday, 23:55 + the last four Thursdays had a morning business meeting + the car is parked 20 m from a shop + last retail activity: 8 sec ago Send immediate text reminder to pick up washing powder + send tweet from @BorisHouse “need identified” + “notification appropriate” Actionable insight. From everything.
  15. 15. NO ANALYTICAL FLEXIBILITY IN M2M/IOT Matt Hatton, Machina Research The BLN IoT ‘14 Internet replaces wire It’s all about the context M2M consumer IoT defined I-P-O like it’s 1975 context context context context context context context Is this hot?
  16. 16. LIFE SCIENCE STRATEGIES DON’T WORK IN THE IOT - There are no commonly accepted - ‘catalogue’ of things, - ‘ontology’ of things, - ‘data format’ of things, - ‘meta data’ for things. - Most businesses are driven by revenue, not long-term strategic vision - Service providers have no need to publish - Data can be highly personal (cheap excuse) unless they’re
  17. 17. Trojan Room coffee pot - ca. 1993 Oct. 1995 “The Internet of Things” Kevin Ashton, ca. 1999 20 YEARS OF NON-CONVERGENT EVOLUTION FIRST DATA POTENTIAL RECOGNISED TODAY’S REALITY “ignorant coexistence” ➡ Commonly accepted platforms and formats for data exchange ➡ Meta-data deposition is a must ➡ Infrastructure provides entry point for computational knowledge inference “designed to ask questions”
  18. 18. • Why a biologist is interested in large, unstructured data • What wrong is with the IoT in its current state • How biologists deal with similar problems • Which academic concepts would be useful in the IoT WHAT TO EXPECT IN THE NEXT 10 MINUTES
  19. 19. Oct. 1995 TOWARDS MIAMI STANDARD AND DATA REPOSITORIES cf. IoT Nov. 1993 MInimal Annotation for MIcroarray Info
  20. 20. META DATA, SHARING AND DATA REPOSITORIES founded in Nov. 1999 But this is a complex and ambitious project, and is one of the biggest challenges that bioinformatics has yet faced. Major difficulties stem from the detail required to describe the conditions of an experiment, and the relative and imprecise nature of measurements of expression levels.The potentially huge volume of data only adds to these difficulties. Nature Feb. 2000 “ “ Nov. 2000 Oct. 2002 Wide adoption as requirement for publication in scientific journals
  21. 21. META DATA, SHARING AND DATA REPOSITORIES cf. IoT 2014 since 2003 http://en.wikipedia.org/wiki/Silo
  22. 22. THE LIFE SCIENCES FIXED THEIR KNOWLEDGE REPRESENTATION PROBLEM
  23. 23. FORMALISING KNOWLEDGE
  24. 24. FORMALISING KNOWLEDGE WITH GENE ONTOLOGY
  25. 25. CURRENT GOVERNMENT INVESTMENTS INTO GENE ONTOLOGY NIH alone spent $44,616,906 on the ontology structure since 2001 (I don’t have data for UK/EU spendings) ~100 full-time salaries for experts with domain-specific knowledge ~40,000 terms
  26. 26. story measurements + meta data open, public repositories human curators ontology terms community PUBLISH OR PERISH ok? journal informal exchange - no credit! funders assessment The majority of this infrastructure is paid for by governments and charities industry!
  27. 27. OUR PROBLEM IS KNOWLEDGE DATA != INSIGHT WITHOUT ORGANISING IT
  28. 28. • Why a biologist is interested in large, unstructured data • What wrong is with the IoT in its current state • How biologists deal with similar problems • Which academic concepts would be useful in the IoT WHAT TO EXPECT IN THE NEXT 10 MINUTES
  29. 29. measurements + meta data storage & provenance human curators ontology terms user PUBLISH OR YOU’RE NOT DOING IOT ok? Maybe the majority of this infrastructure should be paid for by governments? company cloud device registration “ “ privileges dataadded value
  30. 30. WHAT IS AN ONTOLOGY? used to establish conceptual connection between entities knowledge inference finger ontology structure - body part - limb - arm - hand - thumb - fingerontology rules ‣controlled vocabulary ‣clearly defined relationships is a is a connects to part of with ontological reasoning, a computer can infer that “finger is a body part”, although we haven’t explicitly defined it that way
  31. 31. ARE PEOPLE NOT ALREADY USING ONTOLOGIES IN THE IOT? Semantic Sensor Network Ontology “thermostat” The idea is not new! Cf. extension of the semantic web with the Semantic Sensor Network. ‣catalogs ‣conventions http://www.w3.org/2005/Incubator/ssn/ssnx/ssn
  32. 32. ONTOLOGIES HAVE TO BE PRAGMATIC COMPROMISES Gene Ontology annotation 15 years of research 47 publications 100+ authors 50+ PhDs 15 direct annotations ~150 inferred annotations
  33. 33. THE THREE BRANCHES OF Adapted from Anurag et al., Mol. BioSyst., 2012,8, 346-352 Localization:Where is an entity acting? Function:What does the entity do? Process:When is the entity needed?
  34. 34. inferences on “is a” “part of” “regulates” “has part” from geneontology.org from Ashburner et al., Nat Genet. 2000, 25(1):25-9. GO AND CONTEXT
  35. 35. THE BRANCHES OF GO AND THE IOT Localization: inside, (my?) home, living room Function: measures temperature regulates temperature interacts with user directly interacts with user via app Process: regulation of temperature measurement of ambient temperature ‘is proxy / is avatar’ for presence fire ice age
  36. 36. A LAST WORD ON PRAGMATISM “perfect” ontology The SSN Ontology allows for inference entirely on the basis of its structure and annotation. In reality, many parameters are difficult to establish and the effort to annotate things outweighs the utility. “crude” ontology A simplified structure allows for quick annotation even by non- specialists. The lack of details can lead to clashes in the ontology => more smartness has to go into software; more coding effort. 1 billlion different things 1 milllion use cases
  37. 37. 0 clean shirt left + washing machine estimates 97% of your last pack of powder used + it’s Wednesday, 23:55 + the last four Thursdays had a morning business meeting + the car is parked 20 m from a shop + last retail activity: 8 sec ago Send immediate text reminder to pick up washing powder + send tweet from @BorisHouse “need identified” + “notification appropriate” Actionable insight. From everything. “not home” “buying” credit card: “highly personal device” ~ alive and awake 3% left and not pressed “indicator of esteem”
  38. 38. Today’s biology is a quantitative, data- rich science. Infrastructure for ‘big data’ was driven by academics. Data is only useful if it can be turned into knowledge. Understanding of data requires ‘data about the data’. Meta-data should be in a universally understood format. Ontologies provide context. Gene Ontology (GO) is a de facto standard. Human curation is key to GO. Public funders and industry contribute significantly to GO. Should governments be involved in IoT? GO is not a ‘one fits all’, but has a few useful concepts. What does the thing do? Thing function. For what can the thing be an avatar? Thing process. Where is the thing? Thing localization. @BorisAdryan

×