Data Science London - Meetup, 28/05/15

Semantic web warmed up:
Ontologies for
the IoT
Dr. Boris Adryan
@BorisAdryan
@thingslearn
Currently getting divorced from
logic.sysbiol.cam.ac.uk

‣Everything is connected
‣ Big, noisy, often
unstructured data
‣ We are learning how
biological entities
depend on each other
DNA > RNA > proteins
have been

‣ Everything is
connected
‣ Big, noisy, often
unstructured data
www.thingslearn.com
Analytics, context integration, machine
learning and predictive modelling for
the IoT.

0 clean shirt left
+
washing machine estimates
97% of your last pack of
powder used
+
it’s Wednesday, 23:55
+
the last four Thursdays
had a morning business
meeting
+
the car is parked 20 m from
a shop
+
last retail activity: 8 sec ago
Send immediate text
reminder to pick up
washing powder + send
tweet from @BorisHouse
“need identiﬁed” AND
“notiﬁcation appropriate”Actionable insight.
From everything.

NO ANALYTICAL FLEXIBILITY IN M2M/IOT
Matt Hatton, Machina Research
The BLN IoT ‘14
Internet replaces wire
It’s all about the
context
M2M
consumer
IoT
deﬁned I-P-O
like it’s 1975
context
context
context
context
context
context
context
Is it hot?

LIFE SCIENCE STRATEGIES
DON’T WORK IN THE IOT
- There are no commonly accepted
- ‘catalogue’ of things,
- ‘ontology’ of things,
- ‘data format’ of things,
- ‘meta data’ for things.
- Most businesses are driven by
revenue, not long-term strategic
vision
- Service providers have no need to
publish
- Data can be highly personal
(cheap excuse)
unless
they’re

META DATA, SHARING AND DATA REPOS
founded in Nov. 1999
But this is a complex and ambitious project, and is one of the biggest challenges that bioinformatics
has yet faced. Major difficulties stem from the detail required to describe the conditions of an
experiment, and the relative and imprecise nature of measurements of expression levels. The
potentially huge volume of data only adds to these difficulties.
Nature
Feb. 2000
“
“
Nov. 2000 Oct. 2002
Wide adoption:
as requirement
for publication
in scientific
journals

THE LIFE SCIENCES FIXED THEIR
KNOWLEDGE REPRESENTATION PROBLEM

FORMALISING KNOWLEDGE
WITH GENE ONTOLOGY

CURRENT GOVERNMENT
INVESTMENTS INTO GENE
ONTOLOGY
NIH alone spent $44,616,906 on
the ontology structure since 2001
(I don’t have data for UK/EU
spendings)
~100 full-time salaries for experts
with domain-speciﬁc knowledge
~40,000 terms

story
measurements
+ meta data
open, public repositories
human
curators
ontology
terms
community
PUBLISH OR PERISH
ok?
journal
informal exchange - no credit!
funders
assessment
The majority of this
infrastructure is paid for by
governments and charities
industry!

measurements
+ meta data
storage &
provenance
human
curators
ontology
terms
user
PUBLISH OR YOU’RE NOT DOING IOT
ok?
Maybe the majority of this
infrastructure should be
paid for by governments?
company
cloud
device
registration
“ “
privileges
dataadded
value

ARE PEOPLE NOT ALREADY USING ONTOLOGIES IN
THE IOT?

ONTOLOGIES HAVE TO BE
PRAGMATIC COMPROMISES
Gene Ontology annotation
15 years of research
47 publications
100+ authors
50+ PhDs
15 direct annotations
~150 inferred annotations

THE THREE BRANCHES OF
Adapted from Anurag et al., Mol. BioSyst., 2012,8, 346-352
Localization:Where is an entity acting?
Function:What does the entity do?
Process:When is the entity needed?

inferences on “is a”
“part of”
“regulates”
“has part”
from geneontology.org
from Ashburner et al., Nat Genet. 2000, 25(1):25-9.
GO AND CONTEXT

THE BRANCHES OF GO AND THE IOT
Localization: inside, (my?) home, living room
Function:
measures temperature
regulates temperature
interacts with user directly
interacts with user via app
Process:
regulation of temperature
measurement of ambient temperature
‘is proxy / is avatar’ for
presence?
ﬁre?
ice age?
winter?

A LAST WORD ON PRAGMATISM
“perfect” ontology
The SSN Ontology allows for
inference entirely on the basis
of its structure and annotation.
In reality, many parameters are
difﬁcult to establish and the
effort to annotate things
outweighs the utility.
“crude” ontology
A simpliﬁed structure allows for
quick annotation even by non-
specialists.
The lack of details can lead to
clashes in the ontology =>
more smartness has to go into
software; more coding effort.
1 billlion
different things
1 milllion
use cases

0 clean shirt left
+
washing machine estimates
97% of your last pack of
powder used
+
it’s Wednesday, 23:55
+
the last four Thursdays
had a morning business
meeting
+
the car is parked 20 m from
a shop
+
last retail activity: 8 sec ago
Send immediate text
reminder to pick up
washing powder + send
tweet from @BorisHouse
“need identiﬁed” AND
“notiﬁcation appropriate”Actionable insight.
From everything.
“indicator of esteem”
3% left and
not pressed
“not home”
“buying”
credit card:
“highly personal device”
~ alive and awake

Dr. Boris Adryan
@BorisAdryan
@thingslearn
@SoftwareSaved
Open software
Open source
Open data
Fellow of the

Data Science London - Meetup, 28/05/15

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Data Science London - Meetup, 28/05/15

Similar to Data Science London - Meetup, 28/05/15 (20)

More from Boris Adryan

More from Boris Adryan (9)

Recently uploaded

Recently uploaded (20)

Data Science London - Meetup, 28/05/15