SMALL SENSORS. BIG DATA.FROM CLARITY TO INSIGHT IN THE WORLD OF THE SENSOR WEB
Barry Smyth, INSIGHT Centre for Data Analytics
@barrysmyth, barry.smyth@ucd.ie
Tuesday 1 October 13
In a typical lifetime ...
Tuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 billion breaths 2.5 billion heart
beats 100 million litres oxygen
100 trillion cells 50,000 litres
water 70 tonnes food 70 million
calories 3 years toilet 1 billion
km 1 year traffic 50,000 kms
walking 0.5 million kWh 250
tonnes coal 50 tonnes waste 12
years work 500 days sick
150,000 yawns 28 years asleep
2 years reading 9 years of TVTuesday 1 October 13
1 = 10bytes
18
exabyte
Tuesday 1 October 13
1 = 10bytes
18
exabyte
1000,000,000,000,000,000
Tuesday 1 October 13
1 = 10bytes
18
exabyte
20,000 x all of the printed material in the
US Library of Congress.
Or all of the words spoken by humans. Ever!
Tuesday 1 October 13
1 = 10bytes
18
exabyte
6 !
hours
But, we now create this
much information every
Tuesday 1 October 13
Connecting atoms & bits.
From algorithms to data.
Tuesday 1 October 13
A PARADIGM
SHIFT
algorithm
data
algorithm
data
Tuesday 1 October 13
VS
IT’S NOT (JUST) ABOUT
THE DATA
N = all
Messy & Diverse
Reusable
Correlation
N = small
Clean & Uniform
Disposable
Causation
Tuesday 1 October 13
computation
sensors
dev
data
Tuesday 1 October 13
THE WORLD’S FIRST
SUPERCOMPUTER
Brainchild of Seymour Cray, in 1964 the
closet-sized CDC 6600 was the biggest,
baddest computer of the age.
5,500 kgs, 480 kb RAM, 3M FLOPs, $60m
Tuesday 1 October 13
MOORE’S MAGICAL
LAWS
In 1965, Intel co-founder, Gordon Moore,
noted a doubling of “computing power”
every 1-2 years and predicted that this
would continue for at least 10 years...
... this became known as Moore’s Law.
Tuesday 1 October 13
Tuesday 1 October 13
A SELF-FULFILLING
PROPHECY?
CDC
6600
[1964]
IBM
PC
[1981]
iPHONE
5S
[2013]
0.05
FLOPs/$
200
FLOPs/$
8M
FLOPs/$
x4,000 x40,000
Tuesday 1 October 13
TO PUT THIS INTO
PERSPECTIVE ...
The iPhone 5S is about 60,000 times
more powerful that the Apollo 11’s
guidance computer.
Tuesday 1 October 13
IF MOORE’S LAW
APPLIED TO CARS?
“If the auto industry had moved
at the same speed ...
...your car today would
cruise comfortably at a
million miles an hour
and probably get a half
a million miles per gallon
of gasoline. But it would be
cheaper to throw your Rolls Royce away
Tuesday 1 October 13
RAY
KURZWEIL
“A computer that once fit in a building,
when I was a student, now fits in my
pocket and is one thousand times more
powerful despite being a million times
less expensive.”
Tuesday 1 October 13
MEMORY, DISK SIZE,
BANDWIDTH, PIXELS,...
All subject to Moore’s Law like
improvements over the past 30 years...
... except for battery
power / energy density.
Tuesday 1 October 13
Tuesday 1 October 13
THE RISE OF THE
SENSOR
WEB
Tuesday 1 October 13
UBIQUITOUS
COMPUTING
Mark Weiser’s 1988 vision for
a Post-PC world saw computing
evolve from a terminal-based
paradigm to one in which
computing and computation
would simply disappear into the
fabric of our world.
Tabs, Pads, Boards Smart Dust, the Internet of
Things, Wearable Computing
Tuesday 1 October 13
THE EMERGING
SENSOR WEB
Tabs
Pads
Boards
Smart Sensors
The Internet of Things
Wearable Computing
Tuesday 1 October 13
A MATERIALS SCIENCE
DETOUR
Chemistry & Physics Novel Materials &
Structures
Common Materials Next Generation
Sensors
Tuesday 1 October 13
NOKIA’S MORPH
CONCEPT DEVICE
Tuesday 1 October 13
CHALLENGES OF
PHYSICAL SENSING
Conventional sensors (thermistors, flow
meters, photoreceptors).
Biofouling & Calibration.
Robustness, Reliability, Energy &
Communications.
Cost, Cost, Cost.
Tuesday 1 October 13
SWEAT
SENSING
Microfluidic, Lab-on-a-Chip, Wearable.
pH sensitive dye &
photo-detector.
Accurate, continuous,
realtime
Athletic performance
Cystic Fibrosis
Tuesday 1 October 13
UNIVERSAL MOBILE
SENSING PLATFORM
Tuesday 1 October 13
UNIVERSAL MOBILE
SENSING PLATFORM
C A M E R A
M
I
C
R
O
P
H
O
N
ES P E E D
L I G H T
O R I E N T A T I O N
H
U
M
I
D
I
T
Y
T E M P E R A T U R E
L
O
C
A
T
I
O
N
T
O
U
C
H
M
O
T
I
O
N
D
I
R
E
C
T
I
O
NF I N G E R P R I N T S
Tuesday 1 October 13
Connectivity
high-speed data
Mobility
location-aware
Power
always on
Tuesday 1 October 13
THE QUANTIFIED SELF
MOVEMENT
A data-rich approach to everyday living.
Gordon Bell (Microsoft) and the My Life Bits
project Digitizing everyday life.
SenseCam
Tuesday 1 October 13
7 YEARS3 MONTHS2 WEEKS
1 PERSON12M PHOTOS1TB
Tuesday 1 October 13
Activities
classification
summarisation
Lifestyle
behaviours
preferences
Events
segmentation
clustering
Tuesday 1 October 13
Tuesday 1 October 13
THE DISRUPTION OF
HEALTHCARE
Always-on personal sensing, 24/7/365
The Creative Disruption of Healthcare
Activity and exercise, sleep and moods,
food, blood glucose, heart rate, pulse ox,
lung function, ...
Tuesday 1 October 13
THERE’S
AN APP
FOR THAT
Tuesday 1 October 13
EXERCISE
& FITNESS
Runkeeper iPhone/Android
Running, Walking, Biking, ...
Age, gender, weight, ...
Location, pace, duration,
climb, calories, heart rate,...
Tuesday 1 October 13
TRACKING
SLEEP
Basic ‘sleep tracking’
based on motion.
Duration vs Movement
Sleep Quality (≈ time/move)
Sleep Notes / Wakeup Moods
Comparative Analytics
Tuesday 1 October 13
MOOD &
FOCUS
The Melon Headband
Uses EEG to track brain
activity to assess ‘focus’.
Tagging, location, and
activity information helps
users to better assess
what impacts their focus.
Tuesday 1 October 13
FOOD &
NUTRITION
Meal logging and nutritional
analysis.
Manual vs Semi-Automatic.
Calorie goals and
diet plans.
Integrated weight
tracking.
Tuesday 1 October 13
HEART RATE SENSING
Using smartphone camera with your
finger. No external sensor required.
Detecting colour changes due to
capillary blood-flow.
Tagging, comparative analytics etc
Tuesday 1 October 13
BLOOD
GLUCOSE
External blood glucose sensor
automatically syncs
readings with app.
Readings tagged with
mealtime, exercise etc.
Analysis and visualization
of trends, logs, stats.
Tuesday 1 October 13
MOBILE
SPIROMETRY
Using a mobile phone
microphone to evaluate
lung function.
FVC, FEV, PEF measures.
Audio Features Machine Learning.
Mean 5.1% error wrt clinical spirometry suitable
for home-based monitoring.
Tuesday 1 October 13
MOBILE
SPIROMETRY
Using a mobile phone
microphone to evaluate
lung function.
FVC, FEV, PEF measures.
Audio Features Machine Learning.
Mean 5.1% error wrt clinical spirometry suitable
for home-based monitoring.
Tuesday 1 October 13
SENSORS
& SPORTS
Profs Brian Caulfield & Niall Moyna (@
CLARITY)
Player Health vs Performance Analysis
Rugby, Athletics, Cycling, Equestrian,
Archery, Boxing, GAA, ...
Tuesday 1 October 13
AUTOMATIC TACKLE
CLASSIFICATION
GPS +
Accelerometer
Tuesday 1 October 13
CONSUMER-DRIVEN
HEALTHCARE?
Towards preventative, sensor-based,
data-driven healthcare.
Sparse checkups 24/7/365 Sensing
The data is ours to share ...
Apps vs Prescriptions?
Tuesday 1 October 13
ALWAYS ON
MOBILE
SENSING
Tuesday 1 October 13
SC
ALIN
G
Tuesday 1 October 13
vertical scalingTuesday 1 October 13
vertical scalingTuesday 1 October 13
horizontal scalingTuesday 1 October 13
horizontal scalingTuesday 1 October 13
PARTICIPATORYSENSINGTuesday 1 October 13
ASTHMOPOLIS SMART
INHALER
Tuesday 1 October 13
PARTICIPATORY
SENSING
Tuesday 1 October 13
HACKING YOUR
COMMUTE
GPS & Navigation Assistants
Map Apps Rule the World
TomTom, Garmin, Google, Apple,
Nokia, ...
Tuesday 1 October 13
CROWDSOURCED
MAPPING (WAZE)
Free smartphone app.
Real-time sensing of users’
location, time, speed etc.
x millions of users
= social mapping +
traffic flow, alerts, hazards, ...
Tuesday 1 October 13
Tuesday 1 October 13
Tuesday 1 October 13
CITIZEN SENSING
PUBLIC TRANSPORT
Roadify (iPhone App)
Status updates for public
transport experiences.
Train, bus, subway, ferry,
parking, ...
Opinions Alerts, Recommendations,
Delays, ...
Tuesday 1 October 13
TURNING PEOPLE INTO
SENSORS
Participatory/Citizen Sensing
Big, messy data real-time insights.
The smartphone as a mobile sensor
platform...
... and the willingness of people to
contribute to data to causes that matter
Tuesday 1 October 13
FROM REAL
TO VIRTUAL
SENSORS
Tuesday 1 October 13
MINING THE
DATA EXHAUST
From Real to Virtual Sensors
Page Views, Read Times, Mouse Movements,
Search Queries, Result Clicks, Social
Connections, Share, Comments, Likes, Posts,
Emails, IMs, ...
Tuesday 1 October 13
THE ORIGINAL BIG DATA
COMPANY
Mining relevance & reputation from links.
Search logs as sensor data.
Tuesday 1 October 13
PAGERANK GOOGLE’S
BIG IDEA
The importance of a page as a ranking signal.
Estimating importance from in-links ...
... and PageRank
was clever way to
count in-links to
accurate estimate
importance.
Tuesday 1 October 13
GOOGLE’S
BIGGER IDEA
$40billion
Google’s real Bigger Idea was that it’s
search engine could sense our
intentions through our queries and
click ...
... and that it could match this demand
with real-time supply through its search
adverts.
Tuesday 1 October 13
SEARCH LOGS AS
SENSOR DATA
“... Web search ... can be likened
to a large-scale distributed network
of sensors for identifying potential
side effects of drugs. There is a
potential public health benefit
in listening to such signals,
and integrating them with
other sources of information.”
“Web-Scale Pharmacovigilance: Listening to Signals
from the Crowd” J Am Med Inform Assoc. (2013)
Tuesday 1 October 13
SENSING DRUG SIDE-
EFFECTS
82M
Queries
6M
Users
Tuesday 1 October 13
SENSING
FLU TRENDS
Identified trigger terms correlated we known
past outbreaks. Tracked real-time occurrence
of these terms, location by location to deliver
accurate* regional outbreak
maps that
correlated
well with
verified
CDC data.
Tuesday 1 October 13
TURNING BROWSERS
INTO BUYERS
Understanding user preferences.
Making personalized suggestions.
Tuesday 1 October 13
items
users
Tuesday 1 October 13
items
users
Correlations between the ratings
patterns of users denote user
similarity ...
People like you have also liked ...
Tuesday 1 October 13
items
users
Conversely correlations between
the ratings patterns of items denote
item similarity ...
If you liked X then you might like Y...
Tuesday 1 October 13
MINING USER-
GENERATED REVIEWS
Tuesday 1 October 13
USER-GENERATED
REVIEWS
+‘ves
staff
location
bed
service
breakfast
-‘ves
noise
elevators
carpet
health club
public transport
Chicago Hotels
Tuesday 1 October 13
OPINION
AMPLIFICATION
Twitter, FaceBook as a source of
real-time opinions.
Raw Text Sentiment Opinion
These days Twitter data has been
used to predict election outcomes,
box office success, and musical talent ...
Tuesday 1 October 13
Participatory sensing as collective
intelligence
Human Intelligence + Brute-Force
Computation
TOWARDS COLLECTIVE
INTELLIGENCE
Tuesday 1 October 13
DEALING WITH EMAIL
SPAM
Back in 2000 Yahoo had a
problem ...
Bots registering free email
accounts for the purpose of
bulk spam.
How to recognise real people from the
spambots?
Luis Von Ahn
Manuel Blum
Tuesday 1 October 13
Yahoo! Mail CAPTCHA
Tuesday 1 October 13
250m
CAPTCHAS
PER DAY
150kPERSON-HOURS
PER DAY
7mPERSON
HOURS
45CAPTCHA
DAYS!
Tuesday 1 October 13
What if we could
do something more with all of this
‘CAPTCHA time’?
Tuesday 1 October 13
Tuesday 1 October 13
99.1%
word-level
accuracy
1.2bn
CAPTCHAS
in year 1
440m
words
17m
books
Tuesday 1 October 13
GAMES WITH A
PURPOSE
In 2003 there were 9bn hours of solitaire
played on PCs...
... and these days there are around 70m
hours of FarmVille played every week!
It only took about 20m hours of human
effort to build the Panama Canal!
Tuesday 1 October 13
FOLD.IT - MOLECULAR
GAME PLAY
Tuesday 1 October 13
HOW WELL DOES IT
ALL WORK?
In 2011, players of Foldit helped to decipher
the crystal structure of the Mason-Pfizer
monkey virus (M-PMV) retroviral protease,
an AIDS-causing monkey virus.
Players “produced” an accurate 3D model of
the enzyme in just 10 days! This structure
had eluded scientists for some 15 years.
Khatib, F. et al. (2011). "Crystal structure of a monomeric retroviral protease solved by
protein folding game players". Nature Structural & Molecular Biology 18 (10): 1175
Tuesday 1 October 13
BIG DATA OR
BIG BROTHER?
Tuesday 1 October 13
THE END OF THE AGE
OF PRIVACY?
“Technology is neither good nor bad, nor is
it neutral”
Public by Default.
The Price of Free?
Ownership of Personal Data?
Tuesday 1 October 13
THE END OF
ANONYMITY
The Case of AOL Searcher No. 4417749.
20M anonymized queries, 600k users as
research data (AOL, 2006).
User No. 4417749 = 62 year old Thelma Arnold of
Lilburn, Ga.
Tuesday 1 October 13
THE PANOPTICON
STATE?
Zamyatin’s dystopian glass-walled future
of government surveillance.
NSA Prism programme.
Tuesday 1 October 13
A shift in the data ownership model a new asset
class for personal data?
Owned by the individual shared with services.
Cloud storage (e.g. DropBox) as a shareable
repository of personal data...
CONTROLLING
PERSONAL DATA
Tuesday 1 October 13
THE BIG DATA WORLD
OF THE SENSOR WEB
Tuesday 1 October 13
N = ALL
MESSYCORRELATION
Tuesday 1 October 13
THE OPTION-VALUE OF BIG DATA
DATA-DRIVEN EVERYTHING
POWER TO THE PEOPLE
Tuesday 1 October 13
THE OPTION-VALUE OF
BIG DATA
Reuse & Recycle
From Primary to Secondary Uses of Data
The Unintended Consequences of Data
Tuesday 1 October 13
DATA-DRIVEN
EVERYTHING
Social Science, Linguistics, Anthropology, Cultural
Studies, Journalism, Political Science,
Humanities ...
All impacted by Big Data Thinking...
Tuesday 1 October 13
GOOGLE’S N-GRAM
VIEWER
Acerbi A, Lampos V, Garnett P, Bentley RA (2013) The Expression
of Emotions in 20th Century Books. PLoS ONE 8(3)
Tuesday 1 October 13
DATA-DRIVEN
EVERYTHING
Michel J-P, Shen YK, Aiden AP, Veres A, Gray MK, et al. (2011)
Quantitative analysis of culture using millions of digitized books. Science
331: 176–182
Lieberman E, Michel J-P, Jackson J, Tang T, Nowak MA (2007)
Quantifying the evolutionary dynamics of language. Nature 449: 713–716
Richards, Daniel Rex. "The content of historical books as an indicator of
past interest in environmental issues." Biodiversity and Conservation
(2013): 1-9.
Lampos, Vasileios, et al. "Analysing Mood Patterns in the United
Kingdom through Twitter Content." arXiv preprint arXiv:1304.5507 (2013).
Tuesday 1 October 13
POWER TO THE PEOPLE
Personal Data & Personal Analytics
People as Sensors in Participatory
Sensing
Human Computation & Collective
Intelligence
Tuesday 1 October 13
Creating a Data-Driven Society
Tuesday 1 October 13

Small sensors-big-data-barry-smyth-ria-2013

  • 1.
    SMALL SENSORS. BIGDATA.FROM CLARITY TO INSIGHT IN THE WORLD OF THE SENSOR WEB Barry Smyth, INSIGHT Centre for Data Analytics @barrysmyth, barry.smyth@ucd.ie Tuesday 1 October 13
  • 2.
    In a typicallifetime ... Tuesday 1 October 13
  • 3.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 4.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 5.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 6.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 7.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 8.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 9.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 10.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 11.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 12.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 13.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 14.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 15.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 16.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 17.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 18.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 19.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 20.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 21.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 22.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 23.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 24.
    1 billion breaths2.5 billion heart beats 100 million litres oxygen 100 trillion cells 50,000 litres water 70 tonnes food 70 million calories 3 years toilet 1 billion km 1 year traffic 50,000 kms walking 0.5 million kWh 250 tonnes coal 50 tonnes waste 12 years work 500 days sick 150,000 yawns 28 years asleep 2 years reading 9 years of TVTuesday 1 October 13
  • 25.
  • 26.
  • 27.
    1 = 10bytes 18 exabyte 20,000x all of the printed material in the US Library of Congress. Or all of the words spoken by humans. Ever! Tuesday 1 October 13
  • 28.
    1 = 10bytes 18 exabyte 6! hours But, we now create this much information every Tuesday 1 October 13
  • 29.
    Connecting atoms &bits. From algorithms to data. Tuesday 1 October 13
  • 30.
  • 31.
    VS IT’S NOT (JUST)ABOUT THE DATA N = all Messy & Diverse Reusable Correlation N = small Clean & Uniform Disposable Causation Tuesday 1 October 13
  • 32.
  • 33.
    THE WORLD’S FIRST SUPERCOMPUTER Brainchildof Seymour Cray, in 1964 the closet-sized CDC 6600 was the biggest, baddest computer of the age. 5,500 kgs, 480 kb RAM, 3M FLOPs, $60m Tuesday 1 October 13
  • 34.
    MOORE’S MAGICAL LAWS In 1965,Intel co-founder, Gordon Moore, noted a doubling of “computing power” every 1-2 years and predicted that this would continue for at least 10 years... ... this became known as Moore’s Law. Tuesday 1 October 13
  • 35.
  • 36.
  • 37.
    TO PUT THISINTO PERSPECTIVE ... The iPhone 5S is about 60,000 times more powerful that the Apollo 11’s guidance computer. Tuesday 1 October 13
  • 38.
    IF MOORE’S LAW APPLIEDTO CARS? “If the auto industry had moved at the same speed ... ...your car today would cruise comfortably at a million miles an hour and probably get a half a million miles per gallon of gasoline. But it would be cheaper to throw your Rolls Royce away Tuesday 1 October 13
  • 39.
    RAY KURZWEIL “A computer thatonce fit in a building, when I was a student, now fits in my pocket and is one thousand times more powerful despite being a million times less expensive.” Tuesday 1 October 13
  • 40.
    MEMORY, DISK SIZE, BANDWIDTH,PIXELS,... All subject to Moore’s Law like improvements over the past 30 years... ... except for battery power / energy density. Tuesday 1 October 13
  • 41.
  • 42.
    THE RISE OFTHE SENSOR WEB Tuesday 1 October 13
  • 43.
    UBIQUITOUS COMPUTING Mark Weiser’s 1988vision for a Post-PC world saw computing evolve from a terminal-based paradigm to one in which computing and computation would simply disappear into the fabric of our world. Tabs, Pads, Boards Smart Dust, the Internet of Things, Wearable Computing Tuesday 1 October 13
  • 44.
    THE EMERGING SENSOR WEB Tabs Pads Boards SmartSensors The Internet of Things Wearable Computing Tuesday 1 October 13
  • 45.
    A MATERIALS SCIENCE DETOUR Chemistry& Physics Novel Materials & Structures Common Materials Next Generation Sensors Tuesday 1 October 13
  • 46.
  • 47.
    CHALLENGES OF PHYSICAL SENSING Conventionalsensors (thermistors, flow meters, photoreceptors). Biofouling & Calibration. Robustness, Reliability, Energy & Communications. Cost, Cost, Cost. Tuesday 1 October 13
  • 48.
    SWEAT SENSING Microfluidic, Lab-on-a-Chip, Wearable. pHsensitive dye & photo-detector. Accurate, continuous, realtime Athletic performance Cystic Fibrosis Tuesday 1 October 13
  • 49.
  • 50.
    UNIVERSAL MOBILE SENSING PLATFORM CA M E R A M I C R O P H O N ES P E E D L I G H T O R I E N T A T I O N H U M I D I T Y T E M P E R A T U R E L O C A T I O N T O U C H M O T I O N D I R E C T I O NF I N G E R P R I N T S Tuesday 1 October 13
  • 51.
  • 52.
    THE QUANTIFIED SELF MOVEMENT Adata-rich approach to everyday living. Gordon Bell (Microsoft) and the My Life Bits project Digitizing everyday life. SenseCam Tuesday 1 October 13
  • 53.
    7 YEARS3 MONTHS2WEEKS 1 PERSON12M PHOTOS1TB Tuesday 1 October 13
  • 54.
  • 55.
  • 56.
    THE DISRUPTION OF HEALTHCARE Always-onpersonal sensing, 24/7/365 The Creative Disruption of Healthcare Activity and exercise, sleep and moods, food, blood glucose, heart rate, pulse ox, lung function, ... Tuesday 1 October 13
  • 57.
  • 58.
    EXERCISE & FITNESS Runkeeper iPhone/Android Running,Walking, Biking, ... Age, gender, weight, ... Location, pace, duration, climb, calories, heart rate,... Tuesday 1 October 13
  • 59.
    TRACKING SLEEP Basic ‘sleep tracking’ basedon motion. Duration vs Movement Sleep Quality (≈ time/move) Sleep Notes / Wakeup Moods Comparative Analytics Tuesday 1 October 13
  • 60.
    MOOD & FOCUS The MelonHeadband Uses EEG to track brain activity to assess ‘focus’. Tagging, location, and activity information helps users to better assess what impacts their focus. Tuesday 1 October 13
  • 61.
    FOOD & NUTRITION Meal loggingand nutritional analysis. Manual vs Semi-Automatic. Calorie goals and diet plans. Integrated weight tracking. Tuesday 1 October 13
  • 62.
    HEART RATE SENSING Usingsmartphone camera with your finger. No external sensor required. Detecting colour changes due to capillary blood-flow. Tagging, comparative analytics etc Tuesday 1 October 13
  • 63.
    BLOOD GLUCOSE External blood glucosesensor automatically syncs readings with app. Readings tagged with mealtime, exercise etc. Analysis and visualization of trends, logs, stats. Tuesday 1 October 13
  • 64.
    MOBILE SPIROMETRY Using a mobilephone microphone to evaluate lung function. FVC, FEV, PEF measures. Audio Features Machine Learning. Mean 5.1% error wrt clinical spirometry suitable for home-based monitoring. Tuesday 1 October 13
  • 65.
    MOBILE SPIROMETRY Using a mobilephone microphone to evaluate lung function. FVC, FEV, PEF measures. Audio Features Machine Learning. Mean 5.1% error wrt clinical spirometry suitable for home-based monitoring. Tuesday 1 October 13
  • 66.
    SENSORS & SPORTS Profs BrianCaulfield & Niall Moyna (@ CLARITY) Player Health vs Performance Analysis Rugby, Athletics, Cycling, Equestrian, Archery, Boxing, GAA, ... Tuesday 1 October 13
  • 67.
  • 68.
    CONSUMER-DRIVEN HEALTHCARE? Towards preventative, sensor-based, data-drivenhealthcare. Sparse checkups 24/7/365 Sensing The data is ours to share ... Apps vs Prescriptions? Tuesday 1 October 13
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
    HACKING YOUR COMMUTE GPS &Navigation Assistants Map Apps Rule the World TomTom, Garmin, Google, Apple, Nokia, ... Tuesday 1 October 13
  • 79.
    CROWDSOURCED MAPPING (WAZE) Free smartphoneapp. Real-time sensing of users’ location, time, speed etc. x millions of users = social mapping + traffic flow, alerts, hazards, ... Tuesday 1 October 13
  • 80.
  • 81.
  • 82.
    CITIZEN SENSING PUBLIC TRANSPORT Roadify(iPhone App) Status updates for public transport experiences. Train, bus, subway, ferry, parking, ... Opinions Alerts, Recommendations, Delays, ... Tuesday 1 October 13
  • 83.
    TURNING PEOPLE INTO SENSORS Participatory/CitizenSensing Big, messy data real-time insights. The smartphone as a mobile sensor platform... ... and the willingness of people to contribute to data to causes that matter Tuesday 1 October 13
  • 84.
  • 85.
    MINING THE DATA EXHAUST FromReal to Virtual Sensors Page Views, Read Times, Mouse Movements, Search Queries, Result Clicks, Social Connections, Share, Comments, Likes, Posts, Emails, IMs, ... Tuesday 1 October 13
  • 86.
    THE ORIGINAL BIGDATA COMPANY Mining relevance & reputation from links. Search logs as sensor data. Tuesday 1 October 13
  • 87.
    PAGERANK GOOGLE’S BIG IDEA Theimportance of a page as a ranking signal. Estimating importance from in-links ... ... and PageRank was clever way to count in-links to accurate estimate importance. Tuesday 1 October 13
  • 88.
    GOOGLE’S BIGGER IDEA $40billion Google’s realBigger Idea was that it’s search engine could sense our intentions through our queries and click ... ... and that it could match this demand with real-time supply through its search adverts. Tuesday 1 October 13
  • 89.
    SEARCH LOGS AS SENSORDATA “... Web search ... can be likened to a large-scale distributed network of sensors for identifying potential side effects of drugs. There is a potential public health benefit in listening to such signals, and integrating them with other sources of information.” “Web-Scale Pharmacovigilance: Listening to Signals from the Crowd” J Am Med Inform Assoc. (2013) Tuesday 1 October 13
  • 90.
  • 91.
    SENSING FLU TRENDS Identified triggerterms correlated we known past outbreaks. Tracked real-time occurrence of these terms, location by location to deliver accurate* regional outbreak maps that correlated well with verified CDC data. Tuesday 1 October 13
  • 92.
    TURNING BROWSERS INTO BUYERS Understandinguser preferences. Making personalized suggestions. Tuesday 1 October 13
  • 93.
  • 94.
    items users Correlations between theratings patterns of users denote user similarity ... People like you have also liked ... Tuesday 1 October 13
  • 95.
    items users Conversely correlations between theratings patterns of items denote item similarity ... If you liked X then you might like Y... Tuesday 1 October 13
  • 96.
  • 97.
  • 98.
    OPINION AMPLIFICATION Twitter, FaceBook asa source of real-time opinions. Raw Text Sentiment Opinion These days Twitter data has been used to predict election outcomes, box office success, and musical talent ... Tuesday 1 October 13
  • 99.
    Participatory sensing ascollective intelligence Human Intelligence + Brute-Force Computation TOWARDS COLLECTIVE INTELLIGENCE Tuesday 1 October 13
  • 100.
    DEALING WITH EMAIL SPAM Backin 2000 Yahoo had a problem ... Bots registering free email accounts for the purpose of bulk spam. How to recognise real people from the spambots? Luis Von Ahn Manuel Blum Tuesday 1 October 13
  • 101.
  • 102.
  • 103.
    What if wecould do something more with all of this ‘CAPTCHA time’? Tuesday 1 October 13
  • 104.
  • 105.
  • 106.
    GAMES WITH A PURPOSE In2003 there were 9bn hours of solitaire played on PCs... ... and these days there are around 70m hours of FarmVille played every week! It only took about 20m hours of human effort to build the Panama Canal! Tuesday 1 October 13
  • 107.
    FOLD.IT - MOLECULAR GAMEPLAY Tuesday 1 October 13
  • 108.
    HOW WELL DOESIT ALL WORK? In 2011, players of Foldit helped to decipher the crystal structure of the Mason-Pfizer monkey virus (M-PMV) retroviral protease, an AIDS-causing monkey virus. Players “produced” an accurate 3D model of the enzyme in just 10 days! This structure had eluded scientists for some 15 years. Khatib, F. et al. (2011). "Crystal structure of a monomeric retroviral protease solved by protein folding game players". Nature Structural & Molecular Biology 18 (10): 1175 Tuesday 1 October 13
  • 109.
    BIG DATA OR BIGBROTHER? Tuesday 1 October 13
  • 110.
    THE END OFTHE AGE OF PRIVACY? “Technology is neither good nor bad, nor is it neutral” Public by Default. The Price of Free? Ownership of Personal Data? Tuesday 1 October 13
  • 111.
    THE END OF ANONYMITY TheCase of AOL Searcher No. 4417749. 20M anonymized queries, 600k users as research data (AOL, 2006). User No. 4417749 = 62 year old Thelma Arnold of Lilburn, Ga. Tuesday 1 October 13
  • 112.
    THE PANOPTICON STATE? Zamyatin’s dystopianglass-walled future of government surveillance. NSA Prism programme. Tuesday 1 October 13
  • 113.
    A shift inthe data ownership model a new asset class for personal data? Owned by the individual shared with services. Cloud storage (e.g. DropBox) as a shareable repository of personal data... CONTROLLING PERSONAL DATA Tuesday 1 October 13
  • 114.
    THE BIG DATAWORLD OF THE SENSOR WEB Tuesday 1 October 13
  • 115.
  • 116.
    THE OPTION-VALUE OFBIG DATA DATA-DRIVEN EVERYTHING POWER TO THE PEOPLE Tuesday 1 October 13
  • 117.
    THE OPTION-VALUE OF BIGDATA Reuse & Recycle From Primary to Secondary Uses of Data The Unintended Consequences of Data Tuesday 1 October 13
  • 118.
    DATA-DRIVEN EVERYTHING Social Science, Linguistics,Anthropology, Cultural Studies, Journalism, Political Science, Humanities ... All impacted by Big Data Thinking... Tuesday 1 October 13
  • 119.
    GOOGLE’S N-GRAM VIEWER Acerbi A,Lampos V, Garnett P, Bentley RA (2013) The Expression of Emotions in 20th Century Books. PLoS ONE 8(3) Tuesday 1 October 13
  • 120.
    DATA-DRIVEN EVERYTHING Michel J-P, ShenYK, Aiden AP, Veres A, Gray MK, et al. (2011) Quantitative analysis of culture using millions of digitized books. Science 331: 176–182 Lieberman E, Michel J-P, Jackson J, Tang T, Nowak MA (2007) Quantifying the evolutionary dynamics of language. Nature 449: 713–716 Richards, Daniel Rex. "The content of historical books as an indicator of past interest in environmental issues." Biodiversity and Conservation (2013): 1-9. Lampos, Vasileios, et al. "Analysing Mood Patterns in the United Kingdom through Twitter Content." arXiv preprint arXiv:1304.5507 (2013). Tuesday 1 October 13
  • 121.
    POWER TO THEPEOPLE Personal Data & Personal Analytics People as Sensors in Participatory Sensing Human Computation & Collective Intelligence Tuesday 1 October 13
  • 122.
    Creating a Data-DrivenSociety Tuesday 1 October 13