The Internet of Things is the biggest challenge Big Data has ever faced. For the first time in history, inexpensive connected devices with sensors are available that generate vast amounts of data in seconds. What does it mean for data science that my cat generates gigabytes of data every few hours? Clearly the emergence of IoT technologies will change Big Data as we know it, quite possibly beyond recognition. I will outline three ways in which the IoT explosion will change how we work, three assumptions about data it has irrevocably challenged and three ways we can not merely cope but thrive within this unprecedented expansion of data volumes, velocity and variety (and cats).
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
BDW16 London - Chris von Csefalvay, Helioserv - Cats and What They Tell us About Big Data and IoT
1.
2. The Internet of Cats
(and what we can learn from other oddities in
the world of IoT)
Chris von Csefalvay MA BCL FRSA
Chief Technology Officer, Helioserv Ltd
3. The Speaker
• Chris von Csefalvay (vohn-CHAY-full-why)
• CTO, Helioserv
• Previously: corporate big data roles with global companies
(FTSE100, DAX)
• Before that: geospatial intelligence analysis and modeling
• Even more before that: commercial legal practice
• Before the holocene: Oxford (MA 2010, BCL 2011),
4. The Cat
• River Constanza von Csefalvay (con-STAYN-zuh)
• Chief Feline Officer, us, home
• Previously: kitten, some other household
• Interests: food, napping, being the most networked feline on record
5. FATCAT
Feline Assessment Topology for Care and Automated Treatment
• >600 distinct sensors, entirely COTS
• Sensors over MQTT backbone running on COTS computing
hardware
• Outputs to:
• Dashboard
• AWS S3 (90 days storage) Glacier (frozen archive)
• EMR
• TOTAL COST < $500 (half of a kidney infection vet bill)
6. Mosquitto on
f at cat . mqbs2. l ocal
Cat house (living rm)
sens102. r i ver . sensor s. l ocal
Cat bed
sens104. r i ver . sensor s. l ocal
Scratching post
sens105. r i ver . sensor s. l ocal
Food & water area
sens106. r i ver . sensor s. l ocal
Litter chem telemetry
sens109. r i ver . sensor s. l ocal
DTU under bed
sens120. r i ver . sensor s. l ocal
DTU downstairs
sens121. r i ver . sensor s. l ocal
DTU in Katie’s studio
sens122. r i ver . sensor s. l ocal
Scanner living room
sens131. r i ver . sensor s. l ocal
Scanner stairs
sens132. r i ver . sensor s. l ocal
Scanner corridor
sens133. r i ver . sensor s. l ocal
Bridge
f at cat . cb02. l ocal
Dashboard
f at cat . d. l ocal
Archival task
Everything older than 90 days
O/B router
r t 01. ob. r t s. l ocal
AWS VPC
eu-west-1
FATCAT
Topology
7. FATCAT
Miscellaneous capabilities
• Real-time location
• Food and water intake monitoring
• Urea levels monitoring
• Temperature monitoring
• Behavioural monitoring
• Sleep cycle monitoring
9. There are wider points here. Mostly non-
cat-related.
Complex IoT systems,
rivalling or often even
exceeding industrial
equivalents (here: PTx
telemetry) are now cheap
and easy to build from
COTS parts.
10. There are wider points here. Mostly non-
cat-related.
Complex IoT systems,
rivalling or often even
exceeding industrial
equivalents (here: PTx
telemetry) are now cheap
and easy to build from
COTS parts.
IoT systems will, within
the foreseeable future,
become incredibly
enmeshed with our lives.
11. There are wider points here. Mostly non-
cat-related.
Complex IoT systems,
rivalling or often even
exceeding industrial
equivalents (here: PTx
telemetry) are now cheap
and easy to build from
COTS parts.
IoT systems will, within
the foreseeable future,
become incredibly
enmeshed with our lives.
The potential is almost as
big as the risks:
- Security
- Privacy
- Legal
- Societal
- Economic
- Reputational
12. What does this mean for us as data
scientists, data lovers, data nerds, data
users, data exploiters, data
commercialisers, data visualisers,…?
14. 1) We need to understand the source of our
data better.
Old approach: the ‘statistical idea’
We as data scientists are agnostic to the source, origin, nature and provenance
of our data.
15. 1) We need to understand the source of our
data better.
Old approach: the ‘statistical idea’
We as data scientists are agnostic to the source, origin, nature and provenance
of our data.
New approach: the ‘contextual idea’
Data lives in a wide context, and its origins directly influence its predictive,
descriptive and analytical capabilities.
17. 2) We need to see more in IoT than a
ubiquitous source of data (and have a
healthy dose of criticism).
18. “We can do this.”
“Let’s do this!”
Common
sense
19. “Let’s put an IP stack on it!”
is not a product or business strategy.
20. Legal risk
Social risk
(reputation for being creepy)
Cost/price-point risk
RF Emissions avoidance
Just being sheer ridiculous risk
Annoyed cat risk
Regulatory risk
21. Don’t be the excuse.
Do we need this data?
Do we need it from all users?
Can we get it from a volunteer subsample?
Do we have an opt-out in place?
Does this seem like something we’d want
our kids to buy?
22. 3) We need to be creative about security.
Even if security is generally seen from a more conservative perspective, threats
will not come from where we think they will.
23. Fear: hacked IoT devices will kill us by
shutting down our connected cars, blowing
up our connected fridges and ruining our
clothes.
24. Fear: hacked IoT devices will kill us by
shutting down our connected cars, blowing
up our connected fridges and ruining our
clothes.
Reality: Probably not. But they might shut
down half the internet.
29. The lesson from Mirai…
Hackers know most of the IoT is badly defended at best. Your fridge might not
kill you, but it sure can kill one of the world’s largest DNS providers.
IoT security is going to be an exercise in creative thinking.
proactive, creative approaches
over
reactive, conservative approaches!
30. Tentative bottom lines
Most of the future of the IoT is an unknown unknown.
This includes almost all of its security risks.
The statistical approach won’t work anymore.
We’ve got to understand every element of the pathway, from cat through
sensor to message bus, to be able to effectively use IoT data.
The IoT is not a panacea. Treating it like that will only attract ridicule.
We can connect every litter box to the internet, we just… shouldn’t.