Why should you trust my data code4lib 2016

Why Should You
Trust My Data?
building data infrastructure
that accommodates networks of trust
Matt Zumwalt
datjawn.com | databindery.com
@ﬂyingzumwalt
code{4}lib 2016

I’m interested in trust.
particularly trust & trustworthiness
when people exchange data

there’s a rhythm
to the computing world
centralization decentralization
client-server peer-to-peer

mainframes
personal computers
server farms
[“internet of everything”]
“the cloud”
“the PC revolution”
“computers”
“the diamond age”

host data
reference each other

By 2019 the data created by IoE
devices alone will be 49 times higher
than all the trafﬁc that moved through
datacenters in 2014.
it won’t scale.
Reference:
Cisco Global Cloud Index

Worldwide Storage Capacity in 2012:
2.5 zettabytes
Total Data Center Trafﬁc in 2016:
10.4 zettabytes per year
Anticipated data created by Internet of
Everything (IoE) devices in 2019:
507.5 zettabytes per year
References:
NetApp
Cisco Global Cloud Index
gigaom
Washington Post

distributed data web
“You can’t propose that something
be a universal space and at the
same time keep control of it.”
- Tim Berners Lee

elements of
trustworthiness
authority & reputation
integrity & provenance
synergy or compatibility
consistency
etc…

we’ve got this
Organisms have been solving
these problems for eons
Humans for millennia
Librarians for centuries
Software developers for decades

‘git for (tabular) data’
transparency & reproducibility
http://datjawn.com
builds from the work of
http://dat-data.com
Tabular: rows & columns (ie. Spreadsheets, CSV, SQL DBs)

initial commit
a set of changes
commit those changes
and describe them
Who made the changes?
Why did they make them?
When did they commit them?

more changes
commit those changes

different changes
committed to a
different branch

other changes
on another branch

get a speciﬁc version
prove it’s identical
know who made it

Files are data.
They have histories.
Metadata are data.
They have histories too.
Whatever the data,
The same patterns apply.

Stop building server-side applications.
Assume that data are anywhere and/or everywhere.
Assume that your software will be run in many places.
Erase your distinctions between server and client.
Let data grow branches - build trees (ie. Merkle DAGs)
Stop thinking of data as singular.
Stop thinking of datasets as monolithic.
Embrace redundancy & replication.
Understand that trustworthiness and authority are dynamic.
Broaden your sense of “now”.
Appreciate provenance.
there are no servers
there is only the web

Meet the
dat jawn team on
Wednesday…
Matt Zumwalt
datjawn.com | databindery.com
@ﬂyingzumwalt
code{4}lib 2016

Why should you trust my data code4lib 2016

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Similar to Why should you trust my data code4lib 2016

Similar to Why should you trust my data code4lib 2016 (20)

Recently uploaded

Recently uploaded (20)

Why should you trust my data code4lib 2016