The talk he explores models and technical architectures for a new generation of data marketplace for Personal Data from Things (PDT).
In this model, brokered data exchanges that occur on the network are tracked without the need for a trusted authority, namely using (a) blockchain technology to record data exchange transactions, and (b) smart contracts that operate on the blockchain, to enforce agreements between data producers and consumers.
In this vision, smart contracts can also be used for dispute resolution, in combination with reputation management mechanisms, to provide incentives for fair behaviour by participants.
Simple data tracking of brokered exchanges is a natural precursor to recording the full provenance of data further down the value chain, i.e., through value-adding aggregators and services. As provenance arguably adds value to information, the project will also study mechanisms for observing and recording complex transformations over data streams.
3. P.Missier2017
SystemsResearchChallenges
3
IoT ∩ People Personal Data from Things (PDT)
IoT vision: personal devices will make our lives better
They often also produce data that is also personal
As per the Data Protection Act 1998
• Are people aware of the trade-offs between privacy and benefits?
1. Ownership:
• What is “my” data? Who else has access to it? To what extent?
2. Awareness of third party use of personal data:
• Who has been doing what with my data?
3. Control.
• How much control can I have on the data that devices produce on my behalf?
Ownership + awareness + control Trust
5. P.Missier2017
SystemsResearchChallenges
6
Moving forward: brokered Personal Data exchanges
Working assumption: Personal data are assets with a value
fitness / health monitoring, energy metering, …
PDT: Control trading
Primary
producers
(wearables…)
Value Added Services /
aggregators
Topics (minimal
data semantics)
What would an infrastructure for a PDT marketplace look like?
Sensor data streams,
batched into windows
6. P.Missier2017
SystemsResearchChallenges
7
Baseline (personal) data marketplace scenario
1-hop Contract model between Primary producers PP and Primary
Consumers PC:
Each topic has an associated unit value: T val(T)
For each batch of N messages from PPi, to PCj about Tk:
(PPi, PCj, Tk, N) PCj owes (N . val(Tk)) coins to PPi
14. P.Missier2017
SystemsResearchChallenges
15
What does the authority do?
1. Control the Tracker DB
1. Prevent fraud:
• Producers have an incentive to over-claim data production
• VAS have an incentive to deny receiving some of the data
• (Data ownership / data theft / Replay attack)
• A third party will have an interest in claiming ownership of messages sent by
others
• For instance, by copying data (possibly encrypted) and replaying it on the
channel, publishing it as its own
24. P.Missier2017
SystemsResearchChallenges
25
Using smart contracts for unilateral reporting verification
Given N PP, M PC,
and R topics T1, T2, …, TR:
Each PPi and each PCj all report their unilateral count cubes for each window w
They all agree on the time interval that defines W (magic, to be dealt with later)
1) No trouble: 1. All PPi and / all PCj report independently and accurately
2. Some do not report, but reports are accurate
2) Trouble: 1. The reports from PPi and PCj do not “add up”
2. The reports do not sync on time / windows
Scenarios:
26. P.Missier2017
SystemsResearchChallenges
27
Publishers transactions
Each of these fragments is sent to the Reconciliation Smart Contract as a
Ethereum blockchain transaction:
- The contract receives N messages associated with w
- For each PCj, the contract has access to the set of topics it subscribes to:
Credit PCj PPi:
28. P.Missier2017
SystemsResearchChallenges
29
Reports propagation
Settlements are straightforward when reports are partial but accurate
Matrices SM, RM, are two views of the same data exchanges:
C1. For each PPi and topic Tk:
SMw(i,k) = RMw(j,i,k) for each j:1..M such that Tk ∈ st(PCj)
C2. for i:1..N:
RMw(j,i,k) = SMw(i,k)
Q. Which subsets of reports are sufficient to complete the matrices?
29. P.Missier2017
SystemsResearchChallenges
30
Fraud detection
Incentives to behave unfairly:
• Can fraudulent reporting be always detected?
• Can responsibility for the fraudulent reporting be ascribed to one or
more specific participants?
• Publishers: over-report
• Subscribers: under-report
1. Detection:
SMw(i,k) > RMw(j,i,k) for some j (1)
2. Ascribing responsibility:
Case 1: PC fraud
Case 2: PP fraud
30. P.Missier2017
SystemsResearchChallenges
31
Fraud detection – initial thoughts on responsibilities
Case 1: PC fraud
It follows from C1, C2 (above) that:
If Tk ∈ PCj ∩ PCj then RMw(j,i,k) = RMw(j’,i,k) for i:1..N
let j’ such that Tk ∈ PCj ∩ PCj:
Suppose SMw(i,k) = RMw(j’,i,k)
This suggests that (1) may be due to PCj under-reporting on Tk, and
PPi reporting correctly
- The more topics the PCs share, the stronger the evidence...
Case 2: PP fraud
Suppose RMw(j,i,k) = RMw(j’,i,k) for all Tk
This suggests that PPi has over-reported
(1) SMw(i,k) > RMw(j,i,k)
31. P.Missier2017
SystemsResearchChallenges
32
The last slide
Some novel uses for blockchain (amongst many others)
https://solarcoin.org
http://www.electricchain.org
http://www.mediachain.io/
https://www.coalaip.org
…
Personal data in the IoT space:
Mashhadi, Afra, Fahim Kawsar, and Utku Gunay Acer. “Human Data Interaction in IoT:
The Ownership Aspect.” In Internet of Things (WF-IoT), 2014 IEEE World Forum on,
159–162, 2014.
Vescovi, Michele, Corrado Moiso, Fabrizio Antonelli, Mattia Pasolli, and Christos
Perentis. “Building an Eco-System of Trusted Services through User Transparency,
Control and Awareness on Personal Data Privacy.” In Procs. W3C Workshop on
Privacy and User–Centric Controls. Berlin, Germany, 2014.
Multi-hop contracts and transitive credit management:
Missier, Paolo. “Data Trajectories: Tracking Reuse of Published Data for Transitive
Credit Attribution.” International Journal of Digital Curation 11, no. 1 (2016): 1–16.
doi:doi:10.2218/ijdc.v11i1.425.
Editor's Notes
(*) EPSRC funding, £25k staff + travel
what happened previously: plasma state at last year's workshop has matured into a technical collaboration with the DE Catapult Centre in London
my motivation there is to explore opportunities for early-impact system research in the IoT space
this has generated a short-term research visiting position at the Catapult...
How much of the data used in a certain computation is my data??
What has its contribution been to the analytics?
Note: message semantics currently described only using broker topics
(*) Work being developed in collaboration with the DE Catapult
Researcher in Residence programme
Therefore, the authority must be able to verify that the data observed on the channels is originated by their respective owners as claimed.
Therefore, the authority must able to recognize when data is not genuine. However this cannot be determined directly, because as long as the data is verified on the channel, we cannot know how it is actually produced. One idea is to leverage the traces and past marketplace transactions for trust assessment on each of the parties, for instance based on past record of certified data handling.
There is incentive to alter the content of otherwise genuine messages with a payload that is expected to have a higher value to consumers than the original one.
Ethereum (*) extends the BitCoin protocol in several ways
This scenario effectively amounts to collecting the senders matrix SMw, piecemeal, one row from each PCj, and then reconstruct the counts cubes for w from it.