Talk Abstract
Accumulo requires its users to trust each Accumulo installation with their data—a malicious server or user could easily compromise critical data or learn secrets they are not authorized to access. One particular threat is a malicious Accumulo server tampering with query results by returning forged, modified, or incomplete results to a user. We have implemented a lightweight client-side cryptographic tool to protect Accumulo users from this kind of threat.
Our solution is able to handle a spectrum of different threats. At one end of the spectrum, we use end-to-end signatures to guarantee data integrity: Accumulo clients can sign the data they write to Accumulo and verify that the Accumulo instance did not modify it. At the other end of the spectrum, we store metadata about all the entries written to Accumulo, allowing querying clients to guarantee not just the integrity of the elements contained in the query, but that nothing was omitted from the query itself. As an intermediate solution, we propose an extension to the signature scheme that would speed up the signing and verification of entries with symmetric key cryptography, as well as allowing periodic auditing of the database.
This work is sponsored by the Department of Defense under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.
Speaker
Cassandra Sparks
Associate Technical Staff, Lincoln Laboratory, MIT
Cassandra Sparks is a researcher at MIT Lincoln Laboratory. She graduated from Indiana University in 2014 with an MS in computer science, focusing on programming languages and formal methods. Lately, she has been working on cryptographic enforcement of data integrity in Accumulo.
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Accumulo Summit 2015: Verifiable Responses to Accumulo Queries [Security]
1. Verifiable Responses to
Accumulo Queries
Cassandra Sparks
Robert K. Cunningham, Ariel Hamlin, Emily Shen,
Mayank Varia, David A. Wilson, Arkady Yerukhimovich
April 29, 2015
This work is sponsored by the Department of Defense under Air Force Contract FA8721-05-C-002. Opinions, interpretations, recommendations
and conclusions are those of the authors and are not necessarily endorsed by the United States Government.
2. Verifiable Queries - 2
CS 04/29/15
Introduction to MIT Lincoln Laboratory
Established 1951
Lincoln Laboratory is a Department of Defense FFRDC operated by MIT
FFRDC: Federally Funded Research and Development Center
3. Verifiable Queries - 3
CS 04/29/15
Technology in Support of National Security
Sensors Information Extraction Communications
Integrated Sensing and Decision Support
(Secure – Countermeasure Resistant)
Purpose
Core Work Areas
Space Control
Intelligence,
Surveillance, and
Reconnaissance Systems
and Technology
Tactical Systems
Air and Missile
Defense Technology
Homeland ProtectionAir Traffic Control
Communication Systems Advanced Technology
Cyber Security and
Information Sciences
Engineering
Current Mission Areas
MIT Lincoln Laboratory
Cyber Security and
Information Sciences
4. Verifiable Queries - 4
CS 04/29/15
Common Big Data Architecture
CommandersOperators Analysts
Users
MaritimeGround SpaceC2 CyberOSINT
<html>
Data
AirHUMINTWeather
Analytics
A
C
DE
B
Computing
Web
Files
Scheduler
Ingest &
Enrichment
Ingest &
EnrichmentIngest
This talk: cryptographically
securing Accumulo
5. Verifiable Queries - 5
CS 04/29/15
Threats to Accumulo
• Outsourced "cloud" server
– Learn content of data/queries
– Misattribute data to inserting
clients
• Malicious insider (likely a
sysadmin)
– Learn/change data or queries
– Misinform honest users
• Malicious clients
– Make unauthorized queries
– Learn stored data
– Learn other clients’ queries
• External attacker
– Insert malware, hack, etc
– We won’t detect these, but
our crypto provides resiliency
Our focus: security against the server
6. Verifiable Queries - 6
CS 04/29/15
Querying
Clients
Secure Accumulo Overview
Hadoop Distributed Filesystem
Accumulo
Zookeeper
Network
Inserting
Clients
End-to-end
signatures
Attribute-based
access control
Cell-level
encryption
Verifiable
query
results
System administrator
Data at rest encryption
TLS encryption
Accumulo provides
no safeguards!
We improve the security of Accumulo with cryptography
8. Verifiable Queries - 8
CS 04/29/15
Accumulo
Tablet
Tablet Server
Tablet
Tablet Server
Tablet
Tablet Server
Inserts in Accumulo
Inserting
Client
Querying
Client
?
Row Column
Family
Column
Qualifier
Visibilit
y
Timestamp Value
Patient A Hospital 1 Diagnoses Doctor 12349857 …
9. Verifiable Queries - 9
CS 04/29/15
• A signature algorithm has three phases:
Message
Message
Key
Generation
Digital Signatures
Signing
A signature scheme is secure if an adversary cannot forge a
signature for a new message without having the signing key
Wrong
Message
Verification
10. Verifiable Queries - 10
CS 04/29/15
Accumulo
Tablet
Tablet Server
Tablet
Tablet Server
Tablet
Tablet Server
Digital Signatures in Accumulo
Querying
Client
Row Column
Family
Column
Qualifier
Visibility
Field
Timestamp Value
Patient A Hospital 1 Diagnoses Doctor 12349857 …
Inserting
Client
VerifSign
11. Verifiable Queries - 11
CS 04/29/15
Signature Code
• Implemented in Python as a client-side wrapper
– Uses the pyaccumulo library
– No server-side modifications needed
• Currently in the process of being open-sourced
– Contact pace-contact@ll.mit.edu for updates
• Several interesting design choices:
– Where to store the signature metadata?
– There are many signature algorithms—which one to use?
12. Verifiable Queries - 12
CS 04/29/15
Storing Signature Metadata
• How do we store the signature of each cell?
Option 1: Separate table Option 2: Value field Option 3: Visibility Field
Pro: original table is
unmodified
Con: twice as many
reads & writes
Pro: value field is good at
storing unstructured data
Con: interferes with iterators
Patient Records
Patient 1 Flu shot
Patient 2 Broken knee
Patient 3 Chicken pox
Doctor
Admin
Admin
Patient Records Signatures
Patient 1 <signature 1>
Patient 2 <signature 2>
Patient 3 <signature 3>
Doctor
Admin
Admin
Patient Records
Patient 1 <signature 1>|Flu shot
Patient 2 <signature 2>|Broken knee
Patient 3 <signature 3>|Chicken pox
Doctor
Admin
Admin
Patient Records
Patient 1 Flu shot
Patient 2 Broken knee
Patient 3 Chicken pox
Doctor|“<signature 1>”
Admin|“<signature 2>”
Admin|“<signature 3>”
Pro: all Accumulo functionality
still works
Con: interferes with visibility label
evaluation optimizations
We support all three options
13. Verifiable Queries - 13
CS 04/29/15
Signature Algorithm Options
We support RSA and ECDSA signatures, and are investigating
how to safely use MACs
Option 1:
RSA Signatures
Option 2:
Elliptic Curve
Signatures (ECDSA)
Option 3:
Message Authentication
Codes
• Fast signature
verification
• Large signature &
key size
• Fast signature creation
• Relatively small signature
& key sizes
• Symmetric key---uses the
same key for signing &
verification
• Much faster than RSA and
ECDSA
• Con: one malicious client
has more power to
interfere with integrity
14. Verifiable Queries - 14
CS 04/29/15
Performance
(curve secp256r1)
Benchmarked on a virtualized single-node Accumulo 1.7.0 instance
15. Verifiable Queries - 15
CS 04/29/15
Security Summary: Signatures
• Signatures allow clients to verify data integrity
– Malicious server cannot modify or fabricate results
• Signatures cannot verify data completeness
– Server could omit both data & signature to avoid detection
Modification Insertion Omission
Signatures can detect:
17. Verifiable Queries - 17
CS 04/29/15
The digest is a small
value (constant size)
that represents the
entire dataset
digest
Authenticated Data Structures
• Data structures that allow provably correct queries
– Correctness defined relative to a trusted, well-known source
– Need to support range queries
VO
Inserting Client
Accumulo Server
Querying Client
?
VO
ADS
ADS: Authenticated Data Structure
VO: Verification Object
18. Verifiable Queries - 18
CS 04/29/15
digest
Merkle Hash Trees
2 4 6 8
h(2) h(4) h(6) h(8)
a = h(h(2), h(4)) b = h(h(6), h(8))
e = h(a, b)
10 12 14 16
h(10) h(12) h(14) h(16)
c = h(h(10), h(12)) d = h(h(14), h(16))
f = h(c, d)
root = h(e, f)
Digest is the root
node’s hash value
19. Verifiable Queries - 19
CS 04/29/15
Merkle Hash Trees
2 4 6 8
h(2) h(4) h(6) h(8)
a = h(h(2), h(4)) b = h(h(6), h(8))
e = h(a, b)
range(5, 9)
10 12 14 16
h(10) h(12) h(14) h(16)
c = h(h(10), h(12)) d = h(h(14), h(16))
f = h(c, d)
root = h(e, f)
Naïve solution allows a malicious server
to omit elements at the ends of ranges
Part of the range returned
Part of the verification object
Computed based on returned
information
20. Verifiable Queries - 20
CS 04/29/15
Naïve Merkle Tree Security
Omitting internal
query results
Signatures:
Naïve MHTs:
Solution: return boundaries of the range
Omitting boundary
query results
21. Verifiable Queries - 21
CS 04/29/15
Merkle Hash Trees, Revisited
2 4 6 8
h(2) h(4) h(6) h(8)
a = h(h(2), h(4)) b = h(h(6), h(8))
e = h(a, b)
range(5, 9)
10 12 14 16
h(10) h(12) h(14) h(16)
c = h(h(10), h(12)) d = h(h(14), h(16))
f = h(c, d)
root = h(e, f)
Part of the range returned
Part of the verification object
Computed based on returned
information
23. Verifiable Queries - 23
CS 04/29/15
Merkle Hash Tree Disadvantages
• Mostly used for static data
• How to insert elements into MHTs?
Approach 1: Unbalanced Insert Approach 2: Balanced Insert
Linear time
operations!
Linear time
insert!
24. Verifiable Queries - 24
CS 04/29/15
Authenticated Skip Lists
O(log(n)) O(log(n))
(expected)
O(n) O(log(n))
(expected)
O(log(n)) O(log(n))
(expected)
MHT Skip List
Lookup
Insert
Verify
Randomized skip lists
have empirically better
performance than other
tree-like data structures
26. Verifiable Queries - 26
CS 04/29/15
Additional Work
• Confidentiality to hide data from the server & unauthorized users
– Per-cell encryption allows flexible encryption for different use cases
– Cryptographically enforcing Accumulo’s visibility labels with key
management
• Using HMACs for better performance without sacrificing security
• Key management and distribution for all cryptographic components
27. Verifiable Queries - 27
CS 04/29/15
Conclusion
• Signatures for data tampering detection
– Currently implemented in Python
– Client-side library
– Contact pace-contact@ll.mit.edu to be notified when the code is open-
sourced
• Authenticated Data Structures for full query correctness checks
– Working on embedding in Accumulo for greater efficiency
Questions?