Verifiable Responses to
Accumulo Queries
Cassandra Sparks
Robert K. Cunningham, Ariel Hamlin, Emily Shen,
Mayank Varia, David A. Wilson, Arkady Yerukhimovich
April 29, 2015
This work is sponsored by the Department of Defense under Air Force Contract FA8721-05-C-002. Opinions, interpretations, recommendations
and conclusions are those of the authors and are not necessarily endorsed by the United States Government.
Verifiable Queries - 2
CS 04/29/15
Introduction to MIT Lincoln Laboratory
Established 1951
Lincoln Laboratory is a Department of Defense FFRDC operated by MIT
FFRDC: Federally Funded Research and Development Center
Verifiable Queries - 3
CS 04/29/15
Technology in Support of National Security
Sensors Information Extraction Communications
Integrated Sensing and Decision Support
(Secure – Countermeasure Resistant)
Purpose
Core Work Areas
Space Control
Intelligence,
Surveillance, and
Reconnaissance Systems
and Technology
Tactical Systems
Air and Missile
Defense Technology
Homeland ProtectionAir Traffic Control
Communication Systems Advanced Technology
Cyber Security and
Information Sciences
Engineering
Current Mission Areas
MIT Lincoln Laboratory
Cyber Security and
Information Sciences
Verifiable Queries - 4
CS 04/29/15
Common Big Data Architecture
CommandersOperators Analysts
Users
MaritimeGround SpaceC2 CyberOSINT
<html>
Data
AirHUMINTWeather
Analytics
A
C
DE
B
Computing
Web
Files
Scheduler
Ingest &
Enrichment
Ingest &
EnrichmentIngest
This talk: cryptographically
securing Accumulo
Verifiable Queries - 5
CS 04/29/15
Threats to Accumulo
• Outsourced "cloud" server
– Learn content of data/queries
– Misattribute data to inserting
clients
• Malicious insider (likely a
sysadmin)
– Learn/change data or queries
– Misinform honest users
• Malicious clients
– Make unauthorized queries
– Learn stored data
– Learn other clients’ queries
• External attacker
– Insert malware, hack, etc
– We won’t detect these, but
our crypto provides resiliency
Our focus: security against the server
Verifiable Queries - 6
CS 04/29/15
Querying
Clients
Secure Accumulo Overview
Hadoop Distributed Filesystem
Accumulo
Zookeeper
Network
Inserting
Clients
End-to-end
signatures
Attribute-based
access control
Cell-level
encryption
Verifiable
query
results
System administrator
Data at rest encryption
TLS encryption
Accumulo provides
no safeguards!
We improve the security of Accumulo with cryptography
Verifiable Queries - 7
CS 04/29/15
Outline
• Introduction
• End-to-End Signatures
– Digital Signatures
– Design Overview
– Implementation Details
• Verifiable Query Results
• Conclusion
Verifiable Queries - 8
CS 04/29/15
Accumulo
Tablet
Tablet Server
Tablet
Tablet Server
Tablet
Tablet Server
Inserts in Accumulo
Inserting
Client
Querying
Client
?
Row Column
Family
Column
Qualifier
Visibilit
y
Timestamp Value
Patient A Hospital 1 Diagnoses Doctor 12349857 …
Verifiable Queries - 9
CS 04/29/15
• A signature algorithm has three phases:
Message
Message
Key
Generation
Digital Signatures
Signing
A signature scheme is secure if an adversary cannot forge a
signature for a new message without having the signing key
Wrong
Message
Verification
Verifiable Queries - 10
CS 04/29/15
Accumulo
Tablet
Tablet Server
Tablet
Tablet Server
Tablet
Tablet Server
Digital Signatures in Accumulo
Querying
Client
Row Column
Family
Column
Qualifier
Visibility
Field
Timestamp Value
Patient A Hospital 1 Diagnoses Doctor 12349857 …
Inserting
Client
VerifSign
Verifiable Queries - 11
CS 04/29/15
Signature Code
• Implemented in Python as a client-side wrapper
– Uses the pyaccumulo library
– No server-side modifications needed
• Currently in the process of being open-sourced
– Contact pace-contact@ll.mit.edu for updates
• Several interesting design choices:
– Where to store the signature metadata?
– There are many signature algorithms—which one to use?
Verifiable Queries - 12
CS 04/29/15
Storing Signature Metadata
• How do we store the signature of each cell?
Option 1: Separate table Option 2: Value field Option 3: Visibility Field
Pro: original table is
unmodified
Con: twice as many
reads & writes
Pro: value field is good at
storing unstructured data
Con: interferes with iterators
Patient Records
Patient 1 Flu shot
Patient 2 Broken knee
Patient 3 Chicken pox
Doctor
Admin
Admin
Patient Records Signatures
Patient 1 <signature 1>
Patient 2 <signature 2>
Patient 3 <signature 3>
Doctor
Admin
Admin
Patient Records
Patient 1 <signature 1>|Flu shot
Patient 2 <signature 2>|Broken knee
Patient 3 <signature 3>|Chicken pox
Doctor
Admin
Admin
Patient Records
Patient 1 Flu shot
Patient 2 Broken knee
Patient 3 Chicken pox
Doctor|“<signature 1>”
Admin|“<signature 2>”
Admin|“<signature 3>”
Pro: all Accumulo functionality
still works
Con: interferes with visibility label
evaluation optimizations
We support all three options
Verifiable Queries - 13
CS 04/29/15
Signature Algorithm Options
We support RSA and ECDSA signatures, and are investigating
how to safely use MACs
Option 1:
RSA Signatures
Option 2:
Elliptic Curve
Signatures (ECDSA)
Option 3:
Message Authentication
Codes
• Fast signature
verification
• Large signature &
key size
• Fast signature creation
• Relatively small signature
& key sizes
• Symmetric key---uses the
same key for signing &
verification
• Much faster than RSA and
ECDSA
• Con: one malicious client
has more power to
interfere with integrity
Verifiable Queries - 14
CS 04/29/15
Performance
(curve secp256r1)
Benchmarked on a virtualized single-node Accumulo 1.7.0 instance
Verifiable Queries - 15
CS 04/29/15
Security Summary: Signatures
• Signatures allow clients to verify data integrity
– Malicious server cannot modify or fabricate results
• Signatures cannot verify data completeness
– Server could omit both data & signature to avoid detection
Modification Insertion Omission
Signatures can detect:
Verifiable Queries - 16
CS 04/29/15
Outline
• Introduction
• End-to-End Signatures
• Verifiable Query Results
– Merkle Hash Trees
– Authenticated Skip Lists
• Conclusion
Verifiable Queries - 17
CS 04/29/15
The digest is a small
value (constant size)
that represents the
entire dataset
digest
Authenticated Data Structures
• Data structures that allow provably correct queries
– Correctness defined relative to a trusted, well-known source
– Need to support range queries
VO
Inserting Client
Accumulo Server
Querying Client
?
VO
ADS
ADS: Authenticated Data Structure
VO: Verification Object
Verifiable Queries - 18
CS 04/29/15
digest
Merkle Hash Trees
2 4 6 8
h(2) h(4) h(6) h(8)
a = h(h(2), h(4)) b = h(h(6), h(8))
e = h(a, b)
10 12 14 16
h(10) h(12) h(14) h(16)
c = h(h(10), h(12)) d = h(h(14), h(16))
f = h(c, d)
root = h(e, f)
Digest is the root
node’s hash value
Verifiable Queries - 19
CS 04/29/15
Merkle Hash Trees
2 4 6 8
h(2) h(4) h(6) h(8)
a = h(h(2), h(4)) b = h(h(6), h(8))
e = h(a, b)
range(5, 9)
10 12 14 16
h(10) h(12) h(14) h(16)
c = h(h(10), h(12)) d = h(h(14), h(16))
f = h(c, d)
root = h(e, f)
Naïve solution allows a malicious server
to omit elements at the ends of ranges
Part of the range returned
Part of the verification object
Computed based on returned
information
Verifiable Queries - 20
CS 04/29/15
Naïve Merkle Tree Security
Omitting internal
query results
Signatures:
Naïve MHTs:
Solution: return boundaries of the range
Omitting boundary
query results
Verifiable Queries - 21
CS 04/29/15
Merkle Hash Trees, Revisited
2 4 6 8
h(2) h(4) h(6) h(8)
a = h(h(2), h(4)) b = h(h(6), h(8))
e = h(a, b)
range(5, 9)
10 12 14 16
h(10) h(12) h(14) h(16)
c = h(h(10), h(12)) d = h(h(14), h(16))
f = h(c, d)
root = h(e, f)
Part of the range returned
Part of the verification object
Computed based on returned
information
Verifiable Queries - 22
CS 04/29/15
Security Summary: ADSs
Signatures:
Naïve MHTs:
MHTs:
Omitting internal
query results
Omitting boundary
query results
Verifiable Queries - 23
CS 04/29/15
Merkle Hash Tree Disadvantages
• Mostly used for static data
• How to insert elements into MHTs?
Approach 1: Unbalanced Insert Approach 2: Balanced Insert
Linear time
operations!
Linear time
insert!
Verifiable Queries - 24
CS 04/29/15
Authenticated Skip Lists
O(log(n)) O(log(n))
(expected)
O(n) O(log(n))
(expected)
O(log(n)) O(log(n))
(expected)
MHT Skip List
Lookup
Insert
Verify
Randomized skip lists
have empirically better
performance than other
tree-like data structures
Verifiable Queries - 25
CS 04/29/15
Outline
• Introduction
• End-to-End Signatures
• Verifiable Query Results
• Conclusion
Verifiable Queries - 26
CS 04/29/15
Additional Work
• Confidentiality to hide data from the server & unauthorized users
– Per-cell encryption allows flexible encryption for different use cases
– Cryptographically enforcing Accumulo’s visibility labels with key
management
• Using HMACs for better performance without sacrificing security
• Key management and distribution for all cryptographic components
Verifiable Queries - 27
CS 04/29/15
Conclusion
• Signatures for data tampering detection
– Currently implemented in Python
– Client-side library
– Contact pace-contact@ll.mit.edu to be notified when the code is open-
sourced
• Authenticated Data Structures for full query correctness checks
– Working on embedding in Accumulo for greater efficiency
Questions?

Accumulo Summit 2015: Verifiable Responses to Accumulo Queries [Security]

  • 1.
    Verifiable Responses to AccumuloQueries Cassandra Sparks Robert K. Cunningham, Ariel Hamlin, Emily Shen, Mayank Varia, David A. Wilson, Arkady Yerukhimovich April 29, 2015 This work is sponsored by the Department of Defense under Air Force Contract FA8721-05-C-002. Opinions, interpretations, recommendations and conclusions are those of the authors and are not necessarily endorsed by the United States Government.
  • 2.
    Verifiable Queries -2 CS 04/29/15 Introduction to MIT Lincoln Laboratory Established 1951 Lincoln Laboratory is a Department of Defense FFRDC operated by MIT FFRDC: Federally Funded Research and Development Center
  • 3.
    Verifiable Queries -3 CS 04/29/15 Technology in Support of National Security Sensors Information Extraction Communications Integrated Sensing and Decision Support (Secure – Countermeasure Resistant) Purpose Core Work Areas Space Control Intelligence, Surveillance, and Reconnaissance Systems and Technology Tactical Systems Air and Missile Defense Technology Homeland ProtectionAir Traffic Control Communication Systems Advanced Technology Cyber Security and Information Sciences Engineering Current Mission Areas MIT Lincoln Laboratory Cyber Security and Information Sciences
  • 4.
    Verifiable Queries -4 CS 04/29/15 Common Big Data Architecture CommandersOperators Analysts Users MaritimeGround SpaceC2 CyberOSINT <html> Data AirHUMINTWeather Analytics A C DE B Computing Web Files Scheduler Ingest & Enrichment Ingest & EnrichmentIngest This talk: cryptographically securing Accumulo
  • 5.
    Verifiable Queries -5 CS 04/29/15 Threats to Accumulo • Outsourced "cloud" server – Learn content of data/queries – Misattribute data to inserting clients • Malicious insider (likely a sysadmin) – Learn/change data or queries – Misinform honest users • Malicious clients – Make unauthorized queries – Learn stored data – Learn other clients’ queries • External attacker – Insert malware, hack, etc – We won’t detect these, but our crypto provides resiliency Our focus: security against the server
  • 6.
    Verifiable Queries -6 CS 04/29/15 Querying Clients Secure Accumulo Overview Hadoop Distributed Filesystem Accumulo Zookeeper Network Inserting Clients End-to-end signatures Attribute-based access control Cell-level encryption Verifiable query results System administrator Data at rest encryption TLS encryption Accumulo provides no safeguards! We improve the security of Accumulo with cryptography
  • 7.
    Verifiable Queries -7 CS 04/29/15 Outline • Introduction • End-to-End Signatures – Digital Signatures – Design Overview – Implementation Details • Verifiable Query Results • Conclusion
  • 8.
    Verifiable Queries -8 CS 04/29/15 Accumulo Tablet Tablet Server Tablet Tablet Server Tablet Tablet Server Inserts in Accumulo Inserting Client Querying Client ? Row Column Family Column Qualifier Visibilit y Timestamp Value Patient A Hospital 1 Diagnoses Doctor 12349857 …
  • 9.
    Verifiable Queries -9 CS 04/29/15 • A signature algorithm has three phases: Message Message Key Generation Digital Signatures Signing A signature scheme is secure if an adversary cannot forge a signature for a new message without having the signing key Wrong Message Verification
  • 10.
    Verifiable Queries -10 CS 04/29/15 Accumulo Tablet Tablet Server Tablet Tablet Server Tablet Tablet Server Digital Signatures in Accumulo Querying Client Row Column Family Column Qualifier Visibility Field Timestamp Value Patient A Hospital 1 Diagnoses Doctor 12349857 … Inserting Client VerifSign
  • 11.
    Verifiable Queries -11 CS 04/29/15 Signature Code • Implemented in Python as a client-side wrapper – Uses the pyaccumulo library – No server-side modifications needed • Currently in the process of being open-sourced – Contact pace-contact@ll.mit.edu for updates • Several interesting design choices: – Where to store the signature metadata? – There are many signature algorithms—which one to use?
  • 12.
    Verifiable Queries -12 CS 04/29/15 Storing Signature Metadata • How do we store the signature of each cell? Option 1: Separate table Option 2: Value field Option 3: Visibility Field Pro: original table is unmodified Con: twice as many reads & writes Pro: value field is good at storing unstructured data Con: interferes with iterators Patient Records Patient 1 Flu shot Patient 2 Broken knee Patient 3 Chicken pox Doctor Admin Admin Patient Records Signatures Patient 1 <signature 1> Patient 2 <signature 2> Patient 3 <signature 3> Doctor Admin Admin Patient Records Patient 1 <signature 1>|Flu shot Patient 2 <signature 2>|Broken knee Patient 3 <signature 3>|Chicken pox Doctor Admin Admin Patient Records Patient 1 Flu shot Patient 2 Broken knee Patient 3 Chicken pox Doctor|“<signature 1>” Admin|“<signature 2>” Admin|“<signature 3>” Pro: all Accumulo functionality still works Con: interferes with visibility label evaluation optimizations We support all three options
  • 13.
    Verifiable Queries -13 CS 04/29/15 Signature Algorithm Options We support RSA and ECDSA signatures, and are investigating how to safely use MACs Option 1: RSA Signatures Option 2: Elliptic Curve Signatures (ECDSA) Option 3: Message Authentication Codes • Fast signature verification • Large signature & key size • Fast signature creation • Relatively small signature & key sizes • Symmetric key---uses the same key for signing & verification • Much faster than RSA and ECDSA • Con: one malicious client has more power to interfere with integrity
  • 14.
    Verifiable Queries -14 CS 04/29/15 Performance (curve secp256r1) Benchmarked on a virtualized single-node Accumulo 1.7.0 instance
  • 15.
    Verifiable Queries -15 CS 04/29/15 Security Summary: Signatures • Signatures allow clients to verify data integrity – Malicious server cannot modify or fabricate results • Signatures cannot verify data completeness – Server could omit both data & signature to avoid detection Modification Insertion Omission Signatures can detect:
  • 16.
    Verifiable Queries -16 CS 04/29/15 Outline • Introduction • End-to-End Signatures • Verifiable Query Results – Merkle Hash Trees – Authenticated Skip Lists • Conclusion
  • 17.
    Verifiable Queries -17 CS 04/29/15 The digest is a small value (constant size) that represents the entire dataset digest Authenticated Data Structures • Data structures that allow provably correct queries – Correctness defined relative to a trusted, well-known source – Need to support range queries VO Inserting Client Accumulo Server Querying Client ? VO ADS ADS: Authenticated Data Structure VO: Verification Object
  • 18.
    Verifiable Queries -18 CS 04/29/15 digest Merkle Hash Trees 2 4 6 8 h(2) h(4) h(6) h(8) a = h(h(2), h(4)) b = h(h(6), h(8)) e = h(a, b) 10 12 14 16 h(10) h(12) h(14) h(16) c = h(h(10), h(12)) d = h(h(14), h(16)) f = h(c, d) root = h(e, f) Digest is the root node’s hash value
  • 19.
    Verifiable Queries -19 CS 04/29/15 Merkle Hash Trees 2 4 6 8 h(2) h(4) h(6) h(8) a = h(h(2), h(4)) b = h(h(6), h(8)) e = h(a, b) range(5, 9) 10 12 14 16 h(10) h(12) h(14) h(16) c = h(h(10), h(12)) d = h(h(14), h(16)) f = h(c, d) root = h(e, f) Naïve solution allows a malicious server to omit elements at the ends of ranges Part of the range returned Part of the verification object Computed based on returned information
  • 20.
    Verifiable Queries -20 CS 04/29/15 Naïve Merkle Tree Security Omitting internal query results Signatures: Naïve MHTs: Solution: return boundaries of the range Omitting boundary query results
  • 21.
    Verifiable Queries -21 CS 04/29/15 Merkle Hash Trees, Revisited 2 4 6 8 h(2) h(4) h(6) h(8) a = h(h(2), h(4)) b = h(h(6), h(8)) e = h(a, b) range(5, 9) 10 12 14 16 h(10) h(12) h(14) h(16) c = h(h(10), h(12)) d = h(h(14), h(16)) f = h(c, d) root = h(e, f) Part of the range returned Part of the verification object Computed based on returned information
  • 22.
    Verifiable Queries -22 CS 04/29/15 Security Summary: ADSs Signatures: Naïve MHTs: MHTs: Omitting internal query results Omitting boundary query results
  • 23.
    Verifiable Queries -23 CS 04/29/15 Merkle Hash Tree Disadvantages • Mostly used for static data • How to insert elements into MHTs? Approach 1: Unbalanced Insert Approach 2: Balanced Insert Linear time operations! Linear time insert!
  • 24.
    Verifiable Queries -24 CS 04/29/15 Authenticated Skip Lists O(log(n)) O(log(n)) (expected) O(n) O(log(n)) (expected) O(log(n)) O(log(n)) (expected) MHT Skip List Lookup Insert Verify Randomized skip lists have empirically better performance than other tree-like data structures
  • 25.
    Verifiable Queries -25 CS 04/29/15 Outline • Introduction • End-to-End Signatures • Verifiable Query Results • Conclusion
  • 26.
    Verifiable Queries -26 CS 04/29/15 Additional Work • Confidentiality to hide data from the server & unauthorized users – Per-cell encryption allows flexible encryption for different use cases – Cryptographically enforcing Accumulo’s visibility labels with key management • Using HMACs for better performance without sacrificing security • Key management and distribution for all cryptographic components
  • 27.
    Verifiable Queries -27 CS 04/29/15 Conclusion • Signatures for data tampering detection – Currently implemented in Python – Client-side library – Contact pace-contact@ll.mit.edu to be notified when the code is open- sourced • Authenticated Data Structures for full query correctness checks – Working on embedding in Accumulo for greater efficiency Questions?