SlideShare a Scribd company logo
1 of 27
Verifiable Responses to
Accumulo Queries
Cassandra Sparks
Robert K. Cunningham, Ariel Hamlin, Emily Shen,
Mayank Varia, David A. Wilson, Arkady Yerukhimovich
April 29, 2015
This work is sponsored by the Department of Defense under Air Force Contract FA8721-05-C-002. Opinions, interpretations, recommendations
and conclusions are those of the authors and are not necessarily endorsed by the United States Government.
Verifiable Queries - 2
CS 04/29/15
Introduction to MIT Lincoln Laboratory
Established 1951
Lincoln Laboratory is a Department of Defense FFRDC operated by MIT
FFRDC: Federally Funded Research and Development Center
Verifiable Queries - 3
CS 04/29/15
Technology in Support of National Security
Sensors Information Extraction Communications
Integrated Sensing and Decision Support
(Secure – Countermeasure Resistant)
Purpose
Core Work Areas
Space Control
Intelligence,
Surveillance, and
Reconnaissance Systems
and Technology
Tactical Systems
Air and Missile
Defense Technology
Homeland ProtectionAir Traffic Control
Communication Systems Advanced Technology
Cyber Security and
Information Sciences
Engineering
Current Mission Areas
MIT Lincoln Laboratory
Cyber Security and
Information Sciences
Verifiable Queries - 4
CS 04/29/15
Common Big Data Architecture
CommandersOperators Analysts
Users
MaritimeGround SpaceC2 CyberOSINT
<html>
Data
AirHUMINTWeather
Analytics
A
C
DE
B
Computing
Web
Files
Scheduler
Ingest &
Enrichment
Ingest &
EnrichmentIngest
This talk: cryptographically
securing Accumulo
Verifiable Queries - 5
CS 04/29/15
Threats to Accumulo
• Outsourced "cloud" server
– Learn content of data/queries
– Misattribute data to inserting
clients
• Malicious insider (likely a
sysadmin)
– Learn/change data or queries
– Misinform honest users
• Malicious clients
– Make unauthorized queries
– Learn stored data
– Learn other clients’ queries
• External attacker
– Insert malware, hack, etc
– We won’t detect these, but
our crypto provides resiliency
Our focus: security against the server
Verifiable Queries - 6
CS 04/29/15
Querying
Clients
Secure Accumulo Overview
Hadoop Distributed Filesystem
Accumulo
Zookeeper
Network
Inserting
Clients
End-to-end
signatures
Attribute-based
access control
Cell-level
encryption
Verifiable
query
results
System administrator
Data at rest encryption
TLS encryption
Accumulo provides
no safeguards!
We improve the security of Accumulo with cryptography
Verifiable Queries - 7
CS 04/29/15
Outline
• Introduction
• End-to-End Signatures
– Digital Signatures
– Design Overview
– Implementation Details
• Verifiable Query Results
• Conclusion
Verifiable Queries - 8
CS 04/29/15
Accumulo
Tablet
Tablet Server
Tablet
Tablet Server
Tablet
Tablet Server
Inserts in Accumulo
Inserting
Client
Querying
Client
?
Row Column
Family
Column
Qualifier
Visibilit
y
Timestamp Value
Patient A Hospital 1 Diagnoses Doctor 12349857 …
Verifiable Queries - 9
CS 04/29/15
• A signature algorithm has three phases:
Message
Message
Key
Generation
Digital Signatures
Signing
A signature scheme is secure if an adversary cannot forge a
signature for a new message without having the signing key
Wrong
Message
Verification
Verifiable Queries - 10
CS 04/29/15
Accumulo
Tablet
Tablet Server
Tablet
Tablet Server
Tablet
Tablet Server
Digital Signatures in Accumulo
Querying
Client
Row Column
Family
Column
Qualifier
Visibility
Field
Timestamp Value
Patient A Hospital 1 Diagnoses Doctor 12349857 …
Inserting
Client
VerifSign
Verifiable Queries - 11
CS 04/29/15
Signature Code
• Implemented in Python as a client-side wrapper
– Uses the pyaccumulo library
– No server-side modifications needed
• Currently in the process of being open-sourced
– Contact pace-contact@ll.mit.edu for updates
• Several interesting design choices:
– Where to store the signature metadata?
– There are many signature algorithms—which one to use?
Verifiable Queries - 12
CS 04/29/15
Storing Signature Metadata
• How do we store the signature of each cell?
Option 1: Separate table Option 2: Value field Option 3: Visibility Field
Pro: original table is
unmodified
Con: twice as many
reads & writes
Pro: value field is good at
storing unstructured data
Con: interferes with iterators
Patient Records
Patient 1 Flu shot
Patient 2 Broken knee
Patient 3 Chicken pox
Doctor
Admin
Admin
Patient Records Signatures
Patient 1 <signature 1>
Patient 2 <signature 2>
Patient 3 <signature 3>
Doctor
Admin
Admin
Patient Records
Patient 1 <signature 1>|Flu shot
Patient 2 <signature 2>|Broken knee
Patient 3 <signature 3>|Chicken pox
Doctor
Admin
Admin
Patient Records
Patient 1 Flu shot
Patient 2 Broken knee
Patient 3 Chicken pox
Doctor|“<signature 1>”
Admin|“<signature 2>”
Admin|“<signature 3>”
Pro: all Accumulo functionality
still works
Con: interferes with visibility label
evaluation optimizations
We support all three options
Verifiable Queries - 13
CS 04/29/15
Signature Algorithm Options
We support RSA and ECDSA signatures, and are investigating
how to safely use MACs
Option 1:
RSA Signatures
Option 2:
Elliptic Curve
Signatures (ECDSA)
Option 3:
Message Authentication
Codes
• Fast signature
verification
• Large signature &
key size
• Fast signature creation
• Relatively small signature
& key sizes
• Symmetric key---uses the
same key for signing &
verification
• Much faster than RSA and
ECDSA
• Con: one malicious client
has more power to
interfere with integrity
Verifiable Queries - 14
CS 04/29/15
Performance
(curve secp256r1)
Benchmarked on a virtualized single-node Accumulo 1.7.0 instance
Verifiable Queries - 15
CS 04/29/15
Security Summary: Signatures
• Signatures allow clients to verify data integrity
– Malicious server cannot modify or fabricate results
• Signatures cannot verify data completeness
– Server could omit both data & signature to avoid detection
Modification Insertion Omission
Signatures can detect:
Verifiable Queries - 16
CS 04/29/15
Outline
• Introduction
• End-to-End Signatures
• Verifiable Query Results
– Merkle Hash Trees
– Authenticated Skip Lists
• Conclusion
Verifiable Queries - 17
CS 04/29/15
The digest is a small
value (constant size)
that represents the
entire dataset
digest
Authenticated Data Structures
• Data structures that allow provably correct queries
– Correctness defined relative to a trusted, well-known source
– Need to support range queries
VO
Inserting Client
Accumulo Server
Querying Client
?
VO
ADS
ADS: Authenticated Data Structure
VO: Verification Object
Verifiable Queries - 18
CS 04/29/15
digest
Merkle Hash Trees
2 4 6 8
h(2) h(4) h(6) h(8)
a = h(h(2), h(4)) b = h(h(6), h(8))
e = h(a, b)
10 12 14 16
h(10) h(12) h(14) h(16)
c = h(h(10), h(12)) d = h(h(14), h(16))
f = h(c, d)
root = h(e, f)
Digest is the root
node’s hash value
Verifiable Queries - 19
CS 04/29/15
Merkle Hash Trees
2 4 6 8
h(2) h(4) h(6) h(8)
a = h(h(2), h(4)) b = h(h(6), h(8))
e = h(a, b)
range(5, 9)
10 12 14 16
h(10) h(12) h(14) h(16)
c = h(h(10), h(12)) d = h(h(14), h(16))
f = h(c, d)
root = h(e, f)
Naïve solution allows a malicious server
to omit elements at the ends of ranges
Part of the range returned
Part of the verification object
Computed based on returned
information
Verifiable Queries - 20
CS 04/29/15
Naïve Merkle Tree Security
Omitting internal
query results
Signatures:
Naïve MHTs:
Solution: return boundaries of the range
Omitting boundary
query results
Verifiable Queries - 21
CS 04/29/15
Merkle Hash Trees, Revisited
2 4 6 8
h(2) h(4) h(6) h(8)
a = h(h(2), h(4)) b = h(h(6), h(8))
e = h(a, b)
range(5, 9)
10 12 14 16
h(10) h(12) h(14) h(16)
c = h(h(10), h(12)) d = h(h(14), h(16))
f = h(c, d)
root = h(e, f)
Part of the range returned
Part of the verification object
Computed based on returned
information
Verifiable Queries - 22
CS 04/29/15
Security Summary: ADSs
Signatures:
Naïve MHTs:
MHTs:
Omitting internal
query results
Omitting boundary
query results
Verifiable Queries - 23
CS 04/29/15
Merkle Hash Tree Disadvantages
• Mostly used for static data
• How to insert elements into MHTs?
Approach 1: Unbalanced Insert Approach 2: Balanced Insert
Linear time
operations!
Linear time
insert!
Verifiable Queries - 24
CS 04/29/15
Authenticated Skip Lists
O(log(n)) O(log(n))
(expected)
O(n) O(log(n))
(expected)
O(log(n)) O(log(n))
(expected)
MHT Skip List
Lookup
Insert
Verify
Randomized skip lists
have empirically better
performance than other
tree-like data structures
Verifiable Queries - 25
CS 04/29/15
Outline
• Introduction
• End-to-End Signatures
• Verifiable Query Results
• Conclusion
Verifiable Queries - 26
CS 04/29/15
Additional Work
• Confidentiality to hide data from the server & unauthorized users
– Per-cell encryption allows flexible encryption for different use cases
– Cryptographically enforcing Accumulo’s visibility labels with key
management
• Using HMACs for better performance without sacrificing security
• Key management and distribution for all cryptographic components
Verifiable Queries - 27
CS 04/29/15
Conclusion
• Signatures for data tampering detection
– Currently implemented in Python
– Client-side library
– Contact pace-contact@ll.mit.edu to be notified when the code is open-
sourced
• Authenticated Data Structures for full query correctness checks
– Working on embedding in Accumulo for greater efficiency
Questions?

More Related Content

Similar to Accumulo Summit 2015: Verifiable Responses to Accumulo Queries [Security]

CodeChecker Overview Nov 2019
CodeChecker Overview Nov 2019CodeChecker Overview Nov 2019
CodeChecker Overview Nov 2019
Olivera Milenkovic
 
Seven Degrees Presentation for 2015 ICEAA
Seven Degrees Presentation for 2015 ICEAASeven Degrees Presentation for 2015 ICEAA
Seven Degrees Presentation for 2015 ICEAA
James Lawlor
 
Real-Time Simulation for MBSE of Synchrophasor Systems
Real-Time Simulation for MBSE of Synchrophasor SystemsReal-Time Simulation for MBSE of Synchrophasor Systems
Real-Time Simulation for MBSE of Synchrophasor Systems
Luigi Vanfretti
 
IS Unit 7_Network Security
IS Unit 7_Network SecurityIS Unit 7_Network Security
IS Unit 7_Network Security
Sarthak Patel
 

Similar to Accumulo Summit 2015: Verifiable Responses to Accumulo Queries [Security] (20)

IoT and M2M Safety and Security
IoT and M2M Safety and Security 	IoT and M2M Safety and Security
IoT and M2M Safety and Security
 
CodeChecker Overview Nov 2019
CodeChecker Overview Nov 2019CodeChecker Overview Nov 2019
CodeChecker Overview Nov 2019
 
Blockchain Based Electronic Ballot System
Blockchain Based Electronic Ballot SystemBlockchain Based Electronic Ballot System
Blockchain Based Electronic Ballot System
 
Let's Get Start Your Preparation for CSA Certificate of Cloud Security Knowle...
Let's Get Start Your Preparation for CSA Certificate of Cloud Security Knowle...Let's Get Start Your Preparation for CSA Certificate of Cloud Security Knowle...
Let's Get Start Your Preparation for CSA Certificate of Cloud Security Knowle...
 
Seven Degrees Presentation for 2015 ICEAA
Seven Degrees Presentation for 2015 ICEAASeven Degrees Presentation for 2015 ICEAA
Seven Degrees Presentation for 2015 ICEAA
 
Perth Meetup August 2021
Perth Meetup August 2021Perth Meetup August 2021
Perth Meetup August 2021
 
Getting Started with Splunk Enterprise
Getting Started with Splunk EnterpriseGetting Started with Splunk Enterprise
Getting Started with Splunk Enterprise
 
Making (Implicit) Security Requirements Explicit for Cyber-Physical Systems: ...
Making (Implicit) Security Requirements Explicit for Cyber-Physical Systems: ...Making (Implicit) Security Requirements Explicit for Cyber-Physical Systems: ...
Making (Implicit) Security Requirements Explicit for Cyber-Physical Systems: ...
 
Smart Contract Security Testing
Smart Contract Security TestingSmart Contract Security Testing
Smart Contract Security Testing
 
IRJET- Detection of Intrinsic Intrusion and Auspice System by Utilizing Data ...
IRJET- Detection of Intrinsic Intrusion and Auspice System by Utilizing Data ...IRJET- Detection of Intrinsic Intrusion and Auspice System by Utilizing Data ...
IRJET- Detection of Intrinsic Intrusion and Auspice System by Utilizing Data ...
 
HPCC Systems Presentation to TDWI Chicago Chapter
HPCC Systems Presentation to TDWI Chicago ChapterHPCC Systems Presentation to TDWI Chicago Chapter
HPCC Systems Presentation to TDWI Chicago Chapter
 
Zero-bug Software, Mathematically Guaranteed
Zero-bug Software, Mathematically GuaranteedZero-bug Software, Mathematically Guaranteed
Zero-bug Software, Mathematically Guaranteed
 
Real-Time Simulation for MBSE of Synchrophasor Systems
Real-Time Simulation for MBSE of Synchrophasor SystemsReal-Time Simulation for MBSE of Synchrophasor Systems
Real-Time Simulation for MBSE of Synchrophasor Systems
 
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
What's New in Syncsort's Trillium Line of Data Quality Software - TSS Enterpr...
 
IRJET-2 Proxy-Oriented Data Uploading in Multi Cloud Storage
IRJET-2 	  Proxy-Oriented Data Uploading in Multi Cloud StorageIRJET-2 	  Proxy-Oriented Data Uploading in Multi Cloud Storage
IRJET-2 Proxy-Oriented Data Uploading in Multi Cloud Storage
 
CMIT 321 QUIZ 1
CMIT 321 QUIZ 1CMIT 321 QUIZ 1
CMIT 321 QUIZ 1
 
IS Unit 7_Network Security
IS Unit 7_Network SecurityIS Unit 7_Network Security
IS Unit 7_Network Security
 
Distributed System by Pratik Tambekar
Distributed System by Pratik TambekarDistributed System by Pratik Tambekar
Distributed System by Pratik Tambekar
 
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
 
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Accumulo Summit 2015: Verifiable Responses to Accumulo Queries [Security]

  • 1. Verifiable Responses to Accumulo Queries Cassandra Sparks Robert K. Cunningham, Ariel Hamlin, Emily Shen, Mayank Varia, David A. Wilson, Arkady Yerukhimovich April 29, 2015 This work is sponsored by the Department of Defense under Air Force Contract FA8721-05-C-002. Opinions, interpretations, recommendations and conclusions are those of the authors and are not necessarily endorsed by the United States Government.
  • 2. Verifiable Queries - 2 CS 04/29/15 Introduction to MIT Lincoln Laboratory Established 1951 Lincoln Laboratory is a Department of Defense FFRDC operated by MIT FFRDC: Federally Funded Research and Development Center
  • 3. Verifiable Queries - 3 CS 04/29/15 Technology in Support of National Security Sensors Information Extraction Communications Integrated Sensing and Decision Support (Secure – Countermeasure Resistant) Purpose Core Work Areas Space Control Intelligence, Surveillance, and Reconnaissance Systems and Technology Tactical Systems Air and Missile Defense Technology Homeland ProtectionAir Traffic Control Communication Systems Advanced Technology Cyber Security and Information Sciences Engineering Current Mission Areas MIT Lincoln Laboratory Cyber Security and Information Sciences
  • 4. Verifiable Queries - 4 CS 04/29/15 Common Big Data Architecture CommandersOperators Analysts Users MaritimeGround SpaceC2 CyberOSINT <html> Data AirHUMINTWeather Analytics A C DE B Computing Web Files Scheduler Ingest & Enrichment Ingest & EnrichmentIngest This talk: cryptographically securing Accumulo
  • 5. Verifiable Queries - 5 CS 04/29/15 Threats to Accumulo • Outsourced "cloud" server – Learn content of data/queries – Misattribute data to inserting clients • Malicious insider (likely a sysadmin) – Learn/change data or queries – Misinform honest users • Malicious clients – Make unauthorized queries – Learn stored data – Learn other clients’ queries • External attacker – Insert malware, hack, etc – We won’t detect these, but our crypto provides resiliency Our focus: security against the server
  • 6. Verifiable Queries - 6 CS 04/29/15 Querying Clients Secure Accumulo Overview Hadoop Distributed Filesystem Accumulo Zookeeper Network Inserting Clients End-to-end signatures Attribute-based access control Cell-level encryption Verifiable query results System administrator Data at rest encryption TLS encryption Accumulo provides no safeguards! We improve the security of Accumulo with cryptography
  • 7. Verifiable Queries - 7 CS 04/29/15 Outline • Introduction • End-to-End Signatures – Digital Signatures – Design Overview – Implementation Details • Verifiable Query Results • Conclusion
  • 8. Verifiable Queries - 8 CS 04/29/15 Accumulo Tablet Tablet Server Tablet Tablet Server Tablet Tablet Server Inserts in Accumulo Inserting Client Querying Client ? Row Column Family Column Qualifier Visibilit y Timestamp Value Patient A Hospital 1 Diagnoses Doctor 12349857 …
  • 9. Verifiable Queries - 9 CS 04/29/15 • A signature algorithm has three phases: Message Message Key Generation Digital Signatures Signing A signature scheme is secure if an adversary cannot forge a signature for a new message without having the signing key Wrong Message Verification
  • 10. Verifiable Queries - 10 CS 04/29/15 Accumulo Tablet Tablet Server Tablet Tablet Server Tablet Tablet Server Digital Signatures in Accumulo Querying Client Row Column Family Column Qualifier Visibility Field Timestamp Value Patient A Hospital 1 Diagnoses Doctor 12349857 … Inserting Client VerifSign
  • 11. Verifiable Queries - 11 CS 04/29/15 Signature Code • Implemented in Python as a client-side wrapper – Uses the pyaccumulo library – No server-side modifications needed • Currently in the process of being open-sourced – Contact pace-contact@ll.mit.edu for updates • Several interesting design choices: – Where to store the signature metadata? – There are many signature algorithms—which one to use?
  • 12. Verifiable Queries - 12 CS 04/29/15 Storing Signature Metadata • How do we store the signature of each cell? Option 1: Separate table Option 2: Value field Option 3: Visibility Field Pro: original table is unmodified Con: twice as many reads & writes Pro: value field is good at storing unstructured data Con: interferes with iterators Patient Records Patient 1 Flu shot Patient 2 Broken knee Patient 3 Chicken pox Doctor Admin Admin Patient Records Signatures Patient 1 <signature 1> Patient 2 <signature 2> Patient 3 <signature 3> Doctor Admin Admin Patient Records Patient 1 <signature 1>|Flu shot Patient 2 <signature 2>|Broken knee Patient 3 <signature 3>|Chicken pox Doctor Admin Admin Patient Records Patient 1 Flu shot Patient 2 Broken knee Patient 3 Chicken pox Doctor|“<signature 1>” Admin|“<signature 2>” Admin|“<signature 3>” Pro: all Accumulo functionality still works Con: interferes with visibility label evaluation optimizations We support all three options
  • 13. Verifiable Queries - 13 CS 04/29/15 Signature Algorithm Options We support RSA and ECDSA signatures, and are investigating how to safely use MACs Option 1: RSA Signatures Option 2: Elliptic Curve Signatures (ECDSA) Option 3: Message Authentication Codes • Fast signature verification • Large signature & key size • Fast signature creation • Relatively small signature & key sizes • Symmetric key---uses the same key for signing & verification • Much faster than RSA and ECDSA • Con: one malicious client has more power to interfere with integrity
  • 14. Verifiable Queries - 14 CS 04/29/15 Performance (curve secp256r1) Benchmarked on a virtualized single-node Accumulo 1.7.0 instance
  • 15. Verifiable Queries - 15 CS 04/29/15 Security Summary: Signatures • Signatures allow clients to verify data integrity – Malicious server cannot modify or fabricate results • Signatures cannot verify data completeness – Server could omit both data & signature to avoid detection Modification Insertion Omission Signatures can detect:
  • 16. Verifiable Queries - 16 CS 04/29/15 Outline • Introduction • End-to-End Signatures • Verifiable Query Results – Merkle Hash Trees – Authenticated Skip Lists • Conclusion
  • 17. Verifiable Queries - 17 CS 04/29/15 The digest is a small value (constant size) that represents the entire dataset digest Authenticated Data Structures • Data structures that allow provably correct queries – Correctness defined relative to a trusted, well-known source – Need to support range queries VO Inserting Client Accumulo Server Querying Client ? VO ADS ADS: Authenticated Data Structure VO: Verification Object
  • 18. Verifiable Queries - 18 CS 04/29/15 digest Merkle Hash Trees 2 4 6 8 h(2) h(4) h(6) h(8) a = h(h(2), h(4)) b = h(h(6), h(8)) e = h(a, b) 10 12 14 16 h(10) h(12) h(14) h(16) c = h(h(10), h(12)) d = h(h(14), h(16)) f = h(c, d) root = h(e, f) Digest is the root node’s hash value
  • 19. Verifiable Queries - 19 CS 04/29/15 Merkle Hash Trees 2 4 6 8 h(2) h(4) h(6) h(8) a = h(h(2), h(4)) b = h(h(6), h(8)) e = h(a, b) range(5, 9) 10 12 14 16 h(10) h(12) h(14) h(16) c = h(h(10), h(12)) d = h(h(14), h(16)) f = h(c, d) root = h(e, f) Naïve solution allows a malicious server to omit elements at the ends of ranges Part of the range returned Part of the verification object Computed based on returned information
  • 20. Verifiable Queries - 20 CS 04/29/15 Naïve Merkle Tree Security Omitting internal query results Signatures: Naïve MHTs: Solution: return boundaries of the range Omitting boundary query results
  • 21. Verifiable Queries - 21 CS 04/29/15 Merkle Hash Trees, Revisited 2 4 6 8 h(2) h(4) h(6) h(8) a = h(h(2), h(4)) b = h(h(6), h(8)) e = h(a, b) range(5, 9) 10 12 14 16 h(10) h(12) h(14) h(16) c = h(h(10), h(12)) d = h(h(14), h(16)) f = h(c, d) root = h(e, f) Part of the range returned Part of the verification object Computed based on returned information
  • 22. Verifiable Queries - 22 CS 04/29/15 Security Summary: ADSs Signatures: Naïve MHTs: MHTs: Omitting internal query results Omitting boundary query results
  • 23. Verifiable Queries - 23 CS 04/29/15 Merkle Hash Tree Disadvantages • Mostly used for static data • How to insert elements into MHTs? Approach 1: Unbalanced Insert Approach 2: Balanced Insert Linear time operations! Linear time insert!
  • 24. Verifiable Queries - 24 CS 04/29/15 Authenticated Skip Lists O(log(n)) O(log(n)) (expected) O(n) O(log(n)) (expected) O(log(n)) O(log(n)) (expected) MHT Skip List Lookup Insert Verify Randomized skip lists have empirically better performance than other tree-like data structures
  • 25. Verifiable Queries - 25 CS 04/29/15 Outline • Introduction • End-to-End Signatures • Verifiable Query Results • Conclusion
  • 26. Verifiable Queries - 26 CS 04/29/15 Additional Work • Confidentiality to hide data from the server & unauthorized users – Per-cell encryption allows flexible encryption for different use cases – Cryptographically enforcing Accumulo’s visibility labels with key management • Using HMACs for better performance without sacrificing security • Key management and distribution for all cryptographic components
  • 27. Verifiable Queries - 27 CS 04/29/15 Conclusion • Signatures for data tampering detection – Currently implemented in Python – Client-side library – Contact pace-contact@ll.mit.edu to be notified when the code is open- sourced • Authenticated Data Structures for full query correctness checks – Working on embedding in Accumulo for greater efficiency Questions?