Hashing
THEN AND NOW
MIKE SMORUL – ADAPT PROJECT
Commodity Storage
Performance
2003

JetStor III IDE-FC

62MB/s

large block

2013
218MB/s

workstation SSD
Perc 6/MD1...
Chip Speed
 2003:

Pentium 4

3.2Ghz

 2013:

Core i7 Extreme

3.5Ghz
Hashing Performance
 SHA-256

Hashing

Java:

85MB/s
Crypto++: 111-134MB/s
 Real

World Penalty

Java:

20-40% penalt...
Implications
 Flipped

bottlenecks
Parallelize Digesting
 Independent

IO and digest

threads
 Always have work for the
digest algorithm.
 Large files saw...
Securing Data in Motion

?
Integrity across the
network
 Internal
 Prove

Auditing
your hardware

 Peer-Auditing
 Prove

 Digital

your friends
...
Chronopolis Integrity
 Current:
Producer

supplied
authoritative manifest
Peers locally monitor
integrity
Manually tra...
Chronopolis Integrity
 In-progress
Single

integrity token back
to ingest

 Ideal
Tokens

issued prior to arrival
‘Pr...
Manifests 2.0
 Token

manifests
 Portable, embeddable
Python,

etc
Integrity supporting
Provenance
 Digests

in a cloud validate
transfer only
 Http headers can pass
extended integrity
in...
Closing
 Why

are you hashing?
 What do you want to
prove?
 Hashing Cost/performance
Contact
Mike Smorul
msmorul@sesync.org

http://adapt.umiacs.umd.edu/ace
Upcoming SlideShare
Loading in...5
×

Pasig - Hashing presentation-2013

193

Published on

Presentation to the 2013 Pasig meeting in Washington DC

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
193
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Cover two topicsObservations on storage and hashing performanceHashing in Chronopolis and larger systems
  • ACNC RAID - 20MB small io, 62MB large block
  • Java implementation is default implementationCrypto++But….Real world performance where you have a read/digest pattern
  • Overtime the performance bottleneck flipped from storage and subsystems being a bottleneck to the hashing algorithm being a bottleneckCan we hash enough in time
  • Two ways to parallelizeMultiple simul filesOr thread a single file digestImplementation was two threads and a set of buffers that were passed between the threads
  • This is possible today.Question, what happens in the future for recovery
  • There is not a one size fits all for integrity checkingEach has their strengths/weaknessesInternal - vulnerable to malice, deletions, etcPeer - requires existing relationship and data at both sidesDigital Signatures - trusting sig hasn’t been compromised. If it has, then nothing can be trusted - revocation doesn’t really workToken - This is ACE - small information next to file can prove file hasn’t been tampered with - proves date, but not necessarily identityWhat should you use? - All, whatever is appropriate
  • Chronopolis uses ACE internallyManifests are producer supplied - we create our own token due to weak manifests from a producer (md5, etc)To trace back, we need tokens from ingestion node
  • Single token back to ingest - token issued inline with manifest validation - nodes become transparentIdeal - tokens issues at producer - explain how tokens can be issued before producer
  • There is an ACE token format which packs file identifiers (paths) and ACE tokensDesigned to be embedded in process
  • We can use ACE tokens and extended integrity information to prove provenanceIn a cloud, digests ONLY validate non-corrupt transfer - does not protect against tamperingMost/all cloud systems support extended metadata - use it for advanced integrity information - tokens are 5-6 extra headers - allows for end user validation of data
  • Reasons for hashing - operational, malice, provenanceHashing costs - currently flat, however SHA3 may change that (BLAKE alg 15Gbps+)
  • Pasig - Hashing presentation-2013

    1. 1. Hashing THEN AND NOW MIKE SMORUL – ADAPT PROJECT
    2. 2. Commodity Storage Performance 2003 JetStor III IDE-FC 62MB/s large block 2013 218MB/s workstation SSD Perc 6/MD1000, 400MB/s+
    3. 3. Chip Speed  2003: Pentium 4 3.2Ghz  2013: Core i7 Extreme 3.5Ghz
    4. 4. Hashing Performance  SHA-256 Hashing Java: 85MB/s Crypto++: 111-134MB/s  Real World Penalty Java: 20-40% penalty on slow seek disk
    5. 5. Implications  Flipped bottlenecks
    6. 6. Parallelize Digesting  Independent IO and digest threads  Always have work for the digest algorithm.  Large files saw over 95% of algorithm potential.  Small files unchanged.
    7. 7. Securing Data in Motion ?
    8. 8. Integrity across the network  Internal  Prove Auditing your hardware  Peer-Auditing  Prove  Digital your friends Signatures Prove identity  Token Based Prove time
    9. 9. Chronopolis Integrity  Current: Producer supplied authoritative manifest Peers locally monitor integrity Manually trace back to point of ingest
    10. 10. Chronopolis Integrity  In-progress Single integrity token back to ingest  Ideal Tokens issued prior to arrival ‘Prove’ the state of data to point before Chronopolis
    11. 11. Manifests 2.0  Token manifests  Portable, embeddable Python, etc
    12. 12. Integrity supporting Provenance  Digests in a cloud validate transfer only  Http headers can pass extended integrity information End-user verification
    13. 13. Closing  Why are you hashing?  What do you want to prove?  Hashing Cost/performance
    14. 14. Contact Mike Smorul msmorul@sesync.org http://adapt.umiacs.umd.edu/ace
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×