Your SlideShare is downloading. ×
0
Pasig - Hashing presentation-2013
Pasig - Hashing presentation-2013
Pasig - Hashing presentation-2013
Pasig - Hashing presentation-2013
Pasig - Hashing presentation-2013
Pasig - Hashing presentation-2013
Pasig - Hashing presentation-2013
Pasig - Hashing presentation-2013
Pasig - Hashing presentation-2013
Pasig - Hashing presentation-2013
Pasig - Hashing presentation-2013
Pasig - Hashing presentation-2013
Pasig - Hashing presentation-2013
Pasig - Hashing presentation-2013
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Pasig - Hashing presentation-2013

172

Published on

Presentation to the 2013 Pasig meeting in Washington DC

Presentation to the 2013 Pasig meeting in Washington DC

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
172
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Cover two topicsObservations on storage and hashing performanceHashing in Chronopolis and larger systems
  • ACNC RAID - 20MB small io, 62MB large block
  • Java implementation is default implementationCrypto++But….Real world performance where you have a read/digest pattern
  • Overtime the performance bottleneck flipped from storage and subsystems being a bottleneck to the hashing algorithm being a bottleneckCan we hash enough in time
  • Two ways to parallelizeMultiple simul filesOr thread a single file digestImplementation was two threads and a set of buffers that were passed between the threads
  • This is possible today.Question, what happens in the future for recovery
  • There is not a one size fits all for integrity checkingEach has their strengths/weaknessesInternal - vulnerable to malice, deletions, etcPeer - requires existing relationship and data at both sidesDigital Signatures - trusting sig hasn’t been compromised. If it has, then nothing can be trusted - revocation doesn’t really workToken - This is ACE - small information next to file can prove file hasn’t been tampered with - proves date, but not necessarily identityWhat should you use? - All, whatever is appropriate
  • Chronopolis uses ACE internallyManifests are producer supplied - we create our own token due to weak manifests from a producer (md5, etc)To trace back, we need tokens from ingestion node
  • Single token back to ingest - token issued inline with manifest validation - nodes become transparentIdeal - tokens issues at producer - explain how tokens can be issued before producer
  • There is an ACE token format which packs file identifiers (paths) and ACE tokensDesigned to be embedded in process
  • We can use ACE tokens and extended integrity information to prove provenanceIn a cloud, digests ONLY validate non-corrupt transfer - does not protect against tamperingMost/all cloud systems support extended metadata - use it for advanced integrity information - tokens are 5-6 extra headers - allows for end user validation of data
  • Reasons for hashing - operational, malice, provenanceHashing costs - currently flat, however SHA3 may change that (BLAKE alg 15Gbps+)
  • Transcript

    • 1. Hashing THEN AND NOW MIKE SMORUL – ADAPT PROJECT
    • 2. Commodity Storage Performance 2003 JetStor III IDE-FC 62MB/s large block 2013 218MB/s workstation SSD Perc 6/MD1000, 400MB/s+
    • 3. Chip Speed  2003: Pentium 4 3.2Ghz  2013: Core i7 Extreme 3.5Ghz
    • 4. Hashing Performance  SHA-256 Hashing Java: 85MB/s Crypto++: 111-134MB/s  Real World Penalty Java: 20-40% penalty on slow seek disk
    • 5. Implications  Flipped bottlenecks
    • 6. Parallelize Digesting  Independent IO and digest threads  Always have work for the digest algorithm.  Large files saw over 95% of algorithm potential.  Small files unchanged.
    • 7. Securing Data in Motion ?
    • 8. Integrity across the network  Internal  Prove Auditing your hardware  Peer-Auditing  Prove  Digital your friends Signatures Prove identity  Token Based Prove time
    • 9. Chronopolis Integrity  Current: Producer supplied authoritative manifest Peers locally monitor integrity Manually trace back to point of ingest
    • 10. Chronopolis Integrity  In-progress Single integrity token back to ingest  Ideal Tokens issued prior to arrival ‘Prove’ the state of data to point before Chronopolis
    • 11. Manifests 2.0  Token manifests  Portable, embeddable Python, etc
    • 12. Integrity supporting Provenance  Digests in a cloud validate transfer only  Http headers can pass extended integrity information End-user verification
    • 13. Closing  Why are you hashing?  What do you want to prove?  Hashing Cost/performance
    • 14. Contact Mike Smorul msmorul@sesync.org http://adapt.umiacs.umd.edu/ace

    ×