Using timed-release cryptography to mitigate the preservation risk of embargo periods
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Using timed-release cryptography to mitigate the preservation risk of embargo periods

on

  • 1,623 views

Slides for: ...

Slides for:

Rabia Haq, Michael L. Nelson: Using timed-release cryptography to mitigate the preservation risk of embargo periods. 2009 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 183-192.

Statistics

Views

Total Views
1,623
Views on SlideShare
1,622
Embed Views
1

Actions

Likes
0
Downloads
1
Comments
0

1 Embed 1

http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Using timed-release cryptography to mitigate the preservation risk of embargo periods Presentation Transcript

  • 1. Using Timed-Release Cryptography to Mitigate The Preservation Risk of Embargo Periods Rabia Haq, Michael L. Nelson Old Dominion University Norfolk VA www.cs.odu.edu/~{rhaq,mln} 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 1
  • 2. Overview • Embargo Periods – associated preservation risk interval • Time-Locked Puzzle / Time Release Cryptography • System Evaluation using mod_oai (resource harvesting using OAI-PMH) – Optimization Using Chunked Encryption • Future Considerations • Conclusion 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 2
  • 3. Journal Access Models – Romeo Colors* • Red: Traditional subscription-based Access – Purchase-own model • Yellow: Embargoed Access – Hybrid of traditional and open access • Green: Self-authored Open Access – e.g., arVix.org, institutional repositories • Gold: Free and Open Access Journals – e.g., PLoS Journals, www.doaj.org * “Old” Romeo Colors, now green/blue/yellow/white; see: http://www.sherpa.ac.uk/romeoinfo.html 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 3
  • 4. Embargoed Access • Paid access (red) for some time interval, then the content becomes open (gold) – current issue(s) cost $ – previous issues are free • We’ll assume: gold >= green > yellow > red • Note: inverse of typical online newspaper model of: current is free, archived content costs $. 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 4
  • 5. Who Uses Embargoes? • 24% of PubMed Central (PMC) titles embargoed • The New England Journal of Medicine – embargoed for 6 months • EBMO Journal – embargoed for 12 months 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 5
  • 6. Preservation Risk Interval: (A Hypothetical, Non-Topical Example) • Journal of UT Football Non-Conference Scheduling is embargoed for 6 months – sample article “Why scheduling Florida Atlantic, UTEP, Rice & Arkansas is not a national championship schedule” – previous volumes (e.g., 2008, 2007) are freely available – issues 1--6 of current volume are currently for subscribers only – 6 month “sliding window”: when issue 7 comes out on July1, issue 1 becomes freely available • Now imagine Mack Brown issues a cease & desist order to JUTFNS on June 30 – what happens to volume 2009, issues 1-6? Will they ever be available to non-subscribers? 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 6
  • 7. Current Solutions • LOCKSS (CLOCKSS): www.lockss.org – local, cooperating caches between subscribers (i.e., libraries) – http://www.clockss.org/clockss/Triggered_Content • Portico: www.portico.org – trusted third party archive (i.e., neither library nor publisher) – http://www.portico.org/news/trigger.html 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 7
  • 8. Can We Use Lazy Preservation? • We’ve already shown by using IA, search engine caches, etc. we can reconstruct public web sites after they’ve been lost (McCown, 2007) • For embargoed content, we could expose encrypted content that is embargoed – but how can we prevent bad guys™ from using zombie farms to break the encryption before the embargo period is up? 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 8
  • 9. Timed-Release Cryptology • Time-Lock Puzzle (TLP) Creation – Data decryption non-parallelizable – Serial computation required to break puzzle – Data locked for predetermined time-period • not self-unlocking -- still requires computation to unlock • Used in MIT/LCS35 Time Lock Puzzle – http://people.csail.mit.edu/rivest/lcs35-puzzle-description.txt – idea: you could have started in 1999 (with your 1999 computer) and worked for 35 years… OR you can wait until 2033, buy a new computer and work for 1 year rewards procrastination! 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 9
  • 10. “Regular” RSA Picking values: “Brute Force” Attacks: • n=p*q • need to factor n (which is easier • φ(n)=(p-1)(q-1), then than trying all values of d) “throw away” p & q • simple soln: try all primes from 1 .. √n • pick e coprime to φ(n) • pick d s.t. d*e≡1 mod(φ(n)) helping the attacker: - adding k computers reduces the time public key = (n,e) to break by 1/k private key = (n,d) - they might get lucky and get it on their first shot! Encryption c = me (mod n) more info: http://www.cl.cam.ac.uk/users/rnc1/brute.html Decryption http://axion.physics.ubc.ca/pgp-attack.html m = cd (mod n) 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 10
  • 11. Time Lock Puzzle Picking values: Attacking: • n=p*q • repeated squarings of a is faster than • φ(n)=(p-1)(q-1), then factoring n -- also not known to be “throw away” p & q parallelizable! (Rivest, 1996) • t=TS • demo: n=253 (i.e., p=11,q=23), t=10, • pick some random key k, w= cm = RC5(k,m) 1 2(2 ) = 22 = 4 (mod 253) • pick random a, 1<a<n 2 2(2 ) = 42 = 16 (mod 253) 3 2(2 ) = 162 = 3 (mod 253) • w= a2t (mod n) 4 2(2 ) = 32 = 9 (mod 253) • ck = k ⊕ w 5 2(2 ) = 92 = 81 (mod 253) 6 puzzle = (n,a,t,ck,cm) 2(2 ) = 812 = 236 (mod 253) 7 2(2 ) = 2362 = 36 (mod 253) 8 actually, in our version we skip step 4 2(2 ) = 362 = 31 (mod 253) 9 and define step 7 as: z = m ⊕ w 2(2 ) = 312 = 202 (mod 253) puzzle = (n,t,z) 10 2(2 ) = 2022 = 71 (mod 253) 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 11
  • 12. Implementation: mod_oai, CRATE • Both based on (Smith, 2008) • mod_oai – an Apache module providing OAI-PMH functionality for an entire web site not just, for example, records in an institutional repository • CRATE – a model for encoding resource + associated metadata – implemented using MPEG-21 DIDL complex object format 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 12
  • 13. mod_oai mechanics Integrate OAI-PMH functionality into the web server itself… 1. Use mod_oai • an Apache 2.0 module • automatically answers OAI-PMH requests for an http server • written in C • respects values in .htaccess, httpd.conf 2. Install mod_oai on http://www.foo.edu/ 3. Define baseURL: http://www.foo.edu/modoai → Result: web harvesting with OAI-PMH semantics (e.g., from, until, sets) http://www.foo.edu/modoai?verb=ListRecords&metdataPrefix=oai_didl&from=2004-09-15&set=mime:video:mpeg From site foo, dating from 9/15/2004 through today Give me all resources Using OAI-PMH And their preservation metadata that are MIME type video-MPEG 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 13
  • 14. OAI-PMH Data Model 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 14
  • 15. MPEG-21 DIDL Resource Structure 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 15
  • 16. An Active Repository 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 16
  • 17. A Dying Repository Records e1, f2, g3 are recoverable; record h is lost. 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 17
  • 18. Dynamic Time-Locked Record Embargo within mod_oai • Identification – Calculation of remaining record embargo period • Encryption – Calculating record time-lock puzzle complexity – Time-Lock Puzzle creation • Encapsulation – exploiting flexibility of MPEG-21 DIDL format to encapsulate encrypted resources and related information 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 18
  • 19. Identification update OAI-PMH datestamp as time lock becomes weaker 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 19
  • 20. Encryption • Modification of LCS35 Time Capsule Crypto-Puzzle to use time lock on entire resource (not just the key) – as per code provided at: http://people.csail.mit.edu/rivest/lcs35- puzzle-description.txt • Input: timeUnit (controls puzzle complexity) • Compute: u = 2t mod((p-1)(q-1)) w = (2u) mod(n) z = resource ⊕ w • Output: n, t, z 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 20
  • 21. Encapsulation This version of the record is 7 of 12 separate encryptions, each of which is successively easier to break. It will take approximately 3650 hours of computation to break this time-lock. The next update will be available on 2008-01-16T20:56:15Z. Crypto-Puzzle for LCS35 Time Capsule. Puzzle parameters (all in decimal): n = 398399 t = 264600000. z = 313239174518025552773909388461801735302388... 893375562056859914777144518879488573607906... 742437030171894184996228671834511813009803... (many lines deleted for space) To solve the puzzle, first compute w = 2*2*t (mod n). Then exclusive-or the result with z. (Right-justify the two strings first). The result is the secret message (8 bits per character). 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 21
  • 22. Selecting Appropriate Values of t • Time required to break puzzle dependent on processor speed • Given our projected short embargo period (6-24 months), we made a simplifying assumption that Moore’s law increases linearly (not exponentially) – idea: in the next few months, you’re more likely to see something like: 2Ghz→2.2Ghz, not 2Ghz→4Ghz • recall: t = number of squarings – t=T*S – S=3000 squarings/second, T=1800 seconds *tU, – tU = f(machine speed) * embargolength – t=3000*1800*tU 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 22
  • 23. Effect of Computation Speed on embargolength • We broke time lock puzzles on four class of machines (in GHz): – 1.8 (5 nodes) – 1.6 (26 nodes) – 1 (1 node) – 0.75 (1 node) 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 23
  • 24. Picking t With Empirical Data using 1Ghz machine as baseline, projecting for a 2.5 Ghz machine, and locking for 2 years (63115200 seconds): • tU = 63115200 * 2.5 / (1727.61) = 9133 • t = 3000 * 1800 * 9133 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 24
  • 25. Experimental Evaluation • Embargolength = 365 days • Embargodecrement = 12 • Test website – 525 files – 17.3 MB data – 63% text files – Average file size = 33KB 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 25
  • 26. Harvesting Time: Locked & Unlocked 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 26
  • 27. O(n2) time to create time-lock puzzle 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 27
  • 28. Solution: Break Files Into “Chunks” • Size of file exponentially increases lock-time • Idea: break file into series of small chunks – still O(n2), but with a much more favorable constant • Lock-time on a 1.8 GHz machine time_to_lock(200 KB) = 13 sec time_to_lock(100 KB) = 3 sec 200 KB = 100 KB + 100 KB = 3 sec + 3 sec = 6 sec 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 28
  • 29. 10 KB Chunked Encryption in mod_oai 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 29
  • 30. MPEG-21 DIDL document With Chunks This record has been split into 10000-byte chunks for faster processing. This is part 1 of 7 chunks, with unlocked chunks to be reassembled in the specified order. 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 30
  • 31. Future Considerations • Chunk size performance dependency • Other optimization methods: – Parallel time-locking of resources – Data pre-locking – only time-lock encryption key, use other encryption methods on the original resource (as per original Rivest (1996), not as per http://people.csail.mit.edu/rivest/lcs35-puzzle- description.txt) 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 31
  • 32. Conclusions • Suggest the use of time lock puzzles for dissemination of embargoed records – complement to other methods such as LOCKSS, Poritco, etc. • Implemented and evaluated time lock puzzles in the mod_oai & CRATE environment • Full paper: – http://doi.acm.org/10.1145/1555400.1555430 – http://www.cs.odu.edu/~mln/pubs/jcdl09/jcdl09-time-lock.pdf 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 32