Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using timed-release cryptography to mitigate the preservation risk of embargo periods

1,363 views

Published on

Slides for:

Rabia Haq, Michael L. Nelson: Using timed-release cryptography to mitigate the preservation risk of embargo periods. 2009 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 183-192.

Published in: Technology, Business
  • Be the first to comment

Using timed-release cryptography to mitigate the preservation risk of embargo periods

  1. 1. Using Timed-Release Cryptography to Mitigate The Preservation Risk of Embargo Periods Rabia Haq, Michael L. Nelson Old Dominion University Norfolk VA www.cs.odu.edu/~{rhaq,mln} 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 1
  2. 2. Overview • Embargo Periods – associated preservation risk interval • Time-Locked Puzzle / Time Release Cryptography • System Evaluation using mod_oai (resource harvesting using OAI-PMH) – Optimization Using Chunked Encryption • Future Considerations • Conclusion 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 2
  3. 3. Journal Access Models – Romeo Colors* • Red: Traditional subscription-based Access – Purchase-own model • Yellow: Embargoed Access – Hybrid of traditional and open access • Green: Self-authored Open Access – e.g., arVix.org, institutional repositories • Gold: Free and Open Access Journals – e.g., PLoS Journals, www.doaj.org * “Old” Romeo Colors, now green/blue/yellow/white; see: http://www.sherpa.ac.uk/romeoinfo.html 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 3
  4. 4. Embargoed Access • Paid access (red) for some time interval, then the content becomes open (gold) – current issue(s) cost $ – previous issues are free • We’ll assume: gold >= green > yellow > red • Note: inverse of typical online newspaper model of: current is free, archived content costs $. 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 4
  5. 5. Who Uses Embargoes? • 24% of PubMed Central (PMC) titles embargoed • The New England Journal of Medicine – embargoed for 6 months • EBMO Journal – embargoed for 12 months 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 5
  6. 6. Preservation Risk Interval: (A Hypothetical, Non-Topical Example) • Journal of UT Football Non-Conference Scheduling is embargoed for 6 months – sample article “Why scheduling Florida Atlantic, UTEP, Rice & Arkansas is not a national championship schedule” – previous volumes (e.g., 2008, 2007) are freely available – issues 1--6 of current volume are currently for subscribers only – 6 month “sliding window”: when issue 7 comes out on July1, issue 1 becomes freely available • Now imagine Mack Brown issues a cease & desist order to JUTFNS on June 30 – what happens to volume 2009, issues 1-6? Will they ever be available to non-subscribers? 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 6
  7. 7. Current Solutions • LOCKSS (CLOCKSS): www.lockss.org – local, cooperating caches between subscribers (i.e., libraries) – http://www.clockss.org/clockss/Triggered_Content • Portico: www.portico.org – trusted third party archive (i.e., neither library nor publisher) – http://www.portico.org/news/trigger.html 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 7
  8. 8. Can We Use Lazy Preservation? • We’ve already shown by using IA, search engine caches, etc. we can reconstruct public web sites after they’ve been lost (McCown, 2007) • For embargoed content, we could expose encrypted content that is embargoed – but how can we prevent bad guys™ from using zombie farms to break the encryption before the embargo period is up? 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 8
  9. 9. Timed-Release Cryptology • Time-Lock Puzzle (TLP) Creation – Data decryption non-parallelizable – Serial computation required to break puzzle – Data locked for predetermined time-period • not self-unlocking -- still requires computation to unlock • Used in MIT/LCS35 Time Lock Puzzle – http://people.csail.mit.edu/rivest/lcs35-puzzle-description.txt – idea: you could have started in 1999 (with your 1999 computer) and worked for 35 years… OR you can wait until 2033, buy a new computer and work for 1 year rewards procrastination! 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 9
  10. 10. “Regular” RSA Picking values: “Brute Force” Attacks: • n=p*q • need to factor n (which is easier • φ(n)=(p-1)(q-1), then than trying all values of d) “throw away” p & q • simple soln: try all primes from 1 .. √n • pick e coprime to φ(n) • pick d s.t. d*e≡1 mod(φ(n)) helping the attacker: - adding k computers reduces the time public key = (n,e) to break by 1/k private key = (n,d) - they might get lucky and get it on their first shot! Encryption c = me (mod n) more info: http://www.cl.cam.ac.uk/users/rnc1/brute.html Decryption http://axion.physics.ubc.ca/pgp-attack.html m = cd (mod n) 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 10
  11. 11. Time Lock Puzzle Picking values: Attacking: • n=p*q • repeated squarings of a is faster than • φ(n)=(p-1)(q-1), then factoring n -- also not known to be “throw away” p & q parallelizable! (Rivest, 1996) • t=TS • demo: n=253 (i.e., p=11,q=23), t=10, • pick some random key k, w= cm = RC5(k,m) 1 2(2 ) = 22 = 4 (mod 253) • pick random a, 1<a<n 2 2(2 ) = 42 = 16 (mod 253) 3 2(2 ) = 162 = 3 (mod 253) • w= a2t (mod n) 4 2(2 ) = 32 = 9 (mod 253) • ck = k ⊕ w 5 2(2 ) = 92 = 81 (mod 253) 6 puzzle = (n,a,t,ck,cm) 2(2 ) = 812 = 236 (mod 253) 7 2(2 ) = 2362 = 36 (mod 253) 8 actually, in our version we skip step 4 2(2 ) = 362 = 31 (mod 253) 9 and define step 7 as: z = m ⊕ w 2(2 ) = 312 = 202 (mod 253) puzzle = (n,t,z) 10 2(2 ) = 2022 = 71 (mod 253) 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 11
  12. 12. Implementation: mod_oai, CRATE • Both based on (Smith, 2008) • mod_oai – an Apache module providing OAI-PMH functionality for an entire web site not just, for example, records in an institutional repository • CRATE – a model for encoding resource + associated metadata – implemented using MPEG-21 DIDL complex object format 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 12
  13. 13. mod_oai mechanics Integrate OAI-PMH functionality into the web server itself… 1. Use mod_oai • an Apache 2.0 module • automatically answers OAI-PMH requests for an http server • written in C • respects values in .htaccess, httpd.conf 2. Install mod_oai on http://www.foo.edu/ 3. Define baseURL: http://www.foo.edu/modoai → Result: web harvesting with OAI-PMH semantics (e.g., from, until, sets) http://www.foo.edu/modoai?verb=ListRecords&metdataPrefix=oai_didl&from=2004-09-15&set=mime:video:mpeg From site foo, dating from 9/15/2004 through today Give me all resources Using OAI-PMH And their preservation metadata that are MIME type video-MPEG 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 13
  14. 14. OAI-PMH Data Model 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 14
  15. 15. MPEG-21 DIDL Resource Structure 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 15
  16. 16. An Active Repository 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 16
  17. 17. A Dying Repository Records e1, f2, g3 are recoverable; record h is lost. 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 17
  18. 18. Dynamic Time-Locked Record Embargo within mod_oai • Identification – Calculation of remaining record embargo period • Encryption – Calculating record time-lock puzzle complexity – Time-Lock Puzzle creation • Encapsulation – exploiting flexibility of MPEG-21 DIDL format to encapsulate encrypted resources and related information 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 18
  19. 19. Identification update OAI-PMH datestamp as time lock becomes weaker 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 19
  20. 20. Encryption • Modification of LCS35 Time Capsule Crypto-Puzzle to use time lock on entire resource (not just the key) – as per code provided at: http://people.csail.mit.edu/rivest/lcs35- puzzle-description.txt • Input: timeUnit (controls puzzle complexity) • Compute: u = 2t mod((p-1)(q-1)) w = (2u) mod(n) z = resource ⊕ w • Output: n, t, z 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 20
  21. 21. Encapsulation This version of the record is 7 of 12 separate encryptions, each of which is successively easier to break. It will take approximately 3650 hours of computation to break this time-lock. The next update will be available on 2008-01-16T20:56:15Z. Crypto-Puzzle for LCS35 Time Capsule. Puzzle parameters (all in decimal): n = 398399 t = 264600000. z = 313239174518025552773909388461801735302388... 893375562056859914777144518879488573607906... 742437030171894184996228671834511813009803... (many lines deleted for space) To solve the puzzle, first compute w = 2*2*t (mod n). Then exclusive-or the result with z. (Right-justify the two strings first). The result is the secret message (8 bits per character). 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 21
  22. 22. Selecting Appropriate Values of t • Time required to break puzzle dependent on processor speed • Given our projected short embargo period (6-24 months), we made a simplifying assumption that Moore’s law increases linearly (not exponentially) – idea: in the next few months, you’re more likely to see something like: 2Ghz→2.2Ghz, not 2Ghz→4Ghz • recall: t = number of squarings – t=T*S – S=3000 squarings/second, T=1800 seconds *tU, – tU = f(machine speed) * embargolength – t=3000*1800*tU 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 22
  23. 23. Effect of Computation Speed on embargolength • We broke time lock puzzles on four class of machines (in GHz): – 1.8 (5 nodes) – 1.6 (26 nodes) – 1 (1 node) – 0.75 (1 node) 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 23
  24. 24. Picking t With Empirical Data using 1Ghz machine as baseline, projecting for a 2.5 Ghz machine, and locking for 2 years (63115200 seconds): • tU = 63115200 * 2.5 / (1727.61) = 9133 • t = 3000 * 1800 * 9133 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 24
  25. 25. Experimental Evaluation • Embargolength = 365 days • Embargodecrement = 12 • Test website – 525 files – 17.3 MB data – 63% text files – Average file size = 33KB 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 25
  26. 26. Harvesting Time: Locked & Unlocked 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 26
  27. 27. O(n2) time to create time-lock puzzle 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 27
  28. 28. Solution: Break Files Into “Chunks” • Size of file exponentially increases lock-time • Idea: break file into series of small chunks – still O(n2), but with a much more favorable constant • Lock-time on a 1.8 GHz machine time_to_lock(200 KB) = 13 sec time_to_lock(100 KB) = 3 sec 200 KB = 100 KB + 100 KB = 3 sec + 3 sec = 6 sec 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 28
  29. 29. 10 KB Chunked Encryption in mod_oai 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 29
  30. 30. MPEG-21 DIDL document With Chunks This record has been split into 10000-byte chunks for faster processing. This is part 1 of 7 chunks, with unlocked chunks to be reassembled in the specified order. 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 30
  31. 31. Future Considerations • Chunk size performance dependency • Other optimization methods: – Parallel time-locking of resources – Data pre-locking – only time-lock encryption key, use other encryption methods on the original resource (as per original Rivest (1996), not as per http://people.csail.mit.edu/rivest/lcs35-puzzle- description.txt) 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 31
  32. 32. Conclusions • Suggest the use of time lock puzzles for dissemination of embargoed records – complement to other methods such as LOCKSS, Poritco, etc. • Implemented and evaluated time lock puzzles in the mod_oai & CRATE environment • Full paper: – http://doi.acm.org/10.1145/1555400.1555430 – http://www.cs.odu.edu/~mln/pubs/jcdl09/jcdl09-time-lock.pdf 2009 ACM/IEEE Joint Conference on Digital Libraries, Austin TX, June 15-19 32

×