See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011
Description of Archive-It, the Internet Archive's subscription, self-serve web archiving service, focused on the full-text search system. With nearly 200 partners and over 2000 collections the custom Lucene-based system handles 3+ million index updates per day across an index that totals over 1.3 billion documents. This session will give a detailed description of the architecture and implementation of the Archive-It search system; highlighting many of the challenges due to the scale as well as complex use cases.
Clipping is a handy way to collect important slides you want to go back to later.