See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011
Solr features a little known internal document processing pipeline called the UpdateRequestProcesssorChain or simply the UpdateChain.
In this talk we'll discuss the importance of document processing, when the UpdateChain works well and what limitations it's got. We'll then go on to propose a range of possible improvements.
Examples of use with demo
How to write your own UpdateProcessor, best practices
Example: Tika as an UpdateProcessor
A vision for future improvements