Mime Magic With Apache Tika
by Jukka Zitting on Nov 06, 2009
- 7,234 views
Apache Tika aims to make it easier to extract metadata and structured text content from all kinds of files. Tika is a subproject of Apache Lucene, and leverages libraries like Apache POI and Apache ...
Apache Tika aims to make it easier to extract metadata and structured text content from all kinds of files. Tika is a subproject of Apache Lucene, and leverages libraries like Apache POI and Apache PDFBox to provide a powerful yet simple interface for parsing dozens of document formats. This makes Tika an ideal companion for Apache Lucene, or for any search engine that needs to be able to index metadata and content from many different types of files. This presentation introduces Apache Tika and shows how it's being used in projects like Apache Solr and Apache Jackrabbit. You will learn how to integrate Tika with your application and how to configure and extend Tika to best suit your needs.
Accessibility
Categories
Upload Details
Uploaded via SlideShare as Microsoft PowerPoint
Usage Rights
© All Rights Reserved
Statistics
- Likes
- 0
- Downloads
- 68
- Comments
- 0
- Embed Views
- Views on SlideShare
- 7,146
- Total Views
- 7,234