Presented by Julien Nioche, Director, DigitalPebble
This session will give an overview of Apache Nutch. I will describe its main components and how it fits with other Apache projects such as Hadoop, SOLR, Tika or HBase. The second part of the presentation will be focused on the latest developments in Nutch, the differences between the 1.x and 2.x branch and what we can expect to see in Nutch in the future. This session will cover many practical aspects and should be a good starting point to crawling on a large scale with Apache Nutch and SOLR.
Clipping is a handy way to collect important slides you want to go back to later.