An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)Koji Sekiguchi
The document discusses NLP4L, a natural language processing tool for Apache Lucene. It aims to improve search experiences by using Lucene's index as a corpus database and providing preprocessing, algorithms, and applications like transliteration. The presentation covers what NLP4L is, how NLP can improve search, evaluating search results, and structuring unstructured documents.
An Introduction to NLP4L (Scala by the Bay / Big Data Scala 2015)Koji Sekiguchi
The document discusses NLP4L, a natural language processing tool for Apache Lucene. It aims to improve search experiences by using Lucene's index as a corpus database and providing preprocessing, algorithms, and applications like transliteration. The presentation covers what NLP4L is, how NLP can improve search, evaluating search results, and structuring unstructured documents.
The document introduces NLP4L, a natural language processing tool for Apache Lucene. NLP4L aims to improve search experiences by utilizing NLP/ML technologies to generate models, dictionaries and indexes from text corpora. It provides a framework that allows plug-ins for various NLP tasks like transliteration, named entity extraction and document classification. The framework also includes interfaces for users to examine and modify generated outputs.
The document discusses updates and new features in Lucene/Solr 3.2-3.4 including adding documents with IndexWriter, using TieredMergePolicy for indexing, and the deprecated update.processor chain. It also covers search features like grouping, term queries, explanations from Carrot2, and caching as well as schema changes like new analyzers and omitting positions. The document lists admin tools for merging indexes, unloading with deletion, core creation with properties, and upgrading indexes. It notes some removed contrib modules and provides a link for job opportunities.
This document summarizes new features and changes in Lucene/Solr 3.1, including updated analyzers and tokenizers, spatial search enhancements, a new fast vector highlighter, other improvements like an N-gram field type and range facets, and mentions of related open source projects like lucene-gosen and rondhuit-uima.
The document introduces NLP4L, a natural language processing tool for Apache Lucene. NLP4L aims to improve search experiences by utilizing NLP/ML technologies to generate models, dictionaries and indexes from text corpora. It provides a framework that allows plug-ins for various NLP tasks like transliteration, named entity extraction and document classification. The framework also includes interfaces for users to examine and modify generated outputs.
The document discusses updates and new features in Lucene/Solr 3.2-3.4 including adding documents with IndexWriter, using TieredMergePolicy for indexing, and the deprecated update.processor chain. It also covers search features like grouping, term queries, explanations from Carrot2, and caching as well as schema changes like new analyzers and omitting positions. The document lists admin tools for merging indexes, unloading with deletion, core creation with properties, and upgrading indexes. It notes some removed contrib modules and provides a link for job opportunities.
This document summarizes new features and changes in Lucene/Solr 3.1, including updated analyzers and tokenizers, spatial search enhancements, a new fast vector highlighter, other improvements like an N-gram field type and range facets, and mentions of related open source projects like lucene-gosen and rondhuit-uima.