Presented by Kai Chan | UCLA - See complete conference videos - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012
UCLA Communication Studies Archive hosts a collection of over 100,000 hours of digital television news, updated daily. Its search engine provides closed captioning search and online streaming of videos. The search engine allows researchers and students in various fields to study television news, images and language usage, in ways that were not possible before. In this presentation, we will show the setup of our Lucene/Solr-powered search engine, as well as how it is being used. We will discuss our work on custom result formats, such as linking search result text to the video at particular timestamps, counting occurrences of words, phrases or patterns, grouping the result by fields such as month or show, and creating interactive charts. We will also discuss our work on extending Lucene’s proximity searches, and creating custom query types, such as segment-enclosed (two or more words, phrases or patterns occurring within a story-based text segment), time-enclosed (two or more words, phrases or patterns occurring within a certain time), and multi-word regular expression queries. Future goals will also be discussed, such as supporting multiple languages, multiple sources (speech-to-text along side closed-captioning text), searching user-contributed and generated metadata (programs that identify story segments, objects in video, etc.), and syntactic tags (such as parts of speech).