July 27, 2011 Bay Area Search Presentation
Brian Johnson, Engineering Director, Query Services @ eBay
Query expansion is an important part of of the search recall for all search engines. In this talk I'll discuss some of the general trend driving Hadoop adoption within the Search Query Services team at eBay, and the types of algorithms/techniques we've moved to Hadoop at eBay. Over time we've moved from smaller, editorial data sets to large machine generated data sets mined from behavior log data, items/listings, catalogs, etc. One common workflow is to mine large candidate rewrites/expansions data sets from multiple data sources, use crowd sourced human judgment to classify a subset of the candidates (true positive, false positive), use machine learning techniques discard false positives, run automated validation on the final data set, and automatically push to production.
Ravi Jammalakadaka, Senior Applied Researcher, Query Services @ eBay
Ravi is a real engineer. Not a pointy haired manager like the previous speaker. Expect some real engineering:-) He'll be doing a literature review for acronym mining and discussing a real world implementation.
Title: Mining Acronyms From Raw Text
Abstract: Significant number of eBay products are known by their acronyms. eBay query expansion service expands user queries by their acronym equivalents to increase recall. The challenge is to mine acronyms from either seller ( ex. item descriptions, titles) or buyer ( ex. queries) data.
Ravi will present the state of the art algorithms from recent conferences that mine acronyms from raw text and present their limitations. He will present a new acronym mining algorithm that seeks to address the limitations identified with previous algorithms. He will present a machine learning classifier that seeks to remove the false positives generated from the acronym mining algorithm.