Information Retrieval with Deep Learning


Published on

Deep Learning talk at SGI on Information Retrieval with Deep Learning with deeplearning4j.

Published in: Technology, Education

Information Retrieval with Deep Learning

  1. 1. Information Retrieval With Deep Autoencoders Adam Gibson Zipfian Academy Big Data Science 5-17-14
  2. 2. Overview ● Overview of IR and Question Answering Systems ● Overview of Deep Learning ● Tying the 2 together
  3. 3. Information Retrieval (IR) Overview
  4. 4. Components of an IR QA System  Data Ingestion and Indexing  Types of Answers  Relevance Measure  Search Queries  Types of Information Retrieval  Answer Disambiguation
  5. 5. Data Ingestion and Indexing  Solr/Elastic Search/Lucene  Focus on text: typically a custom process for inputting data  Indexing can be done on a per token (per word) basis  Typical preprocessing:  Named-entity recognition  Relation extraction  Augment the index with certain kinds of information
  6. 6. Relevance Measure Cosine similarity measures how “close” a document is to a query. The query is vectorized via bag-of-words and compared to each document, also vectorized. This is done in light of TF-IDF.
  7. 7. Different Kinds of QA  Definition (Several relevant paragraphs concatenated in to one)  Direct Answer (Watson)  Interactive (Siri, voice assistants)
  8. 8. Question Answering Systems  Are a step beyond search engines…  Have several data sources from which a question can be answered  A classifier for the question type is used on the query  Several data sources can answer different kinds of questions  This is used to compile a list of question-answering  candidates, or “documents most likely to succeed”.
  9. 9. Question Answering Systems (cont.)  Take the input text and classify it, to determine the type of question  Use different answer sources (search engines, triple stores/graph databases and computational engines such as Wolfram Alpha)  Compile a list of answer candidates from each source  Rank each answer candidate by relevance
  10. 10. Deep Learning with DBNs Use Cases: ● Any NLP (collobert and Weston 2011) ● Sound with phonetics (asamir,gdahl,hinton) ● Computer Vision (Lee, Grosse, Ng) ● Watson (DeepQA) ● Image search via object recogition (Google) ● Recommendation Engines (Netflix)
  11. 11. Restricted Boltzmann Machines ● Units – Binary,Gaussian,Rectified Linear,Softmax,Multinomial ● Hidden/Visible Units – Visible learns data – Hidden is partition function ● Contrastive Divergence is used for learning the weights ● Positive phase: Learn inputs (Visible) Negative: Balance out wrt partition function (Hidden) Real Valued Inputs: Binary Inputs:
  12. 12. Results – Feature Learning
  13. 13. More Results – Feature Learning
  14. 14. General Architecture ● Stacked Restricted Boltzmann Machines – Compose to learn higher level ● correlations in the data ● Creates feature extractors Use any sort of output layer with different objective functions to do different tasks: ● Logisitic/Softmax Regression – Negative Log likelihood classification ● Mean Squared Error – Regression ● Cross Entropy - Reconstructions
  15. 15. Deep Learning and QA Systems  Part of the problem with answer-candidate searches is speed: They’re slow.  Each question to be answered is computationally intensive.  Deep learning allows for fast lookup of various kinds of answer candidates by encoding them.  Deep autoencoders allow for the encoding and decoding of images as well as text.
  16. 16. Deep Autoencoders  Deep autoencoders are two deep-belief networks:  The first is a series of RBMs that encode the input into a very tiny set of numbers, also called the codes.  The codes are what’s indexed and stored in search.  The second DBN is the decoder, which reconstructs the results from the codes.
  17. 17. Architecture Visualization
  18. 18. The Encoding Layer: A How-To  Take the input, and make the parameters of the first hidden layer slightly bigger than that input. That allows for more information representation on the first layer.  Progressively decrease the hidden-layer sizes at each layer until you get to the final, coding, layer, which is a small number (10-30 figures).  Make the final hidden-layer output be linear (i.e. real numbers). Linear is pass-through.
  19. 19. Decoding Layer  Transpose the matrices of the encoder and reverse the order of its layers.  Each parameter, after training the encoder, is used to create the decoding net.  The decoder’s hidden layers are the exact opposite of the encoder.  The output layer of the decoder is then trained to reconstruct the input.
  20. 20. Connecting the Dots  Deep autoencoders can assist in creating answer  candidates for information-retrieval systems.  This works for image or text search.    This technique is called semantic hashing.
  21. 21. PCA Results
  22. 22. Some Results