Information Retrieval With Deep Autoencoders
Adam Gibson
Blix.io
Zipfian Academy
Big Data Science 5-17-14
Overview
● Overview of IR and Question Answering Systems
● Overview of Deep Learning
● Tying the 2 together
Information Retrieval (IR) Overview
Components of an IR QA System

Data Ingestion and Indexing

Types of Answers

Relevance Measure

Search Queries

Types of Information Retrieval

Answer Disambiguation
Data Ingestion and Indexing

Solr/Elastic Search/Lucene

Focus on text: typically a custom process
for inputting data

Indexing can be done on a per token
(per word) basis

Typical preprocessing:

Named-entity recognition

Relation extraction

Augment the index with certain kinds
of information
Relevance Measure
Cosine similarity measures how “close” a document is to a query.
The query is vectorized via bag-of-words and compared to each
document, also vectorized. This is done in light of TF-IDF.
Different Kinds of QA

Definition (Several relevant paragraphs
concatenated in to one)

Direct Answer (Watson)

Interactive (Siri, voice assistants)
Question Answering Systems

Are a step beyond search engines…

Have several data sources from which a question
can be answered

A classifier for the question type is used on the query

Several data sources can answer different kinds of
questions

This is used to compile a list of question-answering

candidates, or “documents most likely to succeed”.
Question Answering Systems (cont.)

Take the input text and classify it, to determine
the type of question

Use different answer sources (search engines,
triple stores/graph databases and computational
engines such as Wolfram Alpha)

Compile a list of answer candidates from each
source

Rank each answer candidate by relevance
Deep Learning with DBNs
Use Cases:
● Any NLP (collobert and Weston 2011)
● Sound with phonetics (asamir,gdahl,hinton)
● Computer Vision (Lee, Grosse, Ng)
● Watson (DeepQA)
● Image search via object recogition (Google)
● Recommendation Engines (Netflix)
Restricted Boltzmann Machines
● Units – Binary,Gaussian,Rectified Linear,Softmax,Multinomial
● Hidden/Visible Units – Visible learns data – Hidden is partition
function
● Contrastive Divergence is used for learning the weights
● Positive phase: Learn inputs (Visible) Negative: Balance out
wrt partition function (Hidden)
Real Valued Inputs:
Binary Inputs:
Results – Feature Learning
More Results – Feature Learning
General Architecture
● Stacked Restricted Boltzmann Machines – Compose to learn higher level
● correlations in the data
● Creates feature extractors
Use any sort of output layer with different objective functions to do different
tasks:
● Logisitic/Softmax Regression – Negative Log likelihood classification
● Mean Squared Error – Regression
● Cross Entropy - Reconstructions
Deep Learning and QA Systems

Part of the problem with answer-candidate searches
is speed: They’re slow.

Each question to be answered is computationally intensive.

Deep learning allows for fast lookup of various kinds
of answer candidates by encoding them.

Deep autoencoders allow for the encoding and decoding
of images as well as text.
Deep Autoencoders

Deep autoencoders are two deep-belief networks:

The first is a series of RBMs that encode the input into
a very tiny set of numbers, also called the codes.

The codes are what’s indexed and stored in search.

The second DBN is the decoder, which reconstructs
the results from the codes.
Architecture Visualization
The Encoding Layer: A How-To

Take the input, and make the parameters of the
first hidden layer slightly bigger than that input.
That allows for more information representation
on the first layer.

Progressively decrease the hidden-layer sizes at
each layer until you get to the final, coding, layer,
which is a small number (10-30 figures).

Make the final hidden-layer output be linear
(i.e. real numbers). Linear is pass-through.
Decoding Layer

Transpose the matrices of the encoder and reverse the
order of its layers.

Each parameter, after training the encoder, is used
to create the decoding net.

The decoder’s hidden layers are the exact opposite of
the encoder.

The output layer of the decoder is then trained to reconstruct
the input.
Connecting the Dots

Deep autoencoders can assist in creating answer

candidates for information-retrieval systems.

This works for image or text search.



This technique is called semantic hashing.
PCA Results
Some Results
Information Retrieval with Deep Learning

Information Retrieval with Deep Learning

  • 1.
    Information Retrieval WithDeep Autoencoders Adam Gibson Blix.io Zipfian Academy Big Data Science 5-17-14
  • 2.
    Overview ● Overview ofIR and Question Answering Systems ● Overview of Deep Learning ● Tying the 2 together
  • 3.
  • 4.
    Components of anIR QA System  Data Ingestion and Indexing  Types of Answers  Relevance Measure  Search Queries  Types of Information Retrieval  Answer Disambiguation
  • 5.
    Data Ingestion andIndexing  Solr/Elastic Search/Lucene  Focus on text: typically a custom process for inputting data  Indexing can be done on a per token (per word) basis  Typical preprocessing:  Named-entity recognition  Relation extraction  Augment the index with certain kinds of information
  • 6.
    Relevance Measure Cosine similaritymeasures how “close” a document is to a query. The query is vectorized via bag-of-words and compared to each document, also vectorized. This is done in light of TF-IDF.
  • 7.
    Different Kinds ofQA  Definition (Several relevant paragraphs concatenated in to one)  Direct Answer (Watson)  Interactive (Siri, voice assistants)
  • 8.
    Question Answering Systems  Area step beyond search engines…  Have several data sources from which a question can be answered  A classifier for the question type is used on the query  Several data sources can answer different kinds of questions  This is used to compile a list of question-answering  candidates, or “documents most likely to succeed”.
  • 9.
    Question Answering Systems(cont.)  Take the input text and classify it, to determine the type of question  Use different answer sources (search engines, triple stores/graph databases and computational engines such as Wolfram Alpha)  Compile a list of answer candidates from each source  Rank each answer candidate by relevance
  • 10.
    Deep Learning withDBNs Use Cases: ● Any NLP (collobert and Weston 2011) ● Sound with phonetics (asamir,gdahl,hinton) ● Computer Vision (Lee, Grosse, Ng) ● Watson (DeepQA) ● Image search via object recogition (Google) ● Recommendation Engines (Netflix)
  • 11.
    Restricted Boltzmann Machines ●Units – Binary,Gaussian,Rectified Linear,Softmax,Multinomial ● Hidden/Visible Units – Visible learns data – Hidden is partition function ● Contrastive Divergence is used for learning the weights ● Positive phase: Learn inputs (Visible) Negative: Balance out wrt partition function (Hidden) Real Valued Inputs: Binary Inputs:
  • 12.
  • 13.
    More Results –Feature Learning
  • 14.
    General Architecture ● StackedRestricted Boltzmann Machines – Compose to learn higher level ● correlations in the data ● Creates feature extractors Use any sort of output layer with different objective functions to do different tasks: ● Logisitic/Softmax Regression – Negative Log likelihood classification ● Mean Squared Error – Regression ● Cross Entropy - Reconstructions
  • 15.
    Deep Learning andQA Systems  Part of the problem with answer-candidate searches is speed: They’re slow.  Each question to be answered is computationally intensive.  Deep learning allows for fast lookup of various kinds of answer candidates by encoding them.  Deep autoencoders allow for the encoding and decoding of images as well as text.
  • 16.
    Deep Autoencoders  Deep autoencodersare two deep-belief networks:  The first is a series of RBMs that encode the input into a very tiny set of numbers, also called the codes.  The codes are what’s indexed and stored in search.  The second DBN is the decoder, which reconstructs the results from the codes.
  • 17.
  • 18.
    The Encoding Layer:A How-To  Take the input, and make the parameters of the first hidden layer slightly bigger than that input. That allows for more information representation on the first layer.  Progressively decrease the hidden-layer sizes at each layer until you get to the final, coding, layer, which is a small number (10-30 figures).  Make the final hidden-layer output be linear (i.e. real numbers). Linear is pass-through.
  • 19.
    Decoding Layer  Transpose thematrices of the encoder and reverse the order of its layers.  Each parameter, after training the encoder, is used to create the decoding net.  The decoder’s hidden layers are the exact opposite of the encoder.  The output layer of the decoder is then trained to reconstruct the input.
  • 20.
    Connecting the Dots  Deepautoencoders can assist in creating answer  candidates for information-retrieval systems.  This works for image or text search.    This technique is called semantic hashing.
  • 21.
  • 22.