24. Sentence
similarity
Documents are converted to vectors using tf-idf vectorizer
Similarity model is created using this tf-idf vector and this is used
to calculate similarity score for a new document.
25. Topic
Modelling
LDA – Latest Dirichelet Allocation
-Unsupervised Learning, we have to choose number of topics
ahead of time
-Each document is represented as a distribution over topic
-Each topic is represented as a distribution over words.
Algorithm convert each document into a vector of dimension equals
to number of topic and the each value of vector would give how
much the document goes with that topic.