No more bad news!
News recommendation with ML and NLP.
Samia Khalid and Simon Lia-Jonassen
NTNU Cogito
March 7th, 2019
Contents 00 Introduction
01 Recommender architecture
02 Natural language processing
03 Recommendation model training
04 Demo and further work
Understand Content of news I read.
Learn my Interests over time.
Recommend news that interest me.
Introduction to News Recommendation
Implements three parts:
• Frontend and backend controllers.
• Feed provider and logging.
• NLP, ML and exploration workflow.
News recommender in a nutshell
https://github.com/s-j/goodnews
Content and feedback signals
Natural
Language
Processing
Natural Language Processing
1. Text Processing
2. Clustering
3. Topic Extraction
Natural Language Processing and Exploration
1. Text Processing using spaCy
Leading open-source library for advanced NLP
a. Tokenization
b. Part Of Speech Tagging
c. Lemmatization
d. Stop words
1. Text Processing using spaCy
1. Recognizes a sentence and
assigns a syntactic structure to
it
• “Who is the AI research director?”
2. spaCy provides a built-in
visualizer
1. Text Processing using SpaCy
Dependency Parsing
1. Locate and classify named entities in text into pre-defined categories
2. Can help to answer questions like:
• “Which people, companies and products is the user interested in?”
1. Text Processing using SpaCy
Entity Recognition
1. Part-of-speech Tagging:
• assigns parts of speech to each token
such as noun, verb, adjective, etc.
2. spaCy uses a statistical model to
make a prediction of which tag or
label most likely applies in the
given context
1. Text Processing using SpaCy
Distribution of POS Tags
1. Text Processing using SpaCy
Word Probabilities: finding the most improbable words (noisy data)
1. Text Processing using SpaCy
Analyzing top unigrams in clicked articles vs all articles (considering only PROPN and NOUNS)
1. Word Vectors as input:
• 300 dimensional vectors to represent words in
numerical form
2. K-Means needs the number of cluster as
parameter:
• Try out different values until satisfied
• Can use silhouette score and distortion as metric
3. PCA for visualizing the results in 2-D
2. K-Means Clustering
2. K-Means Clustering
Note
Clusters for the full set of articles
1. LDA considers two things:
• Each document in a corpus is a weighted combination of several topics, e.g.,
doc1-> 0.1 finance + 0.2 science + 0.5 * technology,…
• Each topic has its collection of representative keywords, e.g.,
technology -> [‘computer’, ‘microsoft’, ‘google', ...]
3. Topic Modeling: LDA
2. The two probability distributions that the algorithm tries to approximate,
starting from a random initialization until convergence:
• For a given document, what is the distribution of topics that describe it?
• For a given topic, what is the distribution of its words or what is the importance (probability) of
each word in defining the topic nature?
3. Topic Modeling: LDA
3. Topic Modeling: LDA
Interactive Topics
Visualization with
pyLDAvis
Recommendation
Model
Training
1. Join requests and feedback logs.
• Alternative: use a third-party dataset.
2. Use #clicks > 0 as a positive label.
• Alternative 1: use #clicks / #views
• Alternative 2: use click order
• Alternative 3: get explicit feedback
Model training
Preprocessing
1. Use title and description to get:
• A bag of named entities such as person or org
(using spaCy).
• A bag of key terms from the semantic network
(using Textacy).
• A normalized sum over key term embedding
vectors found in GoogleNews word2vec dataset.
2. Hold out 20% of items for testing.
Model training
NLP features and Train/Test split
1. One-hot-encode entities to get a sparse vector.
2. Compensate popularity skew using
Inverse Document Frequency (IDF).
3. Train a classifier using Gradient Boosting
Decision Trees (GBDT).
Note that we have a small and very skewed,
noisy dataset so we are not expecting good
classification performance.
Model training
Pipeline based on entities
1. Hash-merge features into 100 buckets.
2. Train a GBDT classifier.
Model training
Pipeline based on semantic key terms
Just use logistic regression right away.
• This gives us a more relaxed prediction with a
much higher number of true positives but
also false negatives.
Model training
Pipeline based on embedding vectors
Model training
Stacking and beyond
It is possible to combine features and models...
1. Get NLP features for a ranking candidate.
• Equivalent to the preprocessing step in training.
2. Get "click probability" from the loaded pipeline
and use this value for ranking.
Model application
Using a trained model
Demo time!
• More data and NLP/ML advancements
• Personalized recommendation
• Incremental and online learning
• Social signals and behaviors
Further work
© Copyright Microsoft Corporation. All rights reserved.
Thank you!

No more bad news!

  • 1.
    No more badnews! News recommendation with ML and NLP. Samia Khalid and Simon Lia-Jonassen NTNU Cogito March 7th, 2019
  • 2.
    Contents 00 Introduction 01Recommender architecture 02 Natural language processing 03 Recommendation model training 04 Demo and further work
  • 3.
    Understand Content ofnews I read. Learn my Interests over time. Recommend news that interest me. Introduction to News Recommendation
  • 4.
    Implements three parts: •Frontend and backend controllers. • Feed provider and logging. • NLP, ML and exploration workflow. News recommender in a nutshell https://github.com/s-j/goodnews
  • 5.
  • 6.
  • 7.
  • 8.
    1. Text Processing 2.Clustering 3. Topic Extraction Natural Language Processing and Exploration
  • 9.
    1. Text Processingusing spaCy Leading open-source library for advanced NLP
  • 10.
    a. Tokenization b. PartOf Speech Tagging c. Lemmatization d. Stop words 1. Text Processing using spaCy
  • 11.
    1. Recognizes asentence and assigns a syntactic structure to it • “Who is the AI research director?” 2. spaCy provides a built-in visualizer 1. Text Processing using SpaCy Dependency Parsing
  • 12.
    1. Locate andclassify named entities in text into pre-defined categories 2. Can help to answer questions like: • “Which people, companies and products is the user interested in?” 1. Text Processing using SpaCy Entity Recognition
  • 13.
    1. Part-of-speech Tagging: •assigns parts of speech to each token such as noun, verb, adjective, etc. 2. spaCy uses a statistical model to make a prediction of which tag or label most likely applies in the given context 1. Text Processing using SpaCy Distribution of POS Tags
  • 14.
    1. Text Processingusing SpaCy Word Probabilities: finding the most improbable words (noisy data)
  • 15.
    1. Text Processingusing SpaCy Analyzing top unigrams in clicked articles vs all articles (considering only PROPN and NOUNS)
  • 16.
    1. Word Vectorsas input: • 300 dimensional vectors to represent words in numerical form 2. K-Means needs the number of cluster as parameter: • Try out different values until satisfied • Can use silhouette score and distortion as metric 3. PCA for visualizing the results in 2-D 2. K-Means Clustering
  • 17.
    2. K-Means Clustering Note Clustersfor the full set of articles
  • 18.
    1. LDA considerstwo things: • Each document in a corpus is a weighted combination of several topics, e.g., doc1-> 0.1 finance + 0.2 science + 0.5 * technology,… • Each topic has its collection of representative keywords, e.g., technology -> [‘computer’, ‘microsoft’, ‘google', ...] 3. Topic Modeling: LDA
  • 19.
    2. The twoprobability distributions that the algorithm tries to approximate, starting from a random initialization until convergence: • For a given document, what is the distribution of topics that describe it? • For a given topic, what is the distribution of its words or what is the importance (probability) of each word in defining the topic nature? 3. Topic Modeling: LDA
  • 20.
    3. Topic Modeling:LDA Interactive Topics Visualization with pyLDAvis
  • 21.
  • 22.
    1. Join requestsand feedback logs. • Alternative: use a third-party dataset. 2. Use #clicks > 0 as a positive label. • Alternative 1: use #clicks / #views • Alternative 2: use click order • Alternative 3: get explicit feedback Model training Preprocessing
  • 23.
    1. Use titleand description to get: • A bag of named entities such as person or org (using spaCy). • A bag of key terms from the semantic network (using Textacy). • A normalized sum over key term embedding vectors found in GoogleNews word2vec dataset. 2. Hold out 20% of items for testing. Model training NLP features and Train/Test split
  • 24.
    1. One-hot-encode entitiesto get a sparse vector. 2. Compensate popularity skew using Inverse Document Frequency (IDF). 3. Train a classifier using Gradient Boosting Decision Trees (GBDT). Note that we have a small and very skewed, noisy dataset so we are not expecting good classification performance. Model training Pipeline based on entities
  • 25.
    1. Hash-merge featuresinto 100 buckets. 2. Train a GBDT classifier. Model training Pipeline based on semantic key terms
  • 26.
    Just use logisticregression right away. • This gives us a more relaxed prediction with a much higher number of true positives but also false negatives. Model training Pipeline based on embedding vectors
  • 27.
    Model training Stacking andbeyond It is possible to combine features and models...
  • 28.
    1. Get NLPfeatures for a ranking candidate. • Equivalent to the preprocessing step in training. 2. Get "click probability" from the loaded pipeline and use this value for ranking. Model application Using a trained model
  • 29.
  • 30.
    • More dataand NLP/ML advancements • Personalized recommendation • Incremental and online learning • Social signals and behaviors Further work
  • 31.
    © Copyright MicrosoftCorporation. All rights reserved. Thank you!

Editor's Notes

  • #7 10 min
  • #10 Can skip this slide
  • #12 Analyzes the grammatical structure of a sentence, establishing relationships between "head" words and words which modify those heads. Dependency Parsers can read various forms of plain text input and can output various analysis formats, including part-of-speech tagged text, phrase structure trees, and a grammatical relations (typed dependency) format. Dependency Parsing can be used to solve various complex NLP problems like Named Entity Recognition, Relation Extraction, translation.
  • #13 Locates and and classifies named entities in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.
  • #15 To discard noisy data
  • #16 Say something about «chars> outlier – shows we have data to clean
  • #19 To describe and summarize the documents in a corpus
  • #20 To describe and summarize the documents in a corpus
  • #21 30 min
  • #22 30 min
  • #30 45 min
  • #32 1 hour