SlideShare a Scribd company logo
RECOMMENDATIONS FOR POST POPULARITY
PREDICTION IN SOCIAL MEDIA
Iliana Pappi - MSc Information Studies: Data Science track, University of Amsterdam (UvA)
MSc Thesis Supervisor: Dr. Masoud Mazloom
MSc Thesis Duration: 1/4/2017 – 30/6/2017 (3 months)
1
PROBLEM STATEMENT
• Huge amount of content every minute in
the social media:
• Text
• Image
• Videos
• Audio
• Some posts receive thousands of likes,
positive comments etc. while others are
completely ignored
• What could make a post popular on the
web?
2
WHY DOES IT MATTER?
3
• Advertising – Marketing
• Political Campaigns
... and more
• Understanding user behavior
• Modifying popularity
• Make recommendations
• Video summarization
• etc.
A CHALLENGING PROBLEM TO SOLVE
• Machine learning Popularity prediction
• Data from social media Various features
• Feature extraction from text and image
• Multimodal framework: image in the post,
image caption or user’s comments, hashtags
• Which are useful features to predict post
popularity?
• …..
• It has been proven a challenging problem to
solve.
• How is popularity expressed in social
media?
4
RESEARCH QUESTIONS
• RQ1: How can we define which features affect post popularity in social media, in order to
make recommendations to the users?
• RQ2: What is the role of low and high-level visual features for popularity prediction? Eg.
content like action, scene, people, pets, brand.
• RQ3: How visual centrality in a user’s post can be combined with textual and numerical data
for post popularity?
• RQ4: Outline a multimodal model exploiting different features? Make recommendations for
popularity prediction?
5
OUR PROPOSED METHODOLOGY –
MULTIMODAL POST POPULARITY PREDICTION
FOR CONTENT CATEGORIES
6
POST POPULARITY PREDICTION
• K features: both visual and textual
• Each post: K feature vectors
• Construct sample – feature matrices for
each subset of the data and each feature
type among the K extracted
• Define y-vector as the log-normalized
number of likes in a post
• Prediction for y
• Average/Max Pooling between
different features
• Evaluation:
• Spearman’s Rank Correlation
Coefficient:
• Denotes the monotonic
relationship between the
prediction and the ground
truth (1: perfect correlation)
7
Regression Model
THE DATASET
• ~40k of Instagram posts
• 13k – Human actions
#kiss, #dance, #horseriding
• 15k – Places/ Sceneries
#forest, #urban, #kitchen
• 2k – People/Pets
#selfie, #pets
• 9k – Brand – Related posts
#Wendys
• The data crawled by #hashtag from
Instagram API
• Dataset Preprocessing:
• Python Pandas Data Analysis Library
• Python Natural Language Processing Library
(NLTK)
• Remove duplicate posts – bad image files –
posts without textual content
• Remove non-ASCII words, strip out
symbols, keep only English words
• Pool the textual content of each post
coming from the user
8
VISUAL FEATURE EXTRACTION
• High-level/Low level visual features:
• Keras/Tensorflow Deep Learning Python Library
• GoogleNet Inception V3 deep network – trained on ImageNet 1000-concepts
• High-level: 1x1000 feature vectors – probability of appearance of each
Imagenet concept in the image
• Low-level: 1x2048 feature vectors – Convolutional Pool Layer 8x8
(Max Pooling)
• Visual Sentiment visual features:
• SentiBank detectors on Visual Sentiment Ontology (VSO) - MATLAB
• Visual Sentiment: 1x1200 feature vectors: probability of appearance of each adjective-noun pair
(ANP) in the visual sentiment ontology, eg. ‘clean_pool’, ’happy_mother’ etc.
9
TEXTUAL FEATURE EXTRACTION
• Word-to-Vec (W2V) textual features:
• Python Gensim Library – W2V implementation trained on a part of Google News dataset (100 billion
words)
• Extraction of 1x300 feature vectors for each word  average pooling of all the words in the textual
content of each post
• Bag-of-Words (BoW) textual features:
• Count Vectorizer – Python Scikit-Learn – preconstructed vocabulary of all the sorted list of unique
words in the dataset
• Extraction of 1x19166 feature vectors for each post  sparse frequency representation for words in
post
• Textual sentiment features:
• TextBlob Python Library – Naïve Bayes Analyzer based on NLTK
• Extraction of 1x2 feature vectors – scores for positive and negative sentiment in the textual content of
each post
10
EXPERIMENT 1: CATEGORY-MIX ANALYSIS
• Support Vector Regression (SVR) – Radial Basis
Function (RBF) kernel – Scikit-Learn
• Tuning over C=[0.01,0.1,1,10,100,1000] in a 5-
fold cross validation, l1-normalization
• Compared with l2-normalization of linear SVR,
Random Forrest(RF) regression (100
estimators), Multi-Layer Perceptron (MLP)
Regression (default) – Scikit-Learn
• Results for every subset category:
• Action, Scene, People-Pets, Brand
• And every feature type:
• 3 visual features – 3 textual features
• Evaluation: Spearman’s Rank Correlation
Coefficient (SCRR)
• Storing linear SVR model weights – for high-
level visual features, visual sentiment
features and bag-of-words textual features
• Rank the weights to make semantic
recommendations about the top-10
concepts, ANP’s or words that affect most
popularity prediction for each subset
category
• Late fusion over visual features or textual
features for each model and category
• Late fusion over all features for each model
and category
11
EXPERIMENT 1- RESULTS
12
Tb1:Post Popularity prediction with visual features Tb2:Post Popularity prediction with textual features
EXPERIMENT 1- RESULTS
13
Top-10 ANP’s : (a)action, (b)scene, (c) people-pets, (d) brand Top-10 words - BoW : (a)action, (b)scene, (c) people-pets, (d) brand
EXPERIMENT 1- RESULTS
14
Top-10 ImageNet concepts per category
Tb3: Multimodal Fusion
RQ2: What is the role of low and high-level visual features? Eg.
content like action, scene, people, pets, brand.
• Visual features, especially low-level are more correlated
with action
RQ3: What is the role of textual features when combined with
visual features ?
• Textual features, especially BoW are more correlated with
scene
• Textual features are necessary to increase the predictability
of the model along with visual features
EXPERIMENT 2 – CATEGORY-SPECIFIC
(WITH VISUAL FEATURES) – HEATMAP
15
• Subsets
categorized
by hashtag
• SVR- RBF
• Report on
SCRR
EXPERIMENT 2 – CATEGORY-SPECIFIC
(WITH TEXTUAL FEATURES) - HEATMAP
16
• Subsets
categorized
by hashtag
• SVR- RBF
• Report on
SCRR
EXPERIMENT 3: CONCEPT SPECIFIC
• Label 1000-concepts
of imageNet
• Run SVR-RBF for
each category mix
subset for all
different feature
categories of
concepts
• Report SCRR
• 50 concepts –
action
• 151 concepts –
scene
• 8 concepts –
people
• 406 concepts –
animals
• 525 concepts –
objects(general)
17
CONCLUSIONS
• A number of features were tested  have been proven adequate to lead to recommendations
(RQ1).
• Hashtag subsets indicated human joyful activities and fun places that could make a post popular.
• Visual features, especially low-level  action content.
• Textual features, especially bag-of-words  scene content.
• High-level concepts related to scene  highest correlation with popularity prediction in scenery
datasets.
• In general, more concepts taken under account are better.
• Visual - textual complementarity  multimodal framework (RQ4)  Imagenet concepts, ANPs,
BoW  Bridge the semantic gap!  recommendations for users.S
• Future work- Reflection: select the best features for each category to fuse them in a multimodal
framework, try early fusion, explore the semantics in hashtag concept –specific analysis.
18
Thank you for your attention!
19
RECOMMENDATIONS FOR POST POPULARITY
PREDICTION IN SOCIAL MEDIA
Iliana Pappi - MSc Information Studies: Data Science track, University of Amsterdam (UvA)
MSc Thesis Supervisor: Dr. Masoud Mazloom
MSc Thesis Duration: 1/4/2017 – 30/6/2017 (3 months)
20

More Related Content

Similar to iliananpappi_mscthesis

I ii x_slides_albakour_online
I ii x_slides_albakour_onlineI ii x_slides_albakour_online
I ii x_slides_albakour_online
Dyaa AlBakour
 
Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learning
Ivo Andreev
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
Minha Hwang
 
WhyR? Analiza sentymentu
WhyR? Analiza sentymentuWhyR? Analiza sentymentu
WhyR? Analiza sentymentu
Łukasz Grala
 
TechnicalBackgroundOverview
TechnicalBackgroundOverviewTechnicalBackgroundOverview
TechnicalBackgroundOverview
Motaz El-Saban
 
Cognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from MicrosoftCognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from Microsoft
Łukasz Grala
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognition
vatsal199567
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
jeykottalam
 
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - PosterMediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
multimediaeval
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
 
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Spark Summit
 
Data Acquisition for Sentiment Analysis
Data Acquisition for Sentiment AnalysisData Acquisition for Sentiment Analysis
Data Acquisition for Sentiment Analysis
Ali BELCAID
 
Constrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiConstrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project Bonsai
Ivo Andreev
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用
台灣資料科學年會
 
LDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status update
LDBC council
 
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksA Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
Amazon Web Services
 
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksA Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
Amazon Web Services
 
NLP Dashboard for User Data Insights
NLP Dashboard for User Data InsightsNLP Dashboard for User Data Insights
NLP Dashboard for User Data Insights
lord
 
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Machine Learning Prague
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
Shifa Khan
 

Similar to iliananpappi_mscthesis (20)

I ii x_slides_albakour_online
I ii x_slides_albakour_onlineI ii x_slides_albakour_online
I ii x_slides_albakour_online
 
Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learning
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
WhyR? Analiza sentymentu
WhyR? Analiza sentymentuWhyR? Analiza sentymentu
WhyR? Analiza sentymentu
 
TechnicalBackgroundOverview
TechnicalBackgroundOverviewTechnicalBackgroundOverview
TechnicalBackgroundOverview
 
Cognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from MicrosoftCognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from Microsoft
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognition
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - PosterMediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
 
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
 
Data Acquisition for Sentiment Analysis
Data Acquisition for Sentiment AnalysisData Acquisition for Sentiment Analysis
Data Acquisition for Sentiment Analysis
 
Constrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiConstrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project Bonsai
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用
 
LDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status update
 
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksA Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
 
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksA Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
 
NLP Dashboard for User Data Insights
NLP Dashboard for User Data InsightsNLP Dashboard for User Data Insights
NLP Dashboard for User Data Insights
 
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
Chris Brew - TR Discover: A Natural Language Interface for Exploring Linked D...
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 

Recently uploaded

Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
LengamoLAppostilic
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 

Recently uploaded (20)

Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 

iliananpappi_mscthesis

  • 1. RECOMMENDATIONS FOR POST POPULARITY PREDICTION IN SOCIAL MEDIA Iliana Pappi - MSc Information Studies: Data Science track, University of Amsterdam (UvA) MSc Thesis Supervisor: Dr. Masoud Mazloom MSc Thesis Duration: 1/4/2017 – 30/6/2017 (3 months) 1
  • 2. PROBLEM STATEMENT • Huge amount of content every minute in the social media: • Text • Image • Videos • Audio • Some posts receive thousands of likes, positive comments etc. while others are completely ignored • What could make a post popular on the web? 2
  • 3. WHY DOES IT MATTER? 3 • Advertising – Marketing • Political Campaigns ... and more • Understanding user behavior • Modifying popularity • Make recommendations • Video summarization • etc.
  • 4. A CHALLENGING PROBLEM TO SOLVE • Machine learning Popularity prediction • Data from social media Various features • Feature extraction from text and image • Multimodal framework: image in the post, image caption or user’s comments, hashtags • Which are useful features to predict post popularity? • ….. • It has been proven a challenging problem to solve. • How is popularity expressed in social media? 4
  • 5. RESEARCH QUESTIONS • RQ1: How can we define which features affect post popularity in social media, in order to make recommendations to the users? • RQ2: What is the role of low and high-level visual features for popularity prediction? Eg. content like action, scene, people, pets, brand. • RQ3: How visual centrality in a user’s post can be combined with textual and numerical data for post popularity? • RQ4: Outline a multimodal model exploiting different features? Make recommendations for popularity prediction? 5
  • 6. OUR PROPOSED METHODOLOGY – MULTIMODAL POST POPULARITY PREDICTION FOR CONTENT CATEGORIES 6
  • 7. POST POPULARITY PREDICTION • K features: both visual and textual • Each post: K feature vectors • Construct sample – feature matrices for each subset of the data and each feature type among the K extracted • Define y-vector as the log-normalized number of likes in a post • Prediction for y • Average/Max Pooling between different features • Evaluation: • Spearman’s Rank Correlation Coefficient: • Denotes the monotonic relationship between the prediction and the ground truth (1: perfect correlation) 7 Regression Model
  • 8. THE DATASET • ~40k of Instagram posts • 13k – Human actions #kiss, #dance, #horseriding • 15k – Places/ Sceneries #forest, #urban, #kitchen • 2k – People/Pets #selfie, #pets • 9k – Brand – Related posts #Wendys • The data crawled by #hashtag from Instagram API • Dataset Preprocessing: • Python Pandas Data Analysis Library • Python Natural Language Processing Library (NLTK) • Remove duplicate posts – bad image files – posts without textual content • Remove non-ASCII words, strip out symbols, keep only English words • Pool the textual content of each post coming from the user 8
  • 9. VISUAL FEATURE EXTRACTION • High-level/Low level visual features: • Keras/Tensorflow Deep Learning Python Library • GoogleNet Inception V3 deep network – trained on ImageNet 1000-concepts • High-level: 1x1000 feature vectors – probability of appearance of each Imagenet concept in the image • Low-level: 1x2048 feature vectors – Convolutional Pool Layer 8x8 (Max Pooling) • Visual Sentiment visual features: • SentiBank detectors on Visual Sentiment Ontology (VSO) - MATLAB • Visual Sentiment: 1x1200 feature vectors: probability of appearance of each adjective-noun pair (ANP) in the visual sentiment ontology, eg. ‘clean_pool’, ’happy_mother’ etc. 9
  • 10. TEXTUAL FEATURE EXTRACTION • Word-to-Vec (W2V) textual features: • Python Gensim Library – W2V implementation trained on a part of Google News dataset (100 billion words) • Extraction of 1x300 feature vectors for each word  average pooling of all the words in the textual content of each post • Bag-of-Words (BoW) textual features: • Count Vectorizer – Python Scikit-Learn – preconstructed vocabulary of all the sorted list of unique words in the dataset • Extraction of 1x19166 feature vectors for each post  sparse frequency representation for words in post • Textual sentiment features: • TextBlob Python Library – Naïve Bayes Analyzer based on NLTK • Extraction of 1x2 feature vectors – scores for positive and negative sentiment in the textual content of each post 10
  • 11. EXPERIMENT 1: CATEGORY-MIX ANALYSIS • Support Vector Regression (SVR) – Radial Basis Function (RBF) kernel – Scikit-Learn • Tuning over C=[0.01,0.1,1,10,100,1000] in a 5- fold cross validation, l1-normalization • Compared with l2-normalization of linear SVR, Random Forrest(RF) regression (100 estimators), Multi-Layer Perceptron (MLP) Regression (default) – Scikit-Learn • Results for every subset category: • Action, Scene, People-Pets, Brand • And every feature type: • 3 visual features – 3 textual features • Evaluation: Spearman’s Rank Correlation Coefficient (SCRR) • Storing linear SVR model weights – for high- level visual features, visual sentiment features and bag-of-words textual features • Rank the weights to make semantic recommendations about the top-10 concepts, ANP’s or words that affect most popularity prediction for each subset category • Late fusion over visual features or textual features for each model and category • Late fusion over all features for each model and category 11
  • 12. EXPERIMENT 1- RESULTS 12 Tb1:Post Popularity prediction with visual features Tb2:Post Popularity prediction with textual features
  • 13. EXPERIMENT 1- RESULTS 13 Top-10 ANP’s : (a)action, (b)scene, (c) people-pets, (d) brand Top-10 words - BoW : (a)action, (b)scene, (c) people-pets, (d) brand
  • 14. EXPERIMENT 1- RESULTS 14 Top-10 ImageNet concepts per category Tb3: Multimodal Fusion RQ2: What is the role of low and high-level visual features? Eg. content like action, scene, people, pets, brand. • Visual features, especially low-level are more correlated with action RQ3: What is the role of textual features when combined with visual features ? • Textual features, especially BoW are more correlated with scene • Textual features are necessary to increase the predictability of the model along with visual features
  • 15. EXPERIMENT 2 – CATEGORY-SPECIFIC (WITH VISUAL FEATURES) – HEATMAP 15 • Subsets categorized by hashtag • SVR- RBF • Report on SCRR
  • 16. EXPERIMENT 2 – CATEGORY-SPECIFIC (WITH TEXTUAL FEATURES) - HEATMAP 16 • Subsets categorized by hashtag • SVR- RBF • Report on SCRR
  • 17. EXPERIMENT 3: CONCEPT SPECIFIC • Label 1000-concepts of imageNet • Run SVR-RBF for each category mix subset for all different feature categories of concepts • Report SCRR • 50 concepts – action • 151 concepts – scene • 8 concepts – people • 406 concepts – animals • 525 concepts – objects(general) 17
  • 18. CONCLUSIONS • A number of features were tested  have been proven adequate to lead to recommendations (RQ1). • Hashtag subsets indicated human joyful activities and fun places that could make a post popular. • Visual features, especially low-level  action content. • Textual features, especially bag-of-words  scene content. • High-level concepts related to scene  highest correlation with popularity prediction in scenery datasets. • In general, more concepts taken under account are better. • Visual - textual complementarity  multimodal framework (RQ4)  Imagenet concepts, ANPs, BoW  Bridge the semantic gap!  recommendations for users.S • Future work- Reflection: select the best features for each category to fuse them in a multimodal framework, try early fusion, explore the semantics in hashtag concept –specific analysis. 18
  • 19. Thank you for your attention! 19
  • 20. RECOMMENDATIONS FOR POST POPULARITY PREDICTION IN SOCIAL MEDIA Iliana Pappi - MSc Information Studies: Data Science track, University of Amsterdam (UvA) MSc Thesis Supervisor: Dr. Masoud Mazloom MSc Thesis Duration: 1/4/2017 – 30/6/2017 (3 months) 20