SlideShare a Scribd company logo
Computing Social Score of
Web Artifacts
IRE Major Project
IIIT Hyderabad
Simran Kedia
Jaspreet Kaur
Soumyajit Ganguly
J N Venkatesh
Outline
● Introduction
● Motivation
● Challenges
● Problem Statement
● Approach
● Final Score
● Experiments
● Future Work
● Thank You!
Introduction
● Web artifacts are items like articles, blog posts, videos,
or any other URL.
● Various users post and share these artifacts on different
social media sites like Facebook, Twitter, etc.
● Each of the social media sites have their own metrics for
calculating the popularity score of artifacts posted.
● In this project, we propose an approach which computes
a single aggregate score of an artifact that reflects the
popularity across different social media sites and not
just limited to any particular site.
Motivation
● In todays world, opinion in Social Media matters!! Everyone of us
want to know how popular our web contents are!!
● The Popularity score calculator can be useful in the following cases:
o The Publicity Team of a company may want to know the
popularity score of its advertisement video across the social
media sites and see how people are reacting to it.
o Any content creator company which creates and shares its
content on different social media sites, will get to know the
popularity score and get feedback about the content.
Challenges
There are two main challenges to be tackled, in computing the
score:
● An artifact could have been shared by multiple people and on
multiple social sites. So, we have to aggregate all such shares
for calculating the score.
● There will be many posts and tweets where people discuss/talk
about an artifact, but they do not always quote the artifacts.
Ideally, these tweets/posts should also contribute to the
popularity of the artifact. But, matching these tweets/posts to
an artifact is quite a bit of challenge.
Problem Statement
Given datasets of Facebook and Twitter crawled data, we
would compute the popularity score of all the artifacts
present in the datasets.
We will compute this score by aggregating not only the
tweets/posts, which have quoted the artifacts but also the
tweets/posts which talk about the artifact, and assign a
combined score obtained for each of the artifacts.
Approach
Approach
● Gathering Data
o Used Twitter4j and Python Facebook sdk for crawling
twitter and facebook posts respectively.
● Parsing
o Used html2text tool for extracting the text from artifacts
● PreProcessing
o Performed Tokenization, Stop Word Removal, Stemming
● Algorithms - kNN, Indexing, kMeans
● Results
Approach - kNN
● Transformed the pre-processed data into vectors using tf-idf
normalization.
● Built a k Nearest Neighbour classifier considering all artifact
vectors as training data.
● For every non-artifact vectors, the top 10 closest artifacts are
computed with similarity score.
● Now, for every artifact in the results of kNN (for each non-artifact),
we update likes(favorites), shares(retweets), and comments count
(of artifact) as follows:
NewCount = OldCount + NormalizedSimilarityScore ∗
CountOfNonArtifact
The count refers to the like/share/comment count of an artifact. OldCount
refers to the original like/share/comment count of the artifacts
Approach - Indexing
● The extracted text from the artifacts is written to individual files
and placed in a folder.
● Used Lucene to create an index on this data.
● Treated all non-artifacts as a query and retrieved the top 10
matches for artifact texts along with ranking score.
● Now, for every artifact in the results(for each non-artifact query),
we update the likes(favorites), shares(retweets), and comments
count (of artifact) as follows:
NewCount = OldCount + NormalizedSimilarityScore ∗ CountOfNonArtifact
The count refers to the like/share/comment count of an artifact. OldCount
refers to the original like/share/comment count of the artifacts.
Final Score
● fb score
x * no_of_likes + y * no_of_comments + z * no_of_shares
● twitter score
x * no_of_favorite + y * no_of_comments + z * no_of_retweets
x, y, z are constants which act as weights for each of the associated
count. In our experiments we kept x=1, y=2 and z=5, thus giving
more weightage to shares.
Experiments
● For preparing dataset, we crawled all the Facebook and
Twitter pages related to the ICC Cricket World Cup.
● We got around 8770 unique artifacts from the crawled
data of Facebook and Twitter.
● We used Scikit-learn, a Python library for implementing
the kNN algorithm.
● We used Lucene for implementing our indexing
approach.
● We used Scikit-learn, a Python library for implementing
the k-means algorithm.
Experiments
● Choosing a good value of ’k’ was a difficult task.
● We tried with k as 100 and went up till k is 2500 and
recorded the silhouette score which measures the inter-
class and intra-class distances.
● So, we decided not to go ahead with this approach and
thus do not show its results in the future sections.
● More details in the report.
Result
● The output format will be a file containing (<artifact>, <social
score>) for each of the artifacts for both the methods.
● In both our methods, the artifact (shared by official ICC) about
Australia winning the world cup came out to be the top scorer.
● For both the approaches, top results that differed from each other
were within short distances (in ranks) from one another.
● A post about Srilankan cricketers retiring also got a high score from
both and was within top 5 artifacts.
● The famous “Mauka Mauka” ad from Star India could not come up
in Top 20, from either of the methods, may be due to its popularity
only in the Indian Sub-Continent.
Future Work
● We can extend the current approach and make it more
dynamic, as every second, hundreds of new tweets and posts
come up on Twitter and Facebook respectively.
● Better techniques for data crawling could be used. Presently,
there are some links, from which we could not crawl the
content, because of cookie issues.
● A decay factor can be used while calculating the social score
to ensure that the most relevant information influences the
computed popularity to an appropriate degree.
● A graph can be plotted which will depict, how popularity is
changing over the time.
Thank You!
● Demo of this project - http://goo.gl/p4ylrq
● Code+Report of this project - http://goo.gl/sWuZjm
● Any feedback/suggestions are welcome.
● You can contact us via eMail:
Venkatesh J N - jn.venkatesh@research.iiit.ac.in
Simran Kedia - simran.kedia@students.iiit.ac.in
Soumyajit Ganguly - soumyajit.ganguly@research.iiit.ac.in
Japsreet Kaur - jaspreet.kaur@students.iiit.ac.in

More Related Content

Similar to Computing Social Score of Web Aritfacts

Twitter Sentiment Analysis.pdf
Twitter Sentiment Analysis.pdfTwitter Sentiment Analysis.pdf
Twitter Sentiment Analysis.pdf
Rachanasamal3
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
Sumit Raj
 
Mozilla Foundation Metrics - presentation to engineers
Mozilla Foundation Metrics - presentation to engineersMozilla Foundation Metrics - presentation to engineers
Mozilla Foundation Metrics - presentation to engineers
John Schneider
 
YouTube Title Prediction Using Sentiment Analysis
YouTube Title Prediction Using Sentiment AnalysisYouTube Title Prediction Using Sentiment Analysis
YouTube Title Prediction Using Sentiment Analysis
IRJET Journal
 
Making Reddit Search Relevant and Scalable - Anupama Joshi & Jerry Bao, Reddit
Making Reddit Search Relevant and Scalable - Anupama Joshi & Jerry Bao, RedditMaking Reddit Search Relevant and Scalable - Anupama Joshi & Jerry Bao, Reddit
Making Reddit Search Relevant and Scalable - Anupama Joshi & Jerry Bao, Reddit
Lucidworks
 
IRJET- Sentimental Analysis of Twitter for Stock Market Investment
IRJET- Sentimental Analysis of Twitter for Stock Market InvestmentIRJET- Sentimental Analysis of Twitter for Stock Market Investment
IRJET- Sentimental Analysis of Twitter for Stock Market Investment
IRJET Journal
 
IRJET - Twitter Sentimental Analysis
IRJET -  	  Twitter Sentimental AnalysisIRJET -  	  Twitter Sentimental Analysis
IRJET - Twitter Sentimental Analysis
IRJET Journal
 
What Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PMWhat Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PM
Product School
 
Twitter Sentiment Prediction.pptx
Twitter Sentiment Prediction.pptxTwitter Sentiment Prediction.pptx
Twitter Sentiment Prediction.pptxKrishnesh Pujari
 
IRJET- Review Analyser with Bot
IRJET- Review Analyser with BotIRJET- Review Analyser with Bot
IRJET- Review Analyser with Bot
IRJET Journal
 
OSINT using Twitter & Python
OSINT using Twitter & PythonOSINT using Twitter & Python
OSINT using Twitter & Python37point2
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysis
Ashish Mundra
 
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSISMOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
Editor Jacotech
 
Analysis and Prediction of Sentiments for Cricket Tweets using Hadoop
Analysis and Prediction of Sentiments for Cricket Tweets using HadoopAnalysis and Prediction of Sentiments for Cricket Tweets using Hadoop
Analysis and Prediction of Sentiments for Cricket Tweets using Hadoop
IRJET Journal
 
Learning Analytics Design in Game-based Learning
Learning Analytics Design in Game-based LearningLearning Analytics Design in Game-based Learning
Learning Analytics Design in Game-based Learning
MIT
 
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
ssuser610732
 
How to Create Fun User Experience by Shutterstock Dir of Product
How to Create Fun User Experience by Shutterstock Dir of ProductHow to Create Fun User Experience by Shutterstock Dir of Product
How to Create Fun User Experience by Shutterstock Dir of Product
Product School
 
Webinar Structured Data
Webinar Structured DataWebinar Structured Data
Webinar Structured Data
Botify
 
Dances with unicorns
Dances with unicornsDances with unicorns
Dances with unicorns
EspritAgile
 

Similar to Computing Social Score of Web Aritfacts (20)

Twitter Sentiment Analysis.pdf
Twitter Sentiment Analysis.pdfTwitter Sentiment Analysis.pdf
Twitter Sentiment Analysis.pdf
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Mozilla Foundation Metrics - presentation to engineers
Mozilla Foundation Metrics - presentation to engineersMozilla Foundation Metrics - presentation to engineers
Mozilla Foundation Metrics - presentation to engineers
 
YouTube Title Prediction Using Sentiment Analysis
YouTube Title Prediction Using Sentiment AnalysisYouTube Title Prediction Using Sentiment Analysis
YouTube Title Prediction Using Sentiment Analysis
 
Making Reddit Search Relevant and Scalable - Anupama Joshi & Jerry Bao, Reddit
Making Reddit Search Relevant and Scalable - Anupama Joshi & Jerry Bao, RedditMaking Reddit Search Relevant and Scalable - Anupama Joshi & Jerry Bao, Reddit
Making Reddit Search Relevant and Scalable - Anupama Joshi & Jerry Bao, Reddit
 
IRJET- Sentimental Analysis of Twitter for Stock Market Investment
IRJET- Sentimental Analysis of Twitter for Stock Market InvestmentIRJET- Sentimental Analysis of Twitter for Stock Market Investment
IRJET- Sentimental Analysis of Twitter for Stock Market Investment
 
IRJET - Twitter Sentimental Analysis
IRJET -  	  Twitter Sentimental AnalysisIRJET -  	  Twitter Sentimental Analysis
IRJET - Twitter Sentimental Analysis
 
What Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PMWhat Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PM
 
Twitter Sentiment Prediction.pptx
Twitter Sentiment Prediction.pptxTwitter Sentiment Prediction.pptx
Twitter Sentiment Prediction.pptx
 
IRJET- Review Analyser with Bot
IRJET- Review Analyser with BotIRJET- Review Analyser with Bot
IRJET- Review Analyser with Bot
 
Pres1
Pres1Pres1
Pres1
 
OSINT using Twitter & Python
OSINT using Twitter & PythonOSINT using Twitter & Python
OSINT using Twitter & Python
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysis
 
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSISMOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
 
Analysis and Prediction of Sentiments for Cricket Tweets using Hadoop
Analysis and Prediction of Sentiments for Cricket Tweets using HadoopAnalysis and Prediction of Sentiments for Cricket Tweets using Hadoop
Analysis and Prediction of Sentiments for Cricket Tweets using Hadoop
 
Learning Analytics Design in Game-based Learning
Learning Analytics Design in Game-based LearningLearning Analytics Design in Game-based Learning
Learning Analytics Design in Game-based Learning
 
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
EXTRA: Integrating External Knowledge into Multimodal Hashtag Recommendation ...
 
How to Create Fun User Experience by Shutterstock Dir of Product
How to Create Fun User Experience by Shutterstock Dir of ProductHow to Create Fun User Experience by Shutterstock Dir of Product
How to Create Fun User Experience by Shutterstock Dir of Product
 
Webinar Structured Data
Webinar Structured DataWebinar Structured Data
Webinar Structured Data
 
Dances with unicorns
Dances with unicornsDances with unicorns
Dances with unicorns
 

Recently uploaded

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 

Recently uploaded (20)

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 

Computing Social Score of Web Aritfacts

  • 1. Computing Social Score of Web Artifacts IRE Major Project IIIT Hyderabad Simran Kedia Jaspreet Kaur Soumyajit Ganguly J N Venkatesh
  • 2. Outline ● Introduction ● Motivation ● Challenges ● Problem Statement ● Approach ● Final Score ● Experiments ● Future Work ● Thank You!
  • 3. Introduction ● Web artifacts are items like articles, blog posts, videos, or any other URL. ● Various users post and share these artifacts on different social media sites like Facebook, Twitter, etc. ● Each of the social media sites have their own metrics for calculating the popularity score of artifacts posted. ● In this project, we propose an approach which computes a single aggregate score of an artifact that reflects the popularity across different social media sites and not just limited to any particular site.
  • 4. Motivation ● In todays world, opinion in Social Media matters!! Everyone of us want to know how popular our web contents are!! ● The Popularity score calculator can be useful in the following cases: o The Publicity Team of a company may want to know the popularity score of its advertisement video across the social media sites and see how people are reacting to it. o Any content creator company which creates and shares its content on different social media sites, will get to know the popularity score and get feedback about the content.
  • 5. Challenges There are two main challenges to be tackled, in computing the score: ● An artifact could have been shared by multiple people and on multiple social sites. So, we have to aggregate all such shares for calculating the score. ● There will be many posts and tweets where people discuss/talk about an artifact, but they do not always quote the artifacts. Ideally, these tweets/posts should also contribute to the popularity of the artifact. But, matching these tweets/posts to an artifact is quite a bit of challenge.
  • 6. Problem Statement Given datasets of Facebook and Twitter crawled data, we would compute the popularity score of all the artifacts present in the datasets. We will compute this score by aggregating not only the tweets/posts, which have quoted the artifacts but also the tweets/posts which talk about the artifact, and assign a combined score obtained for each of the artifacts.
  • 8. Approach ● Gathering Data o Used Twitter4j and Python Facebook sdk for crawling twitter and facebook posts respectively. ● Parsing o Used html2text tool for extracting the text from artifacts ● PreProcessing o Performed Tokenization, Stop Word Removal, Stemming ● Algorithms - kNN, Indexing, kMeans ● Results
  • 9. Approach - kNN ● Transformed the pre-processed data into vectors using tf-idf normalization. ● Built a k Nearest Neighbour classifier considering all artifact vectors as training data. ● For every non-artifact vectors, the top 10 closest artifacts are computed with similarity score. ● Now, for every artifact in the results of kNN (for each non-artifact), we update likes(favorites), shares(retweets), and comments count (of artifact) as follows: NewCount = OldCount + NormalizedSimilarityScore ∗ CountOfNonArtifact The count refers to the like/share/comment count of an artifact. OldCount refers to the original like/share/comment count of the artifacts
  • 10. Approach - Indexing ● The extracted text from the artifacts is written to individual files and placed in a folder. ● Used Lucene to create an index on this data. ● Treated all non-artifacts as a query and retrieved the top 10 matches for artifact texts along with ranking score. ● Now, for every artifact in the results(for each non-artifact query), we update the likes(favorites), shares(retweets), and comments count (of artifact) as follows: NewCount = OldCount + NormalizedSimilarityScore ∗ CountOfNonArtifact The count refers to the like/share/comment count of an artifact. OldCount refers to the original like/share/comment count of the artifacts.
  • 11. Final Score ● fb score x * no_of_likes + y * no_of_comments + z * no_of_shares ● twitter score x * no_of_favorite + y * no_of_comments + z * no_of_retweets x, y, z are constants which act as weights for each of the associated count. In our experiments we kept x=1, y=2 and z=5, thus giving more weightage to shares.
  • 12. Experiments ● For preparing dataset, we crawled all the Facebook and Twitter pages related to the ICC Cricket World Cup. ● We got around 8770 unique artifacts from the crawled data of Facebook and Twitter. ● We used Scikit-learn, a Python library for implementing the kNN algorithm. ● We used Lucene for implementing our indexing approach. ● We used Scikit-learn, a Python library for implementing the k-means algorithm.
  • 13. Experiments ● Choosing a good value of ’k’ was a difficult task. ● We tried with k as 100 and went up till k is 2500 and recorded the silhouette score which measures the inter- class and intra-class distances. ● So, we decided not to go ahead with this approach and thus do not show its results in the future sections. ● More details in the report.
  • 14. Result ● The output format will be a file containing (<artifact>, <social score>) for each of the artifacts for both the methods. ● In both our methods, the artifact (shared by official ICC) about Australia winning the world cup came out to be the top scorer. ● For both the approaches, top results that differed from each other were within short distances (in ranks) from one another. ● A post about Srilankan cricketers retiring also got a high score from both and was within top 5 artifacts. ● The famous “Mauka Mauka” ad from Star India could not come up in Top 20, from either of the methods, may be due to its popularity only in the Indian Sub-Continent.
  • 15. Future Work ● We can extend the current approach and make it more dynamic, as every second, hundreds of new tweets and posts come up on Twitter and Facebook respectively. ● Better techniques for data crawling could be used. Presently, there are some links, from which we could not crawl the content, because of cookie issues. ● A decay factor can be used while calculating the social score to ensure that the most relevant information influences the computed popularity to an appropriate degree. ● A graph can be plotted which will depict, how popularity is changing over the time.
  • 16. Thank You! ● Demo of this project - http://goo.gl/p4ylrq ● Code+Report of this project - http://goo.gl/sWuZjm ● Any feedback/suggestions are welcome. ● You can contact us via eMail: Venkatesh J N - jn.venkatesh@research.iiit.ac.in Simran Kedia - simran.kedia@students.iiit.ac.in Soumyajit Ganguly - soumyajit.ganguly@research.iiit.ac.in Japsreet Kaur - jaspreet.kaur@students.iiit.ac.in

Editor's Notes

  1. needs to be changed