SlideShare a Scribd company logo
1 of 47
www.moving-project.eu
TraininG towards a society of data-saVvy inforMation prOfessionals
to enable open leadership INnovation
Chifumi Nishioka and Ansgar Scherp
Profiling vs. Time vs. Content: What does
Matter for Top-k Publication
Recommendation based on Twitter Profiles?
Kiel University and Leibniz Information Centre for Economics (ZBW Kiel), Germany
www.moving-project.eu
2 of 21
Motivation
• Information overload: too many papers in DLs
• Recommender systems: facilitate researchers by
suggesting papers that may interest them
• Collaborative filtering: cold start problem
• Content-based: based on their papers
• Social media in academia
• Researchers’ ideas on Twitter
[Letierce et al. 14]
• Ongoing research interests
Chifumi Nishioka (chni@informatik.uni-kiel.de)
Recommender System for scientific
papers based on Twitter
www.moving-project.eu
3 of 21
Three Factors
(I) Profiling method
• Extract features from social media items and
documents
• How to model user profiles and document profiles
(II) Temporal decay function
• Model the assumption that older
items are less important
• Examine which temporal decay function performs well
(III) Document content
• Investigate whether it is possible to make reasonable
recommendations using only titles of documents
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
4 of 21
Recommendation Procedure
Chifumi Nishioka (chni@informatik.uni-kiel.de)
User profiling
Compute similarity scores
between user profile and each of document profiles
Document profile
Recommend documents that have high similarity scores
www.moving-project.eu
5 of 21
Related Factors
Chifumi Nishioka (chni@informatik.uni-kiel.de)
User profiling
Compute similarity scores
between user profile and each of document profiles
Document profile
document corpus
Recommend documents that have high similarity scores
www.moving-project.eu
6 of 21
Three Factors and Choices
Chifumi Nishioka (chni@informatik.uni-kiel.de)
• Configurations
Factor Design Choices
Profiling method CF-IDF HCF-IDF LDA
Temporal decay function Sliding window Exponential decay
Document content All (title + full-text) Title
3 × 2 × 2 = 12 strategies are experimented
www.moving-project.eu
7 of 21
Profiles
• Profiles: represented as a vector, where each
element is a weight of a concept (i.e., feature)
• User profile 𝑝 𝑢 for user 𝑢
• Based on a user’s social media items (tweets) i ∈ 𝐼 𝑢
• Document profile 𝑝 𝑑 for document d
• User profile and document profile are made by
the same method
Chifumi Nishioka (chni@informatik.uni-kiel.de)
𝑝 𝑢 = {𝑤𝑒𝑖𝑔ℎ𝑡 𝑐, 𝐼 𝑢 |∀𝑐 ∈ 𝐶}
𝑤𝑒𝑖𝑔ℎ𝑡 𝑐, 𝐼 𝑢 =
𝑖∈𝐼 𝑢
𝑤𝑒𝑖𝑔ℎ𝑡(𝑐, 𝑖)
𝑝 𝑑 = {𝑤𝑒𝑖𝑔ℎ𝑡 𝑐, 𝑑 |∀𝑐 ∈ 𝐶}
a concept 𝑐 ∈ 𝐶 (i.e.,
feature) is a subject
term or topic
www.moving-project.eu
8 of 21
Factor I: Profiling Methods
• CF-IDF [Goossen et al. 11]
• Extension of TF-IDF replacing words with concepts
• Concept: a subject term coming from a taxonomy
• e.g., financial crisis, interest rate (economics)
• Use a domain specific taxonomy to extract only concepts
that are relevant to the target domain
Chifumi Nishioka (chni@informatik.uni-kiel.de)
How to extract features and model users and documents
𝑤𝑒𝑖𝑔ℎ𝑡′ 𝑐𝑓𝑖𝑑𝑓 𝑐, 𝑑 = 𝑐𝑓(𝑐, 𝑑) ∙ 𝑙𝑜𝑔
|𝐷|
|𝑑 ∈ 𝐷: 𝑐 ∈ 𝑑|
1. CF-IDF KB (taxonomy) based
methods2. HCF-IDF
3. LDA Topic modeling
Freely available in many domains!
www.moving-project.eu
9 of 21
Factor I: Profiling Methods
Chifumi Nishioka (chni@informatik.uni-kiel.de)
Social
Recommendation
Social
Tagging
Web Searching Web Mining
Site
Wrapping
Web Log
Analysis
World Wide Web
• HCF-IDF (Hierarchical CF-IDF) [Nishioka et al. 15]
• Extension of CF-IDF using spreading activation
• Combine the strength of the semantics with the
statistical strength
• 𝐵𝑒𝑙𝑙𝐿𝑜𝑔 𝑐, 𝑑 : best spreading
activation function
𝑤𝑒𝑖𝑔ℎ𝑡′ℎ𝑐𝑓𝑖𝑑𝑓 𝑐, 𝑑 = 𝐵𝑒𝑙𝑙𝐿𝑜𝑔(𝑐, 𝑑) ∙ 𝑙𝑜𝑔
|𝐷|
|𝑑 ∈ 𝐷: 𝑐 ∈ 𝑑|
Extract concepts which are
not mentioned directly
by spreading activation!
www.moving-project.eu
10 of 21
Factor I: Profiling Methods
• Latent Dirichlet Allocation (LDA) [Blei et al. 03]
• Unsupervised topic modeling method
• Topic model
• Document: A probability distribution over topics
• Topic: A probability distribution over words
• Treat a topic as a concept
• Procedure
• Construct a topic model over document corpus
• Infer a topic distribution over the trained topic model in
social media items
Chifumi Nishioka (chni@informatik.uni-kiel.de)
𝑤𝑒𝑖𝑔ℎ𝑡′𝑙𝑑𝑎 𝑐, 𝑑 = 𝑝(𝑐|𝑑)
www.moving-project.eu
11 of 21
Factor II: Temporal Decay Function
• Final weight is given after applying decay
• Sliding window
• Give weights only concepts that appear after 𝑡ℎ𝑟𝑒𝑠ℎ
• Parameter setting
• 𝑡ℎ𝑟𝑒𝑠ℎ = 250 𝑑𝑎𝑦𝑠 for social media items
• 𝑡ℎ𝑟𝑒𝑠ℎ = 9.04 𝑦𝑒𝑎𝑟𝑠 for scientific papers
• Exponential decay
• Parameter setting
• 𝜏 = 360 𝑑𝑎𝑦𝑠 for social media items
• 𝜏 = 13.05 𝑦𝑒𝑎𝑟𝑠 for scientific publications
Chifumi Nishioka (chni@informatik.uni-kiel.de)
𝑓𝑠𝑤 𝑡 =
1 for 𝑡 ≥ 𝑡ℎ𝑟𝑒𝑠ℎ
0 for 𝑡 < 𝑡ℎ𝑟𝑒𝑠ℎ
𝑓𝑒𝑥𝑝 𝑡 = 𝑒−(𝑡 𝑐𝑢𝑟𝑟𝑒𝑛𝑡−𝑡)/𝜏
𝑤𝑒𝑖𝑔ℎ𝑡 𝑐, 𝑖 = 𝑓 𝑡 ∙ 𝑤𝑒𝑖𝑔ℎ𝑡′
𝑐, 𝑖 𝑤𝑒𝑖𝑔ℎ𝑡 𝑐, 𝑖 = 𝑓 𝑡 ∙ 𝑤𝑒𝑖𝑔ℎ𝑡′
𝑐, 𝑖
www.moving-project.eu
12 of 21
Factor III: Document Content
• Title
• Always freely available
• All (title + full-text)
• Full-text: usually not available due to legal issues
• Extract full-text from PDF files
Chifumi Nishioka (chni@informatik.uni-kiel.de)
How the recommendation performance using only titles is
close to using both title and full-text
www.moving-project.eu
13 of 21
Computing Recommendations
• Temporal Cosine Similarity for CF-IDF & HCF-IDF
• 𝑓(𝑡 𝑑) to give higher weights to newer documents
• Dot Product for LDA
• Better performance than cosine similarity and
Kullback-Leibler divergence [Hazen 10]
Chifumi Nishioka (chni@informatik.uni-kiel.de)
𝑠𝑖𝑚 𝑡𝑐𝑜𝑠 𝑝 𝑢, 𝑝 𝑑 = 𝑓(𝑡 𝑑) ∙
𝑝 𝑢 ∙ 𝑝 𝑑
||𝑝 𝑢|| ∙ | 𝑝 𝑑 |
𝑠𝑖𝑚 𝑑𝑝 𝑝 𝑢, 𝑝 𝑑 = 𝑝 𝑢 ∙ 𝑝 𝑑
Recommend documents that have higher similarity scores
with user profile
www.moving-project.eu
14 of 21
Experiment Setup (1/3)
• Procedure
• Input his/her public Twitter handles
• The number of recommendations per strategy 𝑘 = 5
[Chen et al. 10]
• Ask participants to assess whether a recommended
paper is interesting or not
• On average 517.54 seconds to complete the
experiment
Chifumi Nishioka (chni@informatik.uni-kiel.de)
Scenario: Recommend scientific papers in the field of
economics based on users’ tweets
www.moving-project.eu
15 of 21
Experiment Setup (2/3)
• Web application
• Metadata (i.e., author, title, year) is shown
• Participants can open PDF files by clicking metadata
• Order of strategies is randomized
• Order of recommended papers is randomized
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
16 of 21
Experiment Setup (3/3)
• Dataset
• Scientific Papers
• 279,381 open access papers from EconBiz
• EconBiz: a portal for scientific papers in economics managed
by ZBW, the German National Library of Economics
• Hierarchical taxonomy
• STW, a thesaurus specialized for economics
• 3,335 semantic concepts and 37,733 labels
• 123 participants from the field of economics
• 21 bachelor /58 master /32 PhD /12 professor
• Metric: rankscore [Breese et al. 98]
• Assumption: higher ranked items are more likely to be
viewed
Chifumi Nishioka (chni@informatik.uni-kiel.de)
http://www.econbiz.de/
www.moving-project.eu
17 of 21
Result: Recommendation Strategy
• The strategy CF-IDF × Sliding window × All
performs best with the rankscore of 0.59
• But, the statistical test shows no significant
difference between the best strategy and the
strategies using HCF-IDF
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
18 of 21
Result: Influence of Three Factors
• Three-way repeated-measure ANOVA to analyze
the performance with respect to each factor
• Profiling method has the largest impact on the
recommendation performance
• Best profiling method: HCF-IDF
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
19 of 21
Insights from the Results
• Profiling method has the largest impact
• Best profiling method: HCF-IDF
• Spreading activation mitigates sparseness
• Works for users who have less tweets
• Usually, full-texts are unavailable for TDM
• Easy to employ HCF-IDF in many different fields
• MeSH for medicine, ACM CCS for computer science
Chifumi Nishioka (chni@informatik.uni-kiel.de)
Advantage: HCF-IDF can make good
recommendations based on only titles
www.moving-project.eu
20 of 21
Conclusion
• User experiment with three factors
• Profiling method: CF-IDF / HCF-IDF / LDA
• Decay function: sliding window / exponential decay
• Document content: Title / All (title + full-text)
• Result
• Profiling method has the largest impact
• Best profiling method: HCF-IDF
• Advantage of HCF-IDF: It performs well even if only
titles are available
Chifumi Nishioka (chni@informatik.uni-kiel.de)
Recommender system for scientific papers based on social
media items
www.moving-project.eu
21 of 21
Special thanks to
SIGIR Student Travel Grant
Project consortium and funding agency
Chifumi Nishioka (chni@informatik.uni-kiel.de)
MOVING is funded by the EU Horizon 2020 Programme under the project number INSO-4-2015: 693092
Our demo is online!
http://amygdala.informatik.uni-kiel.de/Demo/TwitterAccount
www.moving-project.eu
22 of 21
Appendix
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
23 of 21
Reference
• [Blei and Lafferty 06] D. M. Blei and J. D. Lafferty. Dynamic topic models. ICML, 2006.
• [Blei et al. 03] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR,
2003.
• [Breese et al. 98] J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of
predictive algorithms for collaborative filtering. UAI, 1998.
• [Goossen et al. 11] F. Goossen, W. IJntema, F. Frasincar, F. Hogenboom, and U. Kaymak.
News personalization using the CF-IDF semantic recommender. WIMS, 2011.
• [Griffiths and Steyvers 04] T. L. Griffiths and M. Steyvers. Finding scientific topics. NAS,
2004.
• [Hazen 10] T. J. Hazen. Direct and latent modeling techniques for computing spoken
document similarity. Spoken Language Technology, 2010.
• [Kapanipathi et al. 14] P. Kapanipathi, P. Jain, C. Venkataramani, and A. Sheth. User
interests identification on Twitter using a hierarchical knowledge base. ESWC, 2014.
• [Letierce et al. 10] J. Letierce, A. Passant, J. Breslin, and S. Decker. Understanding how
Twitter is used to spread scientific messages. WebSci, 2010.
• [Nascimento et al. 11] C. Nascimento, A. H. F. Laender, A. S. da Silva, and M. A.
Gonçalves. A source independent framework for research paper recommendation. JCDL,
2011.
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
24 of 21
Reference
• [Nishioka et al. 15] C. Nishioka, G. Große-Bölting, A. Scherp. Influence of time on user
profiling and recommending researchers in social media. i-KNOW, 2015.
• [Shen et al. 13] W. Shen, J. Wang, P. Luo, and M. Wang. Linking named entities in tweets
with knowledge base via user interest modeling. KDD, 2013
• [Sugiyama and Kan 10] K. Sugiyama and M.-Y. Kan. Scholarly paper recommendation via
user’s recent research interests. JCDL, 2010.
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
25 of 21
Evaluation Metrics
• rankscore [Breese et al. 98]
• Posit that each successive item in a list is less likely to
be viewed with an exponential decay
• Set a parameter 𝜃 = 5, along with [Breese et al. 98]
• Other metrics
• Precision
• Mean Average Precision (MAP)
• Mean Reciprocal Rank (MRR)
• normalized Discounter Cumulative Gain (nDCG)
Chifumi Nishioka (chni@informatik.uni-kiel.de)
𝑟𝑎𝑛𝑘𝑠𝑐𝑜𝑟𝑒 =
𝑑∈ℎ𝑖𝑡𝑠
1
2
𝑟𝑎𝑛𝑘 𝑑−1
𝜃−1
𝑤𝑒𝑖𝑔ℎ𝑡 𝑐, 𝑖
www.moving-project.eu
26 of 21
Participants
• Collecting participants
• Mailing lists, tweets, and word-of-month
• 134 started the experiment and 123 completed
• Demographics of participants
• 96 male / 27 female
• 32.83 years old on average (SD: 7.34)
• 21 bachelor / 58 master / 32 a PhD / 12 professor
• 83 working in academia / 40 working in industry
• Incentive for the participation
• Get to know his / her most similar economist among
26 famous economists
• Chance to get one of two Amazon vouchers (50 EUR)
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
27 of 21
Tweets of Participants
• Extract user’s tweets via Twitter API
• Enable to extract at most 3,200 tweets per user
• Collect only tweets in English
• The number of English tweets per participant
• Average: 1096.82 English tweets (SD: 1048.46)
• Max: 3192
• Min: 2
• Criteria for the participation
• Participants who had no tweet in the last 250 days
could not participate in the experiment
• Five were rejected for this reason
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
28 of 21
Dataset: Scientific Publications
• Result of collaboration with EconBiz
• 279,381 papers in English
• EconBiz: a portal for scientific publications om
economics
• Procedure
• Seed list: 1 million URLs of open access papers
• Successfully download 413,098 papers in PDF
• Convert PDFs into texts using Apache PDFBox
• Detect languages of 413,098 papers
• Finally, get 279,381 papers in English
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
29 of 21
Dataset: Taxonomy
• STW (ver. 8.12) enriched by DBpedia redirects
• Maintained by ZBW
• Specialized for economics
• 6,335 semantic concepts and 11,679 labels in English
• Enrichment process
• Goal: get more synonymous labels
• Use the official mapping that connects STW concepts
with DBpedia concepts
• Get redirects from Dbpedia concepts
• e.g., “Telecommunications Operator” and “Telephone
companies”
• Finally, 6,335 semantic concepts and 37,733 labels
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
30 of 21
Latent Dirichlet Allocation (LDA)
Chifumi Nishioka (chni@informatik.uni-kiel.de)
source: D. M. Blei. Probabilistic topic models, CACM, 2012.
www.moving-project.eu
31 of 21
Latent Dirichlet Allocation (LDA)
• Implementation: JGibbLDA
• Preprocessing
• Lemmatization and stop words removal
• Remove words that appear in fewer than 25 scientific
publications along with [Blei and Lafferty 06]
• Parameters
• Hyper-parameters: 𝛼 = 0.5 and 𝛽 = 0.1
• suggested by [Griffiths and Steyvers 04]
• The number of topics: 𝐾 = 100
• Optimized by the log likelihood
• Experimented 𝐾 = 20, 50, 100, 200, 500, 1000, 5000
• The number of iterations: 500
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
32 of 21
Statistical Test for Strategies (1/2)
• Mauchly’s sphericity test
• Verify if the variances of the rankscores of the twelve
strategies are equal
• Reveal a violation of sphericity in the strategies
(𝜒2
65 = 435.90, 𝑝 = .00), which leads to positively
biased F-statistics and increases false positives
• One-way repeated-measure ANOVA with a
Greenhouse-Geisser correction of 𝜖 = .61
• Reveals a significant difference (𝐹 6.60, 805.33 =
21.98, 𝑝 = .00)
Chifumi Nishioka (chni@informatik.uni-kiel.de)
Investigate the difference of the recommendation
performance among the twelve strategies
www.moving-project.eu
33 of 21
Statistical Test for Strategies (2/2)
• Shaffer’s modified sequentially rejective
Bonferroni procedure (Shaffer’s MSRB
procedure)
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
34 of 21
Statistical Test for Three Factors
• Mendoza’s sphericity test
• Adopt to multi-way repeated-measure ANOVA
• Again, reveal a violation of sphericity
• Three-way repeated-measure ANOVA with a
Greenhouse-Geisser corrections
Chifumi Nishioka (chni@informatik.uni-kiel.de)
Investigate the difference of the recommendation
performance among the twelve strategies
Global 𝜒2
65 = 435.90, 𝑝 = .00
Profiling Method 𝜒2 2 = 12.21, 𝑝 = .00
Profiling Method × Decay Function 𝜒2
2 = 20.02, 𝑝 = .00
Profiling Method × Document Content (𝜒2 2 = 8.61, 𝑝 = .01
www.moving-project.eu
35 of 21
Result: Precision
• Values are similar to ones in rankscore
• The order of the strategies are identical
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
36 of 21
Result: nDCG
• Values are similar to ones in rankscore
• The order of the strategies are identical
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
37 of 21
Result: Mean Average Precision
• Values are higher than ones of rankscore
• The order of the strategies are almost same
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
38 of 21
Result: Mean Reciprocal Rank
• Values are higher than ones of rankscore
• The order of the strategies are almost same
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
39 of 21
Result: Post-hoc Analysis
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
40 of 21
Result: Influence of Three Factors
• Profiling Method
• This factor has the biggest impact
• HCF-IDF is the best profiling method, followed by CF-
IDF and LDA
• Document Content
• All (both title and full-text) significantly performs
better than Title except when using HCF-IDF
• Profiling Method × Document Content
• All (both title and full-text) is better choice for CF-IDF
• Document Content makes no difference for HCF-IDF
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
41 of 21
Result: Demographic Factors (1/3)
• Gender
• Female participants are more likely to evaluate
recommendations as interesting
• However, no difference about how each strategy
performs compared to the others, due to no
significant difference in the factor gender × strategy
• Age
• No influence on recommendation performance
Chifumi Nishioka (chni@informatik.uni-kiel.de)
Investigate whether each demographic factor has an
influence on recommendation performance
www.moving-project.eu
42 of 21
Result: Demographic Factors (2/3)
• Highest academic degree
• Participants with bachelor are more likely to evaluate
recommendations as interesting than those with
lecturer/professor
• But, no difference about how each strategy performs
compared to the others
• Major
• Manually classify participants into the two groups
• Participants majoring in economics (𝑛 = 92) and other
• No influence on recommendation performance
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
43 of 21
Result: Demographic Factors (3/3)
• Years of profession
• On average, working in 7.85 years (SD: 6.85)
• Divide participants into the three groups
• Participants working for ~5 years (𝑛 = 44), working for
5~10 years (𝑛 = 34), and working for 10~ years
• No influence on recommendation performance
• Employment type
• Participants working in academia (𝑛 = 83) and
industry
• No influence on recommendation performance
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
44 of 21
Result: Click Rates (1/2)
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
45 of 21
Result: Click Rates (2/2)
Chifumi Nishioka (chni@informatik.uni-kiel.de)
www.moving-project.eu
46 of 21
List of Taxonomies
• Maintained by W3
• https://www.w3.org/2001/sw/wiki/SKOS/Datasets
Chifumi Nishioka (chni@informatik.uni-kiel.de)
source: https://www.w3.org/2001/sw/wiki/SKOS/Datasets
Taxonomy Domain
Thesaurus for the Social Sciences Social science
NASA Taxonomy Technology areas
Linked Life data Biomedicine
Medical Subject Headings (MeSH) Biomedicine
Australian education vocablaries Education
UNESCO Thesaurus Education, culture, natural sciences, and
social and human sciences
ACM Computing Classification System Computer science
www.moving-project.eu
47 of 21
Insights from the Results
• Profiling method has the largest impact
• Best profiling method: HCF-IDF
• Only HCF-IDF enables to make reasonable
recommendations based on only titles
• CF-IDF requires full-texts
• LDA perform poorly even with full-texts
• Possible reason: impossible to infer topic distribution from
social media items, which are short and sparse
• Usually, full-texts are not available for legal issues
• Easy to employ HCF-IDF in many different fields
• e.g., MeSH for medicine, ACM CCS for compute
science
Chifumi Nishioka (chni@informatik.uni-kiel.de)

More Related Content

What's hot

NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGGeoffrey Fox
 
Data Science Curriculum at Indiana University
Data Science Curriculum at Indiana UniversityData Science Curriculum at Indiana University
Data Science Curriculum at Indiana UniversityGeoffrey Fox
 
e-Consultation Platforms: Generating or just Recycling Ideas?
e-Consultation Platforms: Generating or just Recycling Ideas?e-Consultation Platforms: Generating or just Recycling Ideas?
e-Consultation Platforms: Generating or just Recycling Ideas?Efthimios Tambouris
 
Lessons from Data Science Program at Indiana University: Curriculum, Students...
Lessons from Data Science Program at Indiana University: Curriculum, Students...Lessons from Data Science Program at Indiana University: Curriculum, Students...
Lessons from Data Science Program at Indiana University: Curriculum, Students...Geoffrey Fox
 
Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Angelo Salatino
 
Internet working With TCP/IP
Internet working With TCP/IPInternet working With TCP/IP
Internet working With TCP/IPchee wai wong
 
Data Science and Online Education
Data Science and Online EducationData Science and Online Education
Data Science and Online EducationGeoffrey Fox
 
resume-tina-tingchu-lin
resume-tina-tingchu-linresume-tina-tingchu-lin
resume-tina-tingchu-linTing-Chu Lin
 
[DOLAP2019] Augmented Business Intelligence
[DOLAP2019] Augmented Business Intelligence[DOLAP2019] Augmented Business Intelligence
[DOLAP2019] Augmented Business IntelligenceUniversity of Bologna
 
Topics of interest for IWPT'01.doc
Topics of interest for IWPT'01.docTopics of interest for IWPT'01.doc
Topics of interest for IWPT'01.docbutest
 

What's hot (20)

Week2: Programming for Data Analysis
Week2: Programming for Data AnalysisWeek2: Programming for Data Analysis
Week2: Programming for Data Analysis
 
Data Wrangling Week 4
Data Wrangling Week 4Data Wrangling Week 4
Data Wrangling Week 4
 
Data wrangling week3
Data wrangling week3Data wrangling week3
Data wrangling week3
 
Data Wrangling Week 7
Data Wrangling Week 7Data Wrangling Week 7
Data Wrangling Week 7
 
Data wrangling week 5
Data wrangling week 5Data wrangling week 5
Data wrangling week 5
 
Data wrangling week 9
Data wrangling week 9Data wrangling week 9
Data wrangling week 9
 
bonino
boninobonino
bonino
 
NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWG
 
Week 11: Programming for Data Analysis
Week 11: Programming for Data AnalysisWeek 11: Programming for Data Analysis
Week 11: Programming for Data Analysis
 
Data Science Curriculum at Indiana University
Data Science Curriculum at Indiana UniversityData Science Curriculum at Indiana University
Data Science Curriculum at Indiana University
 
e-Consultation Platforms: Generating or just Recycling Ideas?
e-Consultation Platforms: Generating or just Recycling Ideas?e-Consultation Platforms: Generating or just Recycling Ideas?
e-Consultation Platforms: Generating or just Recycling Ideas?
 
Hadoop in Alibaba Cloud
Hadoop in Alibaba CloudHadoop in Alibaba Cloud
Hadoop in Alibaba Cloud
 
Lessons from Data Science Program at Indiana University: Curriculum, Students...
Lessons from Data Science Program at Indiana University: Curriculum, Students...Lessons from Data Science Program at Indiana University: Curriculum, Students...
Lessons from Data Science Program at Indiana University: Curriculum, Students...
 
ICSE12 SEE.ppt
ICSE12 SEE.pptICSE12 SEE.ppt
ICSE12 SEE.ppt
 
Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics Invited Talk: Early Detection of Research Topics
Invited Talk: Early Detection of Research Topics
 
Internet working With TCP/IP
Internet working With TCP/IPInternet working With TCP/IP
Internet working With TCP/IP
 
Data Science and Online Education
Data Science and Online EducationData Science and Online Education
Data Science and Online Education
 
resume-tina-tingchu-lin
resume-tina-tingchu-linresume-tina-tingchu-lin
resume-tina-tingchu-lin
 
[DOLAP2019] Augmented Business Intelligence
[DOLAP2019] Augmented Business Intelligence[DOLAP2019] Augmented Business Intelligence
[DOLAP2019] Augmented Business Intelligence
 
Topics of interest for IWPT'01.doc
Topics of interest for IWPT'01.docTopics of interest for IWPT'01.doc
Topics of interest for IWPT'01.doc
 

Viewers also liked

TRECVID 2016 Ad-hoc Video Search task, CERTH-ITI
TRECVID 2016 Ad-hoc Video Search task, CERTH-ITITRECVID 2016 Ad-hoc Video Search task, CERTH-ITI
TRECVID 2016 Ad-hoc Video Search task, CERTH-ITIMOVING Project
 
Including financial criteria in the strategic planning of knowledge repositor...
Including financial criteria in the strategic planning of knowledge repositor...Including financial criteria in the strategic planning of knowledge repositor...
Including financial criteria in the strategic planning of knowledge repositor...MOVING Project
 
TRECVID 2016 POSTER CERTH-ITI
TRECVID 2016 POSTER CERTH-ITITRECVID 2016 POSTER CERTH-ITI
TRECVID 2016 POSTER CERTH-ITIMOVING Project
 
VIDEO AESTHETIC QUALITY ASSESSMENT USING KERNEL SUPPORT VECTOR MACHINE WITH I...
VIDEO AESTHETIC QUALITY ASSESSMENT USING KERNEL SUPPORT VECTOR MACHINE WITH I...VIDEO AESTHETIC QUALITY ASSESSMENT USING KERNEL SUPPORT VECTOR MACHINE WITH I...
VIDEO AESTHETIC QUALITY ASSESSMENT USING KERNEL SUPPORT VECTOR MACHINE WITH I...MOVING Project
 
Mining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open DataMining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open DataMOVING Project
 
Jak prowadzić konto firmowe na Twitterze?
Jak prowadzić konto firmowe na Twitterze?Jak prowadzić konto firmowe na Twitterze?
Jak prowadzić konto firmowe na Twitterze?Artur Jabłoński
 

Viewers also liked (6)

TRECVID 2016 Ad-hoc Video Search task, CERTH-ITI
TRECVID 2016 Ad-hoc Video Search task, CERTH-ITITRECVID 2016 Ad-hoc Video Search task, CERTH-ITI
TRECVID 2016 Ad-hoc Video Search task, CERTH-ITI
 
Including financial criteria in the strategic planning of knowledge repositor...
Including financial criteria in the strategic planning of knowledge repositor...Including financial criteria in the strategic planning of knowledge repositor...
Including financial criteria in the strategic planning of knowledge repositor...
 
TRECVID 2016 POSTER CERTH-ITI
TRECVID 2016 POSTER CERTH-ITITRECVID 2016 POSTER CERTH-ITI
TRECVID 2016 POSTER CERTH-ITI
 
VIDEO AESTHETIC QUALITY ASSESSMENT USING KERNEL SUPPORT VECTOR MACHINE WITH I...
VIDEO AESTHETIC QUALITY ASSESSMENT USING KERNEL SUPPORT VECTOR MACHINE WITH I...VIDEO AESTHETIC QUALITY ASSESSMENT USING KERNEL SUPPORT VECTOR MACHINE WITH I...
VIDEO AESTHETIC QUALITY ASSESSMENT USING KERNEL SUPPORT VECTOR MACHINE WITH I...
 
Mining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open DataMining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open Data
 
Jak prowadzić konto firmowe na Twitterze?
Jak prowadzić konto firmowe na Twitterze?Jak prowadzić konto firmowe na Twitterze?
Jak prowadzić konto firmowe na Twitterze?
 

Similar to Twitter Profile Recommendations

MOVING presentation at JSI
MOVING presentation at JSIMOVING presentation at JSI
MOVING presentation at JSIMOVING Project
 
Open Data Initiatives – Empowering Students to Make More Informed Choices? - ...
Open Data Initiatives – Empowering Students to Make More Informed Choices? - ...Open Data Initiatives – Empowering Students to Make More Informed Choices? - ...
Open Data Initiatives – Empowering Students to Make More Informed Choices? - ...Terminalfour
 
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchTao Xie
 
Planning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful ResearchPlanning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful ResearchTao Xie
 
Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.
Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.
Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.Lena Lindbäck
 
User Required? On the Value of User Research in the Digital Humanities
User Required? On the Value of User Research in the Digital HumanitiesUser Required? On the Value of User Research in the Digital Humanities
User Required? On the Value of User Research in the Digital HumanitiesMaxKemman
 
Knowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital LibrariesKnowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital LibrariesAnsgar Scherp
 
Synthesising JISC Institutional Innovation
Synthesising JISC Institutional InnovationSynthesising JISC Institutional Innovation
Synthesising JISC Institutional InnovationGeorge Roberts
 
Embedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and EvaluationEmbedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and EvaluationwebLyzard technology
 
Learning Analytics and Sensemaking in Digital Learning Ecosystems - Examples ...
Learning Analytics and Sensemaking in Digital Learning Ecosystems - Examples ...Learning Analytics and Sensemaking in Digital Learning Ecosystems - Examples ...
Learning Analytics and Sensemaking in Digital Learning Ecosystems - Examples ...tobold
 
NISI Agile Software Architecture Slide Deck
NISI Agile Software Architecture Slide DeckNISI Agile Software Architecture Slide Deck
NISI Agile Software Architecture Slide DeckUtrecht University
 
Confessions of an Interdisciplinary Researcher: The Case of High Performance ...
Confessions of an Interdisciplinary Researcher: The Case of High Performance ...Confessions of an Interdisciplinary Researcher: The Case of High Performance ...
Confessions of an Interdisciplinary Researcher: The Case of High Performance ...tiberiusp
 
A Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document AnnotationA Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document AnnotationAnsgar Scherp
 
The Download: Tech Talks by the HPCC Systems Community, Episode 12
 The Download: Tech Talks by the HPCC Systems Community, Episode 12 The Download: Tech Talks by the HPCC Systems Community, Episode 12
The Download: Tech Talks by the HPCC Systems Community, Episode 12HPCC Systems
 
A cost structure study for French HSS journals
A cost structure study for French HSS journalsA cost structure study for French HSS journals
A cost structure study for French HSS journalsOpenEdition
 
Business Model Canvas For Teaching Mediation Platform
Business Model Canvas For Teaching Mediation PlatformBusiness Model Canvas For Teaching Mediation Platform
Business Model Canvas For Teaching Mediation PlatformJitendra Kasaudhan
 
Building the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-drivenBuilding the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-drivenMaxKemman
 

Similar to Twitter Profile Recommendations (20)

MOVING presentation at JSI
MOVING presentation at JSIMOVING presentation at JSI
MOVING presentation at JSI
 
Open Data Initiatives – Empowering Students to Make More Informed Choices? - ...
Open Data Initiatives – Empowering Students to Make More Informed Choices? - ...Open Data Initiatives – Empowering Students to Make More Informed Choices? - ...
Open Data Initiatives – Empowering Students to Make More Informed Choices? - ...
 
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
 
Planning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful ResearchPlanning and Executing Practice-Impactful Research
Planning and Executing Practice-Impactful Research
 
Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.
Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.
Presentation of the DURAARK project at Ex Libris conference, Berlin, Germany.
 
User Required? On the Value of User Research in the Digital Humanities
User Required? On the Value of User Research in the Digital HumanitiesUser Required? On the Value of User Research in the Digital Humanities
User Required? On the Value of User Research in the Digital Humanities
 
Knowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital LibrariesKnowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital Libraries
 
Synthesising JISC Institutional Innovation
Synthesising JISC Institutional InnovationSynthesising JISC Institutional Innovation
Synthesising JISC Institutional Innovation
 
Dev8d jupyter
Dev8d jupyterDev8d jupyter
Dev8d jupyter
 
Embedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and EvaluationEmbedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and Evaluation
 
Learning Analytics and Sensemaking in Digital Learning Ecosystems - Examples ...
Learning Analytics and Sensemaking in Digital Learning Ecosystems - Examples ...Learning Analytics and Sensemaking in Digital Learning Ecosystems - Examples ...
Learning Analytics and Sensemaking in Digital Learning Ecosystems - Examples ...
 
NISI Agile Software Architecture Slide Deck
NISI Agile Software Architecture Slide DeckNISI Agile Software Architecture Slide Deck
NISI Agile Software Architecture Slide Deck
 
Confessions of an Interdisciplinary Researcher: The Case of High Performance ...
Confessions of an Interdisciplinary Researcher: The Case of High Performance ...Confessions of an Interdisciplinary Researcher: The Case of High Performance ...
Confessions of an Interdisciplinary Researcher: The Case of High Performance ...
 
A Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document AnnotationA Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document Annotation
 
The Download: Tech Talks by the HPCC Systems Community, Episode 12
 The Download: Tech Talks by the HPCC Systems Community, Episode 12 The Download: Tech Talks by the HPCC Systems Community, Episode 12
The Download: Tech Talks by the HPCC Systems Community, Episode 12
 
Data-X-v3.1
Data-X-v3.1Data-X-v3.1
Data-X-v3.1
 
A cost structure study for French HSS journals
A cost structure study for French HSS journalsA cost structure study for French HSS journals
A cost structure study for French HSS journals
 
Data-X-Sparse-v2
Data-X-Sparse-v2Data-X-Sparse-v2
Data-X-Sparse-v2
 
Business Model Canvas For Teaching Mediation Platform
Business Model Canvas For Teaching Mediation PlatformBusiness Model Canvas For Teaching Mediation Platform
Business Model Canvas For Teaching Mediation Platform
 
Building the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-drivenBuilding the PoliMedia search system; data- and user-driven
Building the PoliMedia search system; data- and user-driven
 

More from MOVING Project

Opening up education through digitization. Remarks on recent developments in ...
Opening up education through digitization. Remarks on recent developments in ...Opening up education through digitization. Remarks on recent developments in ...
Opening up education through digitization. Remarks on recent developments in ...MOVING Project
 
MOVING: Applying digital science methodology for TVET
MOVING: Applying digital science methodology for TVETMOVING: Applying digital science methodology for TVET
MOVING: Applying digital science methodology for TVETMOVING Project
 
Learning analytics for reflective learning
Learning analytics for reflective learningLearning analytics for reflective learning
Learning analytics for reflective learningMOVING Project
 
Challenges in Developing Automatic Learning Guidance in Relation to an Inform...
Challenges in Developing Automatic Learning Guidance in Relation to an Inform...Challenges in Developing Automatic Learning Guidance in Relation to an Inform...
Challenges in Developing Automatic Learning Guidance in Relation to an Inform...MOVING Project
 
Unesco mobileweek 2019_frontier_tech_oer-final
Unesco mobileweek 2019_frontier_tech_oer-finalUnesco mobileweek 2019_frontier_tech_oer-final
Unesco mobileweek 2019_frontier_tech_oer-finalMOVING Project
 
Inferring knowledge acquisition through Web navigation behaviour
Inferring knowledge acquisition through Web navigation behaviourInferring knowledge acquisition through Web navigation behaviour
Inferring knowledge acquisition through Web navigation behaviourMOVING Project
 
ITI-CERTH participation in TRECVID 2018
ITI-CERTH participation in TRECVID 2018ITI-CERTH participation in TRECVID 2018
ITI-CERTH participation in TRECVID 2018MOVING Project
 
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln– Der MOOC "Science ...
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln– Der MOOC "Science ...Wissenschaft 2.0 und offene Forschungsmethoden vermitteln– Der MOOC "Science ...
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln– Der MOOC "Science ...MOVING Project
 
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln: Der MOOC Science 2...
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln: Der MOOC Science 2...Wissenschaft 2.0 und offene Forschungsmethoden vermitteln: Der MOOC Science 2...
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln: Der MOOC Science 2...MOVING Project
 
VERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval
VERGE: A Multimodal Interactive Search Engine for Video Browsing and RetrievalVERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval
VERGE: A Multimodal Interactive Search Engine for Video Browsing and RetrievalMOVING Project
 
Temporal Lecture Video Fragmentation using Word Embeddings
Temporal Lecture Video Fragmentation using Word EmbeddingsTemporal Lecture Video Fragmentation using Word Embeddings
Temporal Lecture Video Fragmentation using Word EmbeddingsMOVING Project
 
The Impact of Blocking and Name-Matching on Author Disambiguation.
The Impact of Blocking and Name-Matching on Author Disambiguation.The Impact of Blocking and Name-Matching on Author Disambiguation.
The Impact of Blocking and Name-Matching on Author Disambiguation.MOVING Project
 
Effective Unsupervised Author Disambiguation with Relative Frequencies
Effective Unsupervised Author Disambiguation with Relative FrequenciesEffective Unsupervised Author Disambiguation with Relative Frequencies
Effective Unsupervised Author Disambiguation with Relative FrequenciesMOVING Project
 
What to read next? Challenges and Preliminary Results in Selecting Represen...
What to read next? Challenges and  Preliminary Results in Selecting  Represen...What to read next? Challenges and  Preliminary Results in Selecting  Represen...
What to read next? Challenges and Preliminary Results in Selecting Represen...MOVING Project
 
Qualitative Analysis of Vocabulary Evolution on the Linked Open Data Cloud
Qualitative Analysis of Vocabulary Evolution on the Linked Open Data CloudQualitative Analysis of Vocabulary Evolution on the Linked Open Data Cloud
Qualitative Analysis of Vocabulary Evolution on the Linked Open Data CloudMOVING Project
 
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD CloudAnalyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD CloudMOVING Project
 
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...MOVING Project
 
Deep Multi-task Learning with Label Correlation Constraint for Video Concept ...
Deep Multi-task Learning with Label Correlation Constraint for Video Concept ...Deep Multi-task Learning with Label Correlation Constraint for Video Concept ...
Deep Multi-task Learning with Label Correlation Constraint for Video Concept ...MOVING Project
 
Generic to Specific Recognition Models for Membership Analysis in Group Videos
Generic to Specific Recognition Models for Membership Analysis in Group VideosGeneric to Specific Recognition Models for Membership Analysis in Group Videos
Generic to Specific Recognition Models for Membership Analysis in Group VideosMOVING Project
 
MOVING the Industry 4.0
MOVING the Industry 4.0MOVING the Industry 4.0
MOVING the Industry 4.0MOVING Project
 

More from MOVING Project (20)

Opening up education through digitization. Remarks on recent developments in ...
Opening up education through digitization. Remarks on recent developments in ...Opening up education through digitization. Remarks on recent developments in ...
Opening up education through digitization. Remarks on recent developments in ...
 
MOVING: Applying digital science methodology for TVET
MOVING: Applying digital science methodology for TVETMOVING: Applying digital science methodology for TVET
MOVING: Applying digital science methodology for TVET
 
Learning analytics for reflective learning
Learning analytics for reflective learningLearning analytics for reflective learning
Learning analytics for reflective learning
 
Challenges in Developing Automatic Learning Guidance in Relation to an Inform...
Challenges in Developing Automatic Learning Guidance in Relation to an Inform...Challenges in Developing Automatic Learning Guidance in Relation to an Inform...
Challenges in Developing Automatic Learning Guidance in Relation to an Inform...
 
Unesco mobileweek 2019_frontier_tech_oer-final
Unesco mobileweek 2019_frontier_tech_oer-finalUnesco mobileweek 2019_frontier_tech_oer-final
Unesco mobileweek 2019_frontier_tech_oer-final
 
Inferring knowledge acquisition through Web navigation behaviour
Inferring knowledge acquisition through Web navigation behaviourInferring knowledge acquisition through Web navigation behaviour
Inferring knowledge acquisition through Web navigation behaviour
 
ITI-CERTH participation in TRECVID 2018
ITI-CERTH participation in TRECVID 2018ITI-CERTH participation in TRECVID 2018
ITI-CERTH participation in TRECVID 2018
 
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln– Der MOOC "Science ...
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln– Der MOOC "Science ...Wissenschaft 2.0 und offene Forschungsmethoden vermitteln– Der MOOC "Science ...
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln– Der MOOC "Science ...
 
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln: Der MOOC Science 2...
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln: Der MOOC Science 2...Wissenschaft 2.0 und offene Forschungsmethoden vermitteln: Der MOOC Science 2...
Wissenschaft 2.0 und offene Forschungsmethoden vermitteln: Der MOOC Science 2...
 
VERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval
VERGE: A Multimodal Interactive Search Engine for Video Browsing and RetrievalVERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval
VERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval
 
Temporal Lecture Video Fragmentation using Word Embeddings
Temporal Lecture Video Fragmentation using Word EmbeddingsTemporal Lecture Video Fragmentation using Word Embeddings
Temporal Lecture Video Fragmentation using Word Embeddings
 
The Impact of Blocking and Name-Matching on Author Disambiguation.
The Impact of Blocking and Name-Matching on Author Disambiguation.The Impact of Blocking and Name-Matching on Author Disambiguation.
The Impact of Blocking and Name-Matching on Author Disambiguation.
 
Effective Unsupervised Author Disambiguation with Relative Frequencies
Effective Unsupervised Author Disambiguation with Relative FrequenciesEffective Unsupervised Author Disambiguation with Relative Frequencies
Effective Unsupervised Author Disambiguation with Relative Frequencies
 
What to read next? Challenges and Preliminary Results in Selecting Represen...
What to read next? Challenges and  Preliminary Results in Selecting  Represen...What to read next? Challenges and  Preliminary Results in Selecting  Represen...
What to read next? Challenges and Preliminary Results in Selecting Represen...
 
Qualitative Analysis of Vocabulary Evolution on the Linked Open Data Cloud
Qualitative Analysis of Vocabulary Evolution on the Linked Open Data CloudQualitative Analysis of Vocabulary Evolution on the Linked Open Data Cloud
Qualitative Analysis of Vocabulary Evolution on the Linked Open Data Cloud
 
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD CloudAnalyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud
Analyzing the Evolution of Vocabulary Terms and Their Impact on the LOD Cloud
 
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
Keeping Linked Open Data Caches Up-to-date by Predicting the Life-time of RDF...
 
Deep Multi-task Learning with Label Correlation Constraint for Video Concept ...
Deep Multi-task Learning with Label Correlation Constraint for Video Concept ...Deep Multi-task Learning with Label Correlation Constraint for Video Concept ...
Deep Multi-task Learning with Label Correlation Constraint for Video Concept ...
 
Generic to Specific Recognition Models for Membership Analysis in Group Videos
Generic to Specific Recognition Models for Membership Analysis in Group VideosGeneric to Specific Recognition Models for Membership Analysis in Group Videos
Generic to Specific Recognition Models for Membership Analysis in Group Videos
 
MOVING the Industry 4.0
MOVING the Industry 4.0MOVING the Industry 4.0
MOVING the Industry 4.0
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Twitter Profile Recommendations

  • 1. www.moving-project.eu TraininG towards a society of data-saVvy inforMation prOfessionals to enable open leadership INnovation Chifumi Nishioka and Ansgar Scherp Profiling vs. Time vs. Content: What does Matter for Top-k Publication Recommendation based on Twitter Profiles? Kiel University and Leibniz Information Centre for Economics (ZBW Kiel), Germany
  • 2. www.moving-project.eu 2 of 21 Motivation • Information overload: too many papers in DLs • Recommender systems: facilitate researchers by suggesting papers that may interest them • Collaborative filtering: cold start problem • Content-based: based on their papers • Social media in academia • Researchers’ ideas on Twitter [Letierce et al. 14] • Ongoing research interests Chifumi Nishioka (chni@informatik.uni-kiel.de) Recommender System for scientific papers based on Twitter
  • 3. www.moving-project.eu 3 of 21 Three Factors (I) Profiling method • Extract features from social media items and documents • How to model user profiles and document profiles (II) Temporal decay function • Model the assumption that older items are less important • Examine which temporal decay function performs well (III) Document content • Investigate whether it is possible to make reasonable recommendations using only titles of documents Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 4. www.moving-project.eu 4 of 21 Recommendation Procedure Chifumi Nishioka (chni@informatik.uni-kiel.de) User profiling Compute similarity scores between user profile and each of document profiles Document profile Recommend documents that have high similarity scores
  • 5. www.moving-project.eu 5 of 21 Related Factors Chifumi Nishioka (chni@informatik.uni-kiel.de) User profiling Compute similarity scores between user profile and each of document profiles Document profile document corpus Recommend documents that have high similarity scores
  • 6. www.moving-project.eu 6 of 21 Three Factors and Choices Chifumi Nishioka (chni@informatik.uni-kiel.de) • Configurations Factor Design Choices Profiling method CF-IDF HCF-IDF LDA Temporal decay function Sliding window Exponential decay Document content All (title + full-text) Title 3 × 2 × 2 = 12 strategies are experimented
  • 7. www.moving-project.eu 7 of 21 Profiles • Profiles: represented as a vector, where each element is a weight of a concept (i.e., feature) • User profile 𝑝 𝑢 for user 𝑢 • Based on a user’s social media items (tweets) i ∈ 𝐼 𝑢 • Document profile 𝑝 𝑑 for document d • User profile and document profile are made by the same method Chifumi Nishioka (chni@informatik.uni-kiel.de) 𝑝 𝑢 = {𝑤𝑒𝑖𝑔ℎ𝑡 𝑐, 𝐼 𝑢 |∀𝑐 ∈ 𝐶} 𝑤𝑒𝑖𝑔ℎ𝑡 𝑐, 𝐼 𝑢 = 𝑖∈𝐼 𝑢 𝑤𝑒𝑖𝑔ℎ𝑡(𝑐, 𝑖) 𝑝 𝑑 = {𝑤𝑒𝑖𝑔ℎ𝑡 𝑐, 𝑑 |∀𝑐 ∈ 𝐶} a concept 𝑐 ∈ 𝐶 (i.e., feature) is a subject term or topic
  • 8. www.moving-project.eu 8 of 21 Factor I: Profiling Methods • CF-IDF [Goossen et al. 11] • Extension of TF-IDF replacing words with concepts • Concept: a subject term coming from a taxonomy • e.g., financial crisis, interest rate (economics) • Use a domain specific taxonomy to extract only concepts that are relevant to the target domain Chifumi Nishioka (chni@informatik.uni-kiel.de) How to extract features and model users and documents 𝑤𝑒𝑖𝑔ℎ𝑡′ 𝑐𝑓𝑖𝑑𝑓 𝑐, 𝑑 = 𝑐𝑓(𝑐, 𝑑) ∙ 𝑙𝑜𝑔 |𝐷| |𝑑 ∈ 𝐷: 𝑐 ∈ 𝑑| 1. CF-IDF KB (taxonomy) based methods2. HCF-IDF 3. LDA Topic modeling Freely available in many domains!
  • 9. www.moving-project.eu 9 of 21 Factor I: Profiling Methods Chifumi Nishioka (chni@informatik.uni-kiel.de) Social Recommendation Social Tagging Web Searching Web Mining Site Wrapping Web Log Analysis World Wide Web • HCF-IDF (Hierarchical CF-IDF) [Nishioka et al. 15] • Extension of CF-IDF using spreading activation • Combine the strength of the semantics with the statistical strength • 𝐵𝑒𝑙𝑙𝐿𝑜𝑔 𝑐, 𝑑 : best spreading activation function 𝑤𝑒𝑖𝑔ℎ𝑡′ℎ𝑐𝑓𝑖𝑑𝑓 𝑐, 𝑑 = 𝐵𝑒𝑙𝑙𝐿𝑜𝑔(𝑐, 𝑑) ∙ 𝑙𝑜𝑔 |𝐷| |𝑑 ∈ 𝐷: 𝑐 ∈ 𝑑| Extract concepts which are not mentioned directly by spreading activation!
  • 10. www.moving-project.eu 10 of 21 Factor I: Profiling Methods • Latent Dirichlet Allocation (LDA) [Blei et al. 03] • Unsupervised topic modeling method • Topic model • Document: A probability distribution over topics • Topic: A probability distribution over words • Treat a topic as a concept • Procedure • Construct a topic model over document corpus • Infer a topic distribution over the trained topic model in social media items Chifumi Nishioka (chni@informatik.uni-kiel.de) 𝑤𝑒𝑖𝑔ℎ𝑡′𝑙𝑑𝑎 𝑐, 𝑑 = 𝑝(𝑐|𝑑)
  • 11. www.moving-project.eu 11 of 21 Factor II: Temporal Decay Function • Final weight is given after applying decay • Sliding window • Give weights only concepts that appear after 𝑡ℎ𝑟𝑒𝑠ℎ • Parameter setting • 𝑡ℎ𝑟𝑒𝑠ℎ = 250 𝑑𝑎𝑦𝑠 for social media items • 𝑡ℎ𝑟𝑒𝑠ℎ = 9.04 𝑦𝑒𝑎𝑟𝑠 for scientific papers • Exponential decay • Parameter setting • 𝜏 = 360 𝑑𝑎𝑦𝑠 for social media items • 𝜏 = 13.05 𝑦𝑒𝑎𝑟𝑠 for scientific publications Chifumi Nishioka (chni@informatik.uni-kiel.de) 𝑓𝑠𝑤 𝑡 = 1 for 𝑡 ≥ 𝑡ℎ𝑟𝑒𝑠ℎ 0 for 𝑡 < 𝑡ℎ𝑟𝑒𝑠ℎ 𝑓𝑒𝑥𝑝 𝑡 = 𝑒−(𝑡 𝑐𝑢𝑟𝑟𝑒𝑛𝑡−𝑡)/𝜏 𝑤𝑒𝑖𝑔ℎ𝑡 𝑐, 𝑖 = 𝑓 𝑡 ∙ 𝑤𝑒𝑖𝑔ℎ𝑡′ 𝑐, 𝑖 𝑤𝑒𝑖𝑔ℎ𝑡 𝑐, 𝑖 = 𝑓 𝑡 ∙ 𝑤𝑒𝑖𝑔ℎ𝑡′ 𝑐, 𝑖
  • 12. www.moving-project.eu 12 of 21 Factor III: Document Content • Title • Always freely available • All (title + full-text) • Full-text: usually not available due to legal issues • Extract full-text from PDF files Chifumi Nishioka (chni@informatik.uni-kiel.de) How the recommendation performance using only titles is close to using both title and full-text
  • 13. www.moving-project.eu 13 of 21 Computing Recommendations • Temporal Cosine Similarity for CF-IDF & HCF-IDF • 𝑓(𝑡 𝑑) to give higher weights to newer documents • Dot Product for LDA • Better performance than cosine similarity and Kullback-Leibler divergence [Hazen 10] Chifumi Nishioka (chni@informatik.uni-kiel.de) 𝑠𝑖𝑚 𝑡𝑐𝑜𝑠 𝑝 𝑢, 𝑝 𝑑 = 𝑓(𝑡 𝑑) ∙ 𝑝 𝑢 ∙ 𝑝 𝑑 ||𝑝 𝑢|| ∙ | 𝑝 𝑑 | 𝑠𝑖𝑚 𝑑𝑝 𝑝 𝑢, 𝑝 𝑑 = 𝑝 𝑢 ∙ 𝑝 𝑑 Recommend documents that have higher similarity scores with user profile
  • 14. www.moving-project.eu 14 of 21 Experiment Setup (1/3) • Procedure • Input his/her public Twitter handles • The number of recommendations per strategy 𝑘 = 5 [Chen et al. 10] • Ask participants to assess whether a recommended paper is interesting or not • On average 517.54 seconds to complete the experiment Chifumi Nishioka (chni@informatik.uni-kiel.de) Scenario: Recommend scientific papers in the field of economics based on users’ tweets
  • 15. www.moving-project.eu 15 of 21 Experiment Setup (2/3) • Web application • Metadata (i.e., author, title, year) is shown • Participants can open PDF files by clicking metadata • Order of strategies is randomized • Order of recommended papers is randomized Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 16. www.moving-project.eu 16 of 21 Experiment Setup (3/3) • Dataset • Scientific Papers • 279,381 open access papers from EconBiz • EconBiz: a portal for scientific papers in economics managed by ZBW, the German National Library of Economics • Hierarchical taxonomy • STW, a thesaurus specialized for economics • 3,335 semantic concepts and 37,733 labels • 123 participants from the field of economics • 21 bachelor /58 master /32 PhD /12 professor • Metric: rankscore [Breese et al. 98] • Assumption: higher ranked items are more likely to be viewed Chifumi Nishioka (chni@informatik.uni-kiel.de) http://www.econbiz.de/
  • 17. www.moving-project.eu 17 of 21 Result: Recommendation Strategy • The strategy CF-IDF × Sliding window × All performs best with the rankscore of 0.59 • But, the statistical test shows no significant difference between the best strategy and the strategies using HCF-IDF Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 18. www.moving-project.eu 18 of 21 Result: Influence of Three Factors • Three-way repeated-measure ANOVA to analyze the performance with respect to each factor • Profiling method has the largest impact on the recommendation performance • Best profiling method: HCF-IDF Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 19. www.moving-project.eu 19 of 21 Insights from the Results • Profiling method has the largest impact • Best profiling method: HCF-IDF • Spreading activation mitigates sparseness • Works for users who have less tweets • Usually, full-texts are unavailable for TDM • Easy to employ HCF-IDF in many different fields • MeSH for medicine, ACM CCS for computer science Chifumi Nishioka (chni@informatik.uni-kiel.de) Advantage: HCF-IDF can make good recommendations based on only titles
  • 20. www.moving-project.eu 20 of 21 Conclusion • User experiment with three factors • Profiling method: CF-IDF / HCF-IDF / LDA • Decay function: sliding window / exponential decay • Document content: Title / All (title + full-text) • Result • Profiling method has the largest impact • Best profiling method: HCF-IDF • Advantage of HCF-IDF: It performs well even if only titles are available Chifumi Nishioka (chni@informatik.uni-kiel.de) Recommender system for scientific papers based on social media items
  • 21. www.moving-project.eu 21 of 21 Special thanks to SIGIR Student Travel Grant Project consortium and funding agency Chifumi Nishioka (chni@informatik.uni-kiel.de) MOVING is funded by the EU Horizon 2020 Programme under the project number INSO-4-2015: 693092 Our demo is online! http://amygdala.informatik.uni-kiel.de/Demo/TwitterAccount
  • 22. www.moving-project.eu 22 of 21 Appendix Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 23. www.moving-project.eu 23 of 21 Reference • [Blei and Lafferty 06] D. M. Blei and J. D. Lafferty. Dynamic topic models. ICML, 2006. • [Blei et al. 03] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 2003. • [Breese et al. 98] J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. UAI, 1998. • [Goossen et al. 11] F. Goossen, W. IJntema, F. Frasincar, F. Hogenboom, and U. Kaymak. News personalization using the CF-IDF semantic recommender. WIMS, 2011. • [Griffiths and Steyvers 04] T. L. Griffiths and M. Steyvers. Finding scientific topics. NAS, 2004. • [Hazen 10] T. J. Hazen. Direct and latent modeling techniques for computing spoken document similarity. Spoken Language Technology, 2010. • [Kapanipathi et al. 14] P. Kapanipathi, P. Jain, C. Venkataramani, and A. Sheth. User interests identification on Twitter using a hierarchical knowledge base. ESWC, 2014. • [Letierce et al. 10] J. Letierce, A. Passant, J. Breslin, and S. Decker. Understanding how Twitter is used to spread scientific messages. WebSci, 2010. • [Nascimento et al. 11] C. Nascimento, A. H. F. Laender, A. S. da Silva, and M. A. Gonçalves. A source independent framework for research paper recommendation. JCDL, 2011. Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 24. www.moving-project.eu 24 of 21 Reference • [Nishioka et al. 15] C. Nishioka, G. Große-Bölting, A. Scherp. Influence of time on user profiling and recommending researchers in social media. i-KNOW, 2015. • [Shen et al. 13] W. Shen, J. Wang, P. Luo, and M. Wang. Linking named entities in tweets with knowledge base via user interest modeling. KDD, 2013 • [Sugiyama and Kan 10] K. Sugiyama and M.-Y. Kan. Scholarly paper recommendation via user’s recent research interests. JCDL, 2010. Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 25. www.moving-project.eu 25 of 21 Evaluation Metrics • rankscore [Breese et al. 98] • Posit that each successive item in a list is less likely to be viewed with an exponential decay • Set a parameter 𝜃 = 5, along with [Breese et al. 98] • Other metrics • Precision • Mean Average Precision (MAP) • Mean Reciprocal Rank (MRR) • normalized Discounter Cumulative Gain (nDCG) Chifumi Nishioka (chni@informatik.uni-kiel.de) 𝑟𝑎𝑛𝑘𝑠𝑐𝑜𝑟𝑒 = 𝑑∈ℎ𝑖𝑡𝑠 1 2 𝑟𝑎𝑛𝑘 𝑑−1 𝜃−1 𝑤𝑒𝑖𝑔ℎ𝑡 𝑐, 𝑖
  • 26. www.moving-project.eu 26 of 21 Participants • Collecting participants • Mailing lists, tweets, and word-of-month • 134 started the experiment and 123 completed • Demographics of participants • 96 male / 27 female • 32.83 years old on average (SD: 7.34) • 21 bachelor / 58 master / 32 a PhD / 12 professor • 83 working in academia / 40 working in industry • Incentive for the participation • Get to know his / her most similar economist among 26 famous economists • Chance to get one of two Amazon vouchers (50 EUR) Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 27. www.moving-project.eu 27 of 21 Tweets of Participants • Extract user’s tweets via Twitter API • Enable to extract at most 3,200 tweets per user • Collect only tweets in English • The number of English tweets per participant • Average: 1096.82 English tweets (SD: 1048.46) • Max: 3192 • Min: 2 • Criteria for the participation • Participants who had no tweet in the last 250 days could not participate in the experiment • Five were rejected for this reason Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 28. www.moving-project.eu 28 of 21 Dataset: Scientific Publications • Result of collaboration with EconBiz • 279,381 papers in English • EconBiz: a portal for scientific publications om economics • Procedure • Seed list: 1 million URLs of open access papers • Successfully download 413,098 papers in PDF • Convert PDFs into texts using Apache PDFBox • Detect languages of 413,098 papers • Finally, get 279,381 papers in English Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 29. www.moving-project.eu 29 of 21 Dataset: Taxonomy • STW (ver. 8.12) enriched by DBpedia redirects • Maintained by ZBW • Specialized for economics • 6,335 semantic concepts and 11,679 labels in English • Enrichment process • Goal: get more synonymous labels • Use the official mapping that connects STW concepts with DBpedia concepts • Get redirects from Dbpedia concepts • e.g., “Telecommunications Operator” and “Telephone companies” • Finally, 6,335 semantic concepts and 37,733 labels Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 30. www.moving-project.eu 30 of 21 Latent Dirichlet Allocation (LDA) Chifumi Nishioka (chni@informatik.uni-kiel.de) source: D. M. Blei. Probabilistic topic models, CACM, 2012.
  • 31. www.moving-project.eu 31 of 21 Latent Dirichlet Allocation (LDA) • Implementation: JGibbLDA • Preprocessing • Lemmatization and stop words removal • Remove words that appear in fewer than 25 scientific publications along with [Blei and Lafferty 06] • Parameters • Hyper-parameters: 𝛼 = 0.5 and 𝛽 = 0.1 • suggested by [Griffiths and Steyvers 04] • The number of topics: 𝐾 = 100 • Optimized by the log likelihood • Experimented 𝐾 = 20, 50, 100, 200, 500, 1000, 5000 • The number of iterations: 500 Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 32. www.moving-project.eu 32 of 21 Statistical Test for Strategies (1/2) • Mauchly’s sphericity test • Verify if the variances of the rankscores of the twelve strategies are equal • Reveal a violation of sphericity in the strategies (𝜒2 65 = 435.90, 𝑝 = .00), which leads to positively biased F-statistics and increases false positives • One-way repeated-measure ANOVA with a Greenhouse-Geisser correction of 𝜖 = .61 • Reveals a significant difference (𝐹 6.60, 805.33 = 21.98, 𝑝 = .00) Chifumi Nishioka (chni@informatik.uni-kiel.de) Investigate the difference of the recommendation performance among the twelve strategies
  • 33. www.moving-project.eu 33 of 21 Statistical Test for Strategies (2/2) • Shaffer’s modified sequentially rejective Bonferroni procedure (Shaffer’s MSRB procedure) Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 34. www.moving-project.eu 34 of 21 Statistical Test for Three Factors • Mendoza’s sphericity test • Adopt to multi-way repeated-measure ANOVA • Again, reveal a violation of sphericity • Three-way repeated-measure ANOVA with a Greenhouse-Geisser corrections Chifumi Nishioka (chni@informatik.uni-kiel.de) Investigate the difference of the recommendation performance among the twelve strategies Global 𝜒2 65 = 435.90, 𝑝 = .00 Profiling Method 𝜒2 2 = 12.21, 𝑝 = .00 Profiling Method × Decay Function 𝜒2 2 = 20.02, 𝑝 = .00 Profiling Method × Document Content (𝜒2 2 = 8.61, 𝑝 = .01
  • 35. www.moving-project.eu 35 of 21 Result: Precision • Values are similar to ones in rankscore • The order of the strategies are identical Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 36. www.moving-project.eu 36 of 21 Result: nDCG • Values are similar to ones in rankscore • The order of the strategies are identical Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 37. www.moving-project.eu 37 of 21 Result: Mean Average Precision • Values are higher than ones of rankscore • The order of the strategies are almost same Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 38. www.moving-project.eu 38 of 21 Result: Mean Reciprocal Rank • Values are higher than ones of rankscore • The order of the strategies are almost same Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 39. www.moving-project.eu 39 of 21 Result: Post-hoc Analysis Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 40. www.moving-project.eu 40 of 21 Result: Influence of Three Factors • Profiling Method • This factor has the biggest impact • HCF-IDF is the best profiling method, followed by CF- IDF and LDA • Document Content • All (both title and full-text) significantly performs better than Title except when using HCF-IDF • Profiling Method × Document Content • All (both title and full-text) is better choice for CF-IDF • Document Content makes no difference for HCF-IDF Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 41. www.moving-project.eu 41 of 21 Result: Demographic Factors (1/3) • Gender • Female participants are more likely to evaluate recommendations as interesting • However, no difference about how each strategy performs compared to the others, due to no significant difference in the factor gender × strategy • Age • No influence on recommendation performance Chifumi Nishioka (chni@informatik.uni-kiel.de) Investigate whether each demographic factor has an influence on recommendation performance
  • 42. www.moving-project.eu 42 of 21 Result: Demographic Factors (2/3) • Highest academic degree • Participants with bachelor are more likely to evaluate recommendations as interesting than those with lecturer/professor • But, no difference about how each strategy performs compared to the others • Major • Manually classify participants into the two groups • Participants majoring in economics (𝑛 = 92) and other • No influence on recommendation performance Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 43. www.moving-project.eu 43 of 21 Result: Demographic Factors (3/3) • Years of profession • On average, working in 7.85 years (SD: 6.85) • Divide participants into the three groups • Participants working for ~5 years (𝑛 = 44), working for 5~10 years (𝑛 = 34), and working for 10~ years • No influence on recommendation performance • Employment type • Participants working in academia (𝑛 = 83) and industry • No influence on recommendation performance Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 44. www.moving-project.eu 44 of 21 Result: Click Rates (1/2) Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 45. www.moving-project.eu 45 of 21 Result: Click Rates (2/2) Chifumi Nishioka (chni@informatik.uni-kiel.de)
  • 46. www.moving-project.eu 46 of 21 List of Taxonomies • Maintained by W3 • https://www.w3.org/2001/sw/wiki/SKOS/Datasets Chifumi Nishioka (chni@informatik.uni-kiel.de) source: https://www.w3.org/2001/sw/wiki/SKOS/Datasets Taxonomy Domain Thesaurus for the Social Sciences Social science NASA Taxonomy Technology areas Linked Life data Biomedicine Medical Subject Headings (MeSH) Biomedicine Australian education vocablaries Education UNESCO Thesaurus Education, culture, natural sciences, and social and human sciences ACM Computing Classification System Computer science
  • 47. www.moving-project.eu 47 of 21 Insights from the Results • Profiling method has the largest impact • Best profiling method: HCF-IDF • Only HCF-IDF enables to make reasonable recommendations based on only titles • CF-IDF requires full-texts • LDA perform poorly even with full-texts • Possible reason: impossible to infer topic distribution from social media items, which are short and sparse • Usually, full-texts are not available for legal issues • Easy to employ HCF-IDF in many different fields • e.g., MeSH for medicine, ACM CCS for compute science Chifumi Nishioka (chni@informatik.uni-kiel.de)