Submit Search
Upload
LITM
•
0 likes
•
303 views
jins0618
Follow
some links I collected once for the incoming freshmen of our team.
Read less
Read more
Data & Analytics
Report
Share
Report
Share
1 of 9
Recommended
An Attempt to Automate the Process of Source Evaluation
An Attempt to Automate the Process of Source Evaluation
IDES Editor
Macran
Macran
Pradip Rahul
CapitalRoad: programs & approach (may 2011)
CapitalRoad: programs & approach (may 2011)
Tom Ogaranko
La droga
La droga
melindaipiales17
Série ar livre nos e amarras
Série ar livre nos e amarras
lucian vieira vieira
Wali solat diatas air
Wali solat diatas air
Amir Aminuddin
Fatima Rice
Fatima Rice
Jalal ud din Ahmad
Libraries in the teeth of change
Libraries in the teeth of change
Bryan Alexander
Recommended
An Attempt to Automate the Process of Source Evaluation
An Attempt to Automate the Process of Source Evaluation
IDES Editor
Macran
Macran
Pradip Rahul
CapitalRoad: programs & approach (may 2011)
CapitalRoad: programs & approach (may 2011)
Tom Ogaranko
La droga
La droga
melindaipiales17
Série ar livre nos e amarras
Série ar livre nos e amarras
lucian vieira vieira
Wali solat diatas air
Wali solat diatas air
Amir Aminuddin
Fatima Rice
Fatima Rice
Jalal ud din Ahmad
Libraries in the teeth of change
Libraries in the teeth of change
Bryan Alexander
PARQUE DEL RETIRO 2016
PARQUE DEL RETIRO 2016
patricia.aguilar
Monthly Newsletter 01/2016
Monthly Newsletter 01/2016
Latvijas Banka
PhD Transcript
PhD Transcript
Ramin Vaghei
Tonya shirelle | Life Coaching And Personal Coaching
Tonya shirelle | Life Coaching And Personal Coaching
tonyashirelle
Confidentiality
Confidentiality
DeniseMHA
Drug discovery process style 6 powerpoint presentation templates
Drug discovery process style 6 powerpoint presentation templates
SlideTeam.net
Wattle Grove Primary School - Class Parent Information Evening 2016
Wattle Grove Primary School - Class Parent Information Evening 2016
Stuart Meachem
Wattle Grove Primary School - New Demountable for 2016
Wattle Grove Primary School - New Demountable for 2016
Stuart Meachem
17 syllabus statements
17 syllabus statements
cartlidge
GIT HORMONES
GIT HORMONES
Dr Nilesh Kate
"Как вырастить ответственность сотрудников - Что надо написать в инструкцию, ...
"Как вырастить ответственность сотрудников - Что надо написать в инструкцию, ...
awgua
K1803057782
K1803057782
IOSR Journals
Incremental Page Rank Computation on Evolving Graphs : NOTES
Incremental Page Rank Computation on Evolving Graphs : NOTES
Subhajit Sahu
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEEFINALYEARSTUDENTPROJECTS
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
IEEEMEMTECHSTUDENTSPROJECTS
Topic-specific Web Crawler using Probability Method
Topic-specific Web Crawler using Probability Method
IOSR Journals
IRJET- Page Ranking Algorithms – A Comparison
IRJET- Page Ranking Algorithms – A Comparison
IRJET Journal
Aggregate rank bringing order to web sites
Aggregate rank bringing order to web sites
OUM SAOKOSAL
Efficient focused web crawling approach
Efficient focused web crawling approach
Syed Islam
JPJ1423 Keyword Query Routing
JPJ1423 Keyword Query Routing
chennaijp
keyword query routing
keyword query routing
swathi78
407 409
407 409
Editor IJARCET
More Related Content
Viewers also liked
PARQUE DEL RETIRO 2016
PARQUE DEL RETIRO 2016
patricia.aguilar
Monthly Newsletter 01/2016
Monthly Newsletter 01/2016
Latvijas Banka
PhD Transcript
PhD Transcript
Ramin Vaghei
Tonya shirelle | Life Coaching And Personal Coaching
Tonya shirelle | Life Coaching And Personal Coaching
tonyashirelle
Confidentiality
Confidentiality
DeniseMHA
Drug discovery process style 6 powerpoint presentation templates
Drug discovery process style 6 powerpoint presentation templates
SlideTeam.net
Wattle Grove Primary School - Class Parent Information Evening 2016
Wattle Grove Primary School - Class Parent Information Evening 2016
Stuart Meachem
Wattle Grove Primary School - New Demountable for 2016
Wattle Grove Primary School - New Demountable for 2016
Stuart Meachem
17 syllabus statements
17 syllabus statements
cartlidge
GIT HORMONES
GIT HORMONES
Dr Nilesh Kate
"Как вырастить ответственность сотрудников - Что надо написать в инструкцию, ...
"Как вырастить ответственность сотрудников - Что надо написать в инструкцию, ...
awgua
Viewers also liked
(11)
PARQUE DEL RETIRO 2016
PARQUE DEL RETIRO 2016
Monthly Newsletter 01/2016
Monthly Newsletter 01/2016
PhD Transcript
PhD Transcript
Tonya shirelle | Life Coaching And Personal Coaching
Tonya shirelle | Life Coaching And Personal Coaching
Confidentiality
Confidentiality
Drug discovery process style 6 powerpoint presentation templates
Drug discovery process style 6 powerpoint presentation templates
Wattle Grove Primary School - Class Parent Information Evening 2016
Wattle Grove Primary School - Class Parent Information Evening 2016
Wattle Grove Primary School - New Demountable for 2016
Wattle Grove Primary School - New Demountable for 2016
17 syllabus statements
17 syllabus statements
GIT HORMONES
GIT HORMONES
"Как вырастить ответственность сотрудников - Что надо написать в инструкцию, ...
"Как вырастить ответственность сотрудников - Что надо написать в инструкцию, ...
Similar to LITM
K1803057782
K1803057782
IOSR Journals
Incremental Page Rank Computation on Evolving Graphs : NOTES
Incremental Page Rank Computation on Evolving Graphs : NOTES
Subhajit Sahu
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEEFINALYEARSTUDENTPROJECTS
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
IEEEMEMTECHSTUDENTSPROJECTS
Topic-specific Web Crawler using Probability Method
Topic-specific Web Crawler using Probability Method
IOSR Journals
IRJET- Page Ranking Algorithms – A Comparison
IRJET- Page Ranking Algorithms – A Comparison
IRJET Journal
Aggregate rank bringing order to web sites
Aggregate rank bringing order to web sites
OUM SAOKOSAL
Efficient focused web crawling approach
Efficient focused web crawling approach
Syed Islam
JPJ1423 Keyword Query Routing
JPJ1423 Keyword Query Routing
chennaijp
keyword query routing
keyword query routing
swathi78
407 409
407 409
Editor IJARCET
Webpage classification and Features
Webpage classification and Features
Higher Education Department KPK, Pakistan
Keyword query routing
Keyword query routing
Shakas Technologies
PageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey report
IOSR Journals
SEMANTICS GRAPH MINING FOR TOPIC DISCOVERY AND WORD ASSOCIATIONS
SEMANTICS GRAPH MINING FOR TOPIC DISCOVERY AND WORD ASSOCIATIONS
IJDKP
Data Mining Module 5 Business Analytics.pdf
Data Mining Module 5 Business Analytics.pdf
Jayanti Pande
A Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTES
Subhajit Sahu
Keyword query routing
Keyword query routing
Shakas Technologies
Scalable Keyword Cover Search using Keyword NNE and Inverted Indexing
Scalable Keyword Cover Search using Keyword NNE and Inverted Indexing
IRJET Journal
At33264269
At33264269
IJERA Editor
Similar to LITM
(20)
K1803057782
K1803057782
Incremental Page Rank Computation on Evolving Graphs : NOTES
Incremental Page Rank Computation on Evolving Graphs : NOTES
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
Topic-specific Web Crawler using Probability Method
Topic-specific Web Crawler using Probability Method
IRJET- Page Ranking Algorithms – A Comparison
IRJET- Page Ranking Algorithms – A Comparison
Aggregate rank bringing order to web sites
Aggregate rank bringing order to web sites
Efficient focused web crawling approach
Efficient focused web crawling approach
JPJ1423 Keyword Query Routing
JPJ1423 Keyword Query Routing
keyword query routing
keyword query routing
407 409
407 409
Webpage classification and Features
Webpage classification and Features
Keyword query routing
Keyword query routing
PageRank algorithm and its variations: A Survey report
PageRank algorithm and its variations: A Survey report
SEMANTICS GRAPH MINING FOR TOPIC DISCOVERY AND WORD ASSOCIATIONS
SEMANTICS GRAPH MINING FOR TOPIC DISCOVERY AND WORD ASSOCIATIONS
Data Mining Module 5 Business Analytics.pdf
Data Mining Module 5 Business Analytics.pdf
A Generalization of the PageRank Algorithm : NOTES
A Generalization of the PageRank Algorithm : NOTES
Keyword query routing
Keyword query routing
Scalable Keyword Cover Search using Keyword NNE and Inverted Indexing
Scalable Keyword Cover Search using Keyword NNE and Inverted Indexing
At33264269
At33264269
More from jins0618
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
jins0618
Latent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite Networks
jins0618
Web Service QoS Prediction Approach in Mobile Internet Environments
Web Service QoS Prediction Approach in Mobile Internet Environments
jins0618
吕潇 星环科技大数据技术探索与应用实践
吕潇 星环科技大数据技术探索与应用实践
jins0618
李战怀 大数据环境下数据存储与管理的研究
李战怀 大数据环境下数据存储与管理的研究
jins0618
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline
jins0618
Christian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big data
jins0618
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processing
jins0618
Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...
jins0618
Ling liu part 02:big graph processing
Ling liu part 02:big graph processing
jins0618
Ling liu part 01:big graph processing
Ling liu part 01:big graph processing
jins0618
Wang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configuration
jins0618
Wang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under threshold
jins0618
2015 07-tuto2-clus type
2015 07-tuto2-clus type
jins0618
2015 07-tuto1-phrase mining
2015 07-tuto1-phrase mining
jins0618
2015 07-tuto3-mining hin
2015 07-tuto3-mining hin
jins0618
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline
jins0618
Weiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysis
jins0618
Ke yi small summaries for big data
Ke yi small summaries for big data
jins0618
Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...
jins0618
More from jins0618
(20)
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Latent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite Networks
Web Service QoS Prediction Approach in Mobile Internet Environments
Web Service QoS Prediction Approach in Mobile Internet Environments
吕潇 星环科技大数据技术探索与应用实践
吕潇 星环科技大数据技术探索与应用实践
李战怀 大数据环境下数据存储与管理的研究
李战怀 大数据环境下数据存储与管理的研究
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline
Christian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big data
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processing
Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...
Ling liu part 02:big graph processing
Ling liu part 02:big graph processing
Ling liu part 01:big graph processing
Ling liu part 01:big graph processing
Wang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configuration
Wang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under threshold
2015 07-tuto2-clus type
2015 07-tuto2-clus type
2015 07-tuto1-phrase mining
2015 07-tuto1-phrase mining
2015 07-tuto3-mining hin
2015 07-tuto3-mining hin
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline
Weiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysis
Ke yi small summaries for big data
Ke yi small summaries for big data
Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...
Recently uploaded
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
Neil Barnes
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
shivangimorya083
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
YohFuh
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
soniya singh
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
Emmanuel Dauda
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
jennyeacort
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
Suhani Kapoor
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
Boston Institute of Analytics
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
shivangimorya083
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
Suhani Kapoor
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Rachmat Ramadhan H
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
Sapana Sha
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
Spark3's new memory model/management
Spark3's new memory model/management
akshesh doshi
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
FurkanTasci3
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
soniya singh
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Social Samosa
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
thyngster
Recently uploaded
(20)
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Spark3's new memory model/management
Spark3's new memory model/management
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
LITM
1.
作者在2002年在WWW上发了一篇《Topic-Sensitive PageRank 》,又在2003年TKDE上发一篇《Topic-sensitive PageRank: a context-sensitive ranking algorithm for Web search》 《Topic-Sensitive PageRank》,说的是基于随机游走的PersonalRank算法。(《推荐系统实践》P74页) allow the query to influence the link-based score we compute offline a set of PageRank vectors, each biased with a different topic, to create for each page a set of importance scores with respect to particular topics. Pages considered important in sme subject domains may not be considered important in others, regardless of what keywords may appear either in the page or in anchor text reffering to the page. 《推荐系统实践》P73-74讨论了如何度量二分图中两个顶点之间的相关性,并给出了几个影响因素,以及基于这几个因素所提出的算法,见2007年的一片 TKDE 《Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation》 Then instead of using a single global ranking vector, we take the linear combination of the topic-sensitive vectors, weighted using the similarities of the query (and any available context) to the topics. By using a set of rank vectors, we are able to determine more accurately which pages are truly the most important with respect to a particular query or query-context. 将权重用一定的方法归一化到[0,1]之间的数值,然后最小化误差平方和。 可以基于PageRank方法得到每个用户和电影的影响力,影响力是不对称的。 现实反馈与隐式反馈,进而有正负样本的概念,见《推荐系统实践》P67页,以及P68页的公式。对于如何生成负样本,见论文《One-Class Collaborative Filtering》 《Bipartite Graph for Topic Extraction》 目的:Our aim is to investigate and develop techniques that combine the expressive network representation made possible by the complex networks theory with data streaming techniques for dealing with problem of topic extraction. Introduction中有个二分图的描述及物理映射描述,可以拿来(本文是带权二分图)。 加入新的节点:Introduction末尾有一句话“The bipartite graph structure can be easily adjusted with the insertion of new vertices. Moreover, our propagation method can be parallelizable and work in a dynamic context of stream of documents.”可以拿来作为可并行化和流式计算的 分析,末尾部分也有讲到。 Introduction中有一段话“Our proposed method is based on exploration of an effective unsupervised learning ...... However, unlikely tradictional label propagation techniques, ....”说的是方法基础,并有创新性分析. Dimensionality reduction can be considered a subtype of clustering;these include a well-known technique LSA based on SVD. 出了可以看成 subtype of clustering,还可以看成是 subtype of topic mining. 提到矩阵方法(如SVD,LSA等等)的drawbacks,such as expensive storage requirements and computing time. 然后提到pLSA,由其缺陷进而 提到LDA,然后讲LDA的缺陷(可以直接拿来用):LDA based models have a rigorous mathematical treatment of decomposed op- erations that discover the latent groups (topics). From the practitioner’s perspective, creating a new model and deriving it to an effective and
2.
implementable inference algorithm are hard and tiresome tasks [Rajesh et al., 2014]. Moreover, the mathematical rigour hampers a rapid exploration of new as- sumptions, heuristics, or adaptations that could be useful in many real scenarios. 进而提到用二分图的好处。 NBI(network-based inference) 《How does label propagation algorithm work in bipartite networks》 主要用于社区划分(community detection of networks),算法的结果是:相同标签的为一个community。 二分图描述:A bipartite network is a special and important class of networks, where nodes can be divided into two disjoint sets,such that no two nodes within the same set are adjacent (Fig.1). Examples of such networks are actor-movie networks,author-paper networks, etc. 分别优化的描述:In synchronous updating [1], node x at the t-th iteration updates its label based on the labels of its neighbors at the (t-1)- th iteration. 可以只初始化二分图的一边:In light of this, at the start, rather than assigning labels to every node as what the standard LPA does, we only need to assign each red (blue) node with a unique label and keep blue (red) nodes unlabeled. 并行化: 一处是 we propagate labels from red (blue) nodes to blue (red) nodes, the relabeling process for different blue (red) nodes is independent of each other. 另一处是 Parallelism is very important if the network is extremely large, or time is demanding, such as the case of real time community detection for some online social networks。 提到了几个二分图的数据集,其中可以明显用于主题的有:一个是 A network representing the authorship relations between authors and papers on the condensed matter archive at 另一个是 actor-movie network ()[8], with each edge describing player X plays in movie Y. 《Bipartite network projection and personal recommendation》 两种网络:Two kinds of bipartite networks are important because of their particular significance in social, economic, and information systems. "Collaboration network" 和 "opinion network". 用的二步路径:"w_ij sums the contribution from all two-step paths between x_i and x_j"(公式(6)后面一句) 节点间不对称的依赖性度量(但是,有个缺陷,公式(7)的分母只统计了相邻节点的数目,即该节点的度,并没有考虑不同的相邻节点是不同的),以及 单一节点的独立性度量(公式(8),不过并没有说为什么是平方和,而不是直接加和等)。 用平均推荐位置,以及hitting rate作为试验对比参数。 连线的叫法:for each incident entriy a->b Conclusion and Discussion 部分的复杂度分析可以借用。 《Inductive Model Generation for Text Classification Using a Bipartite Heterogeneous Network》 包括ICDM(2012)和JCST(2014)两个版本 文本text,文档document和单词term的关系:texts are represented by a document-term matrix. heterogeneous network的定义 向量的描述:creating a weight vector for each network object, in which each position of the vector corresponds to a category of the data set. The weight vector for objects in which there is no information is calculated during the learning process by propagating information from labeled to unlabeled vertices.The weight of the edges could also be considered to improve the learning.
3.
算法收敛点描述:Step 2 and Step 3 are repeated for every document until a stopping criterion is reached. We adopted as stopping criterion the maximum number of epochs and a minimum mean squared error, i.e., when the mean squared error of an epoch is less than a small given value. 新来者:The algorithm induces weights to objects that represents terms of the collection, which indicates the influence of these terms in the definition of the classes of the documents. After obtaining the weights of the terms for each class, this information is used as a model to classify unseen documents. 《LFM与矩阵分解》讲人来给物品分类的缺陷,为什么不从数据出发,自动地找到那些类(《推荐系统实践》P65-66) 大部分是从书的内容出发,而不是从书的读者群出发 ,不能代表各种用户的意见 很难控制分类的粒度。 很难给出一个物品多个分类,而有的书可能属于很多的类。 很难给出多维度的分类 。比如,按照作者、译者、出版社等维度进行分类。 难决定一个物品在某一个分类中的权重。 项亮的《Temporal Recommendation》 第2.3.3节 :Hoffman在文献 [《Latent class models for collaborative filtering》]提出了隐语义模型 (Latent ClassModel),该模型用隐类 (Latent Class)将用户和物品联系起来,它认为用户并不 是直接对物品产生兴趣,而是用户对几个类别有兴趣,而物品属于不同的类 别,因此这个模型会通过用 户行为数据学习出这些类别,以及用户对类别的兴 趣。在 Latent Class Model的基础上,后来很多研究人员提出了矩阵分解模型, 也被称为 Latent Factor Model[《Modeling relationships at multiple scales to improve accuracy of large recommender systems》]。基于矩阵分解的模型有很 多种 《Modeling relationships at multiple scales to improve accuracy of large recommender systems》提出LFM 不用填补数据就可以进行:propose a method that avoids the need for a gauge set or for imputation, by working directly on the sparse set of known ratings. LFM模型的公式符号说明:Here, p_u is the u-th row of P, which corresponds to user u. Likewise, q_i is the i-th row of Q, which corresponds to item i. Similar to Roweis' method, we could alternate between fixing Q and P, thereby obtaining a series of efficiently solvable least squares problems without requiring impu....Each update of Q or P decreases Err(P,Q), so the process must converge. 讨论到了number of latent factors 的厉害关系:number大,更灵活,会有过拟合风险;number 小, 目标函数会比较大。切了克服这个问题,提到 根据前f-1个因子计算第f个因子的算法。(虽然有矛盾,但还是容忍了。This is an undesirable situatioin, as we want to benefit by increasing the number of factors, thereby explaining more latent aspects of the data. However, we find that we can still treat only the known entries, but accompany the process with shrinkage to alleviate the overfitting problem.)
4.
内积公式的说明,以及模型的线上线下算法的优势。way, each rating rui is estimated as the inner product of the f factors that we learned for u and i, that is pT rating rui is estimated as the inner product of the f factors that we learned for u and i, that is pT u pT u qi.major advantage of sucha regional, factorization-based approach is its computational effi-ciency. The computational burden lies in an offline, preprocess- ing step where all factors are computed. The actual, online ratingprediction is done instantaneously by taking the inner product oftwo length-f vectors. Moreover, since the factors are computedby an iterative algorithm, it is easy to adapt them to changes inthe data such as addition of new ratings, users, or items. 《Towards Explaining Latent Factors with Topic Models in Collaborative Recommender Systems》 LFM的缺点,缺乏解释性(摘要部分)(应用的 《 》)。Latent factor models have been proved to be the state of the art for the Collaborative Filtering approach in a Recommender System. However, latent factors obtained with mathematical methods applied to the user-item matrix can be hardly interpreted by humans. (Introduction部分):Futher more, it is hard to explain users how a specific recommendation has been derived and why it matches to their presumed preferences when following a white-box explanations strategy, i.e. their interpretability is rather low as opposed to explicit knowledge-representation formalisms usch as constraints or logic(本文说的是两 个方面,一个是portability(能够适用于新用户,can be applied to new users(i.e. users without or with only few known ratings)),一个是 interpretability(can be exploited for explaining their recommendations)). Topic model的作用:The main purpose of these algorithms is the analysis of words in natural language texts in order to discover themes represented by sorted lists of words. 然后讲了LDA的basic idea。 Recommender Systems的作用:The general idea behind those systems is to exploit information about users, items and relationships between them such as rating or purchasing actions in order to identify additional serendipitous matches, i.e. pointing users to items they would otherwise not have found. Latent Factor models的basic idea:these models factorize the user-item matrix containing the ratings into two smaller matrices, which summarize information in a lower dimensional space. They are based on the assumption that few dimensions capture most of the signal in the data and mostly noise, such as erratic or inconsistent rating behavior, is filtered. One-Class Collaborative Filtering 提方法的普适性的句式:Different application scenarios in the field of business intelligence and analytics would benefit from research progress into this direction, however this paper focuses on the domain of movie recommendation as a first step. 清洗数据:After removing users and movies with no single remaining rating value the dataset consisted of U_M=6038 users and 3086 movies. The choice of the optimal number of topics in this range, was guided by the consideration that a high number of topics could bother the user in the evaluation phase. For this reason we set the number of topics equal to T=30, which seemed to be a good compromise between topics’ granularity and the cognitive effort of users in order to select the appropriate topics. 概率分布,概率矩阵。The LDA algorithm provides for every movie its’ probability distribution over topics. This set of probabilityvalues can be represented as a vector θi ∈ RT . This vectorprobability distribution over topics. This set of probabilityvalues can be represented as a vector θi ∈ RT . This vector is strictly non-negative, 0 ≤ θij per j =1,...,T, and sum toa vector θi ∈ RT . This vector is strictly non-negative, 0 ≤ θij per j =1,...,T, and sum tovalues can be represented as a vector θi ∈ RT . This vector is strictly non-negative, 0 ≤ θij per j =1,...,T, and sum to 1, ?Tj=1 θiis strictly non-negative, 0 ≤ θij per j =1,...,T, and sum to 1, ?Tj=1θij =1. All the probability distributions of movieshave been organized to form the matrix D ∈ RM×T , whichstrictly non-negative, 0 ≤ θij per j =1,...,T, and sum to 1, ?Tj=1 θij =1. All the probability distributions of movieshave been organizedj=1 θij =1. All the probability distributions of movieshave been organized to form the matrix D ∈ RM×T , which by construction is a stochastic matrix.have been organized to form the matrix D ∈ RM×T , which by construction is a stochastic matrix.
5.
《A Taxonomy for Generating Explanations in Recommender Systems》 explanations in recommender systems的定义:by two properties. First, they are information about recommendations, where a recommendation is typically a ranked list of items(是排名). Second, explanations support objectives defined by the recommender system designer(支持目标). 为什么要解释:the intention behind disclosing the reasoning process of the system could be to increase the user's confidence in making the right decision or to provide additional information such that the user can validate the rationality of the proposed purchase. 《A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation》 Optimal alignment scores(最优配置得分) are less powerful than probabilistic scores that integreate over alignment uncertainty. (概率化的好处):When parameters are probabilities rather than arbitrary scores, they are more readily optimized by objective mathematical criteria. This enables buiding more complex, biologically realistic models with large numbers of parameters. 《Transparent User Models for Personalization》 提到需要解释,并且用word cloud图。we must provide users with meaningful and interpretable answers when they ask,"why did I get badge X?" A convenient way to visualize badge definitions is via word clouds, with the size of an action proportional to its weight in the badge. Figure 4 shows six examples of badges leaned from running our model on the Twitter data set described above. 《Probabilistic Matrix Factorization》 A variety of probabilistic factor-based models has been proposed recently. All these models can be viewed as graphical models in which hidden factor variables have directed connections to variables that represent user ratings. The major drawback of such models is that [填入本文的基点,缺乏解释性。] 引入先验参数的原因探讨:Given sufficiently many factors, a PMF model can approximate any given matrix arbitrarily well....The simplest way to control the capacity of a PMF model is by changing the dimensionality of feature vectors. However, when the dataset is unbalanced, i.e. the number of observations differs significantly mong different rows or columns, this approach fails, since any single number of feature dimensions will be too high for some feature vectors and too low for others. Regularization parameters such as λU and λV defined above provide a more flexible approach to regularization......The complexity of the model is controlled by the hyperparameters..... 归一化(可以把评分值归一化):passed through the logistic function g(x)=1/(1+exp(-x)), which bounds the range of predictions.另一个是: we map the ratings 1,...K to the interval [0,1] using the fuction t(x)=(x-1)/(K-1), so that the range of valid rating values matches the range of predictions our model makes. 复杂度:scale well to large datasets 为本模型的参数引入提供依据:Second, most of the existing algorithms have trouble making accurate predictions for users who have very few ratings......remove all users with fewer than some minimal number of ratings. 局部最优解:a local minimum of the objective function given by Eq. 4 can be found by performing gradient descent in U and V. 《User interest and topic detection for personalized recommendation》 topics是从contents中获取的:propose a novel graphical model to extract hidden topics from web contents, cluster web contents, and detect users' interests on each cluster. 摘要中实验对比的句式:Experiment results on a public dataset demonstrated the limitation of a traditional content-boosted approach, and also showed the validity of our proposed techniques. 应用场景的广泛性:it can be widely applied to many other scenarios such as 描述了collaborative filtering 和 content-based 及其hybrid方法。 基于content的缺陷: existing techniques suffer in serious sparsity problems, because messages in online social media are usually short and
6.
sparse. In addition, web users use language creatively and generate rarely used and unknown vocabularies. These combined factors cause a great difficulty ... to make the most of its advantages. bipartite graph model: one common approach in collaborative filtering is based on a bipartite graph model. A bipartite graph represents the relatioinships between users and threads. we first introduce two baseline methods.(这里可以介绍LFM和personal rank。) A LDA-like model cannot be applied directly to discover users' interest since it lacks of the clustering property. 《Multidimensional mining of large-scale search logs: a topic-concept cube approach》 主要是两张图,一个是二分图的图,一个是graphical representation的图。 一个单词:topic-concept model 《Private traits and attributes are predictable from digital records of human behavior》 有个根据元素是0或1的user-like矩阵,进行了SVD分解,然后用每个用户的低维向量做回归。 GRG法(Generalized Reduced Gradient Method,广义既约梯度法),将Wolfe既约梯度法的推广到带非线性等式约束的情形。 既约梯度法(Reduced Gradient Method,1963),将线性规划的单纯形法推广到具有非线性 目标函数的问题.其基本思想是吧变量分为基变量和非基变量,将 基变量用非基变量表示,并从目标函数中消去基变量,得到以非基变量为自变量的简化的目标函数,进而利用此函数的负梯度构造下降可行方向。简化后的目 标函数关于非基变量的梯度称为目标函数的既约梯度。 Frank-Wolfe方法,提出于1956年,与既约梯度法目标类似,都是求解线性约束问题的一种算法。但是基本思想不同,Frank-Wolfe在每次迭代中,将目标函 数f(x)线性化,通过解线性规划求得下降可行方向,进而沿此方向在可行域内作一维搜索。 用简约梯度法求函数最小值 《推荐系统实践》P72页,无法提供推荐解释。 《Latent Dirichlet Allocation》 想法(摘要部分)in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. pLSI的原理:pLSI models each word in a document as a sample from a mixture model, where the mixture components are multinomial random variables that can be viewed as representations of "topics". xxx's work (pLSI) is a useful step toward probabilistic modeling of text. 为何是混合分布:A classic representation theorem due to xxx establishes that any collection of exchangeable random variables has a representation as a mixture distribution---in general an infinite mixture. Dirichlet带来的便利:The Dirichlet is a convenient distribution......;,these properties will facilitate the development of inference and parameter estimation algorithms. 《Probabilistic latent semantic indexing》 The rationale is that documents which share frequently co-occurring terms will have a similar representation in the latent space, even if they have no terms in common. LSA thus performs some sort of noise reduction (降噪,LFM也有啊) and has the potential benefit to detect synonyms(检测同义词) as well as words that refer to the same topic. pLSA的数学基础:since it is based on the likelihood principle and defines a proper generative model of the data.
7.
参考:基于概率的pLSI生成模型构建步骤: In terms of a generative model it can be defined in the following way:xxxxx 《Diversity Maximization Under Matroid Constraints》 We study this problem from an algorithmic perspective as well as experimentally using simulations and a user study. 《Joint latent topic models for text and citations》 (背景)Proliferation of large electronic document collections such as the web, news articles, blogs and scientific literature in the recent past has posed several new, interesting challenges to researchers in the data mining community. In particular, there is an increasing need for automatic techniques to visu- alize, analyze and mine these document collections. In the recent past, latent topic modeling has become very popular as a completely unsupervised technique for topic discovery in large document collections. 《A topic modeling approach and its integration into the random walk framework for academic search》 第二部分有关于LDA的介绍 LDA的缺点,不能directly对user和movie进行建模,只能对movie进行建模。 《Bipartite Networks of Wikipediaʼs Articles and Authors》 Investigations on ...... are of interest to both network and quantitative analysis studies, as well as to the social sciences. (前面说的事脱离场景)Connecting this to the network of editors and articles ... 《A Neural Probabilistic Language Model》方法要adaptive Here we must deal with data of variable length, like sentences, so the above approach must be adapted. 《Multi-label learning by exploiting label dependency》 From the Bayesian point of view, this problem can be reduced to model the conditional joint distribution 。。。。 we can first eliminate the influences of x in all labels, and then discover the conditional independen-cies among yk (conditioned on x) by analyzing the errors. http://stanford.edu/~rezab/dao/notes/lec14.pd Notice that this objective is non-convex (because of the x T u yi term); in fact it’s NP-hard to optimize. Gradient descent can be used as an approximate approach here, however it turns out to be slow and costs lots of iterations. Note however, that if we fix the set of variables X and treat them as constants, then the objective is a convex function of Y and vice versa. Our approach will therefore be to fix Y and optimize X, then fix X and optimize Y, and repreat until convergence. ftp://ftp.cc.gatech.edu/pub/tech_reports/cse/2007/GT-CSE-07-01.pdf 《使用LFM(Latent factor model)隐语义模型开展Top-N推荐》 参数训练的公式,以及LFM的参数(《推荐系统实战》P69)。
8.
在括号中的引用,所占空格。 逻辑,每一句话都要看其存在理由,并且这个理由不能藏着,要点明。 在Methodology用movie-user的场景来描述,如此则 item topic 和 user interest容易理解。 在Methodology里说一下latent自动包含了一些东西,呼应introduction相应的部分。 点一下实验部分用的是extensional model of LITM。 related work里加些 latent topic相关的文献,捋顺一下调理。 公式2,9,10的原理(如 最大似然估计原理,Maximizing the log-posterior with respect to U and V is equivalent to minimize the sum-of-squared- errors objective function)和过程(公式10如何得到)做补充。 时态 关键字的个数,以及每个关键字的最大单词数 重要工作是建模以及优化方法,缺一不可,要在abstract 和 introduction中提到。 vector 中的元素,是elements还是entries,要弄清楚。 topic 和 interest,在introduction中第一次出现时用看电影来举例说明。 Introduction部分有提到LFM的low interpretability(要加个出处),然后提到原因时说“hard to explain to users how a specific recommendation has been derived”。所提模型LITM作为对LFM的改善,要提一下是怎么derive recommendation的。 引用了专著,要写页数范围。 多行的伪代码是不是要用 Lines No.1.~No.2,并且谓语用复数形式。而单行的伪代码用Line No.,谓语用单数。 参数K的设置。 《Algorithms for non-negative matrix factorization》的参考文献多了个(NIPS) 《Analyzing Entities and Topics in News Articles》 《Joint latent topic models for text and citations》 A tale about LDA2vec: when LDA meets word2vec http://www.datasciencecentral.com/profiles/blogs/a-tale-about-lda2vec-when-lda-meets- word2vec Although the topics look dirty enough, it is possible to label some of them with real topic names.
9.
《Sparse Forward-Backward Using Minimum Divergence Beams for Fast Training Of Conditional Random Fields》 Learning curves for CRF training on synthetic data. 《Modeling relationships at multiple scales to improve accuracy of large recommender systems》 RMSEs of regional, factorization-based methods on Probe data, plotted against varying number of factors. 《Learning Large-Scale Conditional Random Fields》 纵坐标是 objective value 《Learning sparse CRFs for feature selection and classification of hyperspectral imagery》 图像的大标题是 Convergence of the training method. 《Using Maximum Entropy for Text Classification》 Accuracy over iterations of improved iterative scaling on the Industry Sector dataset with the full vocabulary, where it does best on this dataset. For