SlideShare a Scribd company logo
Confucius & “Its” Intelligent Disciples
                  Search + Social



                    Edward Chang
            Director, Google Research, Beijing



2010-9-17         Ed Chang@NCCU Invited Talk     1
Google中国:我们在哪里?




 2010-9-17   Ed Chang@NCCU Invited Talk   2
Web 1.0


                                  .htm


                           .htm               .jpg

            .jpg                                            .msg

                   .htm   .htm
                                                     .htm
                                   .doc




                                            .htm




2010-9-17                 Ed Chang@NCCU Invited Talk               3
Web 2.0 --- Web with People
                          .msg

                   .xls

                                        .htm                     .htm


                                                                        .jpg


            .jpg

                                                          .htm           .msg




                                                   .htm

                                 .doc


2010-9-17                        Ed Chang@NCCU Invited Talk                     4
2010-9-17   Ed Chang@NCCU Invited Talk   5
2010-9-17   Ed Chang@NCCU Invited Talk   6
Outline
     •   Search + Social Synergy
     •   Search  Social
     •   Social  Search
     •   Scalability and Elasticity




2010-9-17             Ed Chang@NCCU Invited Talk   7
Related Papers
• AdHeat (Social Ads):
    – AdHeat: An Influence-based Diffusion Model for Propagating Hints to Match Ads,
      H.J. Bao and E. Y. Chang, WWW 2010 (best paper candidate), April 2010.
    – Parallel Spectral Clustering in Distributed Systems,
      Wen-Yen Chen, Yangqiu Song, Hongjie Bai, Chih-Jen Lin, and E. Y. Chang,
      IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2010.
• UserRank:
    – Confucius and its Intelligent Disciples, X. Si, E. Y. Chang, Z. Gyongyi, VLDB,
      September 2010 .
    – Topic-dependent User Rank, Xiance Si, Z. Gyongyi, E. Y. Chang, and M.S. Sun,
      Google Technical Report.
• Large-scale Collaborative Filtering:
    – PLDA*: Parallel Latent Dirichlet Allocation for Large-Scale Applications, ACM
      Transactions on Internet Technology, 2010.
    – Collaborative Filtering for Orkut Communities: Discovery of User Latent Behavior,
      W.-Y. Chen, J. Chu, E. Y. Chang, WWW 2009: 681-690.
    – Combinational Collaborative Filtering for Personalized Community Recommendation,
      W.-Y. Chen, E. Y. Chang, KDD 2008: 115-123.
    – Parallel SVMs, E. Y. Chang, et al., NIPS 2007.
 2010-9-17                    Ed Chang@NCCU Invited Talk                               8
Query: What are must-see attractions at Yellowstone




 2010-9-17         Ed Chang@NCCU Invited Talk         9
Query: What are must-see attractions at Yellowstone




 2010-9-17         Ed Chang@NCCU Invited Talk         10
Query: What are must-see attractions at Yosemite




 2010-9-17         Ed Chang@NCCU Invited Talk      11
Query: What are must-see attractions at Beijing


                Hotel ads




   2010-9-17         Ed Chang@NCCU Invited Talk   12
2010-9-17   Ed Chang@NCCU Invited Talk   13
2010-9-17   Ed Chang@NCCU Invited Talk   14
2010-9-17   Ed Chang@NCCU Invited Talk   15
2010-9-17   Ed Chang@NCCU Invited Talk   16
Confucius: Google Q&A
   Providing High-Quality Answers in a Timely Fashion
                               Differentiated Content


                                       DC
        U
                 S                                         C
              Search                                                U
                                                        Community

                                    CCS/Q&A

                                  Discussion/Question


       Trigger a discussion/question session during search
       Provide labels to a post (semi-automatically)
       Given a post, find similar posts (automatically)
       Evaluate quality of a post, relevance and originality
       Evaluate user credentials in a topic sensitive way
       Route questions to experts
       Provide most relevant, high-quality content for Search to index
       Generate answers using NLP

2010-9-17                 Ed Chang@NCCU Invited Talk                      17
Confucius: Google Q&A
   Providing High-Quality Answers in a Timely Fashion
                               Differentiated Content


                                       DC
        U
                 S                                         C
              Search                                                U
                                                        Community

                                    CCS/Q&A

                                  Discussion/Question


       Trigger a discussion/question session during search
       Provide labels to a post (semi-automatically)
       Given a post, find similar posts (automatically)
       Evaluate quality of a post, relevance and originality
       Evaluate user credentials in a topic sensitive way
       Route questions to experts
       Provide most relevant, high-quality content for Search to index
       Generate answers using NLP

2010-9-17                 Ed Chang@NCCU Invited Talk                      18
Q&A Uses Machine Learning



                             Label suggestion
                             using LDA algorithm.




                             • Real Time topic-to-
                               topic (T2T)
                               recommendation
                               using LDA algorithm.

                             • Gives out related high
                               quality links to
                               previous questions
                               before human answer
                               appear.



2010-9-17         Ed Chang@NCCU Invited Talk            19
Collaborative Filtering
                                                             Labels/Qs
                                                     1 1 1
                                                 1       1 1        1        1       1
Based on membership so far,                                     1        1           1
 and memberships of others                       1       1      1 1




                                 Questions
                                                     1
                                                                    1 1
Predict further membership                               1                   1
                                             1 1
                                                 1                               1
                                             1                                       1
                                                 1 1 1 1 1

 2010-9-17               Ed Chang@NCCU Invited Talk                                  20
FIM-based Recommendation




2010-9-17          Ed Chang@NCCU Invited Talk   21
FIM Preliminaries
• Observation 1: If an item A is not frequent, any pattern contains
   A won’t be frequent [R. Agrawal]
   use a threshold to eliminate infrequent items
  {A}  {A,B}
• Observation 2: Patterns containing A are subsets of (or found
   from) transactions containing A [J. Han]
   divide-and-conquer: select transactions containing A to form a
   conditional database (CDB), and find patterns containing A from
   that conditional database
  {A, B}, {A, C}, {A}  CDB A
  {A, B}, {B, C}  CDB B




 2010-9-17             Ed Chang@NCCU Invited Talk              22
Preprocessing
                                          • According to
                                            Observation 1, we
                                            count the support of
                                            each item by
                                            scanning the
                                            database, and
                                            eliminate those
                                            infrequent items
                                            from the
                                            transactions.
                                          • According to
                                            Observation 3, we
                                            sort items in each
                                            transaction by the
                                            order of descending
                                            support value.
2010-9-17    Ed Chang@NCCU Invited Talk                     23
Example of Projection




            Example of Projection of a database into CDBs.
            Left: sorted transactions in order of f, c, a, b, m, p
            Right: conditional databases of frequent items


2010-9-17               Ed Chang@NCCU Invited Talk                   25
Example of Projection




            Example of Projection of a database into CDBs.
            Left: sorted transactions;
            Right: conditional databases of frequent items


2010-9-17              Ed Chang@NCCU Invited Talk            26
Example of Projection




            Example of Projection of a database into CDBs.
            Left: sorted transactions;
            Right: conditional databases of frequent items


2010-9-17              Ed Chang@NCCU Invited Talk            27
Recursive Projections [H. Li, et al. ACM RS 08]
                                       • Recursive projection form
                                         a search tree
                                       • Each node is a CDB
                                       • Using the order of items to
                                         prevent duplicated CDBs.
                                       • Each level of breath-first
                                         search of the tree can be
                                         done by a MapReduce
                                         iteration.
                                       • Once a CDB is small
                                         enough to fit in memory,
                                         we can invoke FP-growth
                                         to mine this CDB, and no
                                         more growth of the sub-
                                         tree.
2010-9-17            Ed Chang@NCCU Invited Talk                  28
Projection using MapReduce


                            p:{fcam/fcam/cb} p:3, pc:3




2010-9-17          Ed Chang@NCCU Invited Talk       29
Collaborative Filtering
                                                     Labels/Related Qs
                                                     1 1 1
                                                 1       1 1       1       1       1
Based on membership so far,                                    1       1           1




                                 Questions
 and memberships of others                       1       1     1 1
                                                     1
                                                                   1 1
Predict further membership                               1                 1
                                             1 1
                                                 1                             1
                                             1                                     1
                                                 1 1 1 1 1

 2010-9-17               Ed Chang@NCCU Invited Talk                                30
Distributed Latent Dirichlet Allocation (LDA)
•     Search                                                                                      Users/Music/Ads/Question

       – Construct a latent layer for better
          for semantic matching




                                                                  Users/Music/Ads/Answers
•     Example:
       – iPhone crack
       – Apple pie

               1 recipe pastry for a 9
Documents      inch double crust         How to install apps on
               9 apples, 2/1 cup,        Apple mobile phones?
               brown sugar


Topic
Distribution



                                                                                            •   Other Collaborative Filtering Apps
                                                                                                 –   Recommend Users  Users
Topic
Distribution
                                                                                                 –   Recommend Music  Users
                                                                                                 –   Recommend Ads  Users
                                                                                                 –   Recommend Answers  Q
                 iPhone crack             Apple pie
User quries                                                  • Predict the ? In the light-blue cells
2010-9-17                                Ed Chang@NCCU Invited Talk                                31
Latent Dirichlet Allocation [D. Blei, M. Jordan 04]
• α: uniform Dirichlet φ prior
                                                α
  for per document d topic
  distribution (corpus level
  parameter)
                                                θ
• β: uniform Dirichlet φ prior
  for per topic z word
  distribution (corpus level
  parameter)                                    z
• θd is the topic distribution of
  document d (document level)
• zdj the topic if the jth word in              w            φ            β
  d, wdj the specific word (word                    Nm           K
  level)
                                                         M
2010-9-17                Ed Chang@NCCU Invited Talk                  32
Combinational Collaborative Filtering Model (CCF)
                         【KDD2008】

Communities   Communities                        Communities




   users      descriptions               users                 descriptions




2010-9-17
33                 Ed Chang@NCCU Invited Talk
LDA Gibbs Sampling: Inputs & Outputs
Inputs:
1. training data: documents as
   bags of words
                                                    words    topics
2. parameter: the number of
   topics
                                      docs                        docs
Outputs:
1. model parameters: a co-
   occurrence matrix of topics and
   words.                             topics
2. by-product: a co-occurrence                       words

   matrix of topics and documents.


 2010-9-17             Ed Chang@NCCU Invited Talk                 34
Parallel Gibbs Sampling
Inputs:
1. training data: documents as
   bags of words
                                                    words    topics
2. parameter: the number of
   topics
                                      docs                        docs
Outputs:
1. model parameters: a co-
   occurrence matrix of topics and
   words.                             topics
2. by-product: a co-occurrence                       words

   matrix of topics and documents.


 2010-9-17             Ed Chang@NCCU Invited Talk                 35
PLDA -- enhanced parallel LDA
                  [ACM Transactions on IT]

• PLDA is restricted by memory: Topic-word matrix has
  to fit into memory
• Restricted by Amdahl’s Law: communication costs
  too high




2010-9-17            Ed Chang@NCCU Invited Talk
36
PLDA -- enhanced parallel LDA
• Take advantage of bag of words modeling: each Pw
  machine processes vocabulary in a word order
• Pipelining: fetching the updated topic distribution
  matrix while doing Gibbs sampling




2010-9-17           Ed Chang@NCCU Invited Talk
37
Speedup
            1,500x using 2,000 machines




2010-9-17          Ed Chang@NCCU Invited Talk   38
2010-9-17   Ed Chang@NCCU Invited Talk   39
Confucius: Google Q&A
                           Differentiated Content


                                        DC
        U
             S                                               C
              Search                                                     U
                                                             Community

                                        CCS/Q&A

                                       Discussion/Question


       Trigger a discussion/question session during search
       Provide labels to a post (semi-automatically)
       Given a post, find similar posts (automatically)
       Evaluate quality of a post, relevance and originality
       Evaluate user credentials in a topic sensitive way
       Route questions to experts
       Provide most relevant, high-quality content for Search to index
       NLQA

2010-9-17                  Ed Chang@NCCU Invited Talk                        40
UserRank
              • Rank users by quantity (number of links)
                and quality (weights on links) of
                contributions
              • Quality include:
                 – Relevance. Is an answer relevant to the
                   Q? Measured by KL divergence
                   between latent-topic vectors of A and Q
                 – Coverage. Compared among different
                   answers
                 – Originality. Detect potential plagiarism
                   and spam
                 – Promptness. Time between Q and A
                   posting time



2010-9-17   Ed Chang@NCCU Invited Talk                    41
Outline
     •   Search + Social Synergy
     •   Search  Social
     •   Social  Search
     •   Scalability and Elasticity




2010-9-17             Ed Chang@NCCU Invited Talk   42
Outline
     •   Search + Social Synergy
     •   Search  Social
     •   Social  Search
     •   Scalability and Elasticity




2010-9-17             Ed Chang@NCCU Invited Talk   43
Social + Data Networks




2010-9-17       Ed Chang@NCCU Invited Talk   44
可擴展性和彈性
            Scalability & Elasticity

  •   計算的規模與極端
  •   雲計算平台設計挑戰
  •   谷歌云計算基礎技術
  •   谷歌云計算新技術




2010-9-17      Ed Chang@NCCU Invited Talk   45
計算的規模

            千
                                               3.8*10^5km
            百万
            十億
            万億
            千万億
            百兆
            十万兆
            億兆
2010-9-17   京,亥   Ed Chang@NCCU Invited Talk         46
計算的規模

            千
                                               3.8*10^5km
            百万
            十億
                                                 1.2*10^10km
            万億
            千万億
            百兆
            十万兆
            億兆
2010-9-17   京,亥   Ed Chang@NCCU Invited Talk         47
計算的規模

            千
                                               3.8*10^5km
            百万
            十億
                                                 1.2*10^10km
            万億
            千万億
            百兆                                 1.5*10^16km

            十万兆
            億兆
2010-9-17   京,亥   Ed Chang@NCCU Invited Talk         48
高性能計算的極端
   应用       用户数 精确度                     可靠度       数据量

   科學計算     少           極高              低 -- 中等   Tera


   股市交易     大量          高               極高        Gega


   基因排序     少           高               高         Tera --
                                                  Peta

   搜索引擎     大量          中等 -- 高         中等        Peta




2010-9-17        Ed Chang@NCCU Invited Talk                 49
可擴展性和彈性
            Scalability & Elasticity

  •   計算的規模與極端
  •   雲計算平台設計挑戰
  •   雲計算基礎技術
  •   雲計算新技術




2010-9-17      Ed Chang@NCCU Invited Talk   50
Google文件系统 (GFS)
                                   GFS Master




                       Replicas
         Master                                         Client
                                   GFS Master
                                                        Client


C0     C1                         C1      C0    C3
C3     C4         C3              C5            C4

• Master manages metadata
• Data transfers happen directly between
  clients/chunkservers
• Files broken into chunks (typically 64 MB)
• Chunks triplicated across three machines for safety
• See SOSP^03 paper at Ed Chang@NCCU Invited Talk
   2010-9-17                                                     51
  http://labs.google.com/papers/gfs.html
大表格(Big Table)
• 内存管理
• <Row, Column, Timestamp> triple for key -
  lookup, insert, and delete API




• Example
     – Row: a Web Page
     – Columns: various aspects of the page
     – Time stamps: time when a column was fetched
2010-9-17            Ed Chang@NCCU Invited Talk      52
MapReduce

                                   Data Block 1
                                             map
Data              Data Block 2                                          Results
datadatada                                                Reduce        datadata
tadatadata                                                              datadata
datadatada                   map                                        datadata
tadatadata

             Data Block 3                                Data Block 5
                                                         map
                             map

                                             map
                                      Data Block 4


2010-9-17                   Ed Chang@NCCU Invited Talk                   53
計算機平台(Platform)




2010-9-17      Ed Chang@NCCU Invited Talk   54
樣品平台
• 計算機
     – 16GB DRAM; 160 GB SSD; 5 x 1TB disks
• 計算機機架 (Rack)
     – 40 计算机
     – 48 port Gigabit Ethernet switch
• 計算機中心 (Center)
     – 10,000計算機(250 racks)
     – 2K port Gigabit Ethernet switch

2010-9-17           Ed Chang@NCCU Invited Talk   55
儲存 ---計算機單機




2010-9-17     Ed Chang@NCCU Invited Talk   56
儲存 ---計算機機架




2010-9-17     Ed Chang@NCCU Invited Talk   57
儲存 ---計算機中心




2010-9-17     Ed Chang@NCCU Invited Talk   58
設計挑戰
  • 新程模型, 技術
       – File system
       – Flash (SSD); GPUs?
  • 能源節約
       – Hardware, software, warehouse
       – Encode/compress/transmit data well
  • 故障恢復
       – Hardware/software faults
       – Deal with stragglers

2010-9-17           Ed Chang@NCCU Invited Talk   59
新文件系統技術: Colossus
• Latency Reduction
     – Metadata distribution
     – Encoding and Replication
     – Data migration
• Throughput Improvement
     – Adaptive cluster-level file system
• Fault Tolerant
     – Reed-Solomon
     – Avoid correlated faults

2010-9-17             Ed Chang@NCCU Invited Talk   60
Colossus [Sigmod 09]




2010-9-17       Ed Chang@NCCU Invited Talk   61
新parellel算法
                                      MapReduce             Pregel   MPI
GFS/IO and task rescheduling          Yes                   No       No
overhead between iterations                                 +1       +1

Flexibility of computation model      AllReduce only        ?        Flexible
                                      +0.5                  +1       +1
Efficient AllReduce                   Yes                   Yes      Yes
                                      +1                    +1       +1
Recover from faults between           Yes                   Yes      Apps
iterations                            +1                    +1
Recover from faults within each       Yes                   Yes      Apps
iteration                             +1                    +1
Final Score for scalable machine      3.5                   5        4
learning

2010-9-17                      Ed Chang@NCCU Invited Talk                   62
設計挑戰
  • 新程式設計模型
       – Parallel
       – Flash (SSD); GPUs?
  • 能源節約
       – Hardware, software, warehouse
       – Encode/compress/transmit data well
  • 故障恢復
       – Hardware/software faults
       – Deal with stragglers

2010-9-17           Ed Chang@NCCU Invited Talk   63
能源消耗全球增長趨勢




2010-9-17     Ed Chang@NCCU Invited Talk   64
能源消耗地區增長趨勢




2010-9-17     Ed Chang@NCCU Invited Talk   65
電源使用效率




2010-9-17   Ed Chang@NCCU Invited Talk   66
電源使用分布




2010-9-17   Ed Chang@NCCU Invited Talk   67
節能措施




2010-9-17   Ed Chang@NCCU Invited Talk   68
能源消耗 --- 油當量




2010-9-17     Ed Chang@NCCU Invited Talk   69
設計挑戰
  • 新程式設計模型
       – Parallel
       – Flash (SSD); GPUs?
  • 能源節約
       – Hardware, software, warehouse
       – Encode/compress/transmit data well
  • 故障恢復
       – Hardware/software faults
       – Deal with stragglers

2010-9-17           Ed Chang@NCCU Invited Talk   70
故障率
• 99.9% 正常運行時間, 一年故障?小时
   – 9 小时故障/年
• 99.9%正常運行/天, 10,000 計算機, 不down機機率?
   – 4.5173346 × 10-5
• 10,000 計算機 (中心级)
   –   0.25 次斷電
   –   3 次路由器故障
   –   1,000 計算機故障
   –   1,000s 硬盤故障
   –   etc., etc., etc.

2010-9-17            Ed Chang@NCCU Invited Talk   71
故障後快速修復
     • 複製
     • 如果可能的話
            – 鬆散的一致性 (loosely consist)
            – 近似的答案
            – 不完整的答案




2010-9-17           Ed Chang@NCCU Invited Talk   72
數據規模算法精度




2010-9-17     Ed Chang@NCCU Invited Talk   73
數據規模算法精度




2010-9-17     Ed Chang@NCCU Invited Talk   74
數據規模算法精度




2010-9-17     Ed Chang@NCCU Invited Talk   75
數據規模算法精度




2010-9-17     Ed Chang@NCCU Invited Talk   76
赢在规模的例证
    • 谷歌翻譯
    • 谷歌語音識別
    • 趨勢預測




2010-9-17    Ed Chang@NCCU Invited Talk   77
流感趨勢圖




2010-9-17   Ed Chang@NCCU Invited Talk   78
Outline
     •   Search + Social Synergy
     •   Search  Social
     •   Social  Search
     •   Scalability and Elasticity




2010-9-17             Ed Chang@NCCU Invited Talk   79
Confucius and Disciples




2010-9-17          Ed Chang@NCCU Invited Talk   80
Concluding Remarks
• Search + Social
• Increasing quantity and complexity of data demands scalable
  solutions
• Have parallelized key subroutines for mining massive data sets
   –   Spectral Clustering [ECML 08]
   –   Frequent Itemset Mining [ACM RS 08]
   –   PLSA [KDD 08]
   –   LDA [WWW 09, AAIM 09]
   –   UserRank [Google TR 09]
   –   Support Vector Machines [NIPS 07]
• Relevant papers
   – http://infolab.stanford.edu/~echang/
• Open Source PSVM, PLDA
   – http://code.google.com/p/psvm/
   – http://code.google.com/p/plda/
 2010-9-17                  Ed Chang@NCCU Invited Talk         81
Collaborators
   •   Prof. Chih-Jen Lin (NTU)
   •   Hongjie Bai (Google)
   •   Hongji Bao (Google)
   •   Wen-Yen Chen (UCSB)
   •   Jon Chu (MIT)
   •   Haoyuan Li (PKU)
   •   Zhiyuan Liu (Tsinghua)
   •   Xiance Si (Tsinghua)
   •   Yangqiu Song (Tsinghua)
   •   Matt Stanton (CMU)
   •   Yi Wang (Google)
   •   Dong Zhang (Google)
   •   Kaihua Zhu (Google)

2010-9-17               Ed Chang@NCCU Invited Talk   82
References
[1] Alexa internet. http://www.alexa.com/.
[2] D. M. Blei and M. I. Jordan. Variational methods for the dirichlet process. In Proc. of the 21st international
      conference on Machine learning, pages 373-380, 2004.
[3] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993-
      1022, 2003.
[4] D. Cohn and H. Chang. Learning to probabilistically identify authoritative documents. In Proc. of the Seventeenth
      International Conference on Machine Learning, pages 167-174, 2000.
[5] D. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In
      Advances in Neural Information Processing Systems 13, pages 430-436, 2001.
[6] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic
      analysis. Journal of the American Society of Information Science, 41(6):391-407, 1990.
[7] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm.
      Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1-38, 1977.
[8] S. Geman and D. Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE
      Transactions on Pattern recognition and Machine Intelligence, 6:721-741, 1984.
[9] T. Hofmann. Probabilistic latent semantic indexing. In Proc. of Uncertainty in Articial Intelligence, pages 289-296,
      1999.
[10] T. Hofmann. Latent semantic models for collaborative filtering. ACM Transactions on Information System, 22(1):89-
      115, 2004.
[11] A. McCallum, A. Corrada-Emmanuel, and X. Wang. The author-recipient-topic model for topic and role discovery in
      social networks: Experiments with enron and academic email. Technical report, Computer Science, University of
      Massachusetts Amherst, 2004.
[12] D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent dirichlet allocation. In
      Advances in Neural Information Processing Systems 20, 2007.
[13] M. Ramoni, P. Sebastiani, and P. Cohen. Bayesian clustering by dynamics. Machine Learning, 47(1):91-121, 2002.



2010-9-17                                 Ed Chang@NCCU Invited Talk                                                 83
References (cont.)
[14] R. Salakhutdinov, A. Mnih, and G. Hinton. Restricted boltzmann machines for collaborative ltering. In Proc. Of the
      24th international conference on Machine learning, pages 791-798, 2007.
[15] E. Spertus, M. Sahami, and O. Buyukkokten. Evaluating similarity measures: a large-scale study in the orkut social
      network. In Proc. of the 11th ACM SIGKDD international conference on Knowledge discovery in data mining,
      pages 678-684, 2005.
[16] M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griths. Probabilistic author-topic models for information discovery. In
      Proc. of the 10th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 306-
      315, 2004.
[17] A. Strehl and J. Ghosh. Cluster ensembles - a knowledge reuse framework for combining multiple partitions.
      Journal on Machine Learning Research (JMLR), 3:583-617, 2002.
[18] T. Zhang and V. S. Iyengar. Recommender systems using linear classiers. Journal of Machine Learning Research,
      2:313-334, 2002.
[19] S. Zhong and J. Ghosh. Generative model-based clustering of documents: a comparative study. Knowledge and
      Information Systems (KAIS), 8:374-384, 2005.
[20] L. Admic and E. Adar. How to search a social network. 2004
[21] T.L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, pages
      5228-5235, 2004.
[22] H. Kautz, B. Selman, and M. Shah. Referral Web: Combining social networks and collaborative filtering.
      Communitcations of the ACM, 3:63-65, 1997.
[23] R. Agrawal, T. Imielnski, A. Swami. Mining association rules between sets of items in large databses. SIGMOD
      Rec., 22:207-116, 1993.
[24] J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In
      Proceedings of the Fourteenth Conference on Uncertainty in Artifical Intelligence, 1998.
[25] M.Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Trans. Inf. Syst., 22(1):143-
      177, 2004.



2010-9-17                                 Ed Chang@NCCU Invited Talk                                                   84
References (cont.)
[26] B.M. Sarwar, G. Karypis, J.A. Konstan, and J. Reidl. Item-based collaborative filtering recommendation algorithms.
      In Proceedings of the 10th International World Wide Web Conference, pages 285-295, 2001.
[27] M.Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Trans. Inf. Syst., 22(1):143-
      177, 2004.
[28] B.M. Sarwar, G. Karypis, J.A. Konstan, and J. Reidl. Item-based collaborative filtering recommendation algorithms.
      In Proceedings of the 10th International World Wide Web Conference, pages 285-295, 2001.
[29] M. Brand. Fast online svd revisions for lightweight recommender systems. In Proceedings of the 3rd SIAM
      International Conference on Data Mining, 2003.
[30] D. Goldbberg, D. Nichols, B. Oki and D. Terry. Using collaborative filtering to weave an information tapestry.
      Communication of ACM 35, 12:61-70, 1992.
[31] P. Resnik, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: An open architecture for aollaborative
      filtering of netnews. In Proceedings of the ACM, Conference on Computer Supported Cooperative Work. Pages
      175-186, 1994.
[32] J. Konstan, et al. Grouplens: Applying collaborative filtering to usenet news. Communication ofACM 40, 3:77-87,
      1997.
[33] U. Shardanand and P. Maes. Social information filtering: Algorithms for automating “word of mouth”. In
      Proceedings of ACM CHI, 1:210-217, 1995.
[34] G. Kinden, B. Smith and J. York. Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet
      Computing, 7:76-80, 2003.
[35] T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning Journal 42, 1:177-
      196, 2001.
[36] T. Hofmann and J. Puzicha. Latent class models for collaborative filtering. In Proceedings of International Joint
      Conference in Artificial Intelligence, 1999.
[37] http://www.cs.carleton.edu/cs_comps/0607/recommend/recommender/collaborativefiltering.html
[38] E. Y. Chang, et. al., Parallelizing Support Vector Machines on Distributed Machines, NIPS, 2007.
[39] Wen-Yen Chen, Dong Zhang, and E. Y. Chang, Combinational Collaborative Filtering for personalized community
      recommendation, ACM KDD 2008.
[40] Y. Sun, W.-Y. Chen, H. Bai, C.-j. Lin, and E. Y. Chang, Parallel Spectral Clustering, ECML 2008.
[41] Wen-Yen Chen, Jon Chu, Junyi Luan, Hongjie Bai, Yi Wang, and Edward Y. Chang , Collaborative Filtering for
      Orkut Communities: Discovery of User Latent Behavior, International World Wide Web Conference (WWW)
      Madrid, Spain, April 2009 .
[42] Yi Wang, Hongjie Bai, Matt Stanton, Wen-Yen Chen, and Edward Y. Chang, PLDA: Parallel Latent Dirichlet
      Allocation, International Conference on Algorithmic Aspects in Information and Management, June 2009.

2010-9-17                               Ed Chang@NCCU Invited Talk                                                 85

More Related Content

Viewers also liked

2010 09-24-闕志克老師-cloud computing where do we go
2010 09-24-闕志克老師-cloud computing where do we go2010 09-24-闕志克老師-cloud computing where do we go
2010 09-24-闕志克老師-cloud computing where do we go
nccuscience
 
CUEB Presentation
CUEB PresentationCUEB Presentation
CUEB Presentation
brianlynch
 
Science talk 100315-蔡介立
Science talk 100315-蔡介立Science talk 100315-蔡介立
Science talk 100315-蔡介立
nccuscience
 
Science Talk-091221-劉昭麟
Science Talk-091221-劉昭麟Science Talk-091221-劉昭麟
Science Talk-091221-劉昭麟
nccuscience
 
Layers of Custom Window Fashions
Layers of Custom Window FashionsLayers of Custom Window Fashions
Layers of Custom Window Fashions
Tami Smight Interiors
 
Km Pwrpnt
Km PwrpntKm Pwrpnt
Km Pwrpnt
guest26fc51a
 
2010 07-26-interdisciplinary research and learning
2010 07-26-interdisciplinary research and learning2010 07-26-interdisciplinary research and learning
2010 07-26-interdisciplinary research and learning
nccuscience
 
Science Talk-091012-楊健銘
Science Talk-091012-楊健銘Science Talk-091012-楊健銘
Science Talk-091012-楊健銘
nccuscience
 

Viewers also liked (8)

2010 09-24-闕志克老師-cloud computing where do we go
2010 09-24-闕志克老師-cloud computing where do we go2010 09-24-闕志克老師-cloud computing where do we go
2010 09-24-闕志克老師-cloud computing where do we go
 
CUEB Presentation
CUEB PresentationCUEB Presentation
CUEB Presentation
 
Science talk 100315-蔡介立
Science talk 100315-蔡介立Science talk 100315-蔡介立
Science talk 100315-蔡介立
 
Science Talk-091221-劉昭麟
Science Talk-091221-劉昭麟Science Talk-091221-劉昭麟
Science Talk-091221-劉昭麟
 
Layers of Custom Window Fashions
Layers of Custom Window FashionsLayers of Custom Window Fashions
Layers of Custom Window Fashions
 
Km Pwrpnt
Km PwrpntKm Pwrpnt
Km Pwrpnt
 
2010 07-26-interdisciplinary research and learning
2010 07-26-interdisciplinary research and learning2010 07-26-interdisciplinary research and learning
2010 07-26-interdisciplinary research and learning
 
Science Talk-091012-楊健銘
Science Talk-091012-楊健銘Science Talk-091012-楊健銘
Science Talk-091012-楊健銘
 

Similar to 2010 09-17-張志威老師-confucius & “its” intelligent disciples

CHI2015 - Citizen Science || Zooniverse
CHI2015 - Citizen Science || ZooniverseCHI2015 - Citizen Science || Zooniverse
CHI2015 - Citizen Science || Zooniverse
Ramine Tinati
 
EdChang - Parallel Algorithms For Mining Large Scale Data
EdChang - Parallel Algorithms For Mining Large Scale DataEdChang - Parallel Algorithms For Mining Large Scale Data
EdChang - Parallel Algorithms For Mining Large Scale Data
gu wendong
 
Towards a Social Learning Analytics for Online Communities of Practice for Ed...
Towards a Social Learning Analytics for Online Communities of Practice for Ed...Towards a Social Learning Analytics for Online Communities of Practice for Ed...
Towards a Social Learning Analytics for Online Communities of Practice for Ed...
dcambrid
 
Netnography - Theory & How-To's
Netnography - Theory & How-To'sNetnography - Theory & How-To's
Netnography - Theory & How-To's
Tony Yu
 
Joining the docs: Back to the Future with Taxonomy and the New Semantics
Joining the docs: Back to the Future with Taxonomy and the New SemanticsJoining the docs: Back to the Future with Taxonomy and the New Semantics
Joining the docs: Back to the Future with Taxonomy and the New Semantics
Russell Blakeborough
 
Internet mediatedresearch edirisingha_draft
Internet mediatedresearch edirisingha_draftInternet mediatedresearch edirisingha_draft
Internet mediatedresearch edirisingha_draft
Palitha Edirisingha
 
推薦甄試備審資料─NTHU 學習科學研究所 (張小均)
推薦甄試備審資料─NTHU 學習科學研究所 (張小均)推薦甄試備審資料─NTHU 學習科學研究所 (張小均)
推薦甄試備審資料─NTHU 學習科學研究所 (張小均)
小均 張
 
Open Innovation and Semantic Web
Open Innovation and Semantic WebOpen Innovation and Semantic Web
Open Innovation and Semantic Web
Milan Stankovic
 
De liddo & Buckingham Shum jurix2012
De liddo & Buckingham Shum jurix2012De liddo & Buckingham Shum jurix2012
De liddo & Buckingham Shum jurix2012
Anna De Liddo
 
Conversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignConversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems Design
CommunitySense
 
Global lodlam_communities and open cultural data
Global lodlam_communities and open cultural dataGlobal lodlam_communities and open cultural data
Global lodlam_communities and open cultural data
Minerva Lin
 
Transdisciplinary Research: A short introduction
Transdisciplinary Research: A short introductionTransdisciplinary Research: A short introduction
Transdisciplinary Research: A short introduction
tyndallcentreuea
 
Keynote reusability measurement and social community analysis from mooc con...
Keynote   reusability measurement and social community analysis from mooc con...Keynote   reusability measurement and social community analysis from mooc con...
Keynote reusability measurement and social community analysis from mooc con...
HannibalHsieh
 
Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...
Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...
Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...
Ed Chi
 
ISWC2023-McGuinnessTWC16x9FinalShort.pdf
ISWC2023-McGuinnessTWC16x9FinalShort.pdfISWC2023-McGuinnessTWC16x9FinalShort.pdf
ISWC2023-McGuinnessTWC16x9FinalShort.pdf
Deborah McGuinness
 
[Wikisym2013] serp revised_apa_notice
[Wikisym2013] serp revised_apa_notice[Wikisym2013] serp revised_apa_notice
[Wikisym2013] serp revised_apa_notice
Hanteng Liao
 
A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
Cornelius Puschmann
 
Transitive credit
Transitive creditTransitive credit
Transitive credit
Daniel S. Katz
 
Global Redirective Practices
Global Redirective PracticesGlobal Redirective Practices
Global Redirective Practices
adjwilli
 
JISC Institutional Innovation Support and Synthesis
JISC Institutional Innovation Support and SynthesisJISC Institutional Innovation Support and Synthesis
JISC Institutional Innovation Support and Synthesis
George Roberts
 

Similar to 2010 09-17-張志威老師-confucius & “its” intelligent disciples (20)

CHI2015 - Citizen Science || Zooniverse
CHI2015 - Citizen Science || ZooniverseCHI2015 - Citizen Science || Zooniverse
CHI2015 - Citizen Science || Zooniverse
 
EdChang - Parallel Algorithms For Mining Large Scale Data
EdChang - Parallel Algorithms For Mining Large Scale DataEdChang - Parallel Algorithms For Mining Large Scale Data
EdChang - Parallel Algorithms For Mining Large Scale Data
 
Towards a Social Learning Analytics for Online Communities of Practice for Ed...
Towards a Social Learning Analytics for Online Communities of Practice for Ed...Towards a Social Learning Analytics for Online Communities of Practice for Ed...
Towards a Social Learning Analytics for Online Communities of Practice for Ed...
 
Netnography - Theory & How-To's
Netnography - Theory & How-To'sNetnography - Theory & How-To's
Netnography - Theory & How-To's
 
Joining the docs: Back to the Future with Taxonomy and the New Semantics
Joining the docs: Back to the Future with Taxonomy and the New SemanticsJoining the docs: Back to the Future with Taxonomy and the New Semantics
Joining the docs: Back to the Future with Taxonomy and the New Semantics
 
Internet mediatedresearch edirisingha_draft
Internet mediatedresearch edirisingha_draftInternet mediatedresearch edirisingha_draft
Internet mediatedresearch edirisingha_draft
 
推薦甄試備審資料─NTHU 學習科學研究所 (張小均)
推薦甄試備審資料─NTHU 學習科學研究所 (張小均)推薦甄試備審資料─NTHU 學習科學研究所 (張小均)
推薦甄試備審資料─NTHU 學習科學研究所 (張小均)
 
Open Innovation and Semantic Web
Open Innovation and Semantic WebOpen Innovation and Semantic Web
Open Innovation and Semantic Web
 
De liddo & Buckingham Shum jurix2012
De liddo & Buckingham Shum jurix2012De liddo & Buckingham Shum jurix2012
De liddo & Buckingham Shum jurix2012
 
Conversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignConversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems Design
 
Global lodlam_communities and open cultural data
Global lodlam_communities and open cultural dataGlobal lodlam_communities and open cultural data
Global lodlam_communities and open cultural data
 
Transdisciplinary Research: A short introduction
Transdisciplinary Research: A short introductionTransdisciplinary Research: A short introduction
Transdisciplinary Research: A short introduction
 
Keynote reusability measurement and social community analysis from mooc con...
Keynote   reusability measurement and social community analysis from mooc con...Keynote   reusability measurement and social community analysis from mooc con...
Keynote reusability measurement and social community analysis from mooc con...
 
Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...
Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...
Large Scale Social Analytics on Wikipedia, Delicious, and Twitter (presented ...
 
ISWC2023-McGuinnessTWC16x9FinalShort.pdf
ISWC2023-McGuinnessTWC16x9FinalShort.pdfISWC2023-McGuinnessTWC16x9FinalShort.pdf
ISWC2023-McGuinnessTWC16x9FinalShort.pdf
 
[Wikisym2013] serp revised_apa_notice
[Wikisym2013] serp revised_apa_notice[Wikisym2013] serp revised_apa_notice
[Wikisym2013] serp revised_apa_notice
 
A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
 
Transitive credit
Transitive creditTransitive credit
Transitive credit
 
Global Redirective Practices
Global Redirective PracticesGlobal Redirective Practices
Global Redirective Practices
 
JISC Institutional Innovation Support and Synthesis
JISC Institutional Innovation Support and SynthesisJISC Institutional Innovation Support and Synthesis
JISC Institutional Innovation Support and Synthesis
 

Recently uploaded

How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
Celine George
 
Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
deepaannamalai16
 
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
ImMuslim
 
How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17
Celine George
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
siemaillard
 
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapitolTechU
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
blueshagoo1
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
سمير بسيوني
 
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
nitinpv4ai
 
How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17
Celine George
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
iammrhaywood
 
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdfREASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
giancarloi8888
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
Nguyen Thanh Tu Collection
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Henry Hollis
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
Prof. Dr. K. Adisesha
 
Juneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School DistrictJuneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School District
David Douglas School District
 
Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)
nitinpv4ai
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
Krassimira Luka
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
TechSoup
 

Recently uploaded (20)

How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
 
Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
 
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
 
How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
 
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
 
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
 
How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
 
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdfREASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
 
Juneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School DistrictJuneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School District
 
Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
 

2010 09-17-張志威老師-confucius & “its” intelligent disciples

  • 1. Confucius & “Its” Intelligent Disciples Search + Social Edward Chang Director, Google Research, Beijing 2010-9-17 Ed Chang@NCCU Invited Talk 1
  • 2. Google中国:我们在哪里? 2010-9-17 Ed Chang@NCCU Invited Talk 2
  • 3. Web 1.0 .htm .htm .jpg .jpg .msg .htm .htm .htm .doc .htm 2010-9-17 Ed Chang@NCCU Invited Talk 3
  • 4. Web 2.0 --- Web with People .msg .xls .htm .htm .jpg .jpg .htm .msg .htm .doc 2010-9-17 Ed Chang@NCCU Invited Talk 4
  • 5. 2010-9-17 Ed Chang@NCCU Invited Talk 5
  • 6. 2010-9-17 Ed Chang@NCCU Invited Talk 6
  • 7. Outline • Search + Social Synergy • Search  Social • Social  Search • Scalability and Elasticity 2010-9-17 Ed Chang@NCCU Invited Talk 7
  • 8. Related Papers • AdHeat (Social Ads): – AdHeat: An Influence-based Diffusion Model for Propagating Hints to Match Ads, H.J. Bao and E. Y. Chang, WWW 2010 (best paper candidate), April 2010. – Parallel Spectral Clustering in Distributed Systems, Wen-Yen Chen, Yangqiu Song, Hongjie Bai, Chih-Jen Lin, and E. Y. Chang, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2010. • UserRank: – Confucius and its Intelligent Disciples, X. Si, E. Y. Chang, Z. Gyongyi, VLDB, September 2010 . – Topic-dependent User Rank, Xiance Si, Z. Gyongyi, E. Y. Chang, and M.S. Sun, Google Technical Report. • Large-scale Collaborative Filtering: – PLDA*: Parallel Latent Dirichlet Allocation for Large-Scale Applications, ACM Transactions on Internet Technology, 2010. – Collaborative Filtering for Orkut Communities: Discovery of User Latent Behavior, W.-Y. Chen, J. Chu, E. Y. Chang, WWW 2009: 681-690. – Combinational Collaborative Filtering for Personalized Community Recommendation, W.-Y. Chen, E. Y. Chang, KDD 2008: 115-123. – Parallel SVMs, E. Y. Chang, et al., NIPS 2007. 2010-9-17 Ed Chang@NCCU Invited Talk 8
  • 9. Query: What are must-see attractions at Yellowstone 2010-9-17 Ed Chang@NCCU Invited Talk 9
  • 10. Query: What are must-see attractions at Yellowstone 2010-9-17 Ed Chang@NCCU Invited Talk 10
  • 11. Query: What are must-see attractions at Yosemite 2010-9-17 Ed Chang@NCCU Invited Talk 11
  • 12. Query: What are must-see attractions at Beijing Hotel ads 2010-9-17 Ed Chang@NCCU Invited Talk 12
  • 13. 2010-9-17 Ed Chang@NCCU Invited Talk 13
  • 14. 2010-9-17 Ed Chang@NCCU Invited Talk 14
  • 15. 2010-9-17 Ed Chang@NCCU Invited Talk 15
  • 16. 2010-9-17 Ed Chang@NCCU Invited Talk 16
  • 17. Confucius: Google Q&A Providing High-Quality Answers in a Timely Fashion Differentiated Content DC U S C Search U Community CCS/Q&A Discussion/Question  Trigger a discussion/question session during search  Provide labels to a post (semi-automatically)  Given a post, find similar posts (automatically)  Evaluate quality of a post, relevance and originality  Evaluate user credentials in a topic sensitive way  Route questions to experts  Provide most relevant, high-quality content for Search to index  Generate answers using NLP 2010-9-17 Ed Chang@NCCU Invited Talk 17
  • 18. Confucius: Google Q&A Providing High-Quality Answers in a Timely Fashion Differentiated Content DC U S C Search U Community CCS/Q&A Discussion/Question  Trigger a discussion/question session during search  Provide labels to a post (semi-automatically)  Given a post, find similar posts (automatically)  Evaluate quality of a post, relevance and originality  Evaluate user credentials in a topic sensitive way  Route questions to experts  Provide most relevant, high-quality content for Search to index  Generate answers using NLP 2010-9-17 Ed Chang@NCCU Invited Talk 18
  • 19. Q&A Uses Machine Learning Label suggestion using LDA algorithm. • Real Time topic-to- topic (T2T) recommendation using LDA algorithm. • Gives out related high quality links to previous questions before human answer appear. 2010-9-17 Ed Chang@NCCU Invited Talk 19
  • 20. Collaborative Filtering Labels/Qs 1 1 1 1 1 1 1 1 1 Based on membership so far, 1 1 1 and memberships of others 1 1 1 1 Questions 1 1 1 Predict further membership 1 1 1 1 1 1 1 1 1 1 1 1 1 2010-9-17 Ed Chang@NCCU Invited Talk 20
  • 21. FIM-based Recommendation 2010-9-17 Ed Chang@NCCU Invited Talk 21
  • 22. FIM Preliminaries • Observation 1: If an item A is not frequent, any pattern contains A won’t be frequent [R. Agrawal]  use a threshold to eliminate infrequent items {A}  {A,B} • Observation 2: Patterns containing A are subsets of (or found from) transactions containing A [J. Han]  divide-and-conquer: select transactions containing A to form a conditional database (CDB), and find patterns containing A from that conditional database {A, B}, {A, C}, {A}  CDB A {A, B}, {B, C}  CDB B 2010-9-17 Ed Chang@NCCU Invited Talk 22
  • 23. Preprocessing • According to Observation 1, we count the support of each item by scanning the database, and eliminate those infrequent items from the transactions. • According to Observation 3, we sort items in each transaction by the order of descending support value. 2010-9-17 Ed Chang@NCCU Invited Talk 23
  • 24. Example of Projection Example of Projection of a database into CDBs. Left: sorted transactions in order of f, c, a, b, m, p Right: conditional databases of frequent items 2010-9-17 Ed Chang@NCCU Invited Talk 25
  • 25. Example of Projection Example of Projection of a database into CDBs. Left: sorted transactions; Right: conditional databases of frequent items 2010-9-17 Ed Chang@NCCU Invited Talk 26
  • 26. Example of Projection Example of Projection of a database into CDBs. Left: sorted transactions; Right: conditional databases of frequent items 2010-9-17 Ed Chang@NCCU Invited Talk 27
  • 27. Recursive Projections [H. Li, et al. ACM RS 08] • Recursive projection form a search tree • Each node is a CDB • Using the order of items to prevent duplicated CDBs. • Each level of breath-first search of the tree can be done by a MapReduce iteration. • Once a CDB is small enough to fit in memory, we can invoke FP-growth to mine this CDB, and no more growth of the sub- tree. 2010-9-17 Ed Chang@NCCU Invited Talk 28
  • 28. Projection using MapReduce p:{fcam/fcam/cb} p:3, pc:3 2010-9-17 Ed Chang@NCCU Invited Talk 29
  • 29. Collaborative Filtering Labels/Related Qs 1 1 1 1 1 1 1 1 1 Based on membership so far, 1 1 1 Questions and memberships of others 1 1 1 1 1 1 1 Predict further membership 1 1 1 1 1 1 1 1 1 1 1 1 1 2010-9-17 Ed Chang@NCCU Invited Talk 30
  • 30. Distributed Latent Dirichlet Allocation (LDA) • Search Users/Music/Ads/Question – Construct a latent layer for better for semantic matching Users/Music/Ads/Answers • Example: – iPhone crack – Apple pie 1 recipe pastry for a 9 Documents inch double crust How to install apps on 9 apples, 2/1 cup, Apple mobile phones? brown sugar Topic Distribution • Other Collaborative Filtering Apps – Recommend Users  Users Topic Distribution – Recommend Music  Users – Recommend Ads  Users – Recommend Answers  Q iPhone crack Apple pie User quries • Predict the ? In the light-blue cells 2010-9-17 Ed Chang@NCCU Invited Talk 31
  • 31. Latent Dirichlet Allocation [D. Blei, M. Jordan 04] • α: uniform Dirichlet φ prior α for per document d topic distribution (corpus level parameter) θ • β: uniform Dirichlet φ prior for per topic z word distribution (corpus level parameter) z • θd is the topic distribution of document d (document level) • zdj the topic if the jth word in w φ β d, wdj the specific word (word Nm K level) M 2010-9-17 Ed Chang@NCCU Invited Talk 32
  • 32. Combinational Collaborative Filtering Model (CCF) 【KDD2008】 Communities Communities Communities users descriptions users descriptions 2010-9-17 33 Ed Chang@NCCU Invited Talk
  • 33. LDA Gibbs Sampling: Inputs & Outputs Inputs: 1. training data: documents as bags of words words topics 2. parameter: the number of topics docs docs Outputs: 1. model parameters: a co- occurrence matrix of topics and words. topics 2. by-product: a co-occurrence words matrix of topics and documents. 2010-9-17 Ed Chang@NCCU Invited Talk 34
  • 34. Parallel Gibbs Sampling Inputs: 1. training data: documents as bags of words words topics 2. parameter: the number of topics docs docs Outputs: 1. model parameters: a co- occurrence matrix of topics and words. topics 2. by-product: a co-occurrence words matrix of topics and documents. 2010-9-17 Ed Chang@NCCU Invited Talk 35
  • 35. PLDA -- enhanced parallel LDA [ACM Transactions on IT] • PLDA is restricted by memory: Topic-word matrix has to fit into memory • Restricted by Amdahl’s Law: communication costs too high 2010-9-17 Ed Chang@NCCU Invited Talk 36
  • 36. PLDA -- enhanced parallel LDA • Take advantage of bag of words modeling: each Pw machine processes vocabulary in a word order • Pipelining: fetching the updated topic distribution matrix while doing Gibbs sampling 2010-9-17 Ed Chang@NCCU Invited Talk 37
  • 37. Speedup 1,500x using 2,000 machines 2010-9-17 Ed Chang@NCCU Invited Talk 38
  • 38. 2010-9-17 Ed Chang@NCCU Invited Talk 39
  • 39. Confucius: Google Q&A Differentiated Content DC U S C Search U Community CCS/Q&A Discussion/Question  Trigger a discussion/question session during search  Provide labels to a post (semi-automatically)  Given a post, find similar posts (automatically)  Evaluate quality of a post, relevance and originality  Evaluate user credentials in a topic sensitive way  Route questions to experts  Provide most relevant, high-quality content for Search to index  NLQA 2010-9-17 Ed Chang@NCCU Invited Talk 40
  • 40. UserRank • Rank users by quantity (number of links) and quality (weights on links) of contributions • Quality include: – Relevance. Is an answer relevant to the Q? Measured by KL divergence between latent-topic vectors of A and Q – Coverage. Compared among different answers – Originality. Detect potential plagiarism and spam – Promptness. Time between Q and A posting time 2010-9-17 Ed Chang@NCCU Invited Talk 41
  • 41. Outline • Search + Social Synergy • Search  Social • Social  Search • Scalability and Elasticity 2010-9-17 Ed Chang@NCCU Invited Talk 42
  • 42. Outline • Search + Social Synergy • Search  Social • Social  Search • Scalability and Elasticity 2010-9-17 Ed Chang@NCCU Invited Talk 43
  • 43. Social + Data Networks 2010-9-17 Ed Chang@NCCU Invited Talk 44
  • 44. 可擴展性和彈性 Scalability & Elasticity • 計算的規模與極端 • 雲計算平台設計挑戰 • 谷歌云計算基礎技術 • 谷歌云計算新技術 2010-9-17 Ed Chang@NCCU Invited Talk 45
  • 45. 計算的規模 千 3.8*10^5km 百万 十億 万億 千万億 百兆 十万兆 億兆 2010-9-17 京,亥 Ed Chang@NCCU Invited Talk 46
  • 46. 計算的規模 千 3.8*10^5km 百万 十億 1.2*10^10km 万億 千万億 百兆 十万兆 億兆 2010-9-17 京,亥 Ed Chang@NCCU Invited Talk 47
  • 47. 計算的規模 千 3.8*10^5km 百万 十億 1.2*10^10km 万億 千万億 百兆 1.5*10^16km 十万兆 億兆 2010-9-17 京,亥 Ed Chang@NCCU Invited Talk 48
  • 48. 高性能計算的極端 应用 用户数 精确度 可靠度 数据量 科學計算 少 極高 低 -- 中等 Tera 股市交易 大量 高 極高 Gega 基因排序 少 高 高 Tera -- Peta 搜索引擎 大量 中等 -- 高 中等 Peta 2010-9-17 Ed Chang@NCCU Invited Talk 49
  • 49. 可擴展性和彈性 Scalability & Elasticity • 計算的規模與極端 • 雲計算平台設計挑戰 • 雲計算基礎技術 • 雲計算新技術 2010-9-17 Ed Chang@NCCU Invited Talk 50
  • 50. Google文件系统 (GFS) GFS Master Replicas Master Client GFS Master Client C0 C1 C1 C0 C3 C3 C4 C3 C5 C4 • Master manages metadata • Data transfers happen directly between clients/chunkservers • Files broken into chunks (typically 64 MB) • Chunks triplicated across three machines for safety • See SOSP^03 paper at Ed Chang@NCCU Invited Talk 2010-9-17 51 http://labs.google.com/papers/gfs.html
  • 51. 大表格(Big Table) • 内存管理 • <Row, Column, Timestamp> triple for key - lookup, insert, and delete API • Example – Row: a Web Page – Columns: various aspects of the page – Time stamps: time when a column was fetched 2010-9-17 Ed Chang@NCCU Invited Talk 52
  • 52. MapReduce Data Block 1 map Data Data Block 2 Results datadatada Reduce datadata tadatadata datadata datadatada map datadata tadatadata Data Block 3 Data Block 5 map map map Data Block 4 2010-9-17 Ed Chang@NCCU Invited Talk 53
  • 53. 計算機平台(Platform) 2010-9-17 Ed Chang@NCCU Invited Talk 54
  • 54. 樣品平台 • 計算機 – 16GB DRAM; 160 GB SSD; 5 x 1TB disks • 計算機機架 (Rack) – 40 计算机 – 48 port Gigabit Ethernet switch • 計算機中心 (Center) – 10,000計算機(250 racks) – 2K port Gigabit Ethernet switch 2010-9-17 Ed Chang@NCCU Invited Talk 55
  • 55. 儲存 ---計算機單機 2010-9-17 Ed Chang@NCCU Invited Talk 56
  • 56. 儲存 ---計算機機架 2010-9-17 Ed Chang@NCCU Invited Talk 57
  • 57. 儲存 ---計算機中心 2010-9-17 Ed Chang@NCCU Invited Talk 58
  • 58. 設計挑戰 • 新程模型, 技術 – File system – Flash (SSD); GPUs? • 能源節約 – Hardware, software, warehouse – Encode/compress/transmit data well • 故障恢復 – Hardware/software faults – Deal with stragglers 2010-9-17 Ed Chang@NCCU Invited Talk 59
  • 59. 新文件系統技術: Colossus • Latency Reduction – Metadata distribution – Encoding and Replication – Data migration • Throughput Improvement – Adaptive cluster-level file system • Fault Tolerant – Reed-Solomon – Avoid correlated faults 2010-9-17 Ed Chang@NCCU Invited Talk 60
  • 60. Colossus [Sigmod 09] 2010-9-17 Ed Chang@NCCU Invited Talk 61
  • 61. 新parellel算法 MapReduce Pregel MPI GFS/IO and task rescheduling Yes No No overhead between iterations +1 +1 Flexibility of computation model AllReduce only ? Flexible +0.5 +1 +1 Efficient AllReduce Yes Yes Yes +1 +1 +1 Recover from faults between Yes Yes Apps iterations +1 +1 Recover from faults within each Yes Yes Apps iteration +1 +1 Final Score for scalable machine 3.5 5 4 learning 2010-9-17 Ed Chang@NCCU Invited Talk 62
  • 62. 設計挑戰 • 新程式設計模型 – Parallel – Flash (SSD); GPUs? • 能源節約 – Hardware, software, warehouse – Encode/compress/transmit data well • 故障恢復 – Hardware/software faults – Deal with stragglers 2010-9-17 Ed Chang@NCCU Invited Talk 63
  • 63. 能源消耗全球增長趨勢 2010-9-17 Ed Chang@NCCU Invited Talk 64
  • 64. 能源消耗地區增長趨勢 2010-9-17 Ed Chang@NCCU Invited Talk 65
  • 65. 電源使用效率 2010-9-17 Ed Chang@NCCU Invited Talk 66
  • 66. 電源使用分布 2010-9-17 Ed Chang@NCCU Invited Talk 67
  • 67. 節能措施 2010-9-17 Ed Chang@NCCU Invited Talk 68
  • 68. 能源消耗 --- 油當量 2010-9-17 Ed Chang@NCCU Invited Talk 69
  • 69. 設計挑戰 • 新程式設計模型 – Parallel – Flash (SSD); GPUs? • 能源節約 – Hardware, software, warehouse – Encode/compress/transmit data well • 故障恢復 – Hardware/software faults – Deal with stragglers 2010-9-17 Ed Chang@NCCU Invited Talk 70
  • 70. 故障率 • 99.9% 正常運行時間, 一年故障?小时 – 9 小时故障/年 • 99.9%正常運行/天, 10,000 計算機, 不down機機率? – 4.5173346 × 10-5 • 10,000 計算機 (中心级) – 0.25 次斷電 – 3 次路由器故障 – 1,000 計算機故障 – 1,000s 硬盤故障 – etc., etc., etc. 2010-9-17 Ed Chang@NCCU Invited Talk 71
  • 71. 故障後快速修復 • 複製 • 如果可能的話 – 鬆散的一致性 (loosely consist) – 近似的答案 – 不完整的答案 2010-9-17 Ed Chang@NCCU Invited Talk 72
  • 72. 數據規模算法精度 2010-9-17 Ed Chang@NCCU Invited Talk 73
  • 73. 數據規模算法精度 2010-9-17 Ed Chang@NCCU Invited Talk 74
  • 74. 數據規模算法精度 2010-9-17 Ed Chang@NCCU Invited Talk 75
  • 75. 數據規模算法精度 2010-9-17 Ed Chang@NCCU Invited Talk 76
  • 76. 赢在规模的例证 • 谷歌翻譯 • 谷歌語音識別 • 趨勢預測 2010-9-17 Ed Chang@NCCU Invited Talk 77
  • 77. 流感趨勢圖 2010-9-17 Ed Chang@NCCU Invited Talk 78
  • 78. Outline • Search + Social Synergy • Search  Social • Social  Search • Scalability and Elasticity 2010-9-17 Ed Chang@NCCU Invited Talk 79
  • 79. Confucius and Disciples 2010-9-17 Ed Chang@NCCU Invited Talk 80
  • 80. Concluding Remarks • Search + Social • Increasing quantity and complexity of data demands scalable solutions • Have parallelized key subroutines for mining massive data sets – Spectral Clustering [ECML 08] – Frequent Itemset Mining [ACM RS 08] – PLSA [KDD 08] – LDA [WWW 09, AAIM 09] – UserRank [Google TR 09] – Support Vector Machines [NIPS 07] • Relevant papers – http://infolab.stanford.edu/~echang/ • Open Source PSVM, PLDA – http://code.google.com/p/psvm/ – http://code.google.com/p/plda/ 2010-9-17 Ed Chang@NCCU Invited Talk 81
  • 81. Collaborators • Prof. Chih-Jen Lin (NTU) • Hongjie Bai (Google) • Hongji Bao (Google) • Wen-Yen Chen (UCSB) • Jon Chu (MIT) • Haoyuan Li (PKU) • Zhiyuan Liu (Tsinghua) • Xiance Si (Tsinghua) • Yangqiu Song (Tsinghua) • Matt Stanton (CMU) • Yi Wang (Google) • Dong Zhang (Google) • Kaihua Zhu (Google) 2010-9-17 Ed Chang@NCCU Invited Talk 82
  • 82. References [1] Alexa internet. http://www.alexa.com/. [2] D. M. Blei and M. I. Jordan. Variational methods for the dirichlet process. In Proc. of the 21st international conference on Machine learning, pages 373-380, 2004. [3] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993- 1022, 2003. [4] D. Cohn and H. Chang. Learning to probabilistically identify authoritative documents. In Proc. of the Seventeenth International Conference on Machine Learning, pages 167-174, 2000. [5] D. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In Advances in Neural Information Processing Systems 13, pages 430-436, 2001. [6] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391-407, 1990. [7] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1-38, 1977. [8] S. Geman and D. Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on Pattern recognition and Machine Intelligence, 6:721-741, 1984. [9] T. Hofmann. Probabilistic latent semantic indexing. In Proc. of Uncertainty in Articial Intelligence, pages 289-296, 1999. [10] T. Hofmann. Latent semantic models for collaborative filtering. ACM Transactions on Information System, 22(1):89- 115, 2004. [11] A. McCallum, A. Corrada-Emmanuel, and X. Wang. The author-recipient-topic model for topic and role discovery in social networks: Experiments with enron and academic email. Technical report, Computer Science, University of Massachusetts Amherst, 2004. [12] D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent dirichlet allocation. In Advances in Neural Information Processing Systems 20, 2007. [13] M. Ramoni, P. Sebastiani, and P. Cohen. Bayesian clustering by dynamics. Machine Learning, 47(1):91-121, 2002. 2010-9-17 Ed Chang@NCCU Invited Talk 83
  • 83. References (cont.) [14] R. Salakhutdinov, A. Mnih, and G. Hinton. Restricted boltzmann machines for collaborative ltering. In Proc. Of the 24th international conference on Machine learning, pages 791-798, 2007. [15] E. Spertus, M. Sahami, and O. Buyukkokten. Evaluating similarity measures: a large-scale study in the orkut social network. In Proc. of the 11th ACM SIGKDD international conference on Knowledge discovery in data mining, pages 678-684, 2005. [16] M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griths. Probabilistic author-topic models for information discovery. In Proc. of the 10th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 306- 315, 2004. [17] A. Strehl and J. Ghosh. Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal on Machine Learning Research (JMLR), 3:583-617, 2002. [18] T. Zhang and V. S. Iyengar. Recommender systems using linear classiers. Journal of Machine Learning Research, 2:313-334, 2002. [19] S. Zhong and J. Ghosh. Generative model-based clustering of documents: a comparative study. Knowledge and Information Systems (KAIS), 8:374-384, 2005. [20] L. Admic and E. Adar. How to search a social network. 2004 [21] T.L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, pages 5228-5235, 2004. [22] H. Kautz, B. Selman, and M. Shah. Referral Web: Combining social networks and collaborative filtering. Communitcations of the ACM, 3:63-65, 1997. [23] R. Agrawal, T. Imielnski, A. Swami. Mining association rules between sets of items in large databses. SIGMOD Rec., 22:207-116, 1993. [24] J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the Fourteenth Conference on Uncertainty in Artifical Intelligence, 1998. [25] M.Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Trans. Inf. Syst., 22(1):143- 177, 2004. 2010-9-17 Ed Chang@NCCU Invited Talk 84
  • 84. References (cont.) [26] B.M. Sarwar, G. Karypis, J.A. Konstan, and J. Reidl. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International World Wide Web Conference, pages 285-295, 2001. [27] M.Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Trans. Inf. Syst., 22(1):143- 177, 2004. [28] B.M. Sarwar, G. Karypis, J.A. Konstan, and J. Reidl. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International World Wide Web Conference, pages 285-295, 2001. [29] M. Brand. Fast online svd revisions for lightweight recommender systems. In Proceedings of the 3rd SIAM International Conference on Data Mining, 2003. [30] D. Goldbberg, D. Nichols, B. Oki and D. Terry. Using collaborative filtering to weave an information tapestry. Communication of ACM 35, 12:61-70, 1992. [31] P. Resnik, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: An open architecture for aollaborative filtering of netnews. In Proceedings of the ACM, Conference on Computer Supported Cooperative Work. Pages 175-186, 1994. [32] J. Konstan, et al. Grouplens: Applying collaborative filtering to usenet news. Communication ofACM 40, 3:77-87, 1997. [33] U. Shardanand and P. Maes. Social information filtering: Algorithms for automating “word of mouth”. In Proceedings of ACM CHI, 1:210-217, 1995. [34] G. Kinden, B. Smith and J. York. Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Computing, 7:76-80, 2003. [35] T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning Journal 42, 1:177- 196, 2001. [36] T. Hofmann and J. Puzicha. Latent class models for collaborative filtering. In Proceedings of International Joint Conference in Artificial Intelligence, 1999. [37] http://www.cs.carleton.edu/cs_comps/0607/recommend/recommender/collaborativefiltering.html [38] E. Y. Chang, et. al., Parallelizing Support Vector Machines on Distributed Machines, NIPS, 2007. [39] Wen-Yen Chen, Dong Zhang, and E. Y. Chang, Combinational Collaborative Filtering for personalized community recommendation, ACM KDD 2008. [40] Y. Sun, W.-Y. Chen, H. Bai, C.-j. Lin, and E. Y. Chang, Parallel Spectral Clustering, ECML 2008. [41] Wen-Yen Chen, Jon Chu, Junyi Luan, Hongjie Bai, Yi Wang, and Edward Y. Chang , Collaborative Filtering for Orkut Communities: Discovery of User Latent Behavior, International World Wide Web Conference (WWW) Madrid, Spain, April 2009 . [42] Yi Wang, Hongjie Bai, Matt Stanton, Wen-Yen Chen, and Edward Y. Chang, PLDA: Parallel Latent Dirichlet Allocation, International Conference on Algorithmic Aspects in Information and Management, June 2009. 2010-9-17 Ed Chang@NCCU Invited Talk 85