Summarization for Dragon Star
                    Program
               (Renmin Univ, Beijing, 5.21~5.27, 2012)



                         Yueshen Xu
                       xuyueshen@163.com
                      Zhejiang University




05/28/12                                                 ZJU
Overview

 Narration
   What they addressed
      Program Profile
      Knowledge and Expertise
 Argumentation                       No
   What I think over               Dazzle
      Research and Research Mode
      Potpourri
 Discussion




05/28/12                                     ZJU
Organizer and Lecturer

   Organizer              Lecturer
                                               • Classification                     • Network Model
                                               • Transfer                           • Relationship
  An                                             Learning                             Mining over
amiable                                                                               DBLP
 lady

             CuiPing Li   Prof. Qiang Yang, HKUST                 Prof. Jiawei Han, UIUC


                                               • Online Group                        • Mining on
                                                Behavior over                          Uncertain
                                                Social Network                         Data



                                                 guest
                             Prof. Liu Huan,                       Prof. Jian Pei, SFU
              Jun He               ASU
                                                 appearanc
                                                 e
  05/28/12                                                                                  ZJU
Curriculum

 Contents
   Mainly about Data Mining
   A little about machine learning and database
 Base + Advance
   Base: All should know
   Advance: Only a few know
                                                     6:30
 Syllabus
   Tight and tired
 Participation                                    Prof. Liu
   On time, in time and full time



05/28/12                                                 ZJU
Attention
                             • No
                             qualification   • What you research is to what you
                                             meet.
   No comment, no guess, just what it’s what
   No topics, no transformation and no speculation
                                       • What they told me are
   No detail, just summarization      summarization
   Further study resource repository • Digestitnot too muchit
                                       • Learn for needing
      http://www.cse.ust.hk/~qyang/2012DStar/
      http://www.cs.uiuc.edu/~hanj/dragon12/info12.htm
      Ask for me
      Ask for me all is OK




05/28/12                                                                      ZJU
Prof. Yang
    Classification & Transfer Learning
 Classification                                Prof. Yang, can
   Decision Trees                              you speak a little
   Neural Networks                             faster?
      Replaced by SVM
   Bayesian Classifiers
                                             Just Summarization,
      Conditional Independence
                                             little detail
      Naïve Bayesian Network
   Support Vector Machines
      Little about why, mainly about what
   Ensemble Classifiers
      Bagging and Boost (Ada boost)
      Random Forest
   Collaborative Filtering
      A little

05/28/12                                                             ZJU
Prof. Yang
    Classification & Transfer Learning
 Transfer Learning
   What he and his students good at and maybe only good at




05/28/12                                                      ZJU
Prof. Yang
    Classification & Transfer Learning
 I don’t know, but I can bamboozle you
   Transfer Learning
    The ability of a system to recognize and apply knowledge and
    skills learned in previous tasks to novel tasks or new domains




   Easy to talk, hard to do




05/28/12                                                             ZJU
Prof. Yang
    Classification & Transfer Learning
 What they focus on
      Heterogeneous Transfer Learning
      Source-free selection transfer learning
      Multi-task transfer learning
      Transfer Learning for Link Prediction
      EigenTransfer: A Unified Framework for Transfer Learning




05/28/12                                                          ZJU
Prof. Han
  Information Network Model & Relationship Mining over DBLP

 An amiable and rigorous old senior
   He is involved in the whole process of each paper, ‘Cause he knows
    details well
   He would like to answer every questions
   Never acting superior
 Information Network Model:
    Great powers of conception
   Fundamental theory of network analysis
   Not just about social network. Take a glance at Prof. Han’s contents:
     ─ Network Science
     ─ Measure of Metrics of Networks
     ─ Models of Network Formation

05/28/12                                                            ZJU
Prof. Han
  Information Network Model & Relationship Mining over DBLP

 Network Science  Plentiful  Models of Network Formation
   Social network                       Explain how social networks
   Social network example                should be organized
   Friendship networks vs. blogosphere  Model the graph generation
 Other Network                           process of social networks
   Communication Network                  Probabilistic Distribution
                                           Power Law  Long tail law
   Biological Network
                                           The Erdös-Rényi (ER) Model
                                           The Watts and Strogatz Model
    Network model and their
    representation
    Too many, just list some:
    • PageRank, Bipartite Networks

05/28/12                                                           ZJU
Prof. Han
  Information Network Model & Relationship Mining over DBLP

 All based on DBLP
   Why? ‘Cause it’s heterogeneous networks
   Clustering, Ranking in information networks
 Problems  What they mine




05/28/12                                                   ZJU
Prof. Han
  Information Network Model & Relationship Mining over DBLP

 Classification of information networks
   Is VLDB a conference belonging to DB or DM?
 Similarity Search in information networks
   DBLP
    Who are the most similar to “Christos Faloutsos”?
   IMDB
    Which movies are the most similar to “Little Miss Sunshine”?
   E-Commerce
    Which products are the most similar to “Kindle”?

       Y. Sun, J. Han, X. Yan, P. S. Yu, and Tianyi Wu, “PathSim: Meta Path-Based Top-
       K Similarity Search in Heterogeneous Information Networks”, VLDB'11


05/28/12                                                                            ZJU
Prof. Han
  Information Network Model & Relationship Mining over DBLP

 What they take advantage of?
   Network Schema, called Meta-Path, take an example:




05/28/12                                                   ZJU
Prof. Han
  Information Network Model & Relationship Mining over DBLP

 Relationship Prediction in Information Networks
   Whom should I collaborate with?
   Which paper should I cite for this topic?
   Whom else should I follow on Twitter?
       Y.Sun, R.Barber, M.Gupta, C.Aggarwal and J.Han. “Co-author Relationship
       Prediction in Hererogeneous Bibliographic Networks”, ASONAM’11, July 2011
 Role Discovery: Extraction Semantic Information from
  Links
       Ref. C. Wang, J. Han, et al., “Mining Advisor-Advisee Relationships from
       Research Publication Networks”, SIGKDD 2010
   Data Cleaning and Trust Analysis by InfoNet Analysis
       Xiaoxin Yin, Jiawei Han, Philip S. Yu, “Truth Discovery with Multiple Conflicting
       Information Providers on the Web”, TKDE’08


05/28/12                                                                                   ZJU
Prof. Han
  Information Network Model & Relationship Mining over DBLP

 Automatic discovery of Entity Pages
   (T. Weinger, Jiawei Han et al. WWW’11)
   Given a reference page, can we find entity pages of the same
    Type?
 14 pages references




05/28/12                                                           ZJU
Prof. Pei
  Uncertain Data Mining
 Mining uncertain data  Probability is vital
      Models and Representation of uncertain data
      Mining Frequent Patterns
      Classification
      Clustering
      Outlier Detection
 Topic-Oriented
      Nothing to do with database, namely nothing to do with query
      Learn yourself
      Outlier Detection on uncertain data is a challenge
      This is what I most concern about from point view of knowledge


05/28/12                                                                ZJU
Our Thoughts

 As for pure research, there is no speculation
   What’s the proper mode for research?
      Method-Oriented: Prof. Yang
           All about transfer learning
           All I have to do is solve practical problems with transfer learning, eg.
           Link predication.
      Application-Oriented: Prof. Han
           Find fun in DBLP, all about relationship mining
            Every part of Prof. Han’s method is not new, but leading by the problem,
           the whole framework is innovative
      Topic-Oriented: Prof. Pei
           Clustering and outlier detection on uncertain data
           He and his team is dependent on solid accumulation

05/28/12                                                                              ZJU
Our Thoughts
                                           Is the problem valuable? Can it
                                           be solved by us?
 How do they do research?                         Revise many
 Accumulation  Real world problem  Valuable research problem 
                                                   times
    Discuss and test to find a suitable method  Experiment  Paper
 Accumulated by means of and hard
                  Experience imitation                Test again and again.
                          work                        Accumulation, experience,
 Not just scan ppt, but do experiments others      had did
                                                      judgment….
 Solve problems others had solved
 Different field, different mode
 Application-Oriented: flexible
 Method-Oriented: mathematics
 Topic-Oriented: accumulation
 Work as a Team


05/28/12                                                                     ZJU
Our Thoughts

 Prof. Pei: Small data
   Can you learn a model just with a little data?
   Data collection is very costly
   Since you can know what you want using 1GB, why do you use
    1TB with so many machines?
   Prof. Pei: do we really need experiments? No, provided that what
    you have done is really convictive./ Yes, ‘cause our job is not
    convictive enough.
 Read every helpful paper
 Research should be labeled by researchers, their teams
  and their labs. Everyone has his own pan, not that all
  guys just have one.

05/28/12                                                           ZJU
Our Thoughts

 20/80 Law
 I have fallen behind from others
 I had lost myself in clouds of research for one year. I
  hope I can find my way.




05/28/12                                                    ZJU
Discussion

05/28/12                ZJU

Summarization for dragon star program

  • 1.
    Summarization for DragonStar Program (Renmin Univ, Beijing, 5.21~5.27, 2012) Yueshen Xu xuyueshen@163.com Zhejiang University 05/28/12 ZJU
  • 2.
    Overview  Narration  What they addressed  Program Profile  Knowledge and Expertise  Argumentation No  What I think over Dazzle  Research and Research Mode  Potpourri  Discussion 05/28/12 ZJU
  • 3.
    Organizer and Lecturer  Organizer  Lecturer • Classification • Network Model • Transfer • Relationship An Learning Mining over amiable DBLP lady CuiPing Li Prof. Qiang Yang, HKUST Prof. Jiawei Han, UIUC • Online Group • Mining on Behavior over Uncertain Social Network Data guest Prof. Liu Huan, Prof. Jian Pei, SFU Jun He ASU appearanc e 05/28/12 ZJU
  • 4.
    Curriculum  Contents  Mainly about Data Mining  A little about machine learning and database  Base + Advance  Base: All should know  Advance: Only a few know 6:30  Syllabus  Tight and tired  Participation Prof. Liu  On time, in time and full time 05/28/12 ZJU
  • 5.
    Attention • No qualification • What you research is to what you meet.  No comment, no guess, just what it’s what  No topics, no transformation and no speculation • What they told me are  No detail, just summarization summarization  Further study resource repository • Digestitnot too muchit • Learn for needing  http://www.cse.ust.hk/~qyang/2012DStar/  http://www.cs.uiuc.edu/~hanj/dragon12/info12.htm  Ask for me  Ask for me all is OK 05/28/12 ZJU
  • 6.
    Prof. Yang  Classification & Transfer Learning  Classification Prof. Yang, can  Decision Trees you speak a little  Neural Networks faster?  Replaced by SVM  Bayesian Classifiers Just Summarization,  Conditional Independence little detail  Naïve Bayesian Network  Support Vector Machines  Little about why, mainly about what  Ensemble Classifiers  Bagging and Boost (Ada boost)  Random Forest  Collaborative Filtering  A little 05/28/12 ZJU
  • 7.
    Prof. Yang  Classification & Transfer Learning  Transfer Learning  What he and his students good at and maybe only good at 05/28/12 ZJU
  • 8.
    Prof. Yang  Classification & Transfer Learning  I don’t know, but I can bamboozle you  Transfer Learning The ability of a system to recognize and apply knowledge and skills learned in previous tasks to novel tasks or new domains  Easy to talk, hard to do 05/28/12 ZJU
  • 9.
    Prof. Yang  Classification & Transfer Learning  What they focus on  Heterogeneous Transfer Learning  Source-free selection transfer learning  Multi-task transfer learning  Transfer Learning for Link Prediction  EigenTransfer: A Unified Framework for Transfer Learning 05/28/12 ZJU
  • 10.
    Prof. Han Information Network Model & Relationship Mining over DBLP  An amiable and rigorous old senior  He is involved in the whole process of each paper, ‘Cause he knows details well  He would like to answer every questions  Never acting superior  Information Network Model:  Great powers of conception  Fundamental theory of network analysis  Not just about social network. Take a glance at Prof. Han’s contents: ─ Network Science ─ Measure of Metrics of Networks ─ Models of Network Formation 05/28/12 ZJU
  • 11.
    Prof. Han Information Network Model & Relationship Mining over DBLP  Network Science  Plentiful  Models of Network Formation  Social network  Explain how social networks  Social network example should be organized  Friendship networks vs. blogosphere  Model the graph generation  Other Network process of social networks  Communication Network  Probabilistic Distribution  Power Law  Long tail law  Biological Network  The Erdös-Rényi (ER) Model  The Watts and Strogatz Model Network model and their representation Too many, just list some: • PageRank, Bipartite Networks 05/28/12 ZJU
  • 12.
    Prof. Han Information Network Model & Relationship Mining over DBLP  All based on DBLP  Why? ‘Cause it’s heterogeneous networks  Clustering, Ranking in information networks  Problems  What they mine 05/28/12 ZJU
  • 13.
    Prof. Han Information Network Model & Relationship Mining over DBLP  Classification of information networks  Is VLDB a conference belonging to DB or DM?  Similarity Search in information networks  DBLP Who are the most similar to “Christos Faloutsos”?  IMDB Which movies are the most similar to “Little Miss Sunshine”?  E-Commerce Which products are the most similar to “Kindle”? Y. Sun, J. Han, X. Yan, P. S. Yu, and Tianyi Wu, “PathSim: Meta Path-Based Top- K Similarity Search in Heterogeneous Information Networks”, VLDB'11 05/28/12 ZJU
  • 14.
    Prof. Han Information Network Model & Relationship Mining over DBLP  What they take advantage of?  Network Schema, called Meta-Path, take an example: 05/28/12 ZJU
  • 15.
    Prof. Han Information Network Model & Relationship Mining over DBLP  Relationship Prediction in Information Networks  Whom should I collaborate with?  Which paper should I cite for this topic?  Whom else should I follow on Twitter? Y.Sun, R.Barber, M.Gupta, C.Aggarwal and J.Han. “Co-author Relationship Prediction in Hererogeneous Bibliographic Networks”, ASONAM’11, July 2011  Role Discovery: Extraction Semantic Information from Links Ref. C. Wang, J. Han, et al., “Mining Advisor-Advisee Relationships from Research Publication Networks”, SIGKDD 2010  Data Cleaning and Trust Analysis by InfoNet Analysis Xiaoxin Yin, Jiawei Han, Philip S. Yu, “Truth Discovery with Multiple Conflicting Information Providers on the Web”, TKDE’08 05/28/12 ZJU
  • 16.
    Prof. Han Information Network Model & Relationship Mining over DBLP  Automatic discovery of Entity Pages  (T. Weinger, Jiawei Han et al. WWW’11)  Given a reference page, can we find entity pages of the same Type?  14 pages references 05/28/12 ZJU
  • 17.
    Prof. Pei Uncertain Data Mining  Mining uncertain data  Probability is vital  Models and Representation of uncertain data  Mining Frequent Patterns  Classification  Clustering  Outlier Detection  Topic-Oriented  Nothing to do with database, namely nothing to do with query  Learn yourself  Outlier Detection on uncertain data is a challenge  This is what I most concern about from point view of knowledge 05/28/12 ZJU
  • 18.
    Our Thoughts  Asfor pure research, there is no speculation  What’s the proper mode for research?  Method-Oriented: Prof. Yang All about transfer learning All I have to do is solve practical problems with transfer learning, eg. Link predication.  Application-Oriented: Prof. Han Find fun in DBLP, all about relationship mining Every part of Prof. Han’s method is not new, but leading by the problem, the whole framework is innovative  Topic-Oriented: Prof. Pei Clustering and outlier detection on uncertain data He and his team is dependent on solid accumulation 05/28/12 ZJU
  • 19.
    Our Thoughts Is the problem valuable? Can it be solved by us?  How do they do research? Revise many  Accumulation  Real world problem  Valuable research problem  times Discuss and test to find a suitable method  Experiment  Paper  Accumulated by means of and hard Experience imitation Test again and again. work Accumulation, experience,  Not just scan ppt, but do experiments others had did judgment….  Solve problems others had solved  Different field, different mode  Application-Oriented: flexible  Method-Oriented: mathematics  Topic-Oriented: accumulation  Work as a Team 05/28/12 ZJU
  • 20.
    Our Thoughts  Prof.Pei: Small data  Can you learn a model just with a little data?  Data collection is very costly  Since you can know what you want using 1GB, why do you use 1TB with so many machines?  Prof. Pei: do we really need experiments? No, provided that what you have done is really convictive./ Yes, ‘cause our job is not convictive enough.  Read every helpful paper  Research should be labeled by researchers, their teams and their labs. Everyone has his own pan, not that all guys just have one. 05/28/12 ZJU
  • 21.
    Our Thoughts  20/80Law  I have fallen behind from others  I had lost myself in clouds of research for one year. I hope I can find my way. 05/28/12 ZJU
  • 22.