Finding Nobel prize window by         PageRank           FUJITA Yuji, Turnstone Research Inst., Nihon Univ.
Graph and Network●   Graph theory    –   Part of mathmatics●   Network science    –   Inter-disciplinary study of        ●...
Graph theory    Date back to 1730s●   Objectives    –   Lower dimensional topological structure    –   Combinatorial and t...
Network science●   Objectives    –   Statistics and dynamics    –   Social, Financial, Technological themes●   Topics    –...
Bibliometrics●   Quantitative evaluation of (academic) documents●   Conventional approach: number of citation●   Citation ...
Citation vs PageRankBest cited do not have the best score
Top articlesClinical     Effects of an angiotensin-converting-enzyme inhibitor,Medicine     ramipril, on cardiovascular ev...
Graph expression●   Embedding: drawing on sphere/space●   Matrix
PageRank overview●   Link from a great node is more important      ↔ degree as a score●   But how can it be done? - the pr...
Finite state Markov chain●   Node: status, Transition matrix: moving along    the edge    –   Row: linked (cited) vector  ...
Steady state gives PageRank●   Some Markov chain has a unique steady state●   Steady state given by eigenvector    –   A v...
Why PageRank works?●   Not all citations are equally significant●   Less citation can be a signal of even more    great wo...
Meanings of citation●   Brainchild●   History●   Respect●   Identity    something more than <a>tag</a>
To reach the top●   Many great children    –   Each child give birth to many works    = great scientific achievement
Limitations●   Prof. Yamanakas work (CELL, 2006) has poor    PageRank score, which is a shame to say at    least.●   SPAM ...
To practice●   Get citation data    –   Product or scrape●   Transition matrix    –   Random surfer model●   Iterate matri...
Data●   Tomson-Reuter, Elsevier, …●   Scrape the web (arXive..)●   Common SQL server will hold the data●   NLP required
Transition matrix●   Not all transition matrix has unique    eigenvector●   Random surfer model: let the graph be    conne...
Adaptation to papers●   Old paper cannot cite newer one    –   Non-uniform random surfing●   Adjust decay rate
Sparse matrix●   Most of the elements are Zeros●   Compressed form reduces space and time●   libcsparse    –   made by UFL...
ReferenceL Page, S Brin, R Motwani, T      The PageRank citation ranking: bringing order toWinograd                       ...
Acknowledgment●   Mr. Kazuhisa Takei for ruby interface of    libcsparse in ffi●   Dr. Mari Jibu for citation data handlin...
About me●   2010- Turnstone Research, Inst.●   2011- Nihon Univ. researcher●   2009-2010 finance sector●   2007-2009 Netwo...
Upcoming SlideShare
Loading in …5
×

finding nobel prize window by PageRank

416 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
416
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • 歴史的にはグラフ理論が先行しているが , ネットワーク科学自体は社会ネットワーク分析がコンピュータ登場以前から社会学者によって実践されてきた . 利用可能な情報や処理能力の増大から , 統計的な手法が意味を持ったり , 統計力学が援用されるようになったのは , インターネットの普及以降
  • 歴史的にはグラフ理論が先行しているが , ネットワーク科学自体は社会ネットワーク分析がコンピュータ登場以前から社会学者によって実践されてきた . 利用可能な情報や処理能力の増大から , 統計的な手法が意味を持ったり , 統計力学が援用されるようになったのは , インターネットの普及以降
  • 歴史的にはグラフ理論が先行しているが, ネットワーク科学自体は社会ネットワーク分析がコンピュータ登場以前から社会学者によって実践されてきた. 利用可能な情報や処理能力の増大から, 統計的な手法が意味を持ったり, 統計力学が援用されるようになったのは, インターネットの普及以降
  • The Protein Data Bank Effects of an angiotensin-converting-enzyme inhibitor, ramipril, on cardiovascular events in high-risk patients The genome sequence of Drosophila melanogaster String theory and noncommutative geometry The complete atomic structure of the large ribosomal subunit at 2.4 angstrom resolution Smac, a mitochondrial protein that promotes cytochrome c-dependent caspase activation by eliminating IAP inhibition Identification of DIABLO, a mammalian protein that promotes apoptosis by binding to and antagonizing IAP proteins The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 Class switch recombination and hypermutation require activation-induced cytidine deaminase (AID), a potential RNA editing enzyme Cytotoxic T lymphocyte-associated antigen 4 plays an essential role in the function of CD25(+)CD4(+) regulatory cells that control intestinal inflammationnil
  • 任意のグラフは 3 次元に埋め込み可能 . 種数 0 の曲面 ( 平面 ) に埋め込み可能なものや , そうでないものなど , 幾何学的な表現を与えたものの他に , 行列として表現することもできる . そして今回世話になるのは , こっちのほう .
  • 先立つものが決まってないと今みてるノードのスコアも決まらないけど , それが決まらないと結局先立つノードのスコアも決まらないよ ? どうすりゃいいの ?
  • finding nobel prize window by PageRank

    1. 1. Finding Nobel prize window by PageRank FUJITA Yuji, Turnstone Research Inst., Nihon Univ.
    2. 2. Graph and Network● Graph theory – Part of mathmatics● Network science – Inter-disciplinary study of ● Graph theory ● Physics ● Social science ● Informatics ● particular topics from finance, biology, ...
    3. 3. Graph theory Date back to 1730s● Objectives – Lower dimensional topological structure – Combinatorial and topological studies● Topics – Four colour theorem – Invariants From Wikipedia
    4. 4. Network science● Objectives – Statistics and dynamics – Social, Financial, Technological themes● Topics – 6 degrees of separation – Scale-free networks – PageRank Title:syms.eps Creator:gnuplot 4.0 patchlevel 0 CreationDate:Sun Jan 13 23:04:28 2008
    5. 5. Bibliometrics● Quantitative evaluation of (academic) documents● Conventional approach: number of citation● Citation network – Node: paper Edge: citation – directed graph● More true metric: PageRank
    6. 6. Citation vs PageRankBest cited do not have the best score
    7. 7. Top articlesClinical Effects of an angiotensin-converting-enzyme inhibitor,Medicine ramipril, on cardiovascular events in high-risk patientsClinical Vitamin E supplementation and cardiovascular events in high-Medicine risk patientsImmunology Cytotoxic T lymphocyte-associated antigen 4 plays an essential role in the function of CD25(+)CD4(+) regulatory cells that control intestinal inflammationImmunology Immunologic self-tolerance maintained by CD25(+)CD4(+) regulatory T cells constitutively expressing cytotoxic T lymphocyte-associated antigen 4Physics String theory and noncommutative geometryPhysics Large-N limit of non-commutative gauge theoriesMolecular Smac, a mitochondrial protein that promotes cytochrome c-Biology & dependent caspase activation by eliminating IAP inhibitionGeneticsMolecular Identification of DIABLO, a mammalian protein that promotesBiology & apoptosis by binding to and antagonizing IAP proteinsGeneticsMolecular Systematic variation in gene expression patterns in human
    8. 8. Graph expression● Embedding: drawing on sphere/space● Matrix
    9. 9. PageRank overview● Link from a great node is more important ↔ degree as a score● But how can it be done? - the process can be lost in a loop.. Figure from “The PageRank Citation Ranking: Bringing Order to the Web”
    10. 10. Finite state Markov chain● Node: status, Transition matrix: moving along the edge – Row: linked (cited) vector – Column: link (cite) vector● Probability vector refreshed by multiplying the transition matrix
    11. 11. Steady state gives PageRank● Some Markov chain has a unique steady state● Steady state given by eigenvector – A vector such that Mx = ax● Eigenvector given by linear algebra – Widely known how to compute
    12. 12. Why PageRank works?● Not all citations are equally significant● Less citation can be a signal of even more great work – Fundamental work not cited directly● Academic cascade
    13. 13. Meanings of citation● Brainchild● History● Respect● Identity something more than <a>tag</a>
    14. 14. To reach the top● Many great children – Each child give birth to many works = great scientific achievement
    15. 15. Limitations● Prof. Yamanakas work (CELL, 2006) has poor PageRank score, which is a shame to say at least.● SPAM issues; not so serious as naiive citation count
    16. 16. To practice● Get citation data – Product or scrape● Transition matrix – Random surfer model● Iterate matrix-vector product operation – Sparse matrix operation
    17. 17. Data● Tomson-Reuter, Elsevier, …● Scrape the web (arXive..)● Common SQL server will hold the data● NLP required
    18. 18. Transition matrix● Not all transition matrix has unique eigenvector● Random surfer model: let the graph be connected and get out of loop + =
    19. 19. Adaptation to papers● Old paper cannot cite newer one – Non-uniform random surfing● Adjust decay rate
    20. 20. Sparse matrix● Most of the elements are Zeros● Compressed form reduces space and time● libcsparse – made by UFL people and others, distributed under LGPL
    21. 21. ReferenceL Page, S Brin, R Motwani, T The PageRank citation ranking: bringing order toWinograd the web.Dylan Walker1,2 , Huafeng Ranking Scientific Publications Using a SimpleXie2,3 , Koon-Kiu Yan1,2 , Model of Network TrafficSergei Maslov2P. Chen,1, ∗ H. Xie,2, 3, † S. Finding Scientific Gems with GoogleMaslov,3, ‡ and S. Redner1, §Hajime BABA Google の秘密 - PageRank 徹底解説
    22. 22. Acknowledgment● Mr. Kazuhisa Takei for ruby interface of libcsparse in ffi● Dr. Mari Jibu for citation data handling● Dr. Wataru Souma for network scientific suggestions and comments● Dr. Yoshi Fujiwara for choosing this topic and invitation● Free software developers
    23. 23. About me● 2010- Turnstone Research, Inst.● 2011- Nihon Univ. researcher● 2009-2010 finance sector● 2007-2009 Network analysis at NiCT● 2001-2007 Venture firm CEO● 1994-2002 Discrete math graduate student● Ski, climbing, bicycle, art

    ×