The Structure of Computer Science Knowledge Network

1,524 views
1,310 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,524
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Pham Manh Cuong
  • Pham Manh Cuong
  • Pham Manh Cuong
  • The Structure of Computer Science Knowledge Network

    1. 1. The Structure of the Computer Science Knowledge Network Manh Cuong Pham , Ralf Klamma Information Systems and Database Technology RWTH Aachen, Germany Odense, Denmark, August 09, 2010 ASONAM 2010
    2. 2. Agenda <ul><li>Introduction </li></ul><ul><li>SNA as a knowledge discovery method </li></ul><ul><li>Data sets: DBLP and CiteSeerX </li></ul><ul><li>Network visualization </li></ul><ul><li>Venue ranking </li></ul><ul><li>Conclusions and Outlook </li></ul>
    3. 3. Introduction <ul><li>Digital libraries (in computer science) </li></ul><ul><ul><li>DBLP, ACM DL, IEEE Explorer, CiteSeerX, etc. </li></ul></ul><ul><ul><li>Digital media for scientific knowledge conservation </li></ul></ul><ul><ul><ul><li>Publications </li></ul></ul></ul><ul><ul><ul><li>Venues </li></ul></ul></ul><ul><ul><li>Development of research communities & research areas </li></ul></ul><ul><ul><li>Knowledge discovery: Citation analysis, usage-analysis, etc. </li></ul></ul><ul><ul><li>Digital libraries in Web 2.0: Mendeley, ResearchGate etc. </li></ul></ul><ul><li>Problems </li></ul><ul><ul><li>Structure of computer science knowledge </li></ul></ul><ul><ul><li>Existing research fields </li></ul></ul><ul><ul><li>The interconnection between fields </li></ul></ul>VLDB community in 2006 (DBLP) VLDB community in 1990 (DBLP)
    4. 4. Motivations <ul><li>Scientometrics </li></ul><ul><ul><li>Unit of analysis: journals </li></ul></ul><ul><ul><li>Knowledge mapping: building, visualizing and analyzing the knowledge network </li></ul></ul><ul><ul><li>Methods: </li></ul></ul><ul><ul><ul><li>Citation analysis [Boyack 2005] </li></ul></ul></ul><ul><ul><ul><li>Content analysis </li></ul></ul></ul><ul><ul><ul><li>Log-data (usage data) analysis [Bollen 2009] </li></ul></ul></ul><ul><ul><li>Data sets: </li></ul></ul><ul><ul><ul><li>Journal Citation Index (JCR) </li></ul></ul></ul><ul><ul><ul><li>Science Citation Index (SCI) </li></ul></ul></ul><ul><ul><ul><li>Social Science Citation Index (SSCI), etc. </li></ul></ul></ul><ul><li>Problem </li></ul><ul><ul><li>Computer science conferences </li></ul></ul>
    5. 5. Our Approach <ul><li>Combination of large-scale digital libraries </li></ul><ul><ul><li>DBLP </li></ul></ul><ul><ul><li>CiteSeer X </li></ul></ul><ul><li>Citation analysis </li></ul><ul><ul><li>Bibliographical coupling at venue level (conferences, journals) </li></ul></ul><ul><ul><li>Similarity measures </li></ul></ul><ul><li>SNA as a knowledge discovery method </li></ul><ul><ul><li>Visual analytics </li></ul></ul><ul><ul><li>Cluster analysis </li></ul></ul><ul><ul><li>SNA measures: PageRank, betweenness, hub, authority scores etc. </li></ul></ul>
    6. 6. Data Sets <ul><li>DBLP (http://www.informatik.uni-trier.de/~ley/db/) </li></ul><ul><ul><li>788,259 author’s names </li></ul></ul><ul><ul><li>1,226,412 publications </li></ul></ul><ul><ul><li>3,490 venues (conferences, workshops, journals) </li></ul></ul><ul><li>CiteSeerX (http://citeseerx.ist.psu.edu/) </li></ul><ul><ul><li>7,385,652 publications (including publications in reference lists) </li></ul></ul><ul><ul><li>22,735,240 citations </li></ul></ul><ul><ul><li>Over 4 million author’s names </li></ul></ul><ul><li>Combination </li></ul><ul><ul><li>Canopy clustering [ McCallum 2000 ] </li></ul></ul><ul><ul><li>Result: 864,097 matched pairs </li></ul></ul><ul><ul><li>On average: venues cite 2306 and </li></ul></ul><ul><ul><li>are cited 2037 times </li></ul></ul>
    7. 7. Network Creation and Pre-processing <ul><li>Knowledge network </li></ul><ul><ul><li>Aggregate bibliography coupling counts at venue level </li></ul></ul><ul><ul><li>Undirected graph G(V, E) , where V : venues, E : edges weighted by cosine similarity </li></ul></ul><ul><ul><li>Threshold: </li></ul></ul><ul><ul><li>Clustering: density-based algorithm [ Neuman 2004, Clauset 2004 ] </li></ul></ul><ul><ul><li>Network visualization: force-directed paradigm [ Fruchterman 1991 ] </li></ul></ul><ul><li>Knowledge flow network </li></ul><ul><ul><li>Aggregate bibliography coupling counts at venue level </li></ul></ul><ul><ul><li>Threshold: citation counts >= 50 </li></ul></ul><ul><ul><li>Domains from Microsoft Academic Search ( http://academic.research.microsoft.com/) </li></ul></ul>
    8. 8. Knowledge Network: the Visualization
    9. 9. Knowledge Network: Clustering
    10. 10. Interdisciplinary Venues: Top Betweenness Centrality
    11. 11. High Prestige Series: Top PageRank
    12. 12. Conclusions and Future Research <ul><li>SNA does help to gain an insight into the computer science knowledge </li></ul><ul><li>Knowledge network in computer science </li></ul><ul><ul><li>Highly clustered, large clusters form the core of computer science research </li></ul></ul><ul><ul><li>Research fields are interconnected </li></ul></ul><ul><ul><li>Interdisciplinary venues </li></ul></ul><ul><li>Outlook </li></ul><ul><ul><li>More digital libraries should be integrated: ACM, IEEE, CEUR-WS.org, etc. </li></ul></ul><ul><ul><li>Usage analysis </li></ul></ul><ul><ul><li>Dynamic analysis of knowledge network </li></ul></ul>
    13. 13. Questions ? http://bosch.informatik.rwth-aachen.de:5080/AERCS/

    ×