2010 CRC PhD Student Conference

              Supporting the Exploration of Research Spaces
2010 CRC PhD Student Conference

The activity element refers to how active the researchers, institutions, and organizat...
2010 CRC PhD Student Conference

Scholar is a subset of the Google search index consisting of full-text journal article...
2010 CRC PhD Student Conference

3    Methodology
In order to find out what are the key problems people encounter when t...
2010 CRC PhD Student Conference

Doan, A., Ramakrishnan, R., Chen, F., DeRose, P., Lee, Y., McCann, R., Sayyadian, M., ...
Upcoming SlideShare
Loading in …5



Published on

Published in: Technology, Design
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. 2010 CRC PhD Student Conference Supporting the Exploration of Research Spaces Chwhynny Overbeeke c.overbeeke@open.ac.uk Supervisors Enrico Motta, Tom Heath, Paul Mulholland Department Knowledge Media Institute Status Full-time Probation viva Before Starting date December 2009 1 Introduction It is often hard to make sense of what exactly is going on in the research community. What topics or researchers are new and emerging, gaining popularity, or disappearing? How does this happen and why? What are the key publications or events in a particular area? How can we understand whether geographical shifts are occurring in a research area? There are several tools available that allow users to explore different elements of a research area. However, making sense of the dynamics of a research area is still a very challenging task. This leads to my research question: How can we improve the level of support for people to explore the dynamics of a research commu- nity? 2 Framework and Background In order to answer this question we first need to identify the different elements, relations and dimensions that define a research area and put them into a framework. We then need to find existing tools that address these elements, and categorize them according to our framework in order to identify gaps in the current level of support. Some elements we already identified are: people, institutions and organizations, events, activity, popularity, publications, citations, time, geography, keywords, studentships, funding, impact, and technologies. The people element is about the researchers that are or were present in the research community, whilst the institutions and organizations element refers to the research groups, institutions, and organizations that are active within an area of research, and the affiliations the people within the community have with them. Events can be workshops, conferences, seminars, competitions, or any other kind of research-related happening. EventSeer1 is a service that aggregates all the calls for papers and event announcements that float around the web into one common, searchable tool. It keeps track of events, people, topics and organizations, and lists the most popular people, topics, and organizations per week. 1 http://www.eventseer.net Page 69 of 125
  2. 2. 2010 CRC PhD Student Conference The activity element refers to how active the researchers, institutions, and organizations are within the field, for instance event attendance or organization, or the number and frequency of publications and events. A tool that can be used to explore this is Faceted DBLP2 , a server interface for the DBLP server3 which provides bibliographic information on major computer science journals and proceedings [Ley 2002]. Faceted DBLP starts with some keyword and shows the result set along with a set of facets, e.g. distinguishing publication years, authors, venues, and publication types. The user can characterize the result set in terms of main research topics and filter it according to certain subtopics. There are GrowBag graphs available for keywords (number of hits/coverage). Popularity is about the interest that is displayed in a person, institution or organization, publica- tion, topic, technology, or event. WikiCFP4 is a service that helps organize and share academic information. Users can browse and add calls for papers per subject category, and users to add calls for papers to their own personal user list. Each call for paper has information on the event name, date, location, and deadline. WikiCFP also provides hourly updated lists of the most popular categories, calls for papers, and user lists. One indicator of topic popularity is the number of publications on a topic. There are many tools that show the number of publications per topic per year. PubSearch is a fully automatic web mining approach for the identification of research trends that searches and downloads scientific publications from web sites that typically include academic web pages [Tho et al. 2003]. It extracts citations which are stored in the tool’s Web Citation Database which is used to generate temporal document clusters and journal clusters. These clusters are then mined to find their interrelationships, which are used to detect trends and emerging trends for a specified research area. Another indicator of popularity is how often a publication or researcher is cited. Citations can also help identify relations between researchers through analysis of who is citing who and when, and what their affiliations are. Publish Or Perish is a piece of software that retrieves and analyzes academic citations [Harzing and Van der Wal 2008]. It uses Google Scholar5 to obtain raw citations, and analyzes them. It presents a wide range of citation metrics such as the total number of papers and citations, average number of citations per paper and author, the average number of papers per author and year, an analysis of number of authors per paper, et cetera. Topics, interests, and people evolve over time, and the makeup of the research community changes when people and organizations enter or leave certain research areas or change their direction. Some topics appear to be more established or densely represented in certain geographical areas, for instance because a prolific institution is located there and has attracted several experts on a particular topic, or because many events on a topic are held in that area. AuthorMapper6 is an online tool for visualizing scientific research. It searches journal articles from the SpringerLink7 and allows users to explore the database by plotting the location of authors, research topics and institutions on a world map. It also allows users to identify research trends through timeline graphs, statistics and regions. Keywords are an important indicator of a research area because they are the labels that have been put on publications or events by the people and organizations within that research area. Google 2 http://dblp.l3s.de/ 3 http://dblp.uni-trier.de/ 4 http://www.wikicfp.com/ 5 http://scholar.google.com/ 6 http://www.authormapper.com/ 7 http://www.springerlink.com/ Page 70 of 125
  3. 3. 2010 CRC PhD Student Conference Scholar is a subset of the Google search index consisting of full-text journal articles, technical re- ports, preprints, thesis, books, and web sites that are deemed ’scholarly’ [Noruzi 2005, Harzing and Van der Wal 2008]. Google Scholar has crawling and indexing agreements with several publishers. The system is based on keyword search only and its results are organized by a closely guarded relevance algorithm. The ’cited-by-x’ feature allows users to see by whom a publication was cited, and where. The availability of new studentships indicates that a research area is trying to attract new people. This may mean that the area is hoping to expand, change direction, or become more established. The availability of funding within a research area or topic is an indicator of the interest that is displayed in it, or the level of importance it is deemed to have at a particular time. The Postgraduate Studentships web site8 offers a search engine as well as a browsable list of study or funding opportunities organized by subjects, masters, PhD/doctoral and professional doctorates and a browsable list of general funders, funding universities and featured departments. The site also lists open days and fairs. The level of impact of the research carried out by a research group, institution, organization or individual researcher leads to their establishment in the research community, which in turn could lead to more citations and event attendance. The technologies element refers to the technologies that are developed within an area of research, and their impact, popularity and establishment. Research impact is on a small scale implemented into Scopus (http://www.scopus.com/), currently a preview-only tool which, amongst other things, identifies and matches an organization with all its research output, tracks how primary research is practically applied in patents and tracks the influence of peer-reviewed research on web literature. It covers nearly 18,000 titles from over 5,000 publishers, 40,000,000 records, scientific web pages, and articles-in-press. A tool that ranks publi- cations is DBPubs, a system for analyzing and exploring the content of database publications by combining keyword search with OLAP-style aggregations, navigation, and reporting [Baid et al. 2008]. It performs keyword search over the content of publications. The meta data (title, author, venue, year et cetera) provide OLAP static dimensions, which are combined with dynamic dimen- sions discovered from the content of the publications in the search result, such as frequent phrases, relevant phrases and topics. Based on the link structure between documents (i.e. citations) publi- cation ranks are computed, which are aggregated to find seminal papers, discover trends, and rank authors. Finally, we would like to discuss a more generic tool, DBLife9 [DeRose et al. 2007, Goldberg and Andrzejewski 2007, Doan et al. 2006], which is a prototype of a dynamic portal of current informa- tion for the database research community. It automatically discovers and revisits web pages and resources for the community, extracts information from them, and integrates it to present a unified view of people, organizations, papers, talks, et cetera. For example, it provides a chronological summary, has a browsable list of organizations and conferences, and it summarizes interesting new facts for the day such as new publications, events, or projects. It also provides community statistics including top cited people, top h-indexed people, and top cited publications. DBLife is currently unfinished and does not have full functionality, but from the prototype alone one can conclude it will most likely address quite a few elements from our framework. 8 http://www.postgraduatestudentships.co.uk/ 9 http://dblife.cs.wisc.edu/ Page 71 of 125
  4. 4. 2010 CRC PhD Student Conference 3 Methodology In order to find out what are the key problems people encounter when trying to make sense of the dynamics of a research area we will carry out an empirical study, which consists of a task and a short questionnaire. The 30 to 40 minute task is to be carried out by around 10 to 12 subjects who will be asked to investigate a research area that is fairly new to them and write a short report on their findings. The subjects’ actions will be recorded using screen capture software and the subjects themselves will be videoed for the duration of the task so that the entire exploration process is documented. The screen capture will show the actions the subjects take and the tools they use to reach their goal. The video data will show any reactions the subjects may display during their exploration process, for example confusion or frustration with a tool they are trying to use. The questionnaire will be filled out by as many subjects as possible, who will be asked to identify the key elements of a research area which they would take into account when planning a PhD research. In the questionnaire people will be made aware of the framework we created, but we will allow for open answers and additions to the existing framework. The technical study will consist of an overview, comparison, critical review, and gap analysis of existing tools that support the exploration of the research community. It will link those tools to our framework in order to find out to what extent the several elements are covered by the existing tools. At this stage we will have highlighted the key elements that define a research area, identified gaps in the existing support for the exploration of the research community, and gathered evidence to support this by mapping existing tools to our framework, carrying out a practical task, and sending out a questionnaire. We will then aim to improve support for people to explore the dynamics of the research community by implementing novel tools, addressing the gaps that have emerged from these studies. Our hypothesis is that at least some of these gaps are due to the lack of integration between different types of data covering different elements of a research area. References Baid, A., Balmin, A., Hwang, H., Nijkamp, E., Rao, J., Reinwald, B., Simitsis, A., Sismanis, Y., and Van Ham, F. (2008). DBPubs: Multidimensional Exploration of Database Publications. Proceedings of the VLDB Endowment, 1(2):1456–1459. DeRose, P., Shen, W., Chen, F., Lee, Y., Burdick, D., Doan, A., and Ramakrishnan, R. (2007). DBLife: A Community Information Management Platform for the Database Research Commu- nity. In Weikum, G., Hellerstein, J., and Stonebraker, M., editors, Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research (CIDR 2007), Asilomar, California, USA. Diederich, J. and Balke, W. (2008). FacetedDBLP - Navigational Access for Digital Libraries. Bulletin of the IEEE Technical Committee on Digital Libraries (TCDL), 4(1). Diederich, J., Balke, W., and Thaden, U. (2007). Demonstrating the Semantic GrowBag: Au- tomatically Creating Topic Facets for FacetedDBLP. In Proceedings of the ACM IEEE Joint Conference on Digital Libraries (JCDL 2007), Vancouver, British Columbia, Canada. Page 72 of 125
  5. 5. 2010 CRC PhD Student Conference Doan, A., Ramakrishnan, R., Chen, F., DeRose, P., Lee, Y., McCann, R., Sayyadian, M., and Shen, W. (2006). Community Information Management. IEEE Data Engineering Bulletin, Special Issue on Probabilistic Databases, 29. Goldberg, A. and Andrzejewski, D. (2007). Automatic Research Summaries in DBLife. CS 764: Topics in Database Management Systems. Harzing, A. and Van der Wal, R. (2008). Google Scholar as a New Source for Citation Analysis. Ethics in Science and Environmental Politics, 8:61–73. Ley, M. (2002). The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspec- tives. In Proceedings of the 9th International Symposium (SPIRE 2002), pages 481–486, Lisbon, Portugal. Noruzi, A. (2005). Google Scholar: The New Generation of Citation Indexes. Libri, 55:170–180. Tho, Q., Hui, S., and Fong, A. (2003). Web Mining for Identifying Research Trends. In Sembok, T., Badioze Zaman, H., Chen, H., Urs, S., and Myaeng, S., editors, Proceedings of the 6th Inter- national Conference on Asian Digital Libraries (ICADL 2003), pages 290–301, Kuala Lumpur, Malaysia. Springer. Page 73 of 125