Data for Research (DfR) service


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data for Research (DfR) service

  1. 1. JSTOR Advanced Technology Research Denver 25th January 2008 John Burns Clare Llewellyn l
  2. 2. Today we will introduce a public beta of our Data for Research service and show you some of the other services that JSTOR’s advanced technology group is working on. Mission: Working with other researchers on large- scale text and data mining initiatives with an eye toward beneficial applications for scholars and students. l
  3. 3. What is Data Mining? “Data mining is the process of extracting hidden patterns from data” Lyman and Varian 2003 “As data sets and the information extracted from them have grown in size and complexity, direct hands-on data analysis has increasingly been supplemented and augmented with indirect, automatic data processing using more complex and sophisticated tools, methods and models” Kantardizic 2002 Example: Data mining is using consumer purchasing patterns to predict which products are bought together (gas and flights) l
  4. 4. What is Text Mining? “In text mining the patterns are extracted from natural language text rather than from structured databases of facts” Marti Hearst 2003 “Text mining attempts to discover new, previously unknown information by applying techniques from information retrieval, natural language processing and data mining” National Text Mining Center, UK Example: Looking at which words co-occur in articles that in order to predict interactions (magnesium and migraines) l
  5. 5. Advanced Technology at JSTOR •  Why are we here •  Who we are •  What we are doing l
  6. 6. Why are we releasing our system here? Librarians are the point from which innovation is spread throughout the academy “New roles and functions for librarians include: •  information consultants and producers •  information gatekeepers and intermediators •  end-user educators •  managers and leaders •  data analysts in data administration centers •  preservers of knowledge •  information equalizers” Park 1987 A Data Support Role: “Helping students get their hands dirty with the data” Robin Rice 2008 2nd DCC / RIN Research Data Management Forum l
  7. 7. Who we are - Advanced Technology Research •  A formal commitment by JSTOR to a pro-active role in technology innovation to face new challenges and opportunities •  Our MO is to collaborate with and aid the scholarly community •  We area team of world-class scientists and technologists with a proven track record of innovation Mission Statement “The Advanced Technology Research Group is dedicated to creating, discovering and using relevant technologies in support of JSTOR and the broader scholarly community.” l
  8. 8. ATR - Collaborations with the academic community. For other researchers we provide •  Access to large well-curated data sets •  An exposure channel on JSTOR for research results •  Facilities on JSTOR to expose tools and techniques to users •  Collaboration opportunities For JSTOR •  We evaluate novel techniques •  We present rapid prototypes to users •  Develop peer relationships with research institutions •  Bring new forms of traffic to the JSTOR data •  Reuse JSTOR data in new and exciting ways l
  9. 9. What we are doing - Projects and Partners •  University of Washington – Citation Network Analysis •  University of Princeton – Topic Analysis •  UIUC - Software Environment for the Advancement of Scholarly Research (SEASR) •  University of Michigan – Linguistic tools •  Tufts -Classics Studies •  University of Liverpool – OAI-ORE, Text Mining, Data Analysis •  University of Queensland - Annotations •  Los Alamos National Labs – Annotation Management •  DFKI (German Artificial Intelligence Centre) – Document capture and reconstruction / remastering. •  XRCE (EuroPARC, France) – Scanned Document Analysis •  … l
  10. 10. Advanced Technology Research - Showcase Showcase provides a preview of interesting and useful technologies. It allows our research partners to demonstrate their tools and gain feedback and it allows JSTOR to assess candidate technologies before committing them to the product roadmap. l
  11. 11. Advanced Technology Research - Showcase A place to expose JSTOR data and tools and to encourage new research •  Provides access to JSTOR datasets •  Facility to expose and use tools created by researchers from JSTOR and elsewhere. •  Explanation of ongoing research •  As a forum to facilitate connections between groups working with JSTOR data URL: l
  12. 12. Data for Research •  DFR is a set of web tools designed to allow for the visual exploration of large-scale data sets and the download of word frequencies in JSTOR articles •  Beta Version launched 01/23/09 •  URL: l
  13. 13. Why Word Frequencies Data Requested from JSTOR users in 2008 OCR Data Citation Data Usage Data Word Frequency l
  14. 14. What can you do with work counts? Real life requests: “I would like to request time and word distribution frequencies in linguistics (specific movement removed). These sorts of frequencies could potentially allow me to better understand and delimit the formation of groups, and the underlying impetus behind these groups as expressed in linguistic form.” “I would like to create subject headings for material, using word frequency as a guide to selecting the appropriate terms for the headings.” l
  15. 15. DFR – DEMO! l
  16. 16. DFR – Front Page l
  17. 17. Thefe l
  18. 18. Hath Pre - 1900 l
  19. 19. Hath – post 1900 l
  20. 20. Chymistry l
  21. 21. Download Page l
  22. 22. Files Downloaded l
  23. 23. l 4 0 1 2 3 5 6 7 1666 8 1669 1672 1675 1683 1692 1697 1703 1712 1738 1765 1783 1801 1889 1907 1916 1921 1928 1931 1936 1941 1945 1950 1953 1956 1960 1964 1967 1971 1974 1980 1983 Chart to show the use of the word Chymistry 1987 1990 1993 1996 1999 2002 2005
  24. 24. l
  25. 25. 3 Journals from 1957 The Annals Mathematics American Journal Nursing Agricultural History l
  26. 26. Any questions / feedback? Please take a look at the site and tell us what you think. Email: Contact details Email: Phone: 609-986-2282 l