Your SlideShare is downloading. ×
Trends Influencing Future Scholarship
Tim Babbitt

May 31, 2012
1
CHANGES FROM TECHNOLOGY
Growth of Mobile data usage
Technology is changing how we structure work
Then
 Notepads, notebooks
 Books
 cd, dvd, itunes
 Focus groups
 Classro...
CHANGES IN ACADEMIA
Towards reproducible research
 Reproducible research
 means
context, quality, trust
 means easy access to the
sources

...
Changing Policy
 Emerging trend of journals and publishers linking to openaccess data repositories
 Journals and funding...
Changing Global Research Patterns
The center of gravity of the world system of scholarship
is moving from west to east.

N...
Citations of U.S. research articles in non-U.S.
literature, by region/country: 1998–2010



Asia-8 = India, Indonesia, Ma...
Share of citations to
international literature: 2000–10

Asia-8 =
India, Indonesia, Malaysia, Philippines, Singapore, Sout...
Citations from Asia 10 Articles

NOTES: Asia-10 includes China, Japan, India, Indonesia, Malaysia, Philippines,
Singapore,...
US Academic Expenditures on Research by
Area, FY2008
(Millions of current dollars)

SOURCE: National Science
Foundation/Di...
Master’s degrees conferred
Modern Languages
1%

Other
12%

Biological sciences
2%
Visual/Performing Arts
2%
Computer scien...
Doctorate Placement
4000
3500
3000
2500

Other

2000

Government

1500

Business
Academia

1000
500
0
Life
sciences

Physi...
Average Age of References

15
Citation Format by Discipline
Secondary Source Citations
Journals

6%
3%
5%

9%

Books/Monographs

Reports

Conference Pro...
Academic work is social
 2006 Univ. of Minn.
Study
 68% - Faculty work
collaboratively
 52% - Collaborate with
colleagu...
Sharing
Low threshold (at good
enough)
 Astrophysicists
 arXiv

 Political scientists
 SSRN

 Economists

High thresh...
CHANGES AT THE LIBRARY
Changing Environment of Research
Past

Present

Future

Book
Aggregation

Book & Journal
Aggregation

Electronic Informati...
Resource Usage Trends
250,000,000

200,000,000

Items

150,000,000

100,000,000

50,000,000

Circulation
Interlibrary loan...
Changing Resource Expenditures
SOURCE: NCES 2010, 2008, 2006, 2004, 2002 Supplemental tables for Academic Libraries

Books...
Changing Discovery Methods

Data from Evans (2008
Context
 Activities are the
context for when our
content is used in
research
 Research mash-ups
 “whole is greater than...
Research Network Connections
 Era of connections






Social networks
Professional networks
Connected information
C...
The Interconnected Article
Blogs
Related
Articles
Notebooks

Wikis

Comments
& Reviews

Models
Codes

Presentations

Algor...
Network of Research Areas
Network of Ideas (citations)
Network of datasets
Evolution of search
Catalog/index

Database/Search Engine

Xpath/XQuery

We are here
Subject
indexing
of objects

Full tex...
Search features
 Going from metadata about objects and text search to
ideas, context and mining
 Semantic Search

 Grea...
Semantic Search
 Semantic Search utilizes robust data structures like
ontologies to apply domain knowledge to otherwise t...
Automated processing of library
content
 PubMed contains ~17,787,763 articles to
date
 Manually searching is tedious and...
Arrowsmith LBD: the ABC Model
Articles about an AB relationship

A
Raynaud’s syndrome

AB

B
blood viscosity
etc.

BC

C
d...
Content Presented as Data

Incidence of “Malvinas”)
Something happened !

35
Purpose of Content Analytics
Consumptive use

 informs
 delivers ideas.

Analytical use
 inspires or proves ideas
 The...
What does Content Analytics do?
 Collaborative man-machine exploration
 highlights trends, clues or anomalies [visualiza...
Examples of text as data
 Changes in word sense ( e.g. consumption( TB )
, moot, oratio1 ) and spelling (e.g. 18th C. ſ t...
Text Mining
Unstructured text to queryable data structures

WHY?
 TOO MUCH TEXT TO HAND ANALYZE.
 Improved discovery ( b...
Options for use of text mining
Many, many options – it is about capability
 Generic ‘improvements’
 disambiguation of pe...
E.G. “White Plague”

TUBERCULOSI
S

TB and
Antibioti
cs

Heroin , TB
and
Aurantimona
daceae

41
Datasets: Factoids & point data













ca. 1.4M Faculty ( 50% full-time ) in US HE, ~75M people enrolled...
Drivers of change









Nearly ubiquitous high-speed wireless globally
Inexpensive devices/apps/services
Global...
Thank you!

Questions?
Tim Babbitt
timothy.babbitt@proquest.com
(734) 997-4593

44
Upcoming SlideShare
Loading in...5
×

Trends influencing future scholarshp

61

Published on

SSP 2012 presentation on trends that we need to be mindful of when thinking about scholarship, publishing and use.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
61
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Mellon funded
  • 2010 CSHE paper “Assessing the future landscape of scholarly communications”, Harley, Diane et al.
  • Comments: Nature 464, 466 (25 March 2010) http://www.nature.com/nature/journal/v464/n7288/full/464466a.html
  • A sample of what a very simple text analysis API can look like – plot the occurrence of ‘malivinas’ over time. The subtlety is the linear decrease in interest after the post war spike.
  • Trying to make it very clear that datasets are a different and more central element of all scholarly research (with the possible exception of maths, philosophy and religion). Data both inspires and confirms ideas – text is mostly informative, rarely inspirational.Highlights are the discussion pointsii. In the physical sciences the half-life of content access frequency is ~ 6-8 years.Grey text is a PQ business value, not a user value.
  • Whilst content can be obfuscated or reduced, there are thorny issues with usage data. Early policy decisions need to be taken with respect to exposing usage data, even indirectly ( triangulation is always possible ).--1 Oratio has shifted from ‘speech’ to ‘prayer’ and back again in the latin literature. See Greg Crane et al.
  • Note that the number of articles is small anyway, so the data could simply be random variation. This is way too simple a tool for serious analysis.
  • Figures on faculty demographics from http://nces.ed.gov/programs/digest/d09Sources in earlier paper on datasets.
  • Transcript of "Trends influencing future scholarshp"

    1. 1. Trends Influencing Future Scholarship Tim Babbitt May 31, 2012 1
    2. 2. CHANGES FROM TECHNOLOGY
    3. 3. Growth of Mobile data usage
    4. 4. Technology is changing how we structure work Then  Notepads, notebooks  Books  cd, dvd, itunes  Focus groups  Classrooms/lectures Now  Evernote  eBooks  Spotify, youtube (streamed)  Affectiva  Khan Academy
    5. 5. CHANGES IN ACADEMIA
    6. 6. Towards reproducible research  Reproducible research  means context, quality, trust  means easy access to the sources 6
    7. 7. Changing Policy  Emerging trend of journals and publishers linking to openaccess data repositories  Journals and funding agencies setting policy to preserve and associate data supporting research results  Open Access 7
    8. 8. Changing Global Research Patterns The center of gravity of the world system of scholarship is moving from west to east. NOTES: Asia-10 includes China, Japan, India, Indonesia, Malaysia, Philippines, Singapore, South Korea, Taiwan, and Thailand. SOURCE: National Science Board, Science and Engineering Indicators 2010 8
    9. 9. Citations of U.S. research articles in non-U.S. literature, by region/country: 1998–2010  Asia-8 = India, Indonesia, Malaysia, Philippines, Singapore, South Korea, Taiwan, Thailand; EU = European Union
    10. 10. Share of citations to international literature: 2000–10 Asia-8 = India, Indonesia, Malaysia, Philippines, Singapore, South Korea, Taiwan, Thailand; EU = European Union
    11. 11. Citations from Asia 10 Articles NOTES: Asia-10 includes China, Japan, India, Indonesia, Malaysia, Philippines, Singapore, South Korea, Taiwan, and Thailand. Asia-8 excludes China and Japan. SOURCE: National Science Board, Science and Engineering Indicators 2010 11
    12. 12. US Academic Expenditures on Research by Area, FY2008 (Millions of current dollars) SOURCE: National Science Foundation/Division of Science Resources Statistics, Survey of Research and Development Expenditures at Universities and Colleges: FY 2008. 12
    13. 13. Master’s degrees conferred Modern Languages 1% Other 12% Biological sciences 2% Visual/Performing Arts 2% Computer science 3% Education 29% Social Sciences 3% Psychology 5% Public Administration 5% Engineering 5% Health profession 9% Business 25% 13
    14. 14. Doctorate Placement 4000 3500 3000 2500 Other 2000 Government 1500 Business Academia 1000 500 0 Life sciences Physical sciences Social sciences Engineering Education Humanities Other fields SOURCE: Survey of Earned Doctorates. 14
    15. 15. Average Age of References 15
    16. 16. Citation Format by Discipline Secondary Source Citations Journals 6% 3% 5% 9% Books/Monographs Reports Conference Proceedings 4% 15% 9% Other 4% 2% 0% 12% 10% 38% 42% 33% 12% 18% 73% 79% 56% 54% 53% 45% 18% Education Chemistry Civil Engineering Economics History Educational Psychology
    17. 17. Academic work is social  2006 Univ. of Minn. Study  68% - Faculty work collaboratively  52% - Collaborate with colleagues at other institutions  46% - Find the distance from colleagues is a collaboration obstacle  “One group of experts can’t do everything” SOURCE: Newman M E J PNAS 2001;98:404-409
    18. 18. Sharing Low threshold (at good enough)  Astrophysicists  arXiv  Political scientists  SSRN  Economists High threshold (competitive)  Historians  Molecular and cell biologists  Archaeologists  Biochemists  SSRN, NBER,  Performers/Composers Junior faculty in all fields are especially cautious for fear of theft and/or misinterpretation. 18
    19. 19. CHANGES AT THE LIBRARY
    20. 20. Changing Environment of Research Past Present Future Book Aggregation Book & Journal Aggregation Electronic Information Aggregation Other sources Other sources Other sources LIBRARY The “center” of research is shifting from libraries to other sources Books on shelves LIBRARY LIBRARY Databases 20
    21. 21. Resource Usage Trends 250,000,000 200,000,000 Items 150,000,000 100,000,000 50,000,000 Circulation Interlibrary loans 2002 2004 2006 2008 2010 188,601,008 200,203,943 187,236,440 138,102,762 136,003,396 7,843,649 8,545,417 10,265,385 10,695,342 SOURCES: National Center for Education Statistics. Academic Libraries. 10,157,182 http://nces.ed.gov/pubsearch/get pubcats.asp?sid=041#) , 21
    22. 22. Changing Resource Expenditures SOURCE: NCES 2010, 2008, 2006, 2004, 2002 Supplemental tables for Academic Libraries Books, serial backfiles and other materials Overall % by content category 80% 70% $700,000,000 $600,000,000 $500,000,000 $400,000,000 $300,000,000 $200,000,000 $100,000,000 $- 60% 50% Electronic Audiovisual Print Books, serial backfiles and other materials 40% 30% Current Serial Subscriptions 20% 10% 0% 2002 2004 2006 2008 2010 2002 Current Serial Subscriptions 2004 2006 2008 2010 Overall % of purchased and licenced content $1,400,000,000 $1,200,000,000 100% $1,000,000,000 80% $800,000,000 Electronic $400,000,000 60% Print $600,000,000 40% $200,000,000 Electronic 20% $- Print 0% 2002 2004 2006 2008 2010 2002 2004 2006 2008 2010 22
    23. 23. Changing Discovery Methods Data from Evans (2008
    24. 24. Context  Activities are the context for when our content is used in research  Research mash-ups  “whole is greater than the parts”  Critical for ecosystem of research
    25. 25. Research Network Connections  Era of connections      Social networks Professional networks Connected information Connected concepts Connected meaning
    26. 26. The Interconnected Article Blogs Related Articles Notebooks Wikis Comments & Reviews Models Codes Presentations Algorithms Preprints Podcasts Models Methods Video Plans Data Intermediate Results Ontologies Has a content edge over print: more of it and more timely 26
    27. 27. Network of Research Areas
    28. 28. Network of Ideas (citations)
    29. 29. Network of datasets
    30. 30. Evolution of search Catalog/index Database/Search Engine Xpath/XQuery We are here Subject indexing of objects Full text search A&I search Search Technology Semantic and machine search Structural and network search Information Structure Hand crafted Metadata Content Text Content XML
    31. 31. Search features  Going from metadata about objects and text search to ideas, context and mining  Semantic Search  Greater granularity of discovery  Structural analysis of content  Precision search  Semantic search  Internationalization of search  Translated search
    32. 32. Semantic Search  Semantic Search utilizes robust data structures like ontologies to apply domain knowledge to otherwise twodimensional terms.  The application of word context provides a dynamic aspect to semantic search, allowing the user’s real-time intent to guide results.  Contrast with static thesauri and controlled vocabularies which miss nuances of context and intent. 32
    33. 33. Automated processing of library content  PubMed contains ~17,787,763 articles to date  Manually searching is tedious and frustrating  Can be hard finding links between data and articles  Conclusion? Machines will be reading the library.  Using MyExperiment Workflows, researcher Paul Fisher found Link between cholesterol, patient trauma and parasite resistance in cattle. 33
    34. 34. Arrowsmith LBD: the ABC Model Articles about an AB relationship A Raynaud’s syndrome AB B blood viscosity etc. BC C dietary fish oil Articles about a BC relationship  AB and BC are complementary but disjoint : They can reveal an implicit relationship between A and C in the absence of any explicit relation.  The researcher assesses titles in the B literature identified by the system for fit or contribution to problem. 34
    35. 35. Content Presented as Data Incidence of “Malvinas”) Something happened ! 35
    36. 36. Purpose of Content Analytics Consumptive use  informs  delivers ideas. Analytical use  inspires or proves ideas  The true center of research. They occupy different points in the scholarly information lifecycle. 36
    37. 37. What does Content Analytics do?  Collaborative man-machine exploration  highlights trends, clues or anomalies [visualization – leverages cognitive skills].  On demand Analysis.  identify and quantify trends, relationships, concepts and correlations. (tools: SEASR, nltk , autonomy, … )  Continuous Analytics  generate new ‘facets’ or annotations for discovery [augments content].  Preserves value  Older content is read lessii, but remains important for trend analysis and statistical significance. [value shifts] 37
    38. 38. Examples of text as data  Changes in word sense ( e.g. consumption( TB ) , moot, oratio1 ) and spelling (e.g. 18th C. ſ to s , *re  *er )  Bibliometrics and other usage analyses  Citation patterns  Institution vs. discipline  Author demographics  Pharma: Drug / Symptom correlation.  Biology: Species / date / location observations.  Social Sci: Work/life habits of undergrads based on access patterns at different institutions [ usage data based]  … 38
    39. 39. Text Mining Unstructured text to queryable data structures WHY?  TOO MUCH TEXT TO HAND ANALYZE.  Improved discovery ( better ‘metadata’ )  Business Intelligence  e.g. content stats -> content acquisitions  Saleable datasets E.g. Distribution of authors vs. disciplines vs. grants  End User research agendas  High-End : Custom (user specified) mining as a service  Simple : Visualization of results ( frequency / co-occurrence … ) 39
    40. 40. Options for use of text mining Many, many options – it is about capability  Generic ‘improvements’  disambiguation of people, places and events ()  concept labeling ( different terms, same ‘thing’)  Corpus specific  E.g. extract institutions from theses (demographics).  Discipline specific  E.g. taxon labeling in biology.  Discovery tools , e.g.  topic labeling (`natural` disciplines)  reading level (grade, UG, PG … )  Structural analysis ( important parts of doc )  Boilerplate vs. ‘meat’ 40
    41. 41. E.G. “White Plague” TUBERCULOSI S TB and Antibioti cs Heroin , TB and Aurantimona daceae 41
    42. 42. Datasets: Factoids & point data             ca. 1.4M Faculty ( 50% full-time ) in US HE, ~75M people enrolled in US HE ca. 100k Faculty in UK HE 44% of Researchers use online (other people’s) datasets for their research 48% of Researchers use datasets > 1GB 10.8% store their data outside their institution ( 50% store it in their “lab”) 1 - 5% of datasets are formally moved into the curation process. 66% of faculty have requested other people’s data ( and 49% of those got it). [ 26.5% have the expertise to analyze their own data. [ 80.3% do not have sufficient expertise to manage their own data Institutional storage costs ~ $600 / TB / year [ 58% is the annual increase in the amount of data being generated [ 20-40% is annual growth in the amount of storage deployed (est.)   < 1% of ecological data is accessible after publication. > 85% of all information is in text form   2.7 times more citations accrue to papers with accessible data 3 to 6 times more papers emerge if the data is accessible. 42
    43. 43. Drivers of change         Nearly ubiquitous high-speed wireless globally Inexpensive devices/apps/services Global technology innovation Policy shifts in Academia Internationalization of scholarship Growth in primary source datasets Fearless and connected entrepreneurs Fearless and connected researchers 43
    44. 44. Thank you! Questions? Tim Babbitt timothy.babbitt@proquest.com (734) 997-4593 44

    ×