Metrics vs peer review: Why metrics can (and should?) be applied in the Socia...Anne-Wil Harzing
Review the debates on metrics vs peer review and suggests that we are comparing the idealised version of peer review to the reductionist version of metrics. Instead we should compare the reality of peer review with the inclusive version of metrics.
Makes the case that we should let metrics do the "heavy lifting" in the UK REF [Research Excellence Framework]. I show that a university-level ranking based on metrics (Microsoft Academic citations for all papers published with the university's affiliation between 2008-2013) correlates at 0.97 with the The REF power rating taken from Research Fortnight’s calculation. Using metrics to distribute research-related funding would free up a staggering amount of time and money and would allow us to come up with more creative and meaningful ways to build in a research quality component in the REF.
Haustein, S. (2017). The evolution of scholarly communication and the reward ...Stefanie Haustein
Haustein, S. (2017, February). The evolution of scholarly communication and the reward system of science. Fourth Annual KnoweScape Conference 2017, 22–24 February 2017, Sofia (Bulgaria). keynote
http://knowescape.org/knowescape2017/
Research impact metrics for librarians: calculation & contextLibrary_Connect
Slides from the May 19, 2016, Library Connect webinar "Research impact metrics for librarians: calculation & context" with Jenny Delasalle and Andrew Plume.
Watch the webinar at: https://libraryconnect.elsevier.com/library-connect-webinars?commid=199783
Metrics vs peer review: Why metrics can (and should?) be applied in the Socia...Anne-Wil Harzing
Review the debates on metrics vs peer review and suggests that we are comparing the idealised version of peer review to the reductionist version of metrics. Instead we should compare the reality of peer review with the inclusive version of metrics.
Makes the case that we should let metrics do the "heavy lifting" in the UK REF [Research Excellence Framework]. I show that a university-level ranking based on metrics (Microsoft Academic citations for all papers published with the university's affiliation between 2008-2013) correlates at 0.97 with the The REF power rating taken from Research Fortnight’s calculation. Using metrics to distribute research-related funding would free up a staggering amount of time and money and would allow us to come up with more creative and meaningful ways to build in a research quality component in the REF.
Haustein, S. (2017). The evolution of scholarly communication and the reward ...Stefanie Haustein
Haustein, S. (2017, February). The evolution of scholarly communication and the reward system of science. Fourth Annual KnoweScape Conference 2017, 22–24 February 2017, Sofia (Bulgaria). keynote
http://knowescape.org/knowescape2017/
Research impact metrics for librarians: calculation & contextLibrary_Connect
Slides from the May 19, 2016, Library Connect webinar "Research impact metrics for librarians: calculation & context" with Jenny Delasalle and Andrew Plume.
Watch the webinar at: https://libraryconnect.elsevier.com/library-connect-webinars?commid=199783
Discussion of alternatives to traditional bibliometric sources (many free) including Scopus, eigenfactor, SNIP, SJR, altmetrics, Publish or Perish, Microsoft Academic Search
Assessing Research Impact: Bibliometrics, Citations and the H-IndexFintan Bracken
Talk presented by Dr. Fintan Bracken at the Mary Immaculate College Research Day on 1st September 2015. The talk looked at assessing and maximising the impact of the arts and humanities research conducted at Mary Immaculate College in Limerick, Ireland.
Haustein, S., Smith, E., Mongeon, P., Shu, F., & Larivière, V. (2016): Access...Stefanie Haustein
Conference presentation
Haustein, S., Smith, E., Mongeon, P., Shu, F., & Larivière, V. (2016). Access to global health research. Prevalence and cost of gold and hybrid open access. In Proceedings of the 21st International Conference on Science and Technology Indicators (p. 410–418). Valencia, Spain.
Poster presented at Twitter for Research Conference, April 22-24, 2015, Lyon, France.
This study provides the first findings of a bibliometric study which was conducted to describe the scientific literature available on Twitter between 2006-2014. Source: Scopus Database
#Twlyon2015
A poster by PF Anderson, Skye Bickett, Joanne Doucette, Pamela Herring, Andrea Kepsel, Tierney Lyons, Scott McLachlan, Carol Shannon, and Lin Wu for the 2017 Annual Meeting of the Medical Library Association.
Scientometrics and semantic maps for development (Author: Iina Hellsten)Sarah Cummings
This presentation was a preliminary overview of the research being undertaken by Iina Hellsten and Sarah Cummings. It provides a first outline of what we are planning to do.
Citation metrics versus peer review: Google Scholar, Scopus and the Web of Sc...Anne-Wil Harzing
This presentations reports on a systematic and comprehensive comparison of the coverage of the three major bibliometric databases: Google Scholar, Scopus and the Web of Science. Based on a sample of 146 senior academics in five broad disciplinary areas, we therefore provide both a longitudinal and a cross-disciplinary comparison of the three databases.
Our longitudinal comparison of eight data points between 2013 and 2015 shows a consistent and reasonably stable quarterly growth for both publications and citations across the three databases. This suggests that all three databases provide sufficient stability of coverage to be used for more detailed cross-disciplinary comparisons.
Our cross-disciplinary comparison of the three databases includes four key research metrics (publications, citations, h-index, and hI,annual, an annualised individual h-index) and five major disciplines (Humanities, Social Sciences, Engineering, Sciences and Life Sciences). We show that both the data source and the specific metrics used change the conclusions that can be drawn from cross-disciplinary comparisons.
Scientometric Mapping of Library and Information Science in Web of Science 8638812142
This is a presentation on Scientometric Study done in Library and Information Science Research as per the data downloaded from Web of Science. This is a presentation of MPhil dissertation submitted to Department of Library and Information Science, Mizoram University under Prof SN Singh.
Discussion of alternatives to traditional bibliometric sources (many free) including Scopus, eigenfactor, SNIP, SJR, altmetrics, Publish or Perish, Microsoft Academic Search
Assessing Research Impact: Bibliometrics, Citations and the H-IndexFintan Bracken
Talk presented by Dr. Fintan Bracken at the Mary Immaculate College Research Day on 1st September 2015. The talk looked at assessing and maximising the impact of the arts and humanities research conducted at Mary Immaculate College in Limerick, Ireland.
Haustein, S., Smith, E., Mongeon, P., Shu, F., & Larivière, V. (2016): Access...Stefanie Haustein
Conference presentation
Haustein, S., Smith, E., Mongeon, P., Shu, F., & Larivière, V. (2016). Access to global health research. Prevalence and cost of gold and hybrid open access. In Proceedings of the 21st International Conference on Science and Technology Indicators (p. 410–418). Valencia, Spain.
Poster presented at Twitter for Research Conference, April 22-24, 2015, Lyon, France.
This study provides the first findings of a bibliometric study which was conducted to describe the scientific literature available on Twitter between 2006-2014. Source: Scopus Database
#Twlyon2015
A poster by PF Anderson, Skye Bickett, Joanne Doucette, Pamela Herring, Andrea Kepsel, Tierney Lyons, Scott McLachlan, Carol Shannon, and Lin Wu for the 2017 Annual Meeting of the Medical Library Association.
Scientometrics and semantic maps for development (Author: Iina Hellsten)Sarah Cummings
This presentation was a preliminary overview of the research being undertaken by Iina Hellsten and Sarah Cummings. It provides a first outline of what we are planning to do.
Citation metrics versus peer review: Google Scholar, Scopus and the Web of Sc...Anne-Wil Harzing
This presentations reports on a systematic and comprehensive comparison of the coverage of the three major bibliometric databases: Google Scholar, Scopus and the Web of Science. Based on a sample of 146 senior academics in five broad disciplinary areas, we therefore provide both a longitudinal and a cross-disciplinary comparison of the three databases.
Our longitudinal comparison of eight data points between 2013 and 2015 shows a consistent and reasonably stable quarterly growth for both publications and citations across the three databases. This suggests that all three databases provide sufficient stability of coverage to be used for more detailed cross-disciplinary comparisons.
Our cross-disciplinary comparison of the three databases includes four key research metrics (publications, citations, h-index, and hI,annual, an annualised individual h-index) and five major disciplines (Humanities, Social Sciences, Engineering, Sciences and Life Sciences). We show that both the data source and the specific metrics used change the conclusions that can be drawn from cross-disciplinary comparisons.
Scientometric Mapping of Library and Information Science in Web of Science 8638812142
This is a presentation on Scientometric Study done in Library and Information Science Research as per the data downloaded from Web of Science. This is a presentation of MPhil dissertation submitted to Department of Library and Information Science, Mizoram University under Prof SN Singh.
Technology And Nursing: Past, Present and Future PerspectivesKaren V. Duhamel
This powerpoint presentation contains key concepts and historical innovations involving technological advancements in nursing care delivery and nursing education
The needs of researchers in key disciplines are changing rapidly and this has important implications for the library’s role in enhancing research productivity and impact.
Librarians can build a roadmap for supporting 21st Century research needs that draws on both published research sources and institution-specific user research. Several key trends from recent studies and ideas for institution-specific user research tools are highlighted within.
Presentation by Tito Sierra at Code4Lib 2007 in Athens, GA.
The Smart Subjects tool attempts to increase broader user discovery of relevant library resources by serendipitously recommending library subjects related to a user's search query. The prototype tool uses large locally created subject indexes consisting of rich topical keyword content harvested from local sources. An OpenSearch interface allows this recommendation service to be integrated flexibly and easily in a variety of web applications.
A combination of powerpoint presentations on bibliometrics in higher education, originally presented at (CONCERT) Council on Core Electronic Resources in Taiwan, November 2008 and modified for a paper on bibliometrics and university rankings.
http://ir.library.smu.edu.sg/record=d1010558
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
Keynote given by Carole Goble on 23rd July 2013 at ISMB/ECCB 2013
http://www.iscb.org/ismbeccb2013
How could we evaluate research and researchers? Reproducibility underpins the scientific method: at least in principle if not practice. The willing exchange of results and the transparent conduct of research can only be expected up to a point in a competitive environment. Contributions to science are acknowledged, but not if the credit is for data curation or software. From a bioinformatics view point, how far could our results be reproducible before the pain is just too high? Is open science a dangerous, utopian vision or a legitimate, feasible expectation? How do we move bioinformatics from one where results are post-hoc "made reproducible", to pre-hoc "born reproducible"? And why, in our computational information age, do we communicate results through fragmented, fixed documents rather than cohesive, versioned releases? I will explore these questions drawing on 20 years of experience in both the development of technical infrastructure for Life Science and the social infrastructure in which Life Science operates.
e-Books: Putting Librarians And Researchers 'In The Know'tulipbiru64
Slide presentation presented by Elsevier B.V. at the 4th PERPUN International Conference 2015: Information Revolution, 11-12th August 2015 at Avillion Legacy Hotel, Melaka.
Creation, Transformation, Dissemination and Preservation: Advocating for Scho...NASIG
As the fight for research grants intensifies and the pot of money decreases, librarians need to ensure that the topic of scholarly communication remains on the forefront, regardless of funding. Affording researchers avenues to widely share and publish their work to make it widely available should be a mission both in the library and at the highest levels of the institution. How can libraries make an impact? In this presentation two librarians, a consortia officer and vendor, will discuss how consortia have and continue to play a primary role in advocating for dissemination of information and scholarly communication. Additionally, they will discuss other tools that libraries/researchers can use as a method of collaboration, whether regional or international, and why it is essential for libraries to become part of the solution before they are left out in the cold. Please come prepared to discuss how your library is making an impact on this topic.
Anne McKee
Program Officer for Resource Sharing, Greater Western Library Alliance
McKee received her M.L.S. from Indiana University, Bloomington and has had a very diverse career in librarianship. She has been an academic librarian, a sales rep for two subscription agencies and now a consortium officer for the past 13 years. A former President of NASIG, McKee is on the Serials Review Editorial Board, 3 publisher/vendor library advisory boards and strives to balance a busy career with an even busier family including a husband, 1 high schooler, 1 middle schooler, 2 dogs while being a first year newbie [and admittedly a rather bewildered] club volleyball mom: all this including wearing orthodontia! McKee is probably the only person you’ll meet with both an undergrad AND MLS in Library Science.
Christine M. Stamison
Senior Customer Relations Manager, Swets
Addison, IL
Christine Stamison, Senior Customer Relations Manager for Swets, has worked in various positions in the subscription agent industry for the past 20 years. Previously, she worked for 13 years in academic libraries, primarily in Serials, at both the University of Illinois at Chicago and at the University of Chicago Libraries. Christine received her Masters in Library and Information Services from Rosary College (now Dominican University) and is a regular lecturer for serials, collection development and technical services classes. When not working you can find Christine in the gym working with her trainer trying to get in shape for her upcoming vacation hiking up Machu Picchu and trekking around Easter Island.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
4. Technology is changing how we structure work
Then
Notepads, notebooks
Books
cd, dvd, itunes
Focus groups
Classrooms/lectures
Now
Evernote
eBooks
Spotify, youtube (streamed)
Affectiva
Khan Academy
7. Changing Policy
Emerging trend of journals and publishers linking to openaccess data repositories
Journals and funding agencies setting policy to preserve
and associate data supporting research results
Open Access
7
8. Changing Global Research Patterns
The center of gravity of the world system of scholarship
is moving from west to east.
NOTES: Asia-10 includes
China, Japan, India,
Indonesia, Malaysia,
Philippines, Singapore,
South Korea, Taiwan, and
Thailand.
SOURCE: National
Science Board, Science
and Engineering Indicators
2010
8
9. Citations of U.S. research articles in non-U.S.
literature, by region/country: 1998–2010
Asia-8 = India, Indonesia, Malaysia, Philippines, Singapore, South
Korea, Taiwan, Thailand; EU = European Union
10. Share of citations to
international literature: 2000–10
Asia-8 =
India, Indonesia, Malaysia, Philippines, Singapore, South
Korea, Taiwan, Thailand; EU = European Union
11. Citations from Asia 10 Articles
NOTES: Asia-10 includes China, Japan, India, Indonesia, Malaysia, Philippines,
Singapore, South Korea, Taiwan, and Thailand. Asia-8 excludes China and Japan.
SOURCE: National Science Board, Science and Engineering Indicators 2010
11
12. US Academic Expenditures on Research by
Area, FY2008
(Millions of current dollars)
SOURCE: National Science
Foundation/Division of Science
Resources Statistics, Survey of
Research and Development
Expenditures at Universities and
Colleges: FY 2008.
12
13. Master’s degrees conferred
Modern Languages
1%
Other
12%
Biological sciences
2%
Visual/Performing Arts
2%
Computer science
3%
Education
29%
Social Sciences
3%
Psychology
5%
Public Administration
5%
Engineering
5%
Health profession
9%
Business
25%
13
17. Academic work is social
2006 Univ. of Minn.
Study
68% - Faculty work
collaboratively
52% - Collaborate with
colleagues at other
institutions
46% - Find the distance
from colleagues is a
collaboration obstacle
“One group of experts
can’t do everything”
SOURCE: Newman M E J PNAS 2001;98:404-409
18. Sharing
Low threshold (at good
enough)
Astrophysicists
arXiv
Political scientists
SSRN
Economists
High threshold (competitive)
Historians
Molecular and cell
biologists
Archaeologists
Biochemists
SSRN, NBER,
Performers/Composers
Junior faculty in all fields are especially
cautious for fear of theft and/or
misinterpretation.
18
20. Changing Environment of Research
Past
Present
Future
Book
Aggregation
Book & Journal
Aggregation
Electronic Information
Aggregation
Other sources
Other sources
Other sources
LIBRARY
The “center” of research is shifting from libraries to other sources
Books
on shelves
LIBRARY
LIBRARY
Databases
20
24. Context
Activities are the
context for when our
content is used in
research
Research mash-ups
“whole is greater than
the parts”
Critical for ecosystem
of research
25. Research Network Connections
Era of connections
Social networks
Professional networks
Connected information
Connected concepts
Connected meaning
30. Evolution of search
Catalog/index
Database/Search Engine
Xpath/XQuery
We are here
Subject
indexing
of objects
Full text
search
A&I
search
Search Technology
Semantic
and
machine
search
Structural
and
network
search
Information Structure
Hand
crafted
Metadata
Content
Text
Content
XML
31. Search features
Going from metadata about objects and text search to
ideas, context and mining
Semantic Search
Greater granularity of discovery
Structural analysis of content
Precision search
Semantic search
Internationalization of search
Translated search
32. Semantic Search
Semantic Search utilizes robust data structures like
ontologies to apply domain knowledge to otherwise twodimensional terms.
The application of word context provides a dynamic
aspect to semantic search, allowing the user’s real-time
intent to guide results.
Contrast with static thesauri and controlled vocabularies
which miss nuances of context and intent.
32
33. Automated processing of library
content
PubMed contains ~17,787,763 articles to
date
Manually searching is tedious and
frustrating
Can be hard finding links between data
and articles
Conclusion? Machines will be reading the
library.
Using MyExperiment
Workflows, researcher Paul Fisher found
Link between cholesterol, patient trauma
and parasite resistance in cattle.
33
34. Arrowsmith LBD: the ABC Model
Articles about an AB relationship
A
Raynaud’s syndrome
AB
B
blood viscosity
etc.
BC
C
dietary fish oil
Articles about a BC relationship
AB and BC are complementary but disjoint : They can reveal an implicit
relationship between A and C in the absence of any explicit relation.
The researcher assesses titles in the B literature identified by the system
for fit or contribution to problem.
34
36. Purpose of Content Analytics
Consumptive use
informs
delivers ideas.
Analytical use
inspires or proves ideas
The true center of
research.
They occupy different points in the
scholarly information lifecycle.
36
37. What does Content Analytics do?
Collaborative man-machine exploration
highlights trends, clues or anomalies [visualization – leverages
cognitive skills].
On demand Analysis.
identify and quantify trends, relationships, concepts and
correlations. (tools: SEASR, nltk , autonomy, … )
Continuous Analytics
generate new ‘facets’ or annotations for discovery [augments
content].
Preserves value
Older content is read lessii, but remains important for trend
analysis and statistical significance. [value shifts]
37
38. Examples of text as data
Changes in word sense ( e.g. consumption( TB )
, moot, oratio1 ) and spelling (e.g. 18th C. ſ to s , *re
*er )
Bibliometrics and other usage analyses
Citation patterns
Institution vs. discipline
Author demographics
Pharma: Drug / Symptom correlation.
Biology: Species / date / location observations.
Social Sci: Work/life habits of undergrads based on
access patterns at different institutions [ usage data based]
…
38
39. Text Mining
Unstructured text to queryable data structures
WHY?
TOO MUCH TEXT TO HAND ANALYZE.
Improved discovery ( better ‘metadata’ )
Business Intelligence
e.g. content stats -> content acquisitions
Saleable datasets
E.g. Distribution of authors vs. disciplines vs. grants
End User research agendas
High-End : Custom (user specified) mining as a service
Simple : Visualization of results ( frequency / co-occurrence … )
39
40. Options for use of text mining
Many, many options – it is about capability
Generic ‘improvements’
disambiguation of people, places and events ()
concept labeling ( different terms, same ‘thing’)
Corpus specific
E.g. extract institutions from theses (demographics).
Discipline specific
E.g. taxon labeling in biology.
Discovery tools , e.g.
topic labeling (`natural` disciplines)
reading level (grade, UG, PG … )
Structural analysis ( important parts of doc )
Boilerplate vs. ‘meat’
40
42. Datasets: Factoids & point data
ca. 1.4M Faculty ( 50% full-time ) in US HE, ~75M people enrolled in US HE
ca. 100k Faculty in UK HE
44% of Researchers use online (other people’s) datasets for their research
48% of Researchers use datasets > 1GB
10.8% store their data outside their institution ( 50% store it in their “lab”)
1 - 5% of datasets are formally moved into the curation process.
66% of faculty have requested other people’s data ( and 49% of those got it).
[ 26.5% have the expertise to analyze their own data.
[ 80.3% do not have sufficient expertise to manage their own data
Institutional storage costs ~ $600 / TB / year
[ 58% is the annual increase in the amount of data being generated
[ 20-40% is annual growth in the amount of storage deployed (est.)
< 1% of ecological data is accessible after publication.
> 85% of all information is in text form
2.7 times more citations accrue to papers with accessible data
3 to 6 times more papers emerge if the data is accessible.
42
43. Drivers of change
Nearly ubiquitous high-speed wireless globally
Inexpensive devices/apps/services
Global technology innovation
Policy shifts in Academia
Internationalization of scholarship
Growth in primary source datasets
Fearless and connected entrepreneurs
Fearless and connected researchers
43
2010 CSHE paper “Assessing the future landscape of scholarly communications”, Harley, Diane et al.
Comments: Nature 464, 466 (25 March 2010) http://www.nature.com/nature/journal/v464/n7288/full/464466a.html
A sample of what a very simple text analysis API can look like – plot the occurrence of ‘malivinas’ over time. The subtlety is the linear decrease in interest after the post war spike.
Trying to make it very clear that datasets are a different and more central element of all scholarly research (with the possible exception of maths, philosophy and religion). Data both inspires and confirms ideas – text is mostly informative, rarely inspirational.Highlights are the discussion pointsii. In the physical sciences the half-life of content access frequency is ~ 6-8 years.Grey text is a PQ business value, not a user value.
Whilst content can be obfuscated or reduced, there are thorny issues with usage data. Early policy decisions need to be taken with respect to exposing usage data, even indirectly ( triangulation is always possible ).--1 Oratio has shifted from ‘speech’ to ‘prayer’ and back again in the latin literature. See Greg Crane et al.
Note that the number of articles is small anyway, so the data could simply be random variation. This is way too simple a tool for serious analysis.
Figures on faculty demographics from http://nces.ed.gov/programs/digest/d09Sources in earlier paper on datasets.