1. Jisc OA services and importance of text mining
capabilities
Balviar Notay
2. Jisc investment in text mining activity
»The value and benefits of text mining (Report -
2012) inform Hargreaves
»HargreavesTDM exception for the UK - 2014
(ahead of the game)
15/06/16 Title of presentation (Go to ‘View’ menu > ‘Header and Footer…’ to edit the footers on this slide) 2
3. Benefits ofTDM
» Benefits include:
› Increased researcher efficiency
› Unlocking hidden information and developing new knowledge
› Exploring new horizons
› Improved research and evidence base and improving the research process and quality
› Broader economic and social benefits include cost savings and productivity gains
› Innovative new service development
› New business models
How do we practically make this work? How do we best utilise text mining capability?
15/06/16 Title of presentation (Go to ‘View’ menu > ‘Header and Footer…’ to edit the footers on this slide) 3
4. »Established the National Centre forText Mining
(NaCTeM)
»Funded various search/aggregation projects which
resulted in CORE service (Jisc and Open University)
15/06/16 Title of presentation (Go to ‘View’ menu > ‘Header and Footer…’ to edit the footers on this slide) 4
6. Jisc Subscription Management Services
06/15/16 Title of presentation (Insert > Header & Footer > Slide > Footer > Apply to all) 6
Journals /
Books /
Databases /
Archives /
Multimedia /
Geospatial
Jisc
Collections
SHERPA
Romeo
Jisc
Collections
website
JUSP
E-books
tracking and
decision
support
SUNCAT
data
KB+
Keepers
Registry
Data and interoperability
Subscription
management
Jisc
services
Licence
management
Usage
tracking
Subscription
& purchasing
Consortia
Negotiation
Entitlement
tracking
Perpetual
Access
Analytics and business intelligence
KB+
Jisc
Collections
website
KB+
JUSP
KB+
7. Jisc Bibliographic Data Services
06/15/16 Title of presentation (Insert > Header & Footer > Slide > Footer > Apply to all) 7
Acquisition Discovery Delivery Collection Management
CCM ToolsSUNCAT
Jisc
Historical
Texts
CORE
Select
Book
Check
Availability
Specific
Title
Select
Best Copy
Link to
Best Copy
Document
Delivery
Title
Usage
Management
of Stock
Advice, guidance, technical support, quality assessment and new service development
Bibliographic
Management
Jisc
Services
Interlibrary
Loan
Unknown
Title
NBK
Copac
Reading
Lists
NBK
Circulation
Data
NBK NBK
Manage
Metadata
Collection
Benchmarking
E-books Pilot
Jisc
Collections
JUSP
Copac
Zetoc
Archives HubKB+
8. IR to IR sharing for REF
»CORE and Publications Router
› Support for REF compliance for co-authors
› Text mining the co-authors’ affiliations from full text to support
deposit of articles in co-authors repositories.
› Beta service by Dec 201 6 (estimate)
15/06/16 Copac Collection Management Tools 8
9. CORE and Jisc Journal Archives
»CORE:
› exposes content of 168 UK repositories, 25m records
(approx), 2m full text items (approx)
»Journal Archives
› delivers over 600 journal titles from the archives of major
commercial publishers that Jisc has purchased in
perpetuity such as ProQuest and Brill
› 86 UK HE subscribing institutions.
15/06/16 Copac Collection Management Tools 9
10. CORE and Jisc Journal Archives
»The combined data sets offer a large body of material to exploit
»Assess the feasibility and value of developing useful user-facing text
mining
»Gather evidence with regard to technical and legal feasibility,
»gauge user interest, assess the scale of the opportunity and market
size, identify a delivery and business model, as well as produce
some coding prototypes.
»Outputs: a report and technical prototypes
15/06/16 Copac Collection Management Tools 10
11. standards and identifiers
»UK ORCID consortium
»RIOXX 54 repositories using RIOXX fields.
»RIOXX and OpenAIRE mapping (CORE testing passing records to
OpenAIRE)
»OpenAIRE/Horizon 2020 guidelines (Jisc feeding into the guidelines)
› July 2016: possible release of literature guidelines.
15/06/16 Copac Collection Management Tools 11
12. standards and identifiers
»COARVocabularies: ResourceTypes and mapping to a
number of languages. Version 1.1 will be released soon.
»CASRAI UK chapter
»Research data metadata profile
»OrgID work – building on previous reports that were
produced as part of the Jisc CASRAI pilot.
»ResourceSync
15/06/16 Title of presentation (Go to ‘View’ menu > ‘Header and Footer…’ to edit the footers on this slide) 12
TDM report informed Hargreaves review of IP and copyright 2011
Was a commercial interest and we wanted to make sure that the research interest was fully represented. And used as an evidence base for lobbying.
The TDM report identified market failure in that the public investment – basically the we were not getting the fair share of return on public investment and inhibiting wider social and economic gains that text mining could generation.
The exception permits any published and unpublished in-copyright works to be copied for purposes of text mining for non-commercial research as long as the researcher has lawful access. (subscription through library or permission from researcher). Need to have legal access to the content.
Copyright directive in Europe which is underway – has the potential to introduce and enshrine this exception for all EU member states.
Global research community generates over 1.5 million new scholarly articles per annum
Sea of data predicted to increase at rate of 40% per annum
How do we utilise text mining capability in a really practical sense? For different stakeholders.
New medical treatments?
Data analysing, business intelligence….
Semantometrics – look actual text to see if it robust or/good…..
How do we free up/utikise that capability…
Train the textminers,
CORE is part od a wider OA services portfolio…
How do we free up/utilise that capability… We need to get clear about the use cases and certainly for Jisc we can provide the capability at the network level to improve service provision…
Enhance efficiency and performance of services
Data analysing, business intelligence….
Semantometrics – look actual text to see if it robust or/good…..
Train the textminers,
It is possible that institutions may be unaware of the existence of research outputs of their researchers, even if other UK institutions to which their co-authors are affiliated have captured them on their institutional repositories. These outputs may therefore already be REF-compliant, but the institutions that may wish subsequently to submit them to the next REF may not know they are eligible. This would be solved if there were a mechanism for copying the output from one IR to the other IR(s) of the co-authors.
The combined data sets offer a large body of material that can potentially open up new research opportunities and lines of enquiry, and opportunities for improved research management, if text mining techniques were applied to the data sets, and in so doing, would dramatically enhance the current service offer.
In particular, the project will assess the feasibility and value of developing useful user-facing text mining services over the combined corpora (eg named entity recognition, disciplinary analysis, etc). It will gather evidence with regard to technical and legal feasibility, gauge user interest, assess the scale of the opportunity and market size, identify a delivery and business model, as well as produce some coding prototypes.
The final outputs will be:
a report that will provide the necessary evidence and information needed to build a full business case, if evidence suggests that there is value in pursuing these enhancements
technical prototypes
So it obvious to say providing structure to information – will result in more robust text mining results…
Providing consistency and standardisation….
RIOXX 54 repositories using RIOXX fields. More work needed on compliance. (includes ORCID field)