3. What is data and text mining?
» Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to the process of
deriving high-quality information from text.
» High-quality information is typically derived through the devising of patterns and trends through means such as
statistical pattern learning.
» Text mining usually involves the process of structuring the input text … deriving patterns within the structured data,
and finally evaluation and interpretation of the output.
» High quality' in text mining usually refers to some combination of relevance, novelty, and interestingness.
» Data Mining – is an imprecise term but means anything from
› Large scale data analysis within science - outputs of Hubl telecscope, Cern Large Hadron Collider
› Analysing census data for socio-economic trends (medium scale –finite amount of data )
› The opportunities of mining connected small objects/collections of research data to find new insight. e.g.
bringing together various versions of the Mona Lisa and using Data Mining to analyse their underlying structure.
Ref : https://en.wikipedia.org/wiki/Text_mining
4. 07/03/2016 Digifest 2016 - Sustainable and efficient solutions for shared research data management – Business case and costing 4
5. What is its value for research and education?
» 2012 – Jisc published a key report
“Value and benefits of text mining
» https://www.jisc.ac.uk/reports/value-
and-benefits-of-text-mining
» Took a case study approach and also
under took an economic analysis of
the benefits (…biomedicine)
» Wider at scale benefits were harder
to come by owing to legal and
technical limitations in inhibiting
systematic use
» Since then new benefits have
emerged
6. What types of benefits?
» Finding research insights that were not
possible through other techniques
» Bringing together texts/data across
different discipline and finding new
insights
» “Text mining offers a way of helping
researchers to make sense of and
leverage value from the vast sea of
electronic resources, which is continually
expanding.”
» .”.potential to increase the research base
available to business and society and to
enable business and others to use the
research base more effectively”
Health benefits of outdoor education
https://en.wikipedia.org/wiki/Outdoor_education
7. Innovative Research in Humanities & Social Sciences
» Digging into Data Challenge
» http://diggingintodata.org
» International Initiative now in its 4th
funding round e.g.:
» Trees andTweets -
https://sites.google.com/site/jackgrievea
ston/treesandtweets
» DiLiPaD – http://dilipad.history.ac.uk/
8. 07/03/2016 Digifest 2016 - Sustainable and efficient solutions for shared research data management – Business case and costing 8
9. 07/03/2016 Digifest 2016 - Sustainable and efficient solutions for shared research data management – Business case and costing 9
10. Mining Repositories : Core
» CORE is an aggregation of OpenAccess Repositories and offers itself as a
platform forTDM (£25 million articles)
› Can use an API (of interest if want to build value add services on top
› Or - download the whole aggregation as an open dataset here:
https://core.ac.uk/intro/data_dump
› Jisc and the Open University running CORE in partnership, with the back-end aggregation hosted
by the OU and the front-end services hosted by Jisc. (Further services by Jisc could be developed
on top of this. )
07/03/2016 Digifest 2016 - Sustainable and efficient solutions for shared research data management – Business case and costing 10
11. Universities and Industry
» NCUB (NationalCouncil for Universities and Business) is developing a tool called
an “Intelligent Broker”
› To assist with making better links between University and Industry
› Could potentially harvest and mine data from key sources like the Research
Council’s Gateway to Research, equipment.data (national equipment portal)
and other services potentially - like Core.
› This would give SME’s more intelligence about research intensive activity in
particular areas for example
12. Content Mine
Grew out of a Jisc project initially
07/03/2016 Digifest 2016 - Sustainable and efficient solutions for shared research data management – Business case and costing 12
13. And Finally…
» Open Citation Experiment (usingText mining techniques –see Digifest session
and demo on this!)
» Jisc are commissioning a study to examine theText Mining landscape and future
contributions to this space to review:
› The current landscape - primarily in UK HE but also looking internationally, and within other
relevant sectors to provide a broad view.
› The market – what are the value chains and where might Jisc contribute?
› The legal position and other inhibitors
› Researcher practice, the issues they encounter, their current and future needs, considering
subjects that use and those that don’t
› Existing platforms, services and tools, and potential for use by Jisc or its customers
› Recommendations on possible future areas of work or services for Jisc to explore
14. jisc.ac.uk
For more information
Catherine Grout
Head of Change - Research
catherine.grout@jisc.ac.uk
07/03/2016 Digifest 2016 - Sustainable and efficient solutions for shared research data management – Business case and costing 14