Text and data mining (TDM) techniques can be applied to a wide range of materials, from published research papers, books and theses, to cultural heritage materials, digitised collections, administrative and management reports and documentation, etc. Use cases include academic research, resource discovery and business intelligence.
This workshop will show the value and benefits of TDM techniques and demonstrate how ContentMine aims to liberate 100,000,000 facts from the scientific literature, and ContentMine will provide a hands on demo on a topical and accessible scientific/medical subject.
3. What is data and text mining?
» Text mining, also referred to as text data mining, roughly equivalent to text analytics,
refers to the process of deriving high-quality information from text
» High-quality information is typically derived through the devising of patterns and trends
through means such as statistical pattern learning
» Text mining usually involves the process of structuring the input text ⊠deriving patterns
within the structured data, and finally evaluation and interpretation of the output
» High quality' in text mining usually refers to some combination of relevance, novelty,
and interestingness
Ref : http://bit.ly/_jisc_textmining
Text mining
2/03/2016 Introduction to Data and Text mining
4. What is data and text mining?
» Data mining â is an imprecise term but means anything from
âș Large scale data analysis within science - outputs of Hubl telecscope,
Cern Large Hadron Collider
âș Analysing census data for socio-economic trends (medium scale âfinite amount
of data)
âș The opportunities of mining connected small objects/collections of research data
to find new insight. e.g. bringing together various versions of the Mona Lisa and
using Data Mining to analyse their underlying structure
Ref : http://bit.ly/_jisc_textmining
Data mining
2/03/2016 Introduction to Data and Text mining
6. What is its value for research and education?
» 2012 â Jisc published a key report âValue
and benefits of text miningâ
http://bit.ly/Jisc_textmining
» Took a case study approach and also under
took an economic analysis of the benefits
(âŠbiomedicine)
» Wider at scale benefits were harder to
come by owing to legal and technical
limitations in inhibiting systematic use
» Since then new benefits have emerged
2/03/2016 Introduction to Data and Text mining
7. What types of benefits?
» Finding research insights that were not
possible through other techniques
» Bringing together texts/data across different
discipline and finding new insights
» âText mining offers a way of helping
researchers to make sense of and leverage
value from the vast sea of electronic
resources, which is continually expanding.â
» .â.potential to increase the research base
available to business and society and to
enable business and others to use the
research base more effectivelyâ
2/03/2016 Introduction to Data and Text mining
Health benefits of outdoor education
https://en.wikipedia.org/wiki/Outdoor_education
8. Innovative research in Humanities and Social sciences
» Digging into Data challenge
http://diggingintodata.org
» International Initiative now in its 4th
funding round e.g.:
âș Trees andTweets -
http://bit.ly/treesandtweets
âș DiLiPaD
http://dilipad.history.ac.uk/
2/03/2016 Introduction to Data and Text mining
11. Mining repositories: Core
» CORE is an aggregation of Open Access Repositories and offers itself as a platform for
TDM (ÂŁ25 million articles)
âș Can use an API (of interest if want to build value add services on top)
âș Or - download the whole aggregation as an open dataset here:
https://core.ac.uk/intro/data_dump
âș Jisc and the Open University running CORE in partnership, with the back-end
aggregation hosted by the OU and the front-end services hosted by Jisc.
(Further services by Jisc could be developed on top of this.)
2/03/2016 Introduction to Data and Text mining
12. Universities and industry
» NCUB (National Council for Universities and Business) is developing a tool called an
âIntelligent Brokerâ
âș To assist with making better links between University and Industry
âș Could potentially harvest and mine data from key sources like the Research Councilâs
Gateway to Research, equipment.data (national equipment portal) and other services
potentially - like Core
âș This would give SMEâs more intelligence about research intensive activity in particular
areas for example
2/03/2016 Introduction to Data and Text mining
14. And finallyâŠ
» Open Citation Experiment (usingText mining techniques â see Digifest session and demo
on this!)
» Jisc are commissioning a study to examine theText mining landscape and future
contributions to this space to review:
âș The current landscape â primarily in UK HE but also looking internationally, and within
other relevant sectors to provide a broad view
âș The market â what are the value chains and where might Jisc contribute?
âș The legal position and other inhibitors
âș Researcher practice, the issues they encounter, their current and future needs,
considering subjects that use and those that donât
âș Existing platforms, services and tools, and potential for use by Jisc or its customers
âș Recommendations on possible future areas of work or services for Jisc to explore
2/03/2016 Introduction to Data and Text mining