SlideShare a Scribd company logo
1 of 35
Extract – Analyse – Search - Visualise
Text mining and machine learning for Research Data Management
Dr Tom Parsons and Mitchell Murphy
28/06/2017
2
Co founder, RDM, Knowledge Management
DR. TOM PARSONS
React.js panel and Node.js
WILL EVANS
Python/R data scientist Machine learning and computer vision
DR. STUART BOWE & MITCH MURPHY
Co founder, Software delivery
TIM VENISON
Python, architecture, processing pipeline
BARNABY KEENE
About Spotlight Data
Rapid development of innovative products
OUR AGILE CROSS FUNCTIONAL TEAM
28/06/2017
Developers, architects and researchers
POOL OF ASSOCIATES AND PLACEMENTS
3
Gathering and
making sense of
unstructured data
captured from a
variety of sources
We use charting,
network graphs,
maps and other
techniques for data
investigation
Mining data from
archives, websites
social media and API
sources
Analysis Tools
From simple interfaces
and powerful searches
to end to end large
scale processing
systems
We utilise machine
learning techniques
to extract and
investigate data.
What we do
Data science
Dark DataData Mining Data VisualisationArtificial Intelligence
28/06/2017
4
Spotlight Data
Projects
• Large project with the UK Government and Durham University:
• Applying text mining and machine learning to large data sets
and document corpora
• Twitter and social media mining for ESRC Climate Change project
• Sensor data analysis and machine learning
28/06/2017
5
The Nanowire system
Cloud or on premise
Microservice containerised architecture
Ingest DiscoverProcess
Workers
User panel User panel
Data Processing –
Natural Language
Processing, text
mining, classifiers,
pattern recognition
MQ
Pre-
process
Storage
28/06/2017
6
Ability to process structured and unstructured data
DATA PROCESSING CAPABILITY
Built to adapt to use cases that constantly evolve through a
microservice architecture
ADAPTABILITY
Design for all levels of users with continual improvement
USER EXPERIENCE
Cloud and infrastructure agnostic with the ability to scale
from 100s to millions of files
SCALING
The ability to quickly change releases on a fast and robust
deployment system
FAST DEPLOYMENT
All components to be tested prior to release in a continuous
integration and deployment cycle
TESTED
Nanowire goals
Development targets
Utilising open source libraries with a permissive licence.
OPEN SOURCE
All services to be provided as Docker containers by default,
with no external dependencies
CONTAINERISED
28/06/2017
Introduction
Text mining
8
Text mining
What to do with this information:
• Mine information for research?
• Develop new products and drive innovation
• Allow reuse of research data?
28/06/2017
“The discovery by computer of new, previously unknown information, by automatically
extracting information from different written resources. A key element is the linking ... of
the extracted information ... to form new facts or new hypotheses to be explored
further” (Hearst, 2003)
“An estimated 2.4 million scientific articles published every year” Research Consulting TDM report
9
Text mining
Extracting information
Choose sources Extract text Clean text Analysis Clustering Results
28/06/2017
DATABASES, FILES,
FOLDERS, OFFICE 365
NATURAL LANGUAGE
PROCESSING –
ENTITIES, CONCEPTS,
TOPICS, KEYWORDS,
SENTIMENT
STOP WORD REMOVAL,
TOKENISATION
10
Results
Visualising data
28/06/2017
11
Clusters
Graph databases
28/06/2017
12
Enhanced data storage
JSON Linked Data format
{
"@context": "http://schema.org",
"@type": "DigitalDocument",
"mentions": [
{
"@type": "Person",
"email": "tom.parsons@nottingham.ac.uk"
},
{
"@type": "Thing",
"url": "http://admire.jiscinvolve.org/wp/"
}
],
"spatialCoverage": [
{
"@type": "Place",
"name": "Manchester"
},
{
"@type": "Place",
"name": "British Library"
},
{
"@type": "Place",
"name": "Nottingham"
}
],
"keywords": "rdm,project,nottingham,support,research data",
"inLanguage": {
"@type": "Language",
"name": "English"
},
"typicalAgeRange": ">=18"
}
ANALYSIS RESULTS VALIDATED JSON-LD
28/06/2017
13
Linking text to data
Relationships between data, articles and people
28/06/2017
RESEARCH OUTPUTS
AUTHORS, ACADEMICS, PI/CO-I
UNIVERSITIES, LOCATIONS
14
Linking text to data
Typical metadata
28/06/2017
15
Linking text to data
Data tables
28/06/2017
Data set: https://www.repository.cam.ac.uk/handle/1810/32806
16
Linking text to data
Automated relationships between data, articles and people
28/06/2017
RESEARCH OUTPUTS
AUTHORS, ACADEMICS, PI/CO-I
UNIVERSITIES, LOCATIONS
COMPACT SILTY-LOAM SOIL 2
COURTYARD DEPOSIT BY 2
DEPOSIT BY OVEN 2
DEPOSIT WHITE THIN 2
FI9710 ASHY COURTYARD 2
IIID 5705 FI9710 2
LAYER OF PHYTOLITHS 9
RESIDUE FROM POT 2
RM 4 RESIDUE 2
RM 97 BURNT 2
THIN LAYER OF 2
WHITE LAYER OF 7
WHITE THIN LAYER 2
Citation: Madella, M. (2004). Kilise Tepe Monograph Section F2 Phytolith Data
Table 1
Madella, M.
URL: https://www.repository.cam.ac.uk/handle/1810/32806
Places: Europe, Turkey
Organisations: University of Cambridge
Densham, M.
URL:
https://www.repository.cam.ac.uk/han
dle/1810/33130
17
Search and discovery
Graph databases
28/06/2017
RESEARCH OUTPUTS RELATED
TO PHYTOLITHS
AUTHORS CONNECTED TO
MULTIPLE KILISE TEPE TOPICS
18
Results
Visualising data
28/06/2017
19
Discussion
Text mining
• Discuss in groups for 10 minutes:
• Sources of text and data (files, images, video etc.)
• How could text mining be used for RDM?
• What do you struggle with?
• What are the top three priorities?
28/06/2017
Introduction
Machine learning and text
21
Overview
• What is it?
• Why is it needed?
• Why is it useful for research data management?
• How does it work?
• Demo
28/06/2017
Machine Learning
22
What Is It?
28/06/2017
Machine Learning
• How does an athlete learn to become good at their sport?
• How does a machine learn how to predict outcomes?
• So what is a machine learning algorithm?
23
Why Is It Needed?
28/06/2017
Machine Learning
24
Why Is It Useful For RDM?
28/06/2017
Machine Learning
FORMS
25
How Does It Work?
Machine Learning
• Finding the topic of a file using linear regression
20/06/17
Words (x) Topics (y)
26
Demo
Machine Learning
20/06/17
Introduction
Machine learning and images
28
Facial recognition
Machine learning across document content
Original image
Convert to
grayscale
Extract
face
Find possible
matches
Evaluation of algorithms LBPH, Eigenfaces,
Fisherfaces
TRAINING THE DATA
Allow a user to search for faces within a document corpus or
train the system to recognise individuals
FUTURE
MATCHING FACES IN THE TRAINED MODEL
TRAINING THE MODEL THEN TESTING
28/06/2017
29
Facial recognition
Sometimes makes mistakes…
28/06/2017
30
Image classifiers
TensorFlow machine learning
[”submarine, pigboat, sub, U-boat", "0.989818" ],
["indri, indris, Indri indri, Indri brevicaudatus", "0.00165158"
["killer whale, killer, orca, grampus, sea wolf, Orcinus orca","8.52245e-
05"],
["steam locomotive", "8.31971e-05" ]]},
28/06/2017
31
Review
Machine Learning
20/06/17
• What is it?
• Why is it needed?
• Why is it useful for research data management?
• How does it work?
32
Machine learning exercise
Discussion
Discuss in groups (10 mins):
• How could machine learning be used for RDM?
• Improving RDM:
• What are the ’painful’ manual tasks?
• What could be improved?
• What are the top three priorities?
28/06/2017
Beyond an RDM repository
The future?
34
Spotlight Data
The future
• Deploy text mining/machine learning system within the UK
Government
• Develop the ’next-generation’ of data repository
• Mining data repositories and OA outputs
• Office365 mining and optimisation
• Analysis of the data
28/06/2017
35
EMAIL
mitch@spotlightdata.co.uk
REGISTERED OFFICE
tom@spotlightdata.co.uk
The Ingenuity Centre,
University of Nottingham Innovation Park,
Triumph Road, Nottingham,
NG7 2TU.
Strategic KM Ltd is a Company Registered in England and Wales,
Reg No. 06433359

More Related Content

What's hot

Archivematica for research data
Archivematica for research dataArchivematica for research data
Archivematica for research dataJisc RDM
 
Northumbria University case study
Northumbria University case studyNorthumbria University case study
Northumbria University case studyJisc RDM
 
Presenting RISE
Presenting RISEPresenting RISE
Presenting RISEJisc RDM
 
Lightning Talk - Angela Dappart
Lightning Talk - Angela DappartLightning Talk - Angela Dappart
Lightning Talk - Angela DappartJisc RDM
 
UKRDDS Phase 3 - 1st Webinar (April 2017)
UKRDDS Phase 3 - 1st Webinar (April 2017)UKRDDS Phase 3 - 1st Webinar (April 2017)
UKRDDS Phase 3 - 1st Webinar (April 2017)Christopher Brown
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharingJisc RDM
 
European Open Science Cloud
European Open Science CloudEuropean Open Science Cloud
European Open Science CloudJisc RDM
 
Lightning Talks - Intro
Lightning Talks - IntroLightning Talks - Intro
Lightning Talks - IntroJisc RDM
 
RDM landscape in the Netherlands
RDM landscape in the NetherlandsRDM landscape in the Netherlands
RDM landscape in the NetherlandsJisc RDM
 
EOSC pilot STFC
EOSC pilot STFCEOSC pilot STFC
EOSC pilot STFCJisc RDM
 
RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...
RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...
RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...Nick Sheppard
 
HESA data, describing research activity and #REF2021
HESA data, describing research activity and #REF2021HESA data, describing research activity and #REF2021
HESA data, describing research activity and #REF2021Jisc RDM
 
What I wish I’d known at the start!
What I wish I’d known at the start!What I wish I’d known at the start!
What I wish I’d known at the start!Jisc RDM
 
Grant Funding Programme
Grant Funding ProgrammeGrant Funding Programme
Grant Funding ProgrammeJisc RDM
 
Journal research data policy update
Journal research data policy updateJournal research data policy update
Journal research data policy updateJisc RDM
 
Jisc Research data shared service overview and update - May 2016
Jisc Research data shared service overview and update - May 2016Jisc Research data shared service overview and update - May 2016
Jisc Research data shared service overview and update - May 2016Jisc RDM
 
From Box to Hydra via Archivematica
From Box to Hydra via ArchivematicaFrom Box to Hydra via Archivematica
From Box to Hydra via ArchivematicaJisc RDM
 
Research at risk: developing a shared research data management service for UK...
Research at risk: developing a shared research data management service for UK...Research at risk: developing a shared research data management service for UK...
Research at risk: developing a shared research data management service for UK...Jisc RDM
 
Jisc Research Data Management Shared Service Workshop: An institutional persp...
Jisc Research Data Management Shared Service Workshop: An institutional persp...Jisc Research Data Management Shared Service Workshop: An institutional persp...
Jisc Research Data Management Shared Service Workshop: An institutional persp...Jisc RDM
 
Jisc research data shared service overview IDCC 2016
Jisc research data shared service overview IDCC 2016Jisc research data shared service overview IDCC 2016
Jisc research data shared service overview IDCC 2016Jisc RDM
 

What's hot (20)

Archivematica for research data
Archivematica for research dataArchivematica for research data
Archivematica for research data
 
Northumbria University case study
Northumbria University case studyNorthumbria University case study
Northumbria University case study
 
Presenting RISE
Presenting RISEPresenting RISE
Presenting RISE
 
Lightning Talk - Angela Dappart
Lightning Talk - Angela DappartLightning Talk - Angela Dappart
Lightning Talk - Angela Dappart
 
UKRDDS Phase 3 - 1st Webinar (April 2017)
UKRDDS Phase 3 - 1st Webinar (April 2017)UKRDDS Phase 3 - 1st Webinar (April 2017)
UKRDDS Phase 3 - 1st Webinar (April 2017)
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharing
 
European Open Science Cloud
European Open Science CloudEuropean Open Science Cloud
European Open Science Cloud
 
Lightning Talks - Intro
Lightning Talks - IntroLightning Talks - Intro
Lightning Talks - Intro
 
RDM landscape in the Netherlands
RDM landscape in the NetherlandsRDM landscape in the Netherlands
RDM landscape in the Netherlands
 
EOSC pilot STFC
EOSC pilot STFCEOSC pilot STFC
EOSC pilot STFC
 
RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...
RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...
RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...
 
HESA data, describing research activity and #REF2021
HESA data, describing research activity and #REF2021HESA data, describing research activity and #REF2021
HESA data, describing research activity and #REF2021
 
What I wish I’d known at the start!
What I wish I’d known at the start!What I wish I’d known at the start!
What I wish I’d known at the start!
 
Grant Funding Programme
Grant Funding ProgrammeGrant Funding Programme
Grant Funding Programme
 
Journal research data policy update
Journal research data policy updateJournal research data policy update
Journal research data policy update
 
Jisc Research data shared service overview and update - May 2016
Jisc Research data shared service overview and update - May 2016Jisc Research data shared service overview and update - May 2016
Jisc Research data shared service overview and update - May 2016
 
From Box to Hydra via Archivematica
From Box to Hydra via ArchivematicaFrom Box to Hydra via Archivematica
From Box to Hydra via Archivematica
 
Research at risk: developing a shared research data management service for UK...
Research at risk: developing a shared research data management service for UK...Research at risk: developing a shared research data management service for UK...
Research at risk: developing a shared research data management service for UK...
 
Jisc Research Data Management Shared Service Workshop: An institutional persp...
Jisc Research Data Management Shared Service Workshop: An institutional persp...Jisc Research Data Management Shared Service Workshop: An institutional persp...
Jisc Research Data Management Shared Service Workshop: An institutional persp...
 
Jisc research data shared service overview IDCC 2016
Jisc research data shared service overview IDCC 2016Jisc research data shared service overview IDCC 2016
Jisc research data shared service overview IDCC 2016
 

Similar to Text mining and machine learning

Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
Putting Data to Work: Moving science forward together beyond where we thought...
Putting Data to Work: Moving science forward together beyond where we thought...Putting Data to Work: Moving science forward together beyond where we thought...
Putting Data to Work: Moving science forward together beyond where we thought...Erin Robinson
 
Moving forward data centric sciences weaving AI, Big Data & HPC
Moving forward data centric sciences  weaving AI, Big Data & HPCMoving forward data centric sciences  weaving AI, Big Data & HPC
Moving forward data centric sciences weaving AI, Big Data & HPCGenoveva Vargas-Solar
 
AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...
AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...
AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...Erin Robinson
 
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award LectureWhy Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award LectureXiaogang (Marshall) Ma
 
OSFair2017 training | Explore, model, analyze and visualize systematic resear...
OSFair2017 training | Explore, model, analyze and visualize systematic resear...OSFair2017 training | Explore, model, analyze and visualize systematic resear...
OSFair2017 training | Explore, model, analyze and visualize systematic resear...Open Science Fair
 
A New Partnership for Cross-Scale, Cross-Domain eScience
A New Partnership for Cross-Scale, Cross-Domain eScienceA New Partnership for Cross-Scale, Cross-Domain eScience
A New Partnership for Cross-Scale, Cross-Domain eScienceUniversity of Washington
 
Official resume titash_mandal_
Official resume titash_mandal_Official resume titash_mandal_
Official resume titash_mandal_Titash Mandal
 
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...Ben Blaiszik
 
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017Deborah McGuinness
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science ServicesIan Foster
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesASIS&T
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryIan Foster
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.docbutest
 
2013 Melbourne Software Freedom Day talk - FOSS in Public Decision Making
2013 Melbourne Software Freedom Day talk - FOSS in Public Decision Making2013 Melbourne Software Freedom Day talk - FOSS in Public Decision Making
2013 Melbourne Software Freedom Day talk - FOSS in Public Decision MakingPatrick Sunter
 
Linked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so farLinked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so farAliaksandr Birukou
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science James Hendler
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Richard Zijdeman
 

Similar to Text mining and machine learning (20)

Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
Putting Data to Work: Moving science forward together beyond where we thought...
Putting Data to Work: Moving science forward together beyond where we thought...Putting Data to Work: Moving science forward together beyond where we thought...
Putting Data to Work: Moving science forward together beyond where we thought...
 
Moving forward data centric sciences weaving AI, Big Data & HPC
Moving forward data centric sciences  weaving AI, Big Data & HPCMoving forward data centric sciences  weaving AI, Big Data & HPC
Moving forward data centric sciences weaving AI, Big Data & HPC
 
AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...
AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...
AGU Leptoukh Lecture: Putting Data to Work: Moving science forward together b...
 
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award LectureWhy Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
 
OSFair2017 training | Explore, model, analyze and visualize systematic resear...
OSFair2017 training | Explore, model, analyze and visualize systematic resear...OSFair2017 training | Explore, model, analyze and visualize systematic resear...
OSFair2017 training | Explore, model, analyze and visualize systematic resear...
 
A New Partnership for Cross-Scale, Cross-Domain eScience
A New Partnership for Cross-Scale, Cross-Domain eScienceA New Partnership for Cross-Scale, Cross-Domain eScience
A New Partnership for Cross-Scale, Cross-Domain eScience
 
Official resume titash_mandal_
Official resume titash_mandal_Official resume titash_mandal_
Official resume titash_mandal_
 
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
 
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017
Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
Pl data science october 2017
Pl data science october 2017Pl data science october 2017
Pl data science october 2017
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
 
2013 Melbourne Software Freedom Day talk - FOSS in Public Decision Making
2013 Melbourne Software Freedom Day talk - FOSS in Public Decision Making2013 Melbourne Software Freedom Day talk - FOSS in Public Decision Making
2013 Melbourne Software Freedom Day talk - FOSS in Public Decision Making
 
Linked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so farLinked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so far
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
 

More from Jisc RDM

2019-06_Eunis_Burland
2019-06_Eunis_Burland2019-06_Eunis_Burland
2019-06_Eunis_BurlandJisc RDM
 
Jisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 PaperJisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 PaperJisc RDM
 
Jisc Research Data Shared Service Open Repositories 2018 24x7
Jisc Research Data Shared Service Open Repositories 2018 24x7Jisc Research Data Shared Service Open Repositories 2018 24x7
Jisc Research Data Shared Service Open Repositories 2018 24x7Jisc RDM
 
Jisc Research Data Shared Service - a Samvera case study
Jisc Research Data Shared Service - a Samvera case studyJisc Research Data Shared Service - a Samvera case study
Jisc Research Data Shared Service - a Samvera case studyJisc RDM
 
Building a national Data Repository Data Modelling
Building a national Data Repository Data ModellingBuilding a national Data Repository Data Modelling
Building a national Data Repository Data ModellingJisc RDM
 
Building a national Data Repository System Integration Architecture Overview
Building a national Data Repository System Integration Architecture OverviewBuilding a national Data Repository System Integration Architecture Overview
Building a national Data Repository System Integration Architecture OverviewJisc RDM
 
Building a National Data Service Open Repositories 2018
Building a National Data Service Open Repositories 2018Building a National Data Service Open Repositories 2018
Building a National Data Service Open Repositories 2018Jisc RDM
 
Research Data Toolkit
Research Data ToolkitResearch Data Toolkit
Research Data ToolkitJisc RDM
 
Pre jisc datachampday_260318
Pre jisc datachampday_260318Pre jisc datachampday_260318
Pre jisc datachampday_260318Jisc RDM
 
Stories from the Field: Data are Messy and that's (kind of) ok
Stories from the Field: Data are Messy and that's (kind of) okStories from the Field: Data are Messy and that's (kind of) ok
Stories from the Field: Data are Messy and that's (kind of) okJisc RDM
 
Fair data - dinkum research - by Andy Turner
Fair data -  dinkum research - by Andy TurnerFair data -  dinkum research - by Andy Turner
Fair data - dinkum research - by Andy TurnerJisc RDM
 
2018 03 codata - making the case
2018 03 codata - making the case2018 03 codata - making the case
2018 03 codata - making the caseJisc RDM
 
Research Data Shared Service update at DPC
Research Data Shared Service update at DPCResearch Data Shared Service update at DPC
Research Data Shared Service update at DPCJisc RDM
 
Research Data Shared Service Webinar #1
Research Data Shared Service Webinar #1Research Data Shared Service Webinar #1
Research Data Shared Service Webinar #1Jisc RDM
 
Managing data behind creative masterpieces -RCM
Managing data behind creative masterpieces -RCMManaging data behind creative masterpieces -RCM
Managing data behind creative masterpieces -RCMJisc RDM
 
Managing data behind creative masterpieces
Managing data behind creative masterpiecesManaging data behind creative masterpieces
Managing data behind creative masterpiecesJisc RDM
 
Lightning Talk - Andrew MacLellan
Lightning Talk - Andrew MacLellanLightning Talk - Andrew MacLellan
Lightning Talk - Andrew MacLellanJisc RDM
 
Lightning Talk - Nick Sheppard
Lightning Talk - Nick SheppardLightning Talk - Nick Sheppard
Lightning Talk - Nick SheppardJisc RDM
 
Lightning talk - Adam Harwood
Lightning talk - Adam HarwoodLightning talk - Adam Harwood
Lightning talk - Adam HarwoodJisc RDM
 
Lightning Talk - Chris Awre
Lightning Talk - Chris AwreLightning Talk - Chris Awre
Lightning Talk - Chris AwreJisc RDM
 

More from Jisc RDM (20)

2019-06_Eunis_Burland
2019-06_Eunis_Burland2019-06_Eunis_Burland
2019-06_Eunis_Burland
 
Jisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 PaperJisc Research Data Shared Service Open Repositories 2018 Paper
Jisc Research Data Shared Service Open Repositories 2018 Paper
 
Jisc Research Data Shared Service Open Repositories 2018 24x7
Jisc Research Data Shared Service Open Repositories 2018 24x7Jisc Research Data Shared Service Open Repositories 2018 24x7
Jisc Research Data Shared Service Open Repositories 2018 24x7
 
Jisc Research Data Shared Service - a Samvera case study
Jisc Research Data Shared Service - a Samvera case studyJisc Research Data Shared Service - a Samvera case study
Jisc Research Data Shared Service - a Samvera case study
 
Building a national Data Repository Data Modelling
Building a national Data Repository Data ModellingBuilding a national Data Repository Data Modelling
Building a national Data Repository Data Modelling
 
Building a national Data Repository System Integration Architecture Overview
Building a national Data Repository System Integration Architecture OverviewBuilding a national Data Repository System Integration Architecture Overview
Building a national Data Repository System Integration Architecture Overview
 
Building a National Data Service Open Repositories 2018
Building a National Data Service Open Repositories 2018Building a National Data Service Open Repositories 2018
Building a National Data Service Open Repositories 2018
 
Research Data Toolkit
Research Data ToolkitResearch Data Toolkit
Research Data Toolkit
 
Pre jisc datachampday_260318
Pre jisc datachampday_260318Pre jisc datachampday_260318
Pre jisc datachampday_260318
 
Stories from the Field: Data are Messy and that's (kind of) ok
Stories from the Field: Data are Messy and that's (kind of) okStories from the Field: Data are Messy and that's (kind of) ok
Stories from the Field: Data are Messy and that's (kind of) ok
 
Fair data - dinkum research - by Andy Turner
Fair data -  dinkum research - by Andy TurnerFair data -  dinkum research - by Andy Turner
Fair data - dinkum research - by Andy Turner
 
2018 03 codata - making the case
2018 03 codata - making the case2018 03 codata - making the case
2018 03 codata - making the case
 
Research Data Shared Service update at DPC
Research Data Shared Service update at DPCResearch Data Shared Service update at DPC
Research Data Shared Service update at DPC
 
Research Data Shared Service Webinar #1
Research Data Shared Service Webinar #1Research Data Shared Service Webinar #1
Research Data Shared Service Webinar #1
 
Managing data behind creative masterpieces -RCM
Managing data behind creative masterpieces -RCMManaging data behind creative masterpieces -RCM
Managing data behind creative masterpieces -RCM
 
Managing data behind creative masterpieces
Managing data behind creative masterpiecesManaging data behind creative masterpieces
Managing data behind creative masterpieces
 
Lightning Talk - Andrew MacLellan
Lightning Talk - Andrew MacLellanLightning Talk - Andrew MacLellan
Lightning Talk - Andrew MacLellan
 
Lightning Talk - Nick Sheppard
Lightning Talk - Nick SheppardLightning Talk - Nick Sheppard
Lightning Talk - Nick Sheppard
 
Lightning talk - Adam Harwood
Lightning talk - Adam HarwoodLightning talk - Adam Harwood
Lightning talk - Adam Harwood
 
Lightning Talk - Chris Awre
Lightning Talk - Chris AwreLightning Talk - Chris Awre
Lightning Talk - Chris Awre
 

Recently uploaded

Contributi dei parlamentari del PD - Contributi L. 3/2019
Contributi dei parlamentari del PD - Contributi L. 3/2019Contributi dei parlamentari del PD - Contributi L. 3/2019
Contributi dei parlamentari del PD - Contributi L. 3/2019Partito democratico
 
sponsor for poor old age person food.pdf
sponsor for poor old age person food.pdfsponsor for poor old age person food.pdf
sponsor for poor old age person food.pdfSERUDS INDIA
 
Unique Value Prop slide deck________.pdf
Unique Value Prop slide deck________.pdfUnique Value Prop slide deck________.pdf
Unique Value Prop slide deck________.pdfScottMeyers35
 
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
Vasai Call Girls In 07506202331, Nalasopara Call Girls In Mumbai
Vasai Call Girls In 07506202331, Nalasopara Call Girls In MumbaiVasai Call Girls In 07506202331, Nalasopara Call Girls In Mumbai
Vasai Call Girls In 07506202331, Nalasopara Call Girls In MumbaiPriya Reddy
 
3 May, Journalism in the face of the Environmental Crisis.
3 May, Journalism in the face of the Environmental Crisis.3 May, Journalism in the face of the Environmental Crisis.
3 May, Journalism in the face of the Environmental Crisis.Christina Parmionova
 
Just Call VIP Call Girls In Bangalore Kr Puram ☎️ 6378878445 Independent Fem...
Just Call VIP Call Girls In  Bangalore Kr Puram ☎️ 6378878445 Independent Fem...Just Call VIP Call Girls In  Bangalore Kr Puram ☎️ 6378878445 Independent Fem...
Just Call VIP Call Girls In Bangalore Kr Puram ☎️ 6378878445 Independent Fem...HyderabadDolls
 
31st World Press Freedom Day Conference.
31st World Press Freedom Day Conference.31st World Press Freedom Day Conference.
31st World Press Freedom Day Conference.Christina Parmionova
 
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgergMadhuKothuru
 
Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)NAP Global Network
 
Honasa Consumer Limited Impact Report 2024.pdf
Honasa Consumer Limited Impact Report 2024.pdfHonasa Consumer Limited Impact Report 2024.pdf
Honasa Consumer Limited Impact Report 2024.pdfSocial Samosa
 
Peace-Conflict-and-National-Adaptation-Plan-NAP-Processes-.pdf
Peace-Conflict-and-National-Adaptation-Plan-NAP-Processes-.pdfPeace-Conflict-and-National-Adaptation-Plan-NAP-Processes-.pdf
Peace-Conflict-and-National-Adaptation-Plan-NAP-Processes-.pdfNAP Global Network
 
Call Girls Mehsana / 8250092165 Genuine Call girls with real Photos and Number
Call Girls Mehsana / 8250092165 Genuine Call girls with real Photos and NumberCall Girls Mehsana / 8250092165 Genuine Call girls with real Photos and Number
Call Girls Mehsana / 8250092165 Genuine Call girls with real Photos and NumberSareena Khatun
 
1935 CONSTITUTION REPORT IN RIPH FINALLS
1935 CONSTITUTION REPORT IN RIPH FINALLS1935 CONSTITUTION REPORT IN RIPH FINALLS
1935 CONSTITUTION REPORT IN RIPH FINALLSarandianics
 
Make a difference in a girl's life by donating to her education!
Make a difference in a girl's life by donating to her education!Make a difference in a girl's life by donating to her education!
Make a difference in a girl's life by donating to her education!SERUDS INDIA
 
Scaling up coastal adaptation in Maldives through the NAP process
Scaling up coastal adaptation in Maldives through the NAP processScaling up coastal adaptation in Maldives through the NAP process
Scaling up coastal adaptation in Maldives through the NAP processNAP Global Network
 
Dating Call Girls inBaloda Bazar Bhatapara 9332606886Call Girls Advance Cash...
Dating Call Girls inBaloda Bazar Bhatapara  9332606886Call Girls Advance Cash...Dating Call Girls inBaloda Bazar Bhatapara  9332606886Call Girls Advance Cash...
Dating Call Girls inBaloda Bazar Bhatapara 9332606886Call Girls Advance Cash...kumargunjan9515
 
31st World Press Freedom Day - A Press for the Planet: Journalism in the face...
31st World Press Freedom Day - A Press for the Planet: Journalism in the face...31st World Press Freedom Day - A Press for the Planet: Journalism in the face...
31st World Press Freedom Day - A Press for the Planet: Journalism in the face...Christina Parmionova
 
NGO working for orphan children’s education
NGO working for orphan children’s educationNGO working for orphan children’s education
NGO working for orphan children’s educationSERUDS INDIA
 
31st World Press Freedom Day Conference in Santiago.
31st World Press Freedom Day Conference in Santiago.31st World Press Freedom Day Conference in Santiago.
31st World Press Freedom Day Conference in Santiago.Christina Parmionova
 

Recently uploaded (20)

Contributi dei parlamentari del PD - Contributi L. 3/2019
Contributi dei parlamentari del PD - Contributi L. 3/2019Contributi dei parlamentari del PD - Contributi L. 3/2019
Contributi dei parlamentari del PD - Contributi L. 3/2019
 
sponsor for poor old age person food.pdf
sponsor for poor old age person food.pdfsponsor for poor old age person food.pdf
sponsor for poor old age person food.pdf
 
Unique Value Prop slide deck________.pdf
Unique Value Prop slide deck________.pdfUnique Value Prop slide deck________.pdf
Unique Value Prop slide deck________.pdf
 
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
 
Vasai Call Girls In 07506202331, Nalasopara Call Girls In Mumbai
Vasai Call Girls In 07506202331, Nalasopara Call Girls In MumbaiVasai Call Girls In 07506202331, Nalasopara Call Girls In Mumbai
Vasai Call Girls In 07506202331, Nalasopara Call Girls In Mumbai
 
3 May, Journalism in the face of the Environmental Crisis.
3 May, Journalism in the face of the Environmental Crisis.3 May, Journalism in the face of the Environmental Crisis.
3 May, Journalism in the face of the Environmental Crisis.
 
Just Call VIP Call Girls In Bangalore Kr Puram ☎️ 6378878445 Independent Fem...
Just Call VIP Call Girls In  Bangalore Kr Puram ☎️ 6378878445 Independent Fem...Just Call VIP Call Girls In  Bangalore Kr Puram ☎️ 6378878445 Independent Fem...
Just Call VIP Call Girls In Bangalore Kr Puram ☎️ 6378878445 Independent Fem...
 
31st World Press Freedom Day Conference.
31st World Press Freedom Day Conference.31st World Press Freedom Day Conference.
31st World Press Freedom Day Conference.
 
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
2024 asthma jkdjkfjsdklfjsdlkfjskldfgdsgerg
 
Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)Tuvalu Coastal Adaptation Project (TCAP)
Tuvalu Coastal Adaptation Project (TCAP)
 
Honasa Consumer Limited Impact Report 2024.pdf
Honasa Consumer Limited Impact Report 2024.pdfHonasa Consumer Limited Impact Report 2024.pdf
Honasa Consumer Limited Impact Report 2024.pdf
 
Peace-Conflict-and-National-Adaptation-Plan-NAP-Processes-.pdf
Peace-Conflict-and-National-Adaptation-Plan-NAP-Processes-.pdfPeace-Conflict-and-National-Adaptation-Plan-NAP-Processes-.pdf
Peace-Conflict-and-National-Adaptation-Plan-NAP-Processes-.pdf
 
Call Girls Mehsana / 8250092165 Genuine Call girls with real Photos and Number
Call Girls Mehsana / 8250092165 Genuine Call girls with real Photos and NumberCall Girls Mehsana / 8250092165 Genuine Call girls with real Photos and Number
Call Girls Mehsana / 8250092165 Genuine Call girls with real Photos and Number
 
1935 CONSTITUTION REPORT IN RIPH FINALLS
1935 CONSTITUTION REPORT IN RIPH FINALLS1935 CONSTITUTION REPORT IN RIPH FINALLS
1935 CONSTITUTION REPORT IN RIPH FINALLS
 
Make a difference in a girl's life by donating to her education!
Make a difference in a girl's life by donating to her education!Make a difference in a girl's life by donating to her education!
Make a difference in a girl's life by donating to her education!
 
Scaling up coastal adaptation in Maldives through the NAP process
Scaling up coastal adaptation in Maldives through the NAP processScaling up coastal adaptation in Maldives through the NAP process
Scaling up coastal adaptation in Maldives through the NAP process
 
Dating Call Girls inBaloda Bazar Bhatapara 9332606886Call Girls Advance Cash...
Dating Call Girls inBaloda Bazar Bhatapara  9332606886Call Girls Advance Cash...Dating Call Girls inBaloda Bazar Bhatapara  9332606886Call Girls Advance Cash...
Dating Call Girls inBaloda Bazar Bhatapara 9332606886Call Girls Advance Cash...
 
31st World Press Freedom Day - A Press for the Planet: Journalism in the face...
31st World Press Freedom Day - A Press for the Planet: Journalism in the face...31st World Press Freedom Day - A Press for the Planet: Journalism in the face...
31st World Press Freedom Day - A Press for the Planet: Journalism in the face...
 
NGO working for orphan children’s education
NGO working for orphan children’s educationNGO working for orphan children’s education
NGO working for orphan children’s education
 
31st World Press Freedom Day Conference in Santiago.
31st World Press Freedom Day Conference in Santiago.31st World Press Freedom Day Conference in Santiago.
31st World Press Freedom Day Conference in Santiago.
 

Text mining and machine learning

  • 1. Extract – Analyse – Search - Visualise Text mining and machine learning for Research Data Management Dr Tom Parsons and Mitchell Murphy 28/06/2017
  • 2. 2 Co founder, RDM, Knowledge Management DR. TOM PARSONS React.js panel and Node.js WILL EVANS Python/R data scientist Machine learning and computer vision DR. STUART BOWE & MITCH MURPHY Co founder, Software delivery TIM VENISON Python, architecture, processing pipeline BARNABY KEENE About Spotlight Data Rapid development of innovative products OUR AGILE CROSS FUNCTIONAL TEAM 28/06/2017 Developers, architects and researchers POOL OF ASSOCIATES AND PLACEMENTS
  • 3. 3 Gathering and making sense of unstructured data captured from a variety of sources We use charting, network graphs, maps and other techniques for data investigation Mining data from archives, websites social media and API sources Analysis Tools From simple interfaces and powerful searches to end to end large scale processing systems We utilise machine learning techniques to extract and investigate data. What we do Data science Dark DataData Mining Data VisualisationArtificial Intelligence 28/06/2017
  • 4. 4 Spotlight Data Projects • Large project with the UK Government and Durham University: • Applying text mining and machine learning to large data sets and document corpora • Twitter and social media mining for ESRC Climate Change project • Sensor data analysis and machine learning 28/06/2017
  • 5. 5 The Nanowire system Cloud or on premise Microservice containerised architecture Ingest DiscoverProcess Workers User panel User panel Data Processing – Natural Language Processing, text mining, classifiers, pattern recognition MQ Pre- process Storage 28/06/2017
  • 6. 6 Ability to process structured and unstructured data DATA PROCESSING CAPABILITY Built to adapt to use cases that constantly evolve through a microservice architecture ADAPTABILITY Design for all levels of users with continual improvement USER EXPERIENCE Cloud and infrastructure agnostic with the ability to scale from 100s to millions of files SCALING The ability to quickly change releases on a fast and robust deployment system FAST DEPLOYMENT All components to be tested prior to release in a continuous integration and deployment cycle TESTED Nanowire goals Development targets Utilising open source libraries with a permissive licence. OPEN SOURCE All services to be provided as Docker containers by default, with no external dependencies CONTAINERISED 28/06/2017
  • 8. 8 Text mining What to do with this information: • Mine information for research? • Develop new products and drive innovation • Allow reuse of research data? 28/06/2017 “The discovery by computer of new, previously unknown information, by automatically extracting information from different written resources. A key element is the linking ... of the extracted information ... to form new facts or new hypotheses to be explored further” (Hearst, 2003) “An estimated 2.4 million scientific articles published every year” Research Consulting TDM report
  • 9. 9 Text mining Extracting information Choose sources Extract text Clean text Analysis Clustering Results 28/06/2017 DATABASES, FILES, FOLDERS, OFFICE 365 NATURAL LANGUAGE PROCESSING – ENTITIES, CONCEPTS, TOPICS, KEYWORDS, SENTIMENT STOP WORD REMOVAL, TOKENISATION
  • 12. 12 Enhanced data storage JSON Linked Data format { "@context": "http://schema.org", "@type": "DigitalDocument", "mentions": [ { "@type": "Person", "email": "tom.parsons@nottingham.ac.uk" }, { "@type": "Thing", "url": "http://admire.jiscinvolve.org/wp/" } ], "spatialCoverage": [ { "@type": "Place", "name": "Manchester" }, { "@type": "Place", "name": "British Library" }, { "@type": "Place", "name": "Nottingham" } ], "keywords": "rdm,project,nottingham,support,research data", "inLanguage": { "@type": "Language", "name": "English" }, "typicalAgeRange": ">=18" } ANALYSIS RESULTS VALIDATED JSON-LD 28/06/2017
  • 13. 13 Linking text to data Relationships between data, articles and people 28/06/2017 RESEARCH OUTPUTS AUTHORS, ACADEMICS, PI/CO-I UNIVERSITIES, LOCATIONS
  • 14. 14 Linking text to data Typical metadata 28/06/2017
  • 15. 15 Linking text to data Data tables 28/06/2017 Data set: https://www.repository.cam.ac.uk/handle/1810/32806
  • 16. 16 Linking text to data Automated relationships between data, articles and people 28/06/2017 RESEARCH OUTPUTS AUTHORS, ACADEMICS, PI/CO-I UNIVERSITIES, LOCATIONS COMPACT SILTY-LOAM SOIL 2 COURTYARD DEPOSIT BY 2 DEPOSIT BY OVEN 2 DEPOSIT WHITE THIN 2 FI9710 ASHY COURTYARD 2 IIID 5705 FI9710 2 LAYER OF PHYTOLITHS 9 RESIDUE FROM POT 2 RM 4 RESIDUE 2 RM 97 BURNT 2 THIN LAYER OF 2 WHITE LAYER OF 7 WHITE THIN LAYER 2 Citation: Madella, M. (2004). Kilise Tepe Monograph Section F2 Phytolith Data Table 1 Madella, M. URL: https://www.repository.cam.ac.uk/handle/1810/32806 Places: Europe, Turkey Organisations: University of Cambridge Densham, M. URL: https://www.repository.cam.ac.uk/han dle/1810/33130
  • 17. 17 Search and discovery Graph databases 28/06/2017 RESEARCH OUTPUTS RELATED TO PHYTOLITHS AUTHORS CONNECTED TO MULTIPLE KILISE TEPE TOPICS
  • 19. 19 Discussion Text mining • Discuss in groups for 10 minutes: • Sources of text and data (files, images, video etc.) • How could text mining be used for RDM? • What do you struggle with? • What are the top three priorities? 28/06/2017
  • 21. 21 Overview • What is it? • Why is it needed? • Why is it useful for research data management? • How does it work? • Demo 28/06/2017 Machine Learning
  • 22. 22 What Is It? 28/06/2017 Machine Learning • How does an athlete learn to become good at their sport? • How does a machine learn how to predict outcomes? • So what is a machine learning algorithm?
  • 23. 23 Why Is It Needed? 28/06/2017 Machine Learning
  • 24. 24 Why Is It Useful For RDM? 28/06/2017 Machine Learning FORMS
  • 25. 25 How Does It Work? Machine Learning • Finding the topic of a file using linear regression 20/06/17 Words (x) Topics (y)
  • 28. 28 Facial recognition Machine learning across document content Original image Convert to grayscale Extract face Find possible matches Evaluation of algorithms LBPH, Eigenfaces, Fisherfaces TRAINING THE DATA Allow a user to search for faces within a document corpus or train the system to recognise individuals FUTURE MATCHING FACES IN THE TRAINED MODEL TRAINING THE MODEL THEN TESTING 28/06/2017
  • 29. 29 Facial recognition Sometimes makes mistakes… 28/06/2017
  • 30. 30 Image classifiers TensorFlow machine learning [”submarine, pigboat, sub, U-boat", "0.989818" ], ["indri, indris, Indri indri, Indri brevicaudatus", "0.00165158" ["killer whale, killer, orca, grampus, sea wolf, Orcinus orca","8.52245e- 05"], ["steam locomotive", "8.31971e-05" ]]}, 28/06/2017
  • 31. 31 Review Machine Learning 20/06/17 • What is it? • Why is it needed? • Why is it useful for research data management? • How does it work?
  • 32. 32 Machine learning exercise Discussion Discuss in groups (10 mins): • How could machine learning be used for RDM? • Improving RDM: • What are the ’painful’ manual tasks? • What could be improved? • What are the top three priorities? 28/06/2017
  • 33. Beyond an RDM repository The future?
  • 34. 34 Spotlight Data The future • Deploy text mining/machine learning system within the UK Government • Develop the ’next-generation’ of data repository • Mining data repositories and OA outputs • Office365 mining and optimisation • Analysis of the data 28/06/2017
  • 35. 35 EMAIL mitch@spotlightdata.co.uk REGISTERED OFFICE tom@spotlightdata.co.uk The Ingenuity Centre, University of Nottingham Innovation Park, Triumph Road, Nottingham, NG7 2TU. Strategic KM Ltd is a Company Registered in England and Wales, Reg No. 06433359