SlideShare a Scribd company logo
1 of 37
SubSift web services and workflows for profiling
and comparing scientists and their published works
Simon Price, Peter Flach, Sebastian Spiegler,
Christopher Bailey and Nikki Rogers
2
Outline of this paper
1. SubSift – submission sifting
2. Background Theory: Vector Space Model
3. SubSift REST API
4. Demonstration Workflows
5. Conclusions
3
1. SubSift – submission sifting
1. SubSift – submission sifting
2. Background Theory
3. SubSift REST API
4. Demonstration Workflows
5. Conclusions
4
SubSift
SubSift is a prototype
application to support
academic peer review.
SubSift matches submitted
conference/journal papers to
potential peer reviewers
based on similarity to
published works.
Website:
http://subsift.ilrt.bris.ac.uk
5
SubSift has been used for...
15
6
Contribution of this work
SubSift RESTful web services:
• Open Source software (on Google Code)
• Hosted open web service at University of Bristol
Re-usable workflows for profiling and comparing scientists
and their published works.
Tool for constructing, manipulating and publishing
document-centric datasets.
Related Work
• SubSift uses techniques more normally associated with
Information Retrieval
• Full text search tools support text matching on large-scale
document collections
e.g. Apache Lucene, PostgreSQL, Oracle UltraSearch
Designed for 1:M matching but can also to do Cartesian product M:M matching.
• How SubSift differs:
• Exposes detailed metadata throughout.
• Partly a research tool: need to plug in + instrument new algorithms.
• Fewer licensing restrictions and dependencies for open source.
7
8
2. Background Theory: Vector Space Model
1. SubSift – submission sifting
2. Background Theory
3. SubSift REST API
4. Demonstration Workflows
5. Conclusions
9
Vector Space Model (from Information Retrieval)
Vector Space Model consists of:
• bag-of-words representation
• cosine similarity
• tf-idf weighting
For a query (q), rank the documents (dj) in collection (D) by
descending similarity to the query.
10
Vector Space Model: bag-of-words representation
no. terms in each abstract
no. terms in DBLP author page of each PC member
11
Vector Space Model: cosine similarity
12
Vector Space Model: tf-idf weighting
13
Representational State Transfer (REST)
“RESTful” web services:
• URIs to represent resources
• HTTP POST/GET/PUT/DELETE correspond to usual
Create/Read/Update/Delete (CRUD) operations
• Response formats typically include: XML, JSON, CSV
REST is a design pattern for web services based on HTTP using its
familiar URIs, requests, responses, authentication, etc.
14
3. SubSift REST API
1. SubSift – submission sifting
2. Background Theory
3. SubSift REST API
4. Demonstration Workflows
5. Conclusions
15
SubSift System Archicture
SUBSI FT
REST API
XML CSV TermsJSON YAML RDF
WEB
FILESTORE
SUBSIFT
HARVESTER
XSLT
CLIENT
16
SubSift REST API
17
Profiles
18
Matches
19
SubSift – canonical workflow
20
4. Demonstration Workflows
1. SubSift – submission sifting
2. Background Theory
3. SubSift REST API
4. Demonstration Workflows
5. Conclusions
21
Workflow 1 – Submission Sifting
Workflow 1 – Web 2.0 Client Implementation
22
Workflow 1 – Papers is just a list of URLs (e.g.
Yahoo! Pipes)
23
24
Workflow 2 – Finding an Expert
25
Finding an expert
26
Workflow 3 –Visualising Similarity
27
Clustering staff based on homepage similarity
Dendrogram produced in Matlab from SubSift generated similarity matrix
28
Precision-recall at different thresholds
29
Similarity networks
Diagram created by Graphvis from SubSift generated dot file
30
Connectivity
Diagram created by Graphvis from SubSift generated dot file
31
Workflow 4 – Profiling Reading Lists
32
Profiling a research group by its publications
Diagram produced in Wordle using SubSift profile data
33
Workflow 5 – Ranking News Stories
34
And finally...
Future Work
• Scaling-up
• Currently a small-scale web application running on modest hardware.
• Plans to migrate to a larger-scale HPC application at Bristol.
• ExaMiner project
• Mining and mapping the University of Bristol’s research landscape.
• Crawling the University’s web pages to profile and visualise research interests
of and similarities between faculty, departments, research groups and
researchers.
• Plans to apply to websites of other Universities.
35
36
5. Conclusions
1. SubSift – submission sifting
2. Background Theory
3. SubSift REST API
4. Demonstration Workflows
5. Conclusions
37
Conclusion
• SubSift Services useful outside of peer review domain
• Workflows for profiling/comparing scientists
 Promising e-Science and e-Research use cases for profiling and comparing
scientists and their published works.
• Tool for constructing, manipulating and publishing
document-centric datasets
 E.g. information retrieval, data mining, pattern analysis research.
 Publication of datasets in this way supports reproducibility of science.
 Connects data through Linked Data and the Semantic Web.

More Related Content

What's hot

4. Crossref and Atypon
4. Crossref and Atypon4. Crossref and Atypon
4. Crossref and AtyponCrossref
 
Time travel and time series analysis with pandas + statsmodels
Time travel and time series analysis with pandas + statsmodelsTime travel and time series analysis with pandas + statsmodels
Time travel and time series analysis with pandas + statsmodelsAlexander Hendorf
 
balloon Synopsis at ISWC 2014 Developer Worksop
balloon Synopsis at ISWC 2014 Developer Worksopballoon Synopsis at ISWC 2014 Developer Worksop
balloon Synopsis at ISWC 2014 Developer WorksopKai Schlegel
 
Visualising statistical Linked Data with Plone
Visualising statistical Linked Data with PloneVisualising statistical Linked Data with Plone
Visualising statistical Linked Data with PloneEau de Web
 
Ecore Model Reflection in RDF
Ecore Model Reflection in RDFEcore Model Reflection in RDF
Ecore Model Reflection in RDFSteven Battle
 
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by GlobusHealth Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by GlobusGlobus
 
Online direct import of specimen records from iDigBio infrastructure into tax...
Online direct import of specimen records from iDigBio infrastructure into tax...Online direct import of specimen records from iDigBio infrastructure into tax...
Online direct import of specimen records from iDigBio infrastructure into tax...Viktor Senderov
 
Automate your PDF factsheets with xlwings Reports
Automate your PDF factsheets with xlwings ReportsAutomate your PDF factsheets with xlwings Reports
Automate your PDF factsheets with xlwings Reportsxlwings
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overviewBigData_Europe
 
Dogfooding data at Lyft
Dogfooding data at LyftDogfooding data at Lyft
Dogfooding data at Lyftmarkgrover
 
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...ASIS&T
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseRDTF-Discovery
 
Corpus studio Erwin Komen
Corpus studio Erwin KomenCorpus studio Erwin Komen
Corpus studio Erwin KomenCLARIAH
 
COUNTER Point: Making the Most of Imperfect Data
COUNTER Point: Making the Most of Imperfect DataCOUNTER Point: Making the Most of Imperfect Data
COUNTER Point: Making the Most of Imperfect DataLindsay Cronk
 
Slide 2 collecting, storing and analyzing big data
Slide 2 collecting, storing and analyzing big dataSlide 2 collecting, storing and analyzing big data
Slide 2 collecting, storing and analyzing big dataTrieu Nguyen
 
xlwings reports: Reporting with Excel & Python
xlwings reports: Reporting with Excel & Pythonxlwings reports: Reporting with Excel & Python
xlwings reports: Reporting with Excel & Pythonxlwings
 
Implementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache FlinkImplementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache FlinkMárton Balassi
 

What's hot (20)

HDF-EOS Datablade: Efficiently Serving Earth Science Data
HDF-EOS Datablade: Efficiently Serving Earth Science DataHDF-EOS Datablade: Efficiently Serving Earth Science Data
HDF-EOS Datablade: Efficiently Serving Earth Science Data
 
4. Crossref and Atypon
4. Crossref and Atypon4. Crossref and Atypon
4. Crossref and Atypon
 
Time travel and time series analysis with pandas + statsmodels
Time travel and time series analysis with pandas + statsmodelsTime travel and time series analysis with pandas + statsmodels
Time travel and time series analysis with pandas + statsmodels
 
balloon Synopsis at ISWC 2014 Developer Worksop
balloon Synopsis at ISWC 2014 Developer Worksopballoon Synopsis at ISWC 2014 Developer Worksop
balloon Synopsis at ISWC 2014 Developer Worksop
 
Visualising statistical Linked Data with Plone
Visualising statistical Linked Data with PloneVisualising statistical Linked Data with Plone
Visualising statistical Linked Data with Plone
 
Ecore Model Reflection in RDF
Ecore Model Reflection in RDFEcore Model Reflection in RDF
Ecore Model Reflection in RDF
 
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by GlobusHealth Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by Globus
 
Online direct import of specimen records from iDigBio infrastructure into tax...
Online direct import of specimen records from iDigBio infrastructure into tax...Online direct import of specimen records from iDigBio infrastructure into tax...
Online direct import of specimen records from iDigBio infrastructure into tax...
 
Automate your PDF factsheets with xlwings Reports
Automate your PDF factsheets with xlwings ReportsAutomate your PDF factsheets with xlwings Reports
Automate your PDF factsheets with xlwings Reports
 
LD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and toolsLD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and tools
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overview
 
Dogfooding data at Lyft
Dogfooding data at LyftDogfooding data at Lyft
Dogfooding data at Lyft
 
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
 
Corpus studio Erwin Komen
Corpus studio Erwin KomenCorpus studio Erwin Komen
Corpus studio Erwin Komen
 
F# for Data*
F# for Data*F# for Data*
F# for Data*
 
COUNTER Point: Making the Most of Imperfect Data
COUNTER Point: Making the Most of Imperfect DataCOUNTER Point: Making the Most of Imperfect Data
COUNTER Point: Making the Most of Imperfect Data
 
Slide 2 collecting, storing and analyzing big data
Slide 2 collecting, storing and analyzing big dataSlide 2 collecting, storing and analyzing big data
Slide 2 collecting, storing and analyzing big data
 
xlwings reports: Reporting with Excel & Python
xlwings reports: Reporting with Excel & Pythonxlwings reports: Reporting with Excel & Python
xlwings reports: Reporting with Excel & Python
 
Implementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache FlinkImplementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache Flink
 

Viewers also liked

Historical Photographs of China - the journey towards sustainability and utility
Historical Photographs of China - the journey towards sustainability and utilityHistorical Photographs of China - the journey towards sustainability and utility
Historical Photographs of China - the journey towards sustainability and utilitySimon Price
 
Data Sharing and Standards
Data Sharing and StandardsData Sharing and Standards
Data Sharing and StandardsSimon Price
 
Supporting Big Data, Open Data, Data Analytics and Data Science
Supporting Big Data, Open Data, Data Analytics and Data ScienceSupporting Big Data, Open Data, Data Analytics and Data Science
Supporting Big Data, Open Data, Data Analytics and Data ScienceSimon Price
 
Academic IT support for Data Science
Academic IT support for Data ScienceAcademic IT support for Data Science
Academic IT support for Data ScienceSimon Price
 
Code Club - a Fight Club inspired approach to software inspection and review
Code Club - a Fight Club inspired approach to software inspection and reviewCode Club - a Fight Club inspired approach to software inspection and review
Code Club - a Fight Club inspired approach to software inspection and reviewSimon Price
 
A Higher-Order Data Flow Model for Heterogeneous Big Data
A Higher-Order Data Flow Model for Heterogeneous Big DataA Higher-Order Data Flow Model for Heterogeneous Big Data
A Higher-Order Data Flow Model for Heterogeneous Big DataSimon Price
 
Co-designing Research IT and Research Data Services
Co-designing Research IT and Research Data ServicesCo-designing Research IT and Research Data Services
Co-designing Research IT and Research Data ServicesSimon Price
 
NewsPatterns - visualisation layer of news feed mining
NewsPatterns - visualisation layer of news feed miningNewsPatterns - visualisation layer of news feed mining
NewsPatterns - visualisation layer of news feed miningSimon Price
 
Cost of Migrating Large-Scale Computer Assisted Learning (CAL) Software to We...
Cost of Migrating Large-Scale Computer Assisted Learning (CAL) Software to We...Cost of Migrating Large-Scale Computer Assisted Learning (CAL) Software to We...
Cost of Migrating Large-Scale Computer Assisted Learning (CAL) Software to We...Simon Price
 
Managing Large-scale Multimedia Development Projects
Managing Large-scale Multimedia Development ProjectsManaging Large-scale Multimedia Development Projects
Managing Large-scale Multimedia Development ProjectsSimon Price
 
Managing research data at Bristol
Managing research data at BristolManaging research data at Bristol
Managing research data at BristolSimon Price
 
Research IT at the University of Bristol
Research IT at the University of BristolResearch IT at the University of Bristol
Research IT at the University of BristolSimon Price
 
Mobile Apps for Research Data Collection
Mobile Apps for Research Data CollectionMobile Apps for Research Data Collection
Mobile Apps for Research Data CollectionSimon Price
 
A review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic WebA review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic WebSimon Price
 
Best of Bristol Media City - MyMobileBristol, NatureLocator, Visualising China
Best of Bristol Media City - MyMobileBristol, NatureLocator, Visualising ChinaBest of Bristol Media City - MyMobileBristol, NatureLocator, Visualising China
Best of Bristol Media City - MyMobileBristol, NatureLocator, Visualising ChinaSimon Price
 
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...Simon Price
 
Adapting CARDIO for BOS
Adapting CARDIO for BOSAdapting CARDIO for BOS
Adapting CARDIO for BOSSimon Price
 
Webs of People, Webs of Data
Webs of People, Webs of DataWebs of People, Webs of Data
Webs of People, Webs of DataSimon Price
 
Clinical Experience Recorder
Clinical Experience RecorderClinical Experience Recorder
Clinical Experience RecorderSimon Price
 

Viewers also liked (20)

Historical Photographs of China - the journey towards sustainability and utility
Historical Photographs of China - the journey towards sustainability and utilityHistorical Photographs of China - the journey towards sustainability and utility
Historical Photographs of China - the journey towards sustainability and utility
 
Data Sharing and Standards
Data Sharing and StandardsData Sharing and Standards
Data Sharing and Standards
 
Supporting Big Data, Open Data, Data Analytics and Data Science
Supporting Big Data, Open Data, Data Analytics and Data ScienceSupporting Big Data, Open Data, Data Analytics and Data Science
Supporting Big Data, Open Data, Data Analytics and Data Science
 
Academic IT support for Data Science
Academic IT support for Data ScienceAcademic IT support for Data Science
Academic IT support for Data Science
 
Nature Locator
Nature LocatorNature Locator
Nature Locator
 
Code Club - a Fight Club inspired approach to software inspection and review
Code Club - a Fight Club inspired approach to software inspection and reviewCode Club - a Fight Club inspired approach to software inspection and review
Code Club - a Fight Club inspired approach to software inspection and review
 
A Higher-Order Data Flow Model for Heterogeneous Big Data
A Higher-Order Data Flow Model for Heterogeneous Big DataA Higher-Order Data Flow Model for Heterogeneous Big Data
A Higher-Order Data Flow Model for Heterogeneous Big Data
 
Co-designing Research IT and Research Data Services
Co-designing Research IT and Research Data ServicesCo-designing Research IT and Research Data Services
Co-designing Research IT and Research Data Services
 
NewsPatterns - visualisation layer of news feed mining
NewsPatterns - visualisation layer of news feed miningNewsPatterns - visualisation layer of news feed mining
NewsPatterns - visualisation layer of news feed mining
 
Cost of Migrating Large-Scale Computer Assisted Learning (CAL) Software to We...
Cost of Migrating Large-Scale Computer Assisted Learning (CAL) Software to We...Cost of Migrating Large-Scale Computer Assisted Learning (CAL) Software to We...
Cost of Migrating Large-Scale Computer Assisted Learning (CAL) Software to We...
 
Managing Large-scale Multimedia Development Projects
Managing Large-scale Multimedia Development ProjectsManaging Large-scale Multimedia Development Projects
Managing Large-scale Multimedia Development Projects
 
Managing research data at Bristol
Managing research data at BristolManaging research data at Bristol
Managing research data at Bristol
 
Research IT at the University of Bristol
Research IT at the University of BristolResearch IT at the University of Bristol
Research IT at the University of Bristol
 
Mobile Apps for Research Data Collection
Mobile Apps for Research Data CollectionMobile Apps for Research Data Collection
Mobile Apps for Research Data Collection
 
A review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic WebA review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic Web
 
Best of Bristol Media City - MyMobileBristol, NatureLocator, Visualising China
Best of Bristol Media City - MyMobileBristol, NatureLocator, Visualising ChinaBest of Bristol Media City - MyMobileBristol, NatureLocator, Visualising China
Best of Bristol Media City - MyMobileBristol, NatureLocator, Visualising China
 
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
 
Adapting CARDIO for BOS
Adapting CARDIO for BOSAdapting CARDIO for BOS
Adapting CARDIO for BOS
 
Webs of People, Webs of Data
Webs of People, Webs of DataWebs of People, Webs of Data
Webs of People, Webs of Data
 
Clinical Experience Recorder
Clinical Experience RecorderClinical Experience Recorder
Clinical Experience Recorder
 

Similar to SubSift web services and workflows for profiling and comparing scientists and their published works

Mining and Mapping the Research Landscape
Mining and Mapping the Research LandscapeMining and Mapping the Research Landscape
Mining and Mapping the Research LandscapeSimon Price
 
محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرن
محاضرة برنامج Nails  لتحليل الدراسات السابقة د.شروق المقرنمحاضرة برنامج Nails  لتحليل الدراسات السابقة د.شروق المقرن
محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرنمركز البحوث الأقسام العلمية
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Databricks
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...eswcsummerschool
 
Database novelty detection
Database novelty detectionDatabase novelty detection
Database novelty detectionMostafaAliAbbas
 
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityOpen Cyber University of Korea
 
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionArabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionJasonRafeMiller
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RDatabricks
 
Module development
Module development Module development
Module development Araport
 
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...Till Blume
 
Reaktive Programmierung mit den Reactive Extensions (Rx)
Reaktive Programmierung mit den Reactive Extensions (Rx)Reaktive Programmierung mit den Reactive Extensions (Rx)
Reaktive Programmierung mit den Reactive Extensions (Rx)NETUserGroupBern
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Databricks
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Oscar Corcho
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph GeneratorLDBC council
 
Tapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and FlinkTapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and FlinkMichael Häusler
 
Modeling Search Computing Applications
Modeling Search Computing ApplicationsModeling Search Computing Applications
Modeling Search Computing ApplicationsMarco Brambilla
 

Similar to SubSift web services and workflows for profiling and comparing scientists and their published works (20)

Mining and Mapping the Research Landscape
Mining and Mapping the Research LandscapeMining and Mapping the Research Landscape
Mining and Mapping the Research Landscape
 
محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرن
محاضرة برنامج Nails  لتحليل الدراسات السابقة د.شروق المقرنمحاضرة برنامج Nails  لتحليل الدراسات السابقة د.شروق المقرن
محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرن
 
Closer17.ppt
Closer17.pptCloser17.ppt
Closer17.ppt
 
Closer17.ppt
Closer17.pptCloser17.ppt
Closer17.ppt
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
 
لتحليل الدراسات السابقة Nails محاضرة برنامج
  لتحليل الدراسات السابقة Nails محاضرة برنامج  لتحليل الدراسات السابقة Nails محاضرة برنامج
لتحليل الدراسات السابقة Nails محاضرة برنامج
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
 
Database novelty detection
Database novelty detectionDatabase novelty detection
Database novelty detection
 
iServe Version 1
iServe Version 1iServe Version 1
iServe Version 1
 
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics Interoperability
 
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionArabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
 
Module development
Module development Module development
Module development
 
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...
 
Reaktive Programmierung mit den Reactive Extensions (Rx)
Reaktive Programmierung mit den Reactive Extensions (Rx)Reaktive Programmierung mit den Reactive Extensions (Rx)
Reaktive Programmierung mit den Reactive Extensions (Rx)
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
 
Tapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and FlinkTapping into Scientific Data with Hadoop and Flink
Tapping into Scientific Data with Hadoop and Flink
 
Modeling Search Computing Applications
Modeling Search Computing ApplicationsModeling Search Computing Applications
Modeling Search Computing Applications
 

Recently uploaded

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 

Recently uploaded (20)

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 

SubSift web services and workflows for profiling and comparing scientists and their published works

  • 1. SubSift web services and workflows for profiling and comparing scientists and their published works Simon Price, Peter Flach, Sebastian Spiegler, Christopher Bailey and Nikki Rogers
  • 2. 2 Outline of this paper 1. SubSift – submission sifting 2. Background Theory: Vector Space Model 3. SubSift REST API 4. Demonstration Workflows 5. Conclusions
  • 3. 3 1. SubSift – submission sifting 1. SubSift – submission sifting 2. Background Theory 3. SubSift REST API 4. Demonstration Workflows 5. Conclusions
  • 4. 4 SubSift SubSift is a prototype application to support academic peer review. SubSift matches submitted conference/journal papers to potential peer reviewers based on similarity to published works. Website: http://subsift.ilrt.bris.ac.uk
  • 5. 5 SubSift has been used for... 15
  • 6. 6 Contribution of this work SubSift RESTful web services: • Open Source software (on Google Code) • Hosted open web service at University of Bristol Re-usable workflows for profiling and comparing scientists and their published works. Tool for constructing, manipulating and publishing document-centric datasets.
  • 7. Related Work • SubSift uses techniques more normally associated with Information Retrieval • Full text search tools support text matching on large-scale document collections e.g. Apache Lucene, PostgreSQL, Oracle UltraSearch Designed for 1:M matching but can also to do Cartesian product M:M matching. • How SubSift differs: • Exposes detailed metadata throughout. • Partly a research tool: need to plug in + instrument new algorithms. • Fewer licensing restrictions and dependencies for open source. 7
  • 8. 8 2. Background Theory: Vector Space Model 1. SubSift – submission sifting 2. Background Theory 3. SubSift REST API 4. Demonstration Workflows 5. Conclusions
  • 9. 9 Vector Space Model (from Information Retrieval) Vector Space Model consists of: • bag-of-words representation • cosine similarity • tf-idf weighting For a query (q), rank the documents (dj) in collection (D) by descending similarity to the query.
  • 10. 10 Vector Space Model: bag-of-words representation no. terms in each abstract no. terms in DBLP author page of each PC member
  • 11. 11 Vector Space Model: cosine similarity
  • 12. 12 Vector Space Model: tf-idf weighting
  • 13. 13 Representational State Transfer (REST) “RESTful” web services: • URIs to represent resources • HTTP POST/GET/PUT/DELETE correspond to usual Create/Read/Update/Delete (CRUD) operations • Response formats typically include: XML, JSON, CSV REST is a design pattern for web services based on HTTP using its familiar URIs, requests, responses, authentication, etc.
  • 14. 14 3. SubSift REST API 1. SubSift – submission sifting 2. Background Theory 3. SubSift REST API 4. Demonstration Workflows 5. Conclusions
  • 15. 15 SubSift System Archicture SUBSI FT REST API XML CSV TermsJSON YAML RDF WEB FILESTORE SUBSIFT HARVESTER XSLT CLIENT
  • 20. 20 4. Demonstration Workflows 1. SubSift – submission sifting 2. Background Theory 3. SubSift REST API 4. Demonstration Workflows 5. Conclusions
  • 21. 21 Workflow 1 – Submission Sifting
  • 22. Workflow 1 – Web 2.0 Client Implementation 22
  • 23. Workflow 1 – Papers is just a list of URLs (e.g. Yahoo! Pipes) 23
  • 24. 24 Workflow 2 – Finding an Expert
  • 27. 27 Clustering staff based on homepage similarity Dendrogram produced in Matlab from SubSift generated similarity matrix
  • 29. 29 Similarity networks Diagram created by Graphvis from SubSift generated dot file
  • 30. 30 Connectivity Diagram created by Graphvis from SubSift generated dot file
  • 31. 31 Workflow 4 – Profiling Reading Lists
  • 32. 32 Profiling a research group by its publications Diagram produced in Wordle using SubSift profile data
  • 33. 33 Workflow 5 – Ranking News Stories
  • 35. Future Work • Scaling-up • Currently a small-scale web application running on modest hardware. • Plans to migrate to a larger-scale HPC application at Bristol. • ExaMiner project • Mining and mapping the University of Bristol’s research landscape. • Crawling the University’s web pages to profile and visualise research interests of and similarities between faculty, departments, research groups and researchers. • Plans to apply to websites of other Universities. 35
  • 36. 36 5. Conclusions 1. SubSift – submission sifting 2. Background Theory 3. SubSift REST API 4. Demonstration Workflows 5. Conclusions
  • 37. 37 Conclusion • SubSift Services useful outside of peer review domain • Workflows for profiling/comparing scientists  Promising e-Science and e-Research use cases for profiling and comparing scientists and their published works. • Tool for constructing, manipulating and publishing document-centric datasets  E.g. information retrieval, data mining, pattern analysis research.  Publication of datasets in this way supports reproducibility of science.  Connects data through Linked Data and the Semantic Web.