SlideShare a Scribd company logo
Charting the Digital Library Evaluation
Domain with a Semantically Enhanced
Mining Methodology
S
Eleni Afiontzi,1 Giannis Kazadeis,1 Leonidas Papachristopoulos,2
Michalis Sfakakis,2 Giannis Tsakonas,2 Christos Papatheodorou2
13th ACM/IEEE Joint Conference on Digital Libraries, July 22-26, Indianapolis, IN, USA
1. Department of Informatics,
Athens University of Economics & Business
2. Database & Information Systems
Group, Department of Archives & Library
Science, Ionian University
aim & scope of research
aim & scope of research
• To propose a methodology for discovering patterns in the
scientific literature.
aim & scope of research
• To propose a methodology for discovering patterns in the
scientific literature.
• Our case study is performed in the digital library evaluation
domain and its conference literature.
aim & scope of research
• To propose a methodology for discovering patterns in the
scientific literature.
• Our case study is performed in the digital library evaluation
domain and its conference literature.
• We question:
aim & scope of research
• To propose a methodology for discovering patterns in the
scientific literature.
• Our case study is performed in the digital library evaluation
domain and its conference literature.
• We question:
- how we select relevant studies,
aim & scope of research
• To propose a methodology for discovering patterns in the
scientific literature.
• Our case study is performed in the digital library evaluation
domain and its conference literature.
• We question:
- how we select relevant studies,
- how we annotate them,
aim & scope of research
• To propose a methodology for discovering patterns in the
scientific literature.
• Our case study is performed in the digital library evaluation
domain and its conference literature.
• We question:
- how we select relevant studies,
- how we annotate them,
- how we discover these patterns,
aim & scope of research
• To propose a methodology for discovering patterns in the
scientific literature.
• Our case study is performed in the digital library evaluation
domain and its conference literature.
• We question:
- how we select relevant studies,
- how we annotate them,
- how we discover these patterns,
in an effective, machine-operated way, in order to have reusable
and interpretable data?
why
why
• Abundance of scientific information
why
• Abundance of scientific information
• Limitations of existing tools, such as reusability
why
• Abundance of scientific information
• Limitations of existing tools, such as reusability
• Lack of contextualized analytic tools
why
• Abundance of scientific information
• Limitations of existing tools, such as reusability
• Lack of contextualized analytic tools
• Supervised automated processes
panorama
panorama
1. Document classification to identify relevant papers
panorama
1. Document classification to identify relevant papers
- We use a corpus of 1,824 papers from the JCDL and ECDL
(now TPDL) conferences, era 2001-2011.
panorama
1. Document classification to identify relevant papers
- We use a corpus of 1,824 papers from the JCDL and ECDL
(now TPDL) conferences, era 2001-2011.
2. Semantic annotation processes to mark up important concepts
panorama
1. Document classification to identify relevant papers
- We use a corpus of 1,824 papers from the JCDL and ECDL
(now TPDL) conferences, era 2001-2011.
2. Semantic annotation processes to mark up important concepts
- We use a schema for semantic annotation, the Digital Library
Evaluation Ontology, and a semantic annotation tool,
GoNTogle.
panorama
1. Document classification to identify relevant papers
- We use a corpus of 1,824 papers from the JCDL and ECDL
(now TPDL) conferences, era 2001-2011.
2. Semantic annotation processes to mark up important concepts
- We use a schema for semantic annotation, the Digital Library
Evaluation Ontology, and a semantic annotation tool,
GoNTogle.
3. Clustering to form coherent groups (K=11)
panorama
1. Document classification to identify relevant papers
- We use a corpus of 1,824 papers from the JCDL and ECDL
(now TPDL) conferences, era 2001-2011.
2. Semantic annotation processes to mark up important concepts
- We use a schema for semantic annotation, the Digital Library
Evaluation Ontology, and a semantic annotation tool,
GoNTogle.
3. Clustering to form coherent groups (K=11)
4. Interpretation with the assistance of the ontology schema
panorama
1. Document classification to identify relevant papers
- We use a corpus of 1,824 papers from the JCDL and ECDL
(now TPDL) conferences, era 2001-2011.
2. Semantic annotation processes to mark up important concepts
- We use a schema for semantic annotation, the Digital Library
Evaluation Ontology, and a semantic annotation tool,
GoNTogle.
3. Clustering to form coherent groups (K=11)
4. Interpretation with the assistance of the ontology schema
panorama
1. Document classification to identify relevant papers
- We use a corpus of 1,824 papers from the JCDL and ECDL
(now TPDL) conferences, era 2001-2011.
2. Semantic annotation processes to mark up important concepts
- We use a schema for semantic annotation, the Digital Library
Evaluation Ontology, and a semantic annotation tool,
GoNTogle.
3. Clustering to form coherent groups (K=11)
4. Interpretation with the assistance of the ontology schema
• During this process we perform benchmarking tests to qualify
specific components to effectively automate the exploration of
the literature and the discovery of research patterns.
part 1
how we identify relevant studies
training phase
training phase
• e aim was to train a classifier to identify relevant papers.
training phase
• e aim was to train a classifier to identify relevant papers.
• Categorization
training phase
• e aim was to train a classifier to identify relevant papers.
• Categorization
- two researchers categorized, a third one supervised
training phase
• e aim was to train a classifier to identify relevant papers.
• Categorization
- two researchers categorized, a third one supervised
- descriptors: title, abstract & author keywords
training phase
• e aim was to train a classifier to identify relevant papers.
• Categorization
- two researchers categorized, a third one supervised
- descriptors: title, abstract & author keywords
- rater’s agreement: 82.96% for JCDL, 78% for ECDL
training phase
• e aim was to train a classifier to identify relevant papers.
• Categorization
- two researchers categorized, a third one supervised
- descriptors: title, abstract & author keywords
- rater’s agreement: 82.96% for JCDL, 78% for ECDL
- inter-rater agreement: moderate levels of Cohen’s Kappa
training phase
• e aim was to train a classifier to identify relevant papers.
• Categorization
- two researchers categorized, a third one supervised
- descriptors: title, abstract & author keywords
- rater’s agreement: 82.96% for JCDL, 78% for ECDL
- inter-rater agreement: moderate levels of Cohen’s Kappa
- 12% positive # 88% negative
training phase
• e aim was to train a classifier to identify relevant papers.
• Categorization
- two researchers categorized, a third one supervised
- descriptors: title, abstract & author keywords
- rater’s agreement: 82.96% for JCDL, 78% for ECDL
- inter-rater agreement: moderate levels of Cohen’s Kappa
- 12% positive # 88% negative
• Skewness of data addressed via resampling:
training phase
• e aim was to train a classifier to identify relevant papers.
• Categorization
- two researchers categorized, a third one supervised
- descriptors: title, abstract & author keywords
- rater’s agreement: 82.96% for JCDL, 78% for ECDL
- inter-rater agreement: moderate levels of Cohen’s Kappa
- 12% positive # 88% negative
• Skewness of data addressed via resampling:
- under-sampling (Tomek Links)
training phase
• e aim was to train a classifier to identify relevant papers.
• Categorization
- two researchers categorized, a third one supervised
- descriptors: title, abstract & author keywords
- rater’s agreement: 82.96% for JCDL, 78% for ECDL
- inter-rater agreement: moderate levels of Cohen’s Kappa
- 12% positive # 88% negative
• Skewness of data addressed via resampling:
- under-sampling (Tomek Links)
- over-sampling (random over-sampling)
corpus definition
corpus definition
• Classification algorithm: Naïve Bayes
corpus definition
• Classification algorithm: Naïve Bayes
• Two sub-sets: a development (75%) and a test (25%)
corpus definition
• Classification algorithm: Naïve Bayes
• Two sub-sets: a development (75%) and a test (25%)
• Ten-fold validation: the development set was randomly divided
to 10 equal; 9/10 as training set and 1/10 as test set.
corpus definition
• Classification algorithm: Naïve Bayes
• Two sub-sets: a development (75%) and a test (25%)
• Ten-fold validation: the development set was randomly divided
to 10 equal; 9/10 as training set and 1/10 as test set.
0
0.2
0.4
0.6
0.8
1.0
0 0.2 0.4 0.6 0.8 1.0
corpus definition
• Classification algorithm: Naïve Bayes
• Two sub-sets: a development (75%) and a test (25%)
• Ten-fold validation: the development set was randomly divided
to 10 equal; 9/10 as training set and 1/10 as test set.
0
0.2
0.4
0.6
0.8
1.0
0 0.2 0.4 0.6 0.8 1.0
Test
Development
corpus definition
• Classification algorithm: Naïve Bayes
• Two sub-sets: a development (75%) and a test (25%)
• Ten-fold validation: the development set was randomly divided
to 10 equal; 9/10 as training set and 1/10 as test set.
0
0.2
0.4
0.6
0.8
1.0
0 0.2 0.4 0.6 0.8 1.0
Test
Development
fp rate
corpus definition
• Classification algorithm: Naïve Bayes
• Two sub-sets: a development (75%) and a test (25%)
• Ten-fold validation: the development set was randomly divided
to 10 equal; 9/10 as training set and 1/10 as test set.
0
0.2
0.4
0.6
0.8
1.0
0 0.2 0.4 0.6 0.8 1.0
Test
Development
fp rate
tp rate
corpus definition
• Classification algorithm: Naïve Bayes
• Two sub-sets: a development (75%) and a test (25%)
• Ten-fold validation: the development set was randomly divided
to 10 equal; 9/10 as training set and 1/10 as test set.
0
0.2
0.4
0.6
0.8
1.0
0 0.2 0.4 0.6 0.8 1.0
Test
Development
fp rate
tp rate
part 2
how we annotate
the schema - DiLEO
the schema - DiLEO
• DiLEO aims to conceptualize the DL evaluation domain by
exploring its key entities, their attributes and their relationships.
the schema - DiLEO
• DiLEO aims to conceptualize the DL evaluation domain by
exploring its key entities, their attributes and their relationships.
• A two layered ontology:
the schema - DiLEO
• DiLEO aims to conceptualize the DL evaluation domain by
exploring its key entities, their attributes and their relationships.
• A two layered ontology:
- Strategic level: consists of a set of classes related with the
scope and aim of an evaluation.
the schema - DiLEO
• DiLEO aims to conceptualize the DL evaluation domain by
exploring its key entities, their attributes and their relationships.
• A two layered ontology:
- Strategic level: consists of a set of classes related with the
scope and aim of an evaluation.
- Procedural level: consists of classes dealing with practical
issues.
the instrument - GoNTogle
the instrument - GoNTogle
the instrument - GoNTogle
• We used GoNTogle
to generate a RDFS
knowledge base.
the instrument - GoNTogle
• We used GoNTogle
to generate a RDFS
knowledge base.
• GoNTogle uses the
weighted k-NN
algorithm to support
either manual, or
automated ontology-
based annotation.
the instrument - GoNTogle
• We used GoNTogle
to generate a RDFS
knowledge base.
• GoNTogle uses the
weighted k-NN
algorithm to support
either manual, or
automated ontology-
based annotation.
the instrument - GoNTogle
• We used GoNTogle
to generate a RDFS
knowledge base.
• GoNTogle uses the
weighted k-NN
algorithm to support
either manual, or
automated ontology-
based annotation.
• http://bit.ly/12nlryh
the process - 1/3
the process - 1/3
• GoNTogle estimates a score for each class/subclass, calculating
its presence in the k nearest neighbors.
the process - 1/3
• GoNTogle estimates a score for each class/subclass, calculating
its presence in the k nearest neighbors.
• We set a score threshold above which a class is assigned to a new
instance (optimal score: 0.18).
the process - 1/3
• GoNTogle estimates a score for each class/subclass, calculating
its presence in the k nearest neighbors.
• We set a score threshold above which a class is assigned to a new
instance (optimal score: 0.18).
• e user is presented with a ranked list of the suggested classes/
subclasses and their score ranging from 0 to 1.
the process - 1/3
• GoNTogle estimates a score for each class/subclass, calculating
its presence in the k nearest neighbors.
• We set a score threshold above which a class is assigned to a new
instance (optimal score: 0.18).
• e user is presented with a ranked list of the suggested classes/
subclasses and their score ranging from 0 to 1.
• 2,672 annotations were manually generated.
the process - 2/3
the process - 2/3
• RDFS statements were processed to construct a new data set
(removal of stopwords, symbols, lowercasing, etc.)
the process - 2/3
• RDFS statements were processed to construct a new data set
(removal of stopwords, symbols, lowercasing, etc.)
• Experiments both with un-stemmed (4,880 features) and
stemmed (3,257 features) words.
the process - 2/3
• RDFS statements were processed to construct a new data set
(removal of stopwords, symbols, lowercasing, etc.)
• Experiments both with un-stemmed (4,880 features) and
stemmed (3,257 features) words.
• Multi-label classification via the ML framework Meka.
the process - 2/3
• RDFS statements were processed to construct a new data set
(removal of stopwords, symbols, lowercasing, etc.)
• Experiments both with un-stemmed (4,880 features) and
stemmed (3,257 features) words.
• Multi-label classification via the ML framework Meka.
• Four methods
- binary
representation
- Label powersets
- RAkEL
- ML-kNN
• Four algorithms
- Naïve Bayes
- Multinomial
Naïve Bayes
- k-Nearest-
Neighbors
- Support Vector
Machines
• Four metrics
- Hamming Loss
- Accuracy
- One-error
- F1 macro
the process - 3/3
the process - 3/3
• Performance tests were repeated using GoNTogle.
the process - 3/3
• Performance tests were repeated using GoNTogle.
• GoNTogle’s algorithm achieves good results in relation to the
tested multi-label classification algorithms.
the process - 3/3
• Performance tests were repeated using GoNTogle.
• GoNTogle’s algorithm achieves good results in relation to the
tested multi-label classification algorithms.
0
0.2
0.4
0.6
0.8
1.0
Hamming Loss Accuracy One - Error F1 macro
0.44
0.27
0.63
0.02
0.39
0.29
0.49
0.02
the process - 3/3
• Performance tests were repeated using GoNTogle.
• GoNTogle’s algorithm achieves good results in relation to the
tested multi-label classification algorithms.
0
0.2
0.4
0.6
0.8
1.0
Hamming Loss Accuracy One - Error F1 macro
0.44
0.27
0.63
0.02
0.39
0.29
0.49
0.02
GoNTogle
Meka
part 3
how we discover
clustering - 1/3
clustering - 1/3
• e final data set consists of 224 vectors of 53 features
clustering - 1/3
• e final data set consists of 224 vectors of 53 features
- represents the assigned annotations from the DiLEO
vocabulary to the document corpus.
clustering - 1/3
• e final data set consists of 224 vectors of 53 features
- represents the assigned annotations from the DiLEO
vocabulary to the document corpus.
• We represent the annotated documents by 2 vector models:
clustering - 1/3
• e final data set consists of 224 vectors of 53 features
- represents the assigned annotations from the DiLEO
vocabulary to the document corpus.
• We represent the annotated documents by 2 vector models:
- binary: fi has the value of 1, if the respective to fi subclass is
assigned to the document m, otherwise 0.
clustering - 1/3
• e final data set consists of 224 vectors of 53 features
- represents the assigned annotations from the DiLEO
vocabulary to the document corpus.
• We represent the annotated documents by 2 vector models:
- binary: fi has the value of 1, if the respective to fi subclass is
assigned to the document m, otherwise 0.
- tf-idf: feature frequency ffi of fi in all vectors is equal to 1
when the respective subclass is annotated to the respective
document m; idfi is the inverse document frequency of the
feature i in documents M.
clustering - 2/3
clustering - 2/3
• We cluster the vector representations of the annotations by
applying 2 clustering algorithms:
clustering - 2/3
• We cluster the vector representations of the annotations by
applying 2 clustering algorithms:
- K-Means: partitions M data points to K clusters. e rate of
decrease peaked for K near 11 when plotted the Objective
function (cost or error) for various values of K.
clustering - 2/3
• We cluster the vector representations of the annotations by
applying 2 clustering algorithms:
- K-Means: partitions M data points to K clusters. e rate of
decrease peaked for K near 11 when plotted the Objective
function (cost or error) for various values of K.
- Agglomerative Hierarchical Clustering: a ‘bottom up’ built
hierarchy of clusters.
clustering - 3/3
clustering - 3/3
• We assess each feature of each cluster using the frequency
increase metric.
clustering - 3/3
• We assess each feature of each cluster using the frequency
increase metric.
- it calculates the increase of the frequency of a feature fi in the
cluster k (cfi,k) compared to its document frequency dfi in the
entire data set
clustering - 3/3
• We assess each feature of each cluster using the frequency
increase metric.
- it calculates the increase of the frequency of a feature fi in the
cluster k (cfi,k) compared to its document frequency dfi in the
entire data set
• We select the threshold a that maximizes the F1-measure, the
harmonic mean of Coverage and Dissimilarity mean.
clustering - 3/3
• We assess each feature of each cluster using the frequency
increase metric.
- it calculates the increase of the frequency of a feature fi in the
cluster k (cfi,k) compared to its document frequency dfi in the
entire data set
• We select the threshold a that maximizes the F1-measure, the
harmonic mean of Coverage and Dissimilarity mean.
- Coverage: the proportion of features participating in the
clusters to the total number of features
clustering - 3/3
• We assess each feature of each cluster using the frequency
increase metric.
- it calculates the increase of the frequency of a feature fi in the
cluster k (cfi,k) compared to its document frequency dfi in the
entire data set
• We select the threshold a that maximizes the F1-measure, the
harmonic mean of Coverage and Dissimilarity mean.
- Coverage: the proportion of features participating in the
clusters to the total number of features
- Dissimilarity mean: the average of the distinctiveness of the
clusters, defined in terms of the dissimilarity di,j between all
the possible pairs of the clusters.
metrics - F1-measure
metrics - F1-measure
0
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
metrics - F1-measure
0
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
K-Means tf-idf K-Means binary Hierarchical tf-idf
part 4
how (and what) we interpret
Levels
patterns
hasDimensionsType
isAimingAt
Research
Questions
isSupporting/
isSupportedBy
hasPerformed/
isPerformedIn
isUsedIn/
isUsing
Findings
CriteriaMetrics Factors
Means
Types
Criteria
Categories
hasConstituent/
isConstituting
Dimensions
technical
excellence
Instruments
software
Activity
report
Goals
design
Subjects
human
agents
Dimension
Type
summative
Means
survey
studies
isParticipatingIn
Means
laboratory studies
Characteristics
count
Characteristics
discipline
Dimensions
effectiveness
Objects
PROCEDURAL
LAYER
STRATEGIC
LAYER
K-Means tf-idf
patterns Research
Questions
hasPerformed/
isPerformedIn
Findings
CriteriaMetrics Factors
Criteria
Categories
hasConstituent/
isConstituting
isParticipatingIn
Instruments
Dimensions
effectiveness
Dimensions
Types
means
survey
studies
means
laboratory studies
Characteristics
Goal
describe
means type
quantitative
hasMeansType
activity
record
activity
compare
Level
interface
isAimingAt
isAffecting/
isAffectedBy
Objects
Subjects
human agents
PROCEDURAL
LAYER
STRATEGIC
LAYER
Hierarchical
part 5
conclusions
conclusions
conclusions
• e patterns reflect and - up to a point - confirm the
anecdotally evident research practices of DL researchers.
conclusions
• e patterns reflect and - up to a point - confirm the
anecdotally evident research practices of DL researchers.
• Patterns have similar properties to a map.
conclusions
• e patterns reflect and - up to a point - confirm the
anecdotally evident research practices of DL researchers.
• Patterns have similar properties to a map.
- ey can provide the main and the alternative routes one can
follow to reach to a destination, taking into account several
practical parameters that might not know.
conclusions
• e patterns reflect and - up to a point - confirm the
anecdotally evident research practices of DL researchers.
• Patterns have similar properties to a map.
- ey can provide the main and the alternative routes one can
follow to reach to a destination, taking into account several
practical parameters that might not know.
• By exploring previous profiles, one can weight all the available
options.
conclusions
• e patterns reflect and - up to a point - confirm the
anecdotally evident research practices of DL researchers.
• Patterns have similar properties to a map.
- ey can provide the main and the alternative routes one can
follow to reach to a destination, taking into account several
practical parameters that might not know.
• By exploring previous profiles, one can weight all the available
options.
• is approach can extend other coding methodologies in terms
of transparency, standardization and reusability.
ank you for your attention.
questions?

More Related Content

What's hot

التطبيق العملي لمهارات الكتابة العلمية
التطبيق العملي لمهارات الكتابة العلميةالتطبيق العملي لمهارات الكتابة العلمية
التطبيق العملي لمهارات الكتابة العلمية
مركز البحوث الأقسام العلمية
 
Doing your systematic review: managing data and reporting
Doing your systematic review: managing data and reportingDoing your systematic review: managing data and reporting
Doing your systematic review: managing data and reporting
University of Liverpool Library
 
CDISC-CDASH
CDISC-CDASHCDISC-CDASH
CDISC-CDASH
Gowthami6789
 
Doing a systematic review: top tips for progressing your review
Doing a systematic review: top tips for progressing your reviewDoing a systematic review: top tips for progressing your review
Doing a systematic review: top tips for progressing your review
University of Liverpool Library
 
Research Data Management and Reproducibility
Research Data Management and ReproducibilityResearch Data Management and Reproducibility
Research Data Management and Reproducibility
University of Liverpool Library
 
2010 E-Library Resources GR June 29 2010
2010 E-Library Resources GR June 29 20102010 E-Library Resources GR June 29 2010
2010 E-Library Resources GR June 29 2010
Mary Jo Devereaux, MLS
 

What's hot (6)

التطبيق العملي لمهارات الكتابة العلمية
التطبيق العملي لمهارات الكتابة العلميةالتطبيق العملي لمهارات الكتابة العلمية
التطبيق العملي لمهارات الكتابة العلمية
 
Doing your systematic review: managing data and reporting
Doing your systematic review: managing data and reportingDoing your systematic review: managing data and reporting
Doing your systematic review: managing data and reporting
 
CDISC-CDASH
CDISC-CDASHCDISC-CDASH
CDISC-CDASH
 
Doing a systematic review: top tips for progressing your review
Doing a systematic review: top tips for progressing your reviewDoing a systematic review: top tips for progressing your review
Doing a systematic review: top tips for progressing your review
 
Research Data Management and Reproducibility
Research Data Management and ReproducibilityResearch Data Management and Reproducibility
Research Data Management and Reproducibility
 
2010 E-Library Resources GR June 29 2010
2010 E-Library Resources GR June 29 20102010 E-Library Resources GR June 29 2010
2010 E-Library Resources GR June 29 2010
 

Viewers also liked

The Open Library, Public Domain Wiki, and other Realized Myths of Creative Co...
The Open Library, Public Domain Wiki, and other Realized Myths of Creative Co...The Open Library, Public Domain Wiki, and other Realized Myths of Creative Co...
The Open Library, Public Domain Wiki, and other Realized Myths of Creative Co...
Jon Phillips
 
Library Management System
Library Management SystemLibrary Management System
Library Management System
Bijay Chaurasiya
 
Library management system
Library management systemLibrary management system
Library management system
Raaghav Bhatia
 
Library management system presentation
Library management system presentation Library management system presentation
Library management system presentation
Smit Patel
 
Library Management System
Library Management SystemLibrary Management System
Library Management System
Aditya Shah
 
AUTOMATED LIBRARY MANAGEMENT SYSTEM
AUTOMATED LIBRARY MANAGEMENT SYSTEMAUTOMATED LIBRARY MANAGEMENT SYSTEM
AUTOMATED LIBRARY MANAGEMENT SYSTEM
Abhishek Kumar
 
Library mangement system project srs documentation.doc
Library mangement system project srs documentation.docLibrary mangement system project srs documentation.doc
Library mangement system project srs documentation.doc
jimmykhan
 

Viewers also liked (7)

The Open Library, Public Domain Wiki, and other Realized Myths of Creative Co...
The Open Library, Public Domain Wiki, and other Realized Myths of Creative Co...The Open Library, Public Domain Wiki, and other Realized Myths of Creative Co...
The Open Library, Public Domain Wiki, and other Realized Myths of Creative Co...
 
Library Management System
Library Management SystemLibrary Management System
Library Management System
 
Library management system
Library management systemLibrary management system
Library management system
 
Library management system presentation
Library management system presentation Library management system presentation
Library management system presentation
 
Library Management System
Library Management SystemLibrary Management System
Library Management System
 
AUTOMATED LIBRARY MANAGEMENT SYSTEM
AUTOMATED LIBRARY MANAGEMENT SYSTEMAUTOMATED LIBRARY MANAGEMENT SYSTEM
AUTOMATED LIBRARY MANAGEMENT SYSTEM
 
Library mangement system project srs documentation.doc
Library mangement system project srs documentation.docLibrary mangement system project srs documentation.doc
Library mangement system project srs documentation.doc
 

Similar to Charting the Digital Library Evaluation Domain with a Semantically Enhanced Mining Methodology

Canon of classification
Canon of classificationCanon of classification
Canon of classification
avid
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
Rinke Hoekstra
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
Angelo Salatino
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
Carole Goble
 
Getting Ready for the Next Generation Science Standards: Reviewing the Draft ...
Getting Ready for the Next Generation Science Standards: Reviewing the Draft ...Getting Ready for the Next Generation Science Standards: Reviewing the Draft ...
Getting Ready for the Next Generation Science Standards: Reviewing the Draft ...
The Ohio State University, College of Education and Human Ecology
 
Research Methodology Part I
Research Methodology Part IResearch Methodology Part I
Research Methodology Part I
Anwar Siddiqui
 
researchmethodologyi-140707092303-phpapp02.pdf
researchmethodologyi-140707092303-phpapp02.pdfresearchmethodologyi-140707092303-phpapp02.pdf
researchmethodologyi-140707092303-phpapp02.pdf
Mdali657802
 
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Khirulnizam Abd Rahman
 
Systematic literature review technique.pptx
Systematic literature review technique.pptxSystematic literature review technique.pptx
Systematic literature review technique.pptx
TANMAY DAS GUPTA
 
An OWA-Based Multi-Criteria System For Assigning Reviewers
An OWA-Based Multi-Criteria System For Assigning ReviewersAn OWA-Based Multi-Criteria System For Assigning Reviewers
An OWA-Based Multi-Criteria System For Assigning Reviewers
Dereck Downing
 
Part 1 Research workshop
Part 1 Research workshopPart 1 Research workshop
Part 1 Research workshop
Researchworkshop
 
Application of a Novel Subject Classification Scheme for a Bibliographic Data...
Application of a Novel Subject Classification Scheme for a Bibliographic Data...Application of a Novel Subject Classification Scheme for a Bibliographic Data...
Application of a Novel Subject Classification Scheme for a Bibliographic Data...
National Institute of Informatics
 
Discovery Tools for Open Access Repositories: A Literature Mapping
Discovery Tools for Open Access Repositories: A Literature MappingDiscovery Tools for Open Access Repositories: A Literature Mapping
Discovery Tools for Open Access Repositories: A Literature Mapping
Grial - University of Salamanca
 
Systematic Literature Reviews : Concise Overview
Systematic Literature Reviews : Concise OverviewSystematic Literature Reviews : Concise Overview
Systematic Literature Reviews : Concise Overview
youkayaslam
 
Business research methods 1
Business research methods 1Business research methods 1
Business research methods 1
Free Talk 2 Other
 
Cardiff bibliometrics repository ref_july 2014
Cardiff bibliometrics repository  ref_july 2014Cardiff bibliometrics repository  ref_july 2014
Cardiff bibliometrics repository ref_july 2014
rachaelwhitfield
 
Improving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log AnalysisImproving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log Analysis
Stuart Wrigley
 
Use of Wikipedia categories on information retrieval: a brief research
Use of Wikipedia categories on information retrieval: a brief researchUse of Wikipedia categories on information retrieval: a brief research
Use of Wikipedia categories on information retrieval: a brief research
Jesús Tramullas
 
Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...
Kai Li
 

Similar to Charting the Digital Library Evaluation Domain with a Semantically Enhanced Mining Methodology (20)

Canon of classification
Canon of classificationCanon of classification
Canon of classification
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
Getting Ready for the Next Generation Science Standards: Reviewing the Draft ...
Getting Ready for the Next Generation Science Standards: Reviewing the Draft ...Getting Ready for the Next Generation Science Standards: Reviewing the Draft ...
Getting Ready for the Next Generation Science Standards: Reviewing the Draft ...
 
Research Methodology Part I
Research Methodology Part IResearch Methodology Part I
Research Methodology Part I
 
researchmethodologyi-140707092303-phpapp02.pdf
researchmethodologyi-140707092303-phpapp02.pdfresearchmethodologyi-140707092303-phpapp02.pdf
researchmethodologyi-140707092303-phpapp02.pdf
 
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
 
Systematic literature review technique.pptx
Systematic literature review technique.pptxSystematic literature review technique.pptx
Systematic literature review technique.pptx
 
An OWA-Based Multi-Criteria System For Assigning Reviewers
An OWA-Based Multi-Criteria System For Assigning ReviewersAn OWA-Based Multi-Criteria System For Assigning Reviewers
An OWA-Based Multi-Criteria System For Assigning Reviewers
 
Part 1 Research workshop
Part 1 Research workshopPart 1 Research workshop
Part 1 Research workshop
 
Application of a Novel Subject Classification Scheme for a Bibliographic Data...
Application of a Novel Subject Classification Scheme for a Bibliographic Data...Application of a Novel Subject Classification Scheme for a Bibliographic Data...
Application of a Novel Subject Classification Scheme for a Bibliographic Data...
 
Discovery Tools for Open Access Repositories: A Literature Mapping
Discovery Tools for Open Access Repositories: A Literature MappingDiscovery Tools for Open Access Repositories: A Literature Mapping
Discovery Tools for Open Access Repositories: A Literature Mapping
 
Systematic Literature Reviews : Concise Overview
Systematic Literature Reviews : Concise OverviewSystematic Literature Reviews : Concise Overview
Systematic Literature Reviews : Concise Overview
 
Business research methods 1
Business research methods 1Business research methods 1
Business research methods 1
 
Cardiff bibliometrics repository ref_july 2014
Cardiff bibliometrics repository  ref_july 2014Cardiff bibliometrics repository  ref_july 2014
Cardiff bibliometrics repository ref_july 2014
 
Improving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log AnalysisImproving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log Analysis
 
Use of Wikipedia categories on information retrieval: a brief research
Use of Wikipedia categories on information retrieval: a brief researchUse of Wikipedia categories on information retrieval: a brief research
Use of Wikipedia categories on information retrieval: a brief research
 
Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...
 

More from Giannis Tsakonas

Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο Ιστό
Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο ΙστόΑρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο Ιστό
Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο Ιστό
Giannis Tsakonas
 
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
Giannis Tsakonas
 
Increasing traceability of physical library items through Koha: the case of S...
Increasing traceability of physical library items through Koha: the case of S...Increasing traceability of physical library items through Koha: the case of S...
Increasing traceability of physical library items through Koha: the case of S...
Giannis Tsakonas
 
Ακαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικές
Ακαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικέςΑκαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικές
Ακαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικές
Giannis Tsakonas
 
We were group no 2: notes for the MLAS2015 workshop
We were group no 2: notes for the MLAS2015 workshopWe were group no 2: notes for the MLAS2015 workshop
We were group no 2: notes for the MLAS2015 workshop
Giannis Tsakonas
 
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόητα
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόηταΒιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόητα
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόητα
Giannis Tsakonas
 
{Tech}changes: the technological state of Greek Libraries.
{Tech}changes: the technological state of Greek Libraries.{Tech}changes: the technological state of Greek Libraries.
{Tech}changes: the technological state of Greek Libraries.
Giannis Tsakonas
 
Affective relationships between users & libraries in times of economic stress
Affective relationships between users & libraries in times of economic stressAffective relationships between users & libraries in times of economic stress
Affective relationships between users & libraries in times of economic stress
Giannis Tsakonas
 
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο Κύπρου
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο ΚύπρουΔεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο Κύπρου
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο ΚύπρουGiannis Tsakonas
 
FRBR και Linked Data - Σεμινάριο Αθήνας
FRBR και Linked Data -  Σεμινάριο ΑθήναςFRBR και Linked Data -  Σεμινάριο Αθήνας
FRBR και Linked Data - Σεμινάριο Αθήνας
Giannis Tsakonas
 
Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...
Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...
Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...Giannis Tsakonas
 
Policies for geospatial collections: a research in US and Canadian academic l...
Policies for geospatial collections: a research in US and Canadian academic l...Policies for geospatial collections: a research in US and Canadian academic l...
Policies for geospatial collections: a research in US and Canadian academic l...
Giannis Tsakonas
 
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...
Giannis Tsakonas
 
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Giannis Tsakonas
 
Path-based MXML Storage and Querying
Path-based MXML Storage and QueryingPath-based MXML Storage and Querying
Path-based MXML Storage and Querying
Giannis Tsakonas
 
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked DataΔεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data
Giannis Tsakonas
 
Open Bibliographic Data and E-LIS
Open Bibliographic Data and E-LISOpen Bibliographic Data and E-LIS
Open Bibliographic Data and E-LIS
Giannis Tsakonas
 
Dileo Presentation (in English)
Dileo Presentation (in English)Dileo Presentation (in English)
Dileo Presentation (in English)
Giannis Tsakonas
 
Evaluation Insights to Key Processes of Digital Repositories
Evaluation Insights to Key Processes of Digital RepositoriesEvaluation Insights to Key Processes of Digital Repositories
Evaluation Insights to Key Processes of Digital Repositories
Giannis Tsakonas
 
E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληρ...
E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληρ...E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληρ...
E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληρ...
Giannis Tsakonas
 

More from Giannis Tsakonas (20)

Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο Ιστό
Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο ΙστόΑρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο Ιστό
Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο Ιστό
 
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
 
Increasing traceability of physical library items through Koha: the case of S...
Increasing traceability of physical library items through Koha: the case of S...Increasing traceability of physical library items through Koha: the case of S...
Increasing traceability of physical library items through Koha: the case of S...
 
Ακαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικές
Ακαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικέςΑκαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικές
Ακαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικές
 
We were group no 2: notes for the MLAS2015 workshop
We were group no 2: notes for the MLAS2015 workshopWe were group no 2: notes for the MLAS2015 workshop
We were group no 2: notes for the MLAS2015 workshop
 
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόητα
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόηταΒιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόητα
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόητα
 
{Tech}changes: the technological state of Greek Libraries.
{Tech}changes: the technological state of Greek Libraries.{Tech}changes: the technological state of Greek Libraries.
{Tech}changes: the technological state of Greek Libraries.
 
Affective relationships between users & libraries in times of economic stress
Affective relationships between users & libraries in times of economic stressAffective relationships between users & libraries in times of economic stress
Affective relationships between users & libraries in times of economic stress
 
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο Κύπρου
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο ΚύπρουΔεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο Κύπρου
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο Κύπρου
 
FRBR και Linked Data - Σεμινάριο Αθήνας
FRBR και Linked Data -  Σεμινάριο ΑθήναςFRBR και Linked Data -  Σεμινάριο Αθήνας
FRBR και Linked Data - Σεμινάριο Αθήνας
 
Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...
Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...
Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...
 
Policies for geospatial collections: a research in US and Canadian academic l...
Policies for geospatial collections: a research in US and Canadian academic l...Policies for geospatial collections: a research in US and Canadian academic l...
Policies for geospatial collections: a research in US and Canadian academic l...
 
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...
 
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
 
Path-based MXML Storage and Querying
Path-based MXML Storage and QueryingPath-based MXML Storage and Querying
Path-based MXML Storage and Querying
 
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked DataΔεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data
 
Open Bibliographic Data and E-LIS
Open Bibliographic Data and E-LISOpen Bibliographic Data and E-LIS
Open Bibliographic Data and E-LIS
 
Dileo Presentation (in English)
Dileo Presentation (in English)Dileo Presentation (in English)
Dileo Presentation (in English)
 
Evaluation Insights to Key Processes of Digital Repositories
Evaluation Insights to Key Processes of Digital RepositoriesEvaluation Insights to Key Processes of Digital Repositories
Evaluation Insights to Key Processes of Digital Repositories
 
E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληρ...
E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληρ...E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληρ...
E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληρ...
 

Recently uploaded

Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
NgcHiNguyn25
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 

Recently uploaded (20)

Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 

Charting the Digital Library Evaluation Domain with a Semantically Enhanced Mining Methodology

  • 1. Charting the Digital Library Evaluation Domain with a Semantically Enhanced Mining Methodology S Eleni Afiontzi,1 Giannis Kazadeis,1 Leonidas Papachristopoulos,2 Michalis Sfakakis,2 Giannis Tsakonas,2 Christos Papatheodorou2 13th ACM/IEEE Joint Conference on Digital Libraries, July 22-26, Indianapolis, IN, USA 1. Department of Informatics, Athens University of Economics & Business 2. Database & Information Systems Group, Department of Archives & Library Science, Ionian University
  • 2.
  • 3. aim & scope of research
  • 4. aim & scope of research • To propose a methodology for discovering patterns in the scientific literature.
  • 5. aim & scope of research • To propose a methodology for discovering patterns in the scientific literature. • Our case study is performed in the digital library evaluation domain and its conference literature.
  • 6. aim & scope of research • To propose a methodology for discovering patterns in the scientific literature. • Our case study is performed in the digital library evaluation domain and its conference literature. • We question:
  • 7. aim & scope of research • To propose a methodology for discovering patterns in the scientific literature. • Our case study is performed in the digital library evaluation domain and its conference literature. • We question: - how we select relevant studies,
  • 8. aim & scope of research • To propose a methodology for discovering patterns in the scientific literature. • Our case study is performed in the digital library evaluation domain and its conference literature. • We question: - how we select relevant studies, - how we annotate them,
  • 9. aim & scope of research • To propose a methodology for discovering patterns in the scientific literature. • Our case study is performed in the digital library evaluation domain and its conference literature. • We question: - how we select relevant studies, - how we annotate them, - how we discover these patterns,
  • 10. aim & scope of research • To propose a methodology for discovering patterns in the scientific literature. • Our case study is performed in the digital library evaluation domain and its conference literature. • We question: - how we select relevant studies, - how we annotate them, - how we discover these patterns, in an effective, machine-operated way, in order to have reusable and interpretable data?
  • 11.
  • 12. why
  • 13. why • Abundance of scientific information
  • 14. why • Abundance of scientific information • Limitations of existing tools, such as reusability
  • 15. why • Abundance of scientific information • Limitations of existing tools, such as reusability • Lack of contextualized analytic tools
  • 16. why • Abundance of scientific information • Limitations of existing tools, such as reusability • Lack of contextualized analytic tools • Supervised automated processes
  • 17.
  • 19. panorama 1. Document classification to identify relevant papers
  • 20. panorama 1. Document classification to identify relevant papers - We use a corpus of 1,824 papers from the JCDL and ECDL (now TPDL) conferences, era 2001-2011.
  • 21. panorama 1. Document classification to identify relevant papers - We use a corpus of 1,824 papers from the JCDL and ECDL (now TPDL) conferences, era 2001-2011. 2. Semantic annotation processes to mark up important concepts
  • 22. panorama 1. Document classification to identify relevant papers - We use a corpus of 1,824 papers from the JCDL and ECDL (now TPDL) conferences, era 2001-2011. 2. Semantic annotation processes to mark up important concepts - We use a schema for semantic annotation, the Digital Library Evaluation Ontology, and a semantic annotation tool, GoNTogle.
  • 23. panorama 1. Document classification to identify relevant papers - We use a corpus of 1,824 papers from the JCDL and ECDL (now TPDL) conferences, era 2001-2011. 2. Semantic annotation processes to mark up important concepts - We use a schema for semantic annotation, the Digital Library Evaluation Ontology, and a semantic annotation tool, GoNTogle. 3. Clustering to form coherent groups (K=11)
  • 24. panorama 1. Document classification to identify relevant papers - We use a corpus of 1,824 papers from the JCDL and ECDL (now TPDL) conferences, era 2001-2011. 2. Semantic annotation processes to mark up important concepts - We use a schema for semantic annotation, the Digital Library Evaluation Ontology, and a semantic annotation tool, GoNTogle. 3. Clustering to form coherent groups (K=11) 4. Interpretation with the assistance of the ontology schema
  • 25. panorama 1. Document classification to identify relevant papers - We use a corpus of 1,824 papers from the JCDL and ECDL (now TPDL) conferences, era 2001-2011. 2. Semantic annotation processes to mark up important concepts - We use a schema for semantic annotation, the Digital Library Evaluation Ontology, and a semantic annotation tool, GoNTogle. 3. Clustering to form coherent groups (K=11) 4. Interpretation with the assistance of the ontology schema
  • 26. panorama 1. Document classification to identify relevant papers - We use a corpus of 1,824 papers from the JCDL and ECDL (now TPDL) conferences, era 2001-2011. 2. Semantic annotation processes to mark up important concepts - We use a schema for semantic annotation, the Digital Library Evaluation Ontology, and a semantic annotation tool, GoNTogle. 3. Clustering to form coherent groups (K=11) 4. Interpretation with the assistance of the ontology schema • During this process we perform benchmarking tests to qualify specific components to effectively automate the exploration of the literature and the discovery of research patterns.
  • 27.
  • 28. part 1 how we identify relevant studies
  • 29.
  • 31. training phase • e aim was to train a classifier to identify relevant papers.
  • 32. training phase • e aim was to train a classifier to identify relevant papers. • Categorization
  • 33. training phase • e aim was to train a classifier to identify relevant papers. • Categorization - two researchers categorized, a third one supervised
  • 34. training phase • e aim was to train a classifier to identify relevant papers. • Categorization - two researchers categorized, a third one supervised - descriptors: title, abstract & author keywords
  • 35. training phase • e aim was to train a classifier to identify relevant papers. • Categorization - two researchers categorized, a third one supervised - descriptors: title, abstract & author keywords - rater’s agreement: 82.96% for JCDL, 78% for ECDL
  • 36. training phase • e aim was to train a classifier to identify relevant papers. • Categorization - two researchers categorized, a third one supervised - descriptors: title, abstract & author keywords - rater’s agreement: 82.96% for JCDL, 78% for ECDL - inter-rater agreement: moderate levels of Cohen’s Kappa
  • 37. training phase • e aim was to train a classifier to identify relevant papers. • Categorization - two researchers categorized, a third one supervised - descriptors: title, abstract & author keywords - rater’s agreement: 82.96% for JCDL, 78% for ECDL - inter-rater agreement: moderate levels of Cohen’s Kappa - 12% positive # 88% negative
  • 38. training phase • e aim was to train a classifier to identify relevant papers. • Categorization - two researchers categorized, a third one supervised - descriptors: title, abstract & author keywords - rater’s agreement: 82.96% for JCDL, 78% for ECDL - inter-rater agreement: moderate levels of Cohen’s Kappa - 12% positive # 88% negative • Skewness of data addressed via resampling:
  • 39. training phase • e aim was to train a classifier to identify relevant papers. • Categorization - two researchers categorized, a third one supervised - descriptors: title, abstract & author keywords - rater’s agreement: 82.96% for JCDL, 78% for ECDL - inter-rater agreement: moderate levels of Cohen’s Kappa - 12% positive # 88% negative • Skewness of data addressed via resampling: - under-sampling (Tomek Links)
  • 40. training phase • e aim was to train a classifier to identify relevant papers. • Categorization - two researchers categorized, a third one supervised - descriptors: title, abstract & author keywords - rater’s agreement: 82.96% for JCDL, 78% for ECDL - inter-rater agreement: moderate levels of Cohen’s Kappa - 12% positive # 88% negative • Skewness of data addressed via resampling: - under-sampling (Tomek Links) - over-sampling (random over-sampling)
  • 41.
  • 43. corpus definition • Classification algorithm: Naïve Bayes
  • 44. corpus definition • Classification algorithm: Naïve Bayes • Two sub-sets: a development (75%) and a test (25%)
  • 45. corpus definition • Classification algorithm: Naïve Bayes • Two sub-sets: a development (75%) and a test (25%) • Ten-fold validation: the development set was randomly divided to 10 equal; 9/10 as training set and 1/10 as test set.
  • 46. corpus definition • Classification algorithm: Naïve Bayes • Two sub-sets: a development (75%) and a test (25%) • Ten-fold validation: the development set was randomly divided to 10 equal; 9/10 as training set and 1/10 as test set. 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0
  • 47. corpus definition • Classification algorithm: Naïve Bayes • Two sub-sets: a development (75%) and a test (25%) • Ten-fold validation: the development set was randomly divided to 10 equal; 9/10 as training set and 1/10 as test set. 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 Test Development
  • 48. corpus definition • Classification algorithm: Naïve Bayes • Two sub-sets: a development (75%) and a test (25%) • Ten-fold validation: the development set was randomly divided to 10 equal; 9/10 as training set and 1/10 as test set. 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 Test Development fp rate
  • 49. corpus definition • Classification algorithm: Naïve Bayes • Two sub-sets: a development (75%) and a test (25%) • Ten-fold validation: the development set was randomly divided to 10 equal; 9/10 as training set and 1/10 as test set. 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 Test Development fp rate tp rate
  • 50. corpus definition • Classification algorithm: Naïve Bayes • Two sub-sets: a development (75%) and a test (25%) • Ten-fold validation: the development set was randomly divided to 10 equal; 9/10 as training set and 1/10 as test set. 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 Test Development fp rate tp rate
  • 51.
  • 52. part 2 how we annotate
  • 53. the schema - DiLEO
  • 54. the schema - DiLEO • DiLEO aims to conceptualize the DL evaluation domain by exploring its key entities, their attributes and their relationships.
  • 55. the schema - DiLEO • DiLEO aims to conceptualize the DL evaluation domain by exploring its key entities, their attributes and their relationships. • A two layered ontology:
  • 56. the schema - DiLEO • DiLEO aims to conceptualize the DL evaluation domain by exploring its key entities, their attributes and their relationships. • A two layered ontology: - Strategic level: consists of a set of classes related with the scope and aim of an evaluation.
  • 57. the schema - DiLEO • DiLEO aims to conceptualize the DL evaluation domain by exploring its key entities, their attributes and their relationships. • A two layered ontology: - Strategic level: consists of a set of classes related with the scope and aim of an evaluation. - Procedural level: consists of classes dealing with practical issues.
  • 58.
  • 59. the instrument - GoNTogle
  • 60. the instrument - GoNTogle
  • 61. the instrument - GoNTogle • We used GoNTogle to generate a RDFS knowledge base.
  • 62. the instrument - GoNTogle • We used GoNTogle to generate a RDFS knowledge base. • GoNTogle uses the weighted k-NN algorithm to support either manual, or automated ontology- based annotation.
  • 63. the instrument - GoNTogle • We used GoNTogle to generate a RDFS knowledge base. • GoNTogle uses the weighted k-NN algorithm to support either manual, or automated ontology- based annotation.
  • 64. the instrument - GoNTogle • We used GoNTogle to generate a RDFS knowledge base. • GoNTogle uses the weighted k-NN algorithm to support either manual, or automated ontology- based annotation. • http://bit.ly/12nlryh
  • 65.
  • 67. the process - 1/3 • GoNTogle estimates a score for each class/subclass, calculating its presence in the k nearest neighbors.
  • 68. the process - 1/3 • GoNTogle estimates a score for each class/subclass, calculating its presence in the k nearest neighbors. • We set a score threshold above which a class is assigned to a new instance (optimal score: 0.18).
  • 69. the process - 1/3 • GoNTogle estimates a score for each class/subclass, calculating its presence in the k nearest neighbors. • We set a score threshold above which a class is assigned to a new instance (optimal score: 0.18). • e user is presented with a ranked list of the suggested classes/ subclasses and their score ranging from 0 to 1.
  • 70. the process - 1/3 • GoNTogle estimates a score for each class/subclass, calculating its presence in the k nearest neighbors. • We set a score threshold above which a class is assigned to a new instance (optimal score: 0.18). • e user is presented with a ranked list of the suggested classes/ subclasses and their score ranging from 0 to 1. • 2,672 annotations were manually generated.
  • 71.
  • 73. the process - 2/3 • RDFS statements were processed to construct a new data set (removal of stopwords, symbols, lowercasing, etc.)
  • 74. the process - 2/3 • RDFS statements were processed to construct a new data set (removal of stopwords, symbols, lowercasing, etc.) • Experiments both with un-stemmed (4,880 features) and stemmed (3,257 features) words.
  • 75. the process - 2/3 • RDFS statements were processed to construct a new data set (removal of stopwords, symbols, lowercasing, etc.) • Experiments both with un-stemmed (4,880 features) and stemmed (3,257 features) words. • Multi-label classification via the ML framework Meka.
  • 76. the process - 2/3 • RDFS statements were processed to construct a new data set (removal of stopwords, symbols, lowercasing, etc.) • Experiments both with un-stemmed (4,880 features) and stemmed (3,257 features) words. • Multi-label classification via the ML framework Meka. • Four methods - binary representation - Label powersets - RAkEL - ML-kNN • Four algorithms - Naïve Bayes - Multinomial Naïve Bayes - k-Nearest- Neighbors - Support Vector Machines • Four metrics - Hamming Loss - Accuracy - One-error - F1 macro
  • 77.
  • 79. the process - 3/3 • Performance tests were repeated using GoNTogle.
  • 80. the process - 3/3 • Performance tests were repeated using GoNTogle. • GoNTogle’s algorithm achieves good results in relation to the tested multi-label classification algorithms.
  • 81. the process - 3/3 • Performance tests were repeated using GoNTogle. • GoNTogle’s algorithm achieves good results in relation to the tested multi-label classification algorithms. 0 0.2 0.4 0.6 0.8 1.0 Hamming Loss Accuracy One - Error F1 macro 0.44 0.27 0.63 0.02 0.39 0.29 0.49 0.02
  • 82. the process - 3/3 • Performance tests were repeated using GoNTogle. • GoNTogle’s algorithm achieves good results in relation to the tested multi-label classification algorithms. 0 0.2 0.4 0.6 0.8 1.0 Hamming Loss Accuracy One - Error F1 macro 0.44 0.27 0.63 0.02 0.39 0.29 0.49 0.02 GoNTogle Meka
  • 83.
  • 84. part 3 how we discover
  • 85.
  • 87. clustering - 1/3 • e final data set consists of 224 vectors of 53 features
  • 88. clustering - 1/3 • e final data set consists of 224 vectors of 53 features - represents the assigned annotations from the DiLEO vocabulary to the document corpus.
  • 89. clustering - 1/3 • e final data set consists of 224 vectors of 53 features - represents the assigned annotations from the DiLEO vocabulary to the document corpus. • We represent the annotated documents by 2 vector models:
  • 90. clustering - 1/3 • e final data set consists of 224 vectors of 53 features - represents the assigned annotations from the DiLEO vocabulary to the document corpus. • We represent the annotated documents by 2 vector models: - binary: fi has the value of 1, if the respective to fi subclass is assigned to the document m, otherwise 0.
  • 91. clustering - 1/3 • e final data set consists of 224 vectors of 53 features - represents the assigned annotations from the DiLEO vocabulary to the document corpus. • We represent the annotated documents by 2 vector models: - binary: fi has the value of 1, if the respective to fi subclass is assigned to the document m, otherwise 0. - tf-idf: feature frequency ffi of fi in all vectors is equal to 1 when the respective subclass is annotated to the respective document m; idfi is the inverse document frequency of the feature i in documents M.
  • 92.
  • 94. clustering - 2/3 • We cluster the vector representations of the annotations by applying 2 clustering algorithms:
  • 95. clustering - 2/3 • We cluster the vector representations of the annotations by applying 2 clustering algorithms: - K-Means: partitions M data points to K clusters. e rate of decrease peaked for K near 11 when plotted the Objective function (cost or error) for various values of K.
  • 96. clustering - 2/3 • We cluster the vector representations of the annotations by applying 2 clustering algorithms: - K-Means: partitions M data points to K clusters. e rate of decrease peaked for K near 11 when plotted the Objective function (cost or error) for various values of K. - Agglomerative Hierarchical Clustering: a ‘bottom up’ built hierarchy of clusters.
  • 97.
  • 99. clustering - 3/3 • We assess each feature of each cluster using the frequency increase metric.
  • 100. clustering - 3/3 • We assess each feature of each cluster using the frequency increase metric. - it calculates the increase of the frequency of a feature fi in the cluster k (cfi,k) compared to its document frequency dfi in the entire data set
  • 101. clustering - 3/3 • We assess each feature of each cluster using the frequency increase metric. - it calculates the increase of the frequency of a feature fi in the cluster k (cfi,k) compared to its document frequency dfi in the entire data set • We select the threshold a that maximizes the F1-measure, the harmonic mean of Coverage and Dissimilarity mean.
  • 102. clustering - 3/3 • We assess each feature of each cluster using the frequency increase metric. - it calculates the increase of the frequency of a feature fi in the cluster k (cfi,k) compared to its document frequency dfi in the entire data set • We select the threshold a that maximizes the F1-measure, the harmonic mean of Coverage and Dissimilarity mean. - Coverage: the proportion of features participating in the clusters to the total number of features
  • 103. clustering - 3/3 • We assess each feature of each cluster using the frequency increase metric. - it calculates the increase of the frequency of a feature fi in the cluster k (cfi,k) compared to its document frequency dfi in the entire data set • We select the threshold a that maximizes the F1-measure, the harmonic mean of Coverage and Dissimilarity mean. - Coverage: the proportion of features participating in the clusters to the total number of features - Dissimilarity mean: the average of the distinctiveness of the clusters, defined in terms of the dissimilarity di,j between all the possible pairs of the clusters.
  • 104.
  • 106. metrics - F1-measure 0 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  • 107. metrics - F1-measure 0 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 K-Means tf-idf K-Means binary Hierarchical tf-idf
  • 108.
  • 109. part 4 how (and what) we interpret
  • 111. patterns Research Questions hasPerformed/ isPerformedIn Findings CriteriaMetrics Factors Criteria Categories hasConstituent/ isConstituting isParticipatingIn Instruments Dimensions effectiveness Dimensions Types means survey studies means laboratory studies Characteristics Goal describe means type quantitative hasMeansType activity record activity compare Level interface isAimingAt isAffecting/ isAffectedBy Objects Subjects human agents PROCEDURAL LAYER STRATEGIC LAYER Hierarchical
  • 112.
  • 114.
  • 116. conclusions • e patterns reflect and - up to a point - confirm the anecdotally evident research practices of DL researchers.
  • 117. conclusions • e patterns reflect and - up to a point - confirm the anecdotally evident research practices of DL researchers. • Patterns have similar properties to a map.
  • 118. conclusions • e patterns reflect and - up to a point - confirm the anecdotally evident research practices of DL researchers. • Patterns have similar properties to a map. - ey can provide the main and the alternative routes one can follow to reach to a destination, taking into account several practical parameters that might not know.
  • 119. conclusions • e patterns reflect and - up to a point - confirm the anecdotally evident research practices of DL researchers. • Patterns have similar properties to a map. - ey can provide the main and the alternative routes one can follow to reach to a destination, taking into account several practical parameters that might not know. • By exploring previous profiles, one can weight all the available options.
  • 120. conclusions • e patterns reflect and - up to a point - confirm the anecdotally evident research practices of DL researchers. • Patterns have similar properties to a map. - ey can provide the main and the alternative routes one can follow to reach to a destination, taking into account several practical parameters that might not know. • By exploring previous profiles, one can weight all the available options. • is approach can extend other coding methodologies in terms of transparency, standardization and reusability.
  • 121. ank you for your attention. questions?