Those slides have been prepared by my students from the knowAAN project group at the University of Paderborn. In the project group we worked on the (semi-)automatic analysis of scientific papers in order to enhance awareness of researchers.
Final presentation of the project group Knowledge Awareness in Artefact-Actor-Networks (knowAAN)
1. Project group knowAAN
Final presentation
Computer Science Education Group
University of Paderborn
October 20th 2011
2. Overview
Overview
Introduction
System components & Work flow
Demonstration
Development process
Summary & Outlook
Time for further questions of detail
PG knowAAN 2
3. Overview
Overview: First part
Goals
Extraction & Storage (of data)
Exploration (of data)
System components & Work flow
Analysis & Visualization (of data)
PG knowAAN 3
4. Goals
Goals
Explore research networks
Based on: Artifacts (scientific publications) and metadata
Combination and analysis of data
Computation of similarities of full texts
Support for conference management system Ginkgo
Data visualization
Recommendations
(Source: PG knowAAN project description)
PG knowAAN 4
5. Goals
Imagine you are interested in a conference.
You downloaded the papers of 2 or 3 years.
Now you have nearly 100 publications.
How do you explore them?
100 publications. Do you know tools?
PG knowAAN 5
14. DocumentBrowser: RoundTrip : RoundTripExecutor : PDFToText : Parscit: Languagedetection: Lemmatizer: NounExtraction: Solr: DB:
a / 1) .addPDF
a / 2) .writeToFS
a / 2) Path
a / 3) .createThread
.submitThread
a / 3)
a / 1)
b / 1) .run
b / 2) .getText
b / 2) Text
b / 3) .ParseFullText
b / 3) ParscitXML
b / 4) .extractBodyAndAstract
b / 4) BodyAndAbstract
b / 5) .getLanguage
b / 5) LanguageString
b / 6) .lemmatize
b / 6) LemmatizedText
b / 7) .extractNouns
b / 7) NounsList
b / 8) .lemmatizeNounslist
b / 8) LemmatizedNouns
b / 9) .ReduceToTopNouns
b / 9) TopNouns
b / 10) .writeToFiles
b / 10) Paths
b / 11) .addTexts
b / 11) Solrid
b / 12) .addPublication
b / 12)
b / 1)
22. Development process
Methods of agile software development
Weekly meetings
Sit together (as much as possible)
Automated building system
Continuous integration
Issue tracking
PG knowAAN 22
23. Summary and Outlook
Summary and future work
Summary
Integrated processing of scientific papers
Aggregated visualization of authors, publications and
events
Compute various analysis over the data
Cleaning functionality for automated processed data
Future work
Parallelized Clustering
Additional graphical visualization
Improve extraction of metadata from PDF files
PG knowAAN 23