Project group knowAAN   Final presentation Computer Science Education Group     University of Paderborn     October 20th 2...
OverviewOverview    Introduction    System components & Work flow    Demonstration    Development process    Summary & Outl...
OverviewOverview: First part    Goals    Extraction & Storage (of data)    Exploration (of data)    System components & Wo...
GoalsGoals    Explore research networks    Based on: Artifacts (scientific publications) and metadata    Combination and an...
GoalsImagine you are interested in a conference.You downloaded the papers of 2 or 3 years.  Now you have nearly 100 public...
Extraction & StorageExtraction & Storage           First step: Extract data and store it.             PG knowAAN          ...
Extraction & StoragePG knowAAN                     7
ExplorationExploration               Second step: Explore data.              PG knowAAN                             8
ExplorationExploring a conference             PG knowAAN            9
ExplorationExploration      Which extracted data is available for a publication?                     → Database schema    ...
discipline                                     pub_dis                           pub_aff                                  ...
System components & Work flowSystem components & Work flow           How is our system structured?                  → Some e...
System components & Work flowComponents                                                      Model                 << compo...
DocumentBrowser:              RoundTrip :                  RoundTripExecutor :             PDFToText :            Parscit:...
System components & Work flowWork flow           PG knowAAN                            15
Analysis & VisualizationAnalysis & Visualization           Third step: Analyze and visualize data.               PG knowAA...
Analysis & VisualizationAnalysis of authors              PG knowAAN                        17
Analysis & VisualizationAnalysis of scientific publications              PG knowAAN                                  18
DemonstrationDemonstration                            Now: Demo.           Image: http://www.flickr.com/photos/plaisanter/5...
Development processTechnologies                            Jersey               PG knowAAN                            20
Development processMethods of agile software development     FDD                  XP                                      ...
Development processMethods of agile software development    Weekly meetings    Sit together (as much as possible)    Autom...
Summary and OutlookSummary and future work Summary     Integrated processing of scientific papers     Aggregated visualizat...
Summary and OutlookThank you for your attention                           Questions?              PG knowAAN              ...
Upcoming SlideShare
Loading in...5
×

Final presentation of the project group Knowledge Awareness in Artefact-Actor-Networks (knowAAN)

997

Published on

Those slides have been prepared by my students from the knowAAN project group at the University of Paderborn. In the project group we worked on the (semi-)automatic analysis of scientific papers in order to enhance awareness of researchers.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
997
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Final presentation of the project group Knowledge Awareness in Artefact-Actor-Networks (knowAAN)

  1. 1. Project group knowAAN Final presentation Computer Science Education Group University of Paderborn October 20th 2011
  2. 2. OverviewOverview Introduction System components & Work flow Demonstration Development process Summary & Outlook Time for further questions of detail PG knowAAN 2
  3. 3. OverviewOverview: First part Goals Extraction & Storage (of data) Exploration (of data) System components & Work flow Analysis & Visualization (of data) PG knowAAN 3
  4. 4. GoalsGoals Explore research networks Based on: Artifacts (scientific publications) and metadata Combination and analysis of data Computation of similarities of full texts Support for conference management system Ginkgo Data visualization Recommendations (Source: PG knowAAN project description) PG knowAAN 4
  5. 5. GoalsImagine you are interested in a conference.You downloaded the papers of 2 or 3 years. Now you have nearly 100 publications. How do you explore them? 100 publications. Do you know tools? PG knowAAN 5
  6. 6. Extraction & StorageExtraction & Storage First step: Extract data and store it. PG knowAAN 6
  7. 7. Extraction & StoragePG knowAAN 7
  8. 8. ExplorationExploration Second step: Explore data. PG knowAAN 8
  9. 9. ExplorationExploring a conference PG knowAAN 9
  10. 10. ExplorationExploration Which extracted data is available for a publication? → Database schema PG knowAAN 10
  11. 11. discipline pub_dis pub_aff affiliation id GUID publication_id GUID publication_id GUID id GUID text VARCHAR(512) discipline_id GUID affiliation_id GUID text VARCHAR(512) parent_id GUID Indexes Indexes location_id GUID aut_aff Indexes Indexes author_id GUID affiliation_id GUID Indexes pub_key publication keyword publication_id GUID id GUID id GUID keyword_id GUID lucuid VARCHAR(512) text VARCHAR(512) score DOUBLE title VARCHAR(512) author pub_autIndexes source VARCHAR(512) booktitle VARCHAR(512) id GUID publication_id GUID Indexes normtitle VARCHAR(512) text VARCHAR(512) author_id GUID location date VARCHAR(512) normtext VARCHAR(512) Indexes id GUID pub_con editor VARCHAR(512) firstname VARCHAR(512) latitude DOUBLE concept publication_id GUID journal VARCHAR(512) lastname VARCHAR(512) longitude DOUBLE id GUID concept_id GUID note VARCHAR(512) citation created BIGINT text VARCHAR(512) text VARCHAR(512) score DOUBLE pages VARCHAR(512) publication1_id GUID modified BIGINT IndexesIndexes source VARCHAR(512) publisher VARCHAR(512) Indexes publication2_id GUID Indexes tech VARCHAR(512) Indexes volume VARCHAR(512) pub_cat number VARCHAR(512) aut_add category publication_id GUID rawstring VARCHAR(4096) pub_add author_id GUID id GUID category_id GUID xmlfile VARCHAR(512) publication_id GUID address_id GUID text VARCHAR(512) score DOUBLE pdffile VARCHAR(512) address_id GUID IndexesIndexes source VARCHAR(512) topicfile VARCHAR(512) Indexes Indexes created BIGINT modified BIGINT eventseries Indexes address id GUID id GUID text VARCHAR(512) pub_evt text VARCHAR(512) filepath VARCHAR(512) publication_id GUID location_id GUIDIndexes event event_id GUID Indexes id GUID Indexes text VARCHAR(512) category_count bib_coupling evt_evs filepath VARCHAR(512) event_id GUID predecessor_id GUID discipline_count concept_count co_author eventseries_id GUID successor_id GUID Indexes Indexes evt_pub_aut_count keyword_count co_citation
  12. 12. System components & Work flowSystem components & Work flow How is our system structured? → Some examples. PG knowAAN 12
  13. 13. System components & Work flowComponents Model << component >> << component >> Backend ParscitTrainer << component >> << component >> Parscit Clustering WebServices << component >> FrontendReferenceExtraction << component >> << component >> DB TrendDetection WebServices << component >> DocBrowser << component >> << component >> Roundtrip TF-Component JDBC << component >> << component >> << component >> PDFToText JDBC TopicExtraction DataBase << component >> << component >> << component >> WebServices Recommendation xmlBuilder Solr FileSystem << component >> FileStorage PG knowAAN 13
  14. 14. DocumentBrowser: RoundTrip : RoundTripExecutor : PDFToText : Parscit: Languagedetection: Lemmatizer: NounExtraction: Solr: DB: a / 1) .addPDF a / 2) .writeToFS a / 2) Path a / 3) .createThread .submitThread a / 3) a / 1) b / 1) .run b / 2) .getText b / 2) Text b / 3) .ParseFullText b / 3) ParscitXML b / 4) .extractBodyAndAstract b / 4) BodyAndAbstract b / 5) .getLanguage b / 5) LanguageString b / 6) .lemmatize b / 6) LemmatizedText b / 7) .extractNouns b / 7) NounsList b / 8) .lemmatizeNounslist b / 8) LemmatizedNouns b / 9) .ReduceToTopNouns b / 9) TopNouns b / 10) .writeToFiles b / 10) Paths b / 11) .addTexts b / 11) Solrid b / 12) .addPublication b / 12) b / 1)
  15. 15. System components & Work flowWork flow PG knowAAN 15
  16. 16. Analysis & VisualizationAnalysis & Visualization Third step: Analyze and visualize data. PG knowAAN 16
  17. 17. Analysis & VisualizationAnalysis of authors PG knowAAN 17
  18. 18. Analysis & VisualizationAnalysis of scientific publications PG knowAAN 18
  19. 19. DemonstrationDemonstration Now: Demo. Image: http://www.flickr.com/photos/plaisanter/5525977163/ PG knowAAN 19
  20. 20. Development processTechnologies Jersey PG knowAAN 20
  21. 21. Development processMethods of agile software development FDD XP Scrum PG knowAAN 21
  22. 22. Development processMethods of agile software development Weekly meetings Sit together (as much as possible) Automated building system Continuous integration Issue tracking PG knowAAN 22
  23. 23. Summary and OutlookSummary and future work Summary Integrated processing of scientific papers Aggregated visualization of authors, publications and events Compute various analysis over the data Cleaning functionality for automated processed data Future work Parallelized Clustering Additional graphical visualization Improve extraction of metadata from PDF files PG knowAAN 23
  24. 24. Summary and OutlookThank you for your attention Questions? PG knowAAN 24

×