SlideShare a Scribd company logo
1 of 24
Project group knowAAN
   Final presentation


 Computer Science Education Group
     University of Paderborn


     October 20th 2011
Overview



Overview



    Introduction
    System components & Work flow
    Demonstration
    Development process
    Summary & Outlook
    Time for further questions of detail




                   PG knowAAN                    2
Overview



Overview: First part



    Goals
    Extraction & Storage (of data)
    Exploration (of data)
    System components & Work flow
    Analysis & Visualization (of data)




                PG knowAAN                     3
Goals



Goals

    Explore research networks
    Based on: Artifacts (scientific publications) and metadata
    Combination and analysis of data
    Computation of similarities of full texts
    Support for conference management system Ginkgo
    Data visualization
    Recommendations

              (Source: PG knowAAN project description)



                 PG knowAAN                                        4
Goals


Imagine you are interested in a conference.
You downloaded the papers of 2 or 3 years.
  Now you have nearly 100 publications.
       How do you explore them?




   100 publications. Do you know tools?
      PG knowAAN                                 5
Extraction & Storage



Extraction & Storage




           First step: Extract data and store it.




             PG knowAAN                                               6
Extraction & Storage




PG knowAAN                     7
Exploration



Exploration




               Second step: Explore data.




              PG knowAAN                             8
Exploration



Exploring a conference




             PG knowAAN            9
Exploration



Exploration




      Which extracted data is available for a publication?
                     → Database schema




                PG knowAAN                                           10
discipline                                     pub_dis                           pub_aff                                                                             affiliation
            id GUID                                        publication_id GUID               publication_id GUID                                                               id GUID
            text VARCHAR(512)                              discipline_id GUID                affiliation_id GUID                                                               text VARCHAR(512)
            parent_id GUID                               Indexes                           Indexes                                                                             location_id GUID
                                                                                                                                           aut_aff
           Indexes                                                                                                                                                            Indexes
                                                                                                                                         author_id GUID
                                                                                                                                         affiliation_id GUID
                                                                                                                                        Indexes
                                    pub_key                           publication
   keyword                        publication_id GUID               id GUID
 id GUID                          keyword_id GUID                   lucuid VARCHAR(512)
 text VARCHAR(512)                score DOUBLE                      title VARCHAR(512)                                                         author
                                                                                                                   pub_aut
Indexes                           source VARCHAR(512)               booktitle VARCHAR(512)                                                   id GUID
                                                                                                              publication_id GUID
                                 Indexes                            normtitle VARCHAR(512)                                                   text VARCHAR(512)
                                                                                                              author_id GUID                                                       location
                                                                    date VARCHAR(512)                                                        normtext VARCHAR(512)
                                                                                                           Indexes                                                             id GUID
                                    pub_con                         editor VARCHAR(512)                                                      firstname VARCHAR(512)
                                                                                                                                                                               latitude DOUBLE
   concept                        publication_id GUID               journal VARCHAR(512)                                                     lastname VARCHAR(512)
                                                                                                                                                                               longitude DOUBLE
 id GUID                          concept_id GUID                   note VARCHAR(512)                              citation                  created BIGINT
                                                                                                                                                                               text VARCHAR(512)
 text VARCHAR(512)                score DOUBLE                      pages VARCHAR(512)                        publication1_id GUID           modified BIGINT
                                                                                                                                                                              Indexes
Indexes                           source VARCHAR(512)               publisher VARCHAR(512)                                                 Indexes
                                                                                                              publication2_id GUID
                                 Indexes                            tech VARCHAR(512)                      Indexes
                                                                    volume VARCHAR(512)
                                    pub_cat                         number VARCHAR(512)
                                                                                                                                                          aut_add
   category                       publication_id GUID               rawstring VARCHAR(4096)                        pub_add
                                                                                                                                                        author_id GUID
 id GUID                          category_id GUID                  xmlfile VARCHAR(512)                      publication_id GUID
                                                                                                                                                        address_id GUID
 text VARCHAR(512)                score DOUBLE                      pdffile VARCHAR(512)                      address_id GUID
                                                                                                                                                       Indexes
Indexes                           source VARCHAR(512)               topicfile VARCHAR(512)                 Indexes
                                 Indexes                            created BIGINT
                                                                    modified BIGINT
   eventseries                                                    Indexes
                                                                                                                                                                         address
 id GUID
                                                                                                                                                                    id GUID
 text VARCHAR(512)
                                                                                               pub_evt                                                              text VARCHAR(512)
 filepath VARCHAR(512)
                                                                                             publication_id GUID                                                    location_id GUID
Indexes
                                                 event                                       event_id GUID                                                        Indexes

                                              id GUID                                      Indexes
                                              text VARCHAR(512)                                                                     category_count               bib_coupling
            evt_evs                           filepath VARCHAR(512)
           event_id GUID                      predecessor_id GUID                            discipline_count                       concept_count                co_author
           eventseries_id GUID                successor_id GUID
      Indexes                              Indexes                                           evt_pub_aut_count                      keyword_count                co_citation
System components & Work flow



System components & Work flow




           How is our system structured?
                  → Some examples.




            PG knowAAN                                              12
System components & Work flow



Components
                                                      Model                 << component >>
                      << component >>
                          Backend                                            ParscitTrainer


                                   << component >>
    << component >>
                                        Parscit
       Clustering
                                                     WebServices                  << component >>
                                                                            FrontendReferenceExtraction


    << component >>                << component >>
          DB                       TrendDetection

                                                     WebServices            << component >>
                                                                              DocBrowser


    << component >>                << component >>
       Roundtrip                    TF-Component

                                                                     JDBC


    << component >>                << component >>                          << component >>
      PDFToText                                       JDBC
                                   TopicExtraction                             DataBase




    << component >>                << component >>                          << component >>
                                                       WebServices
    Recommendation                   xmlBuilder                                   Solr




                                                       FileSystem           << component >>
                                                                              FileStorage




                              PG knowAAN                                                                  13
DocumentBrowser:              RoundTrip :                  RoundTripExecutor :             PDFToText :            Parscit:       Languagedetection:       Lemmatizer:   NounExtraction:   Solr:   DB:

             a / 1) .addPDF


                                            a / 2) .writeToFS




                                            a / 2) Path


                                            a / 3) .createThread

                                              .submitThread


                                            a / 3)

                   a / 1)




                                                                           b / 1) .run

                                                                         b / 2) .getText


                                                                           b / 2) Text
                                                                                 b / 3) .ParseFullText


                                                                                    b / 3) ParscitXML




                                                                            b / 4) .extractBodyAndAstract




                                                                            b / 4) BodyAndAbstract

                                                                                              b / 5) .getLanguage


                                                                                             b / 5) LanguageString
                                                                                                            b / 6) .lemmatize


                                                                                                         b / 6) LemmatizedText

                                                                                                                    b / 7) .extractNouns


                                                                                                                      b / 7) NounsList
                                                                                                     b / 8) .lemmatizeNounslist


                                                                                                         b / 8) LemmatizedNouns




                                                                            b / 9) .ReduceToTopNouns




                                                                            b / 9) TopNouns


                                                                            b / 10) .writeToFiles




                                                                            b / 10) Paths
                                                                                                                                 b / 11) .addTexts


                                                                                                                                   b / 11) Solrid


                                                                                                                                     b / 12) .addPublication


                                                                                                                                              b / 12)


                                                                           b / 1)
System components & Work flow



Work flow




           PG knowAAN                            15
Analysis & Visualization



Analysis & Visualization




           Third step: Analyze and visualize data.




               PG knowAAN                                                 16
Analysis & Visualization



Analysis of authors




              PG knowAAN                        17
Analysis & Visualization



Analysis of scientific publications




              PG knowAAN                                  18
Demonstration



Demonstration




                            Now: Demo.
           Image: http://www.flickr.com/photos/plaisanter/5525977163/


             PG knowAAN                                                          19
Development process



Technologies




                            Jersey



               PG knowAAN                            20
Development process



Methods of agile software development



     FDD                  XP
                                        Scrum




             PG knowAAN                                  21
Development process



Methods of agile software development




    Weekly meetings
    Sit together (as much as possible)
    Automated building system
    Continuous integration
    Issue tracking


                PG knowAAN                               22
Summary and Outlook



Summary and future work

 Summary
     Integrated processing of scientific papers
     Aggregated visualization of authors, publications and
     events
     Compute various analysis over the data
     Cleaning functionality for automated processed data

 Future work
     Parallelized Clustering
     Additional graphical visualization
     Improve extraction of metadata from PDF files
                 PG knowAAN                                           23
Summary and Outlook



Thank you for your attention




                           Questions?

              PG knowAAN                                24

More Related Content

More from Wolfgang Reinhardt

Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...
Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...
Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...Wolfgang Reinhardt
 
PUSHPIN: Supporting Scholarly Awareness in Publications and Social Networks
PUSHPIN: Supporting Scholarly Awareness in Publications and Social NetworksPUSHPIN: Supporting Scholarly Awareness in Publications and Social Networks
PUSHPIN: Supporting Scholarly Awareness in Publications and Social NetworksWolfgang Reinhardt
 
Formalized Processes at EATEL (here: SIGs and EC-TEL)
Formalized Processes at EATEL (here: SIGs and EC-TEL)Formalized Processes at EATEL (here: SIGs and EC-TEL)
Formalized Processes at EATEL (here: SIGs and EC-TEL)Wolfgang Reinhardt
 
Developing electronic classroom response apps for a wide variety of mobile de...
Developing electronic classroom response apps for a wide variety of mobile de...Developing electronic classroom response apps for a wide variety of mobile de...
Developing electronic classroom response apps for a wide variety of mobile de...Wolfgang Reinhardt
 
Mobile access to scientific event information: An Android tablet application ...
Mobile access to scientific event information: An Android tablet application ...Mobile access to scientific event information: An Android tablet application ...
Mobile access to scientific event information: An Android tablet application ...Wolfgang Reinhardt
 
PINGO: Peer Instruction in Very Large Groups
PINGO: Peer Instruction in Very Large GroupsPINGO: Peer Instruction in Very Large Groups
PINGO: Peer Instruction in Very Large GroupsWolfgang Reinhardt
 
Understanding the meaning of awareness in Research Networks
Understanding the meaning of awareness in Research NetworksUnderstanding the meaning of awareness in Research Networks
Understanding the meaning of awareness in Research NetworksWolfgang Reinhardt
 
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...Wolfgang Reinhardt
 
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...Wolfgang Reinhardt
 
A widget-based dashboard approach for awareness and reflection in online lear...
A widget-based dashboard approach for awareness and reflection in online lear...A widget-based dashboard approach for awareness and reflection in online lear...
A widget-based dashboard approach for awareness and reflection in online lear...Wolfgang Reinhardt
 
Personal dashboards for individual learning and project awareness in social s...
Personal dashboards for individual learning and project awareness in social s...Personal dashboards for individual learning and project awareness in social s...
Personal dashboards for individual learning and project awareness in social s...Wolfgang Reinhardt
 
TEL-MOOC workshop at #jtelss12
TEL-MOOC workshop at #jtelss12TEL-MOOC workshop at #jtelss12
TEL-MOOC workshop at #jtelss12Wolfgang Reinhardt
 
Research 2.0 - Wie Forscher das Web 2.0 nutzen
Research 2.0 - Wie Forscher das Web 2.0 nutzenResearch 2.0 - Wie Forscher das Web 2.0 nutzen
Research 2.0 - Wie Forscher das Web 2.0 nutzenWolfgang Reinhardt
 
PhD Defense - Awareness Support for Knowledge Workers in Research Networks
PhD Defense - Awareness Support for Knowledge Workers in Research NetworksPhD Defense - Awareness Support for Knowledge Workers in Research Networks
PhD Defense - Awareness Support for Knowledge Workers in Research NetworksWolfgang Reinhardt
 
Idea presentation for the project group PUSHPIN
Idea presentation for the project group PUSHPINIdea presentation for the project group PUSHPIN
Idea presentation for the project group PUSHPINWolfgang Reinhardt
 
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...Wolfgang Reinhardt
 
ViLM im Einsatz in Tutorenschulungen an der UPB
ViLM im Einsatz in Tutorenschulungen an der UPBViLM im Einsatz in Tutorenschulungen an der UPB
ViLM im Einsatz in Tutorenschulungen an der UPBWolfgang Reinhardt
 
Informationsqualität in Unternehmenswikis
Informationsqualität in UnternehmenswikisInformationsqualität in Unternehmenswikis
Informationsqualität in UnternehmenswikisWolfgang Reinhardt
 

More from Wolfgang Reinhardt (20)

Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...
Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...
Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...
 
PUSHPIN: Supporting Scholarly Awareness in Publications and Social Networks
PUSHPIN: Supporting Scholarly Awareness in Publications and Social NetworksPUSHPIN: Supporting Scholarly Awareness in Publications and Social Networks
PUSHPIN: Supporting Scholarly Awareness in Publications and Social Networks
 
Formalized Processes at EATEL (here: SIGs and EC-TEL)
Formalized Processes at EATEL (here: SIGs and EC-TEL)Formalized Processes at EATEL (here: SIGs and EC-TEL)
Formalized Processes at EATEL (here: SIGs and EC-TEL)
 
Developing electronic classroom response apps for a wide variety of mobile de...
Developing electronic classroom response apps for a wide variety of mobile de...Developing electronic classroom response apps for a wide variety of mobile de...
Developing electronic classroom response apps for a wide variety of mobile de...
 
Mobile access to scientific event information: An Android tablet application ...
Mobile access to scientific event information: An Android tablet application ...Mobile access to scientific event information: An Android tablet application ...
Mobile access to scientific event information: An Android tablet application ...
 
Analysis of mLearn 2002-2012
Analysis of mLearn 2002-2012Analysis of mLearn 2002-2012
Analysis of mLearn 2002-2012
 
PINGO: Peer Instruction in Very Large Groups
PINGO: Peer Instruction in Very Large GroupsPINGO: Peer Instruction in Very Large Groups
PINGO: Peer Instruction in Very Large Groups
 
Understanding the meaning of awareness in Research Networks
Understanding the meaning of awareness in Research NetworksUnderstanding the meaning of awareness in Research Networks
Understanding the meaning of awareness in Research Networks
 
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...
 
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...
 
A widget-based dashboard approach for awareness and reflection in online lear...
A widget-based dashboard approach for awareness and reflection in online lear...A widget-based dashboard approach for awareness and reflection in online lear...
A widget-based dashboard approach for awareness and reflection in online lear...
 
Personal dashboards for individual learning and project awareness in social s...
Personal dashboards for individual learning and project awareness in social s...Personal dashboards for individual learning and project awareness in social s...
Personal dashboards for individual learning and project awareness in social s...
 
TEL-MOOC workshop at #jtelss12
TEL-MOOC workshop at #jtelss12TEL-MOOC workshop at #jtelss12
TEL-MOOC workshop at #jtelss12
 
Research 2.0 - Wie Forscher das Web 2.0 nutzen
Research 2.0 - Wie Forscher das Web 2.0 nutzenResearch 2.0 - Wie Forscher das Web 2.0 nutzen
Research 2.0 - Wie Forscher das Web 2.0 nutzen
 
FSLN12 Introduction Paderborn
FSLN12 Introduction PaderbornFSLN12 Introduction Paderborn
FSLN12 Introduction Paderborn
 
PhD Defense - Awareness Support for Knowledge Workers in Research Networks
PhD Defense - Awareness Support for Knowledge Workers in Research NetworksPhD Defense - Awareness Support for Knowledge Workers in Research Networks
PhD Defense - Awareness Support for Knowledge Workers in Research Networks
 
Idea presentation for the project group PUSHPIN
Idea presentation for the project group PUSHPINIdea presentation for the project group PUSHPIN
Idea presentation for the project group PUSHPIN
 
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
 
ViLM im Einsatz in Tutorenschulungen an der UPB
ViLM im Einsatz in Tutorenschulungen an der UPBViLM im Einsatz in Tutorenschulungen an der UPB
ViLM im Einsatz in Tutorenschulungen an der UPB
 
Informationsqualität in Unternehmenswikis
Informationsqualität in UnternehmenswikisInformationsqualität in Unternehmenswikis
Informationsqualität in Unternehmenswikis
 

Recently uploaded

The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)IES VE
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptxHansamali Gamage
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfTejal81
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameKapil Thakar
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kitJamie (Taka) Wang
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud DataEric D. Schabell
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTxtailishbaloch
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and businessFrancesco Corti
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Libraryshyamraj55
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechProduct School
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxSatishbabu Gunukula
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosErol GIRAUDY
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FESTBillieHyde
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationKnoldus Inc.
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024Brian Pichman
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch TuesdayIvanti
 

Recently uploaded (20)

The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kit
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and business
 
SheDev 2024
SheDev 2024SheDev 2024
SheDev 2024
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Library
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenarios
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FEST
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its application
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch Tuesday
 

Final presentation of the project group Knowledge Awareness in Artefact-Actor-Networks (knowAAN)

  • 1. Project group knowAAN Final presentation Computer Science Education Group University of Paderborn October 20th 2011
  • 2. Overview Overview Introduction System components & Work flow Demonstration Development process Summary & Outlook Time for further questions of detail PG knowAAN 2
  • 3. Overview Overview: First part Goals Extraction & Storage (of data) Exploration (of data) System components & Work flow Analysis & Visualization (of data) PG knowAAN 3
  • 4. Goals Goals Explore research networks Based on: Artifacts (scientific publications) and metadata Combination and analysis of data Computation of similarities of full texts Support for conference management system Ginkgo Data visualization Recommendations (Source: PG knowAAN project description) PG knowAAN 4
  • 5. Goals Imagine you are interested in a conference. You downloaded the papers of 2 or 3 years. Now you have nearly 100 publications. How do you explore them? 100 publications. Do you know tools? PG knowAAN 5
  • 6. Extraction & Storage Extraction & Storage First step: Extract data and store it. PG knowAAN 6
  • 8. Exploration Exploration Second step: Explore data. PG knowAAN 8
  • 10. Exploration Exploration Which extracted data is available for a publication? → Database schema PG knowAAN 10
  • 11. discipline pub_dis pub_aff affiliation id GUID publication_id GUID publication_id GUID id GUID text VARCHAR(512) discipline_id GUID affiliation_id GUID text VARCHAR(512) parent_id GUID Indexes Indexes location_id GUID aut_aff Indexes Indexes author_id GUID affiliation_id GUID Indexes pub_key publication keyword publication_id GUID id GUID id GUID keyword_id GUID lucuid VARCHAR(512) text VARCHAR(512) score DOUBLE title VARCHAR(512) author pub_aut Indexes source VARCHAR(512) booktitle VARCHAR(512) id GUID publication_id GUID Indexes normtitle VARCHAR(512) text VARCHAR(512) author_id GUID location date VARCHAR(512) normtext VARCHAR(512) Indexes id GUID pub_con editor VARCHAR(512) firstname VARCHAR(512) latitude DOUBLE concept publication_id GUID journal VARCHAR(512) lastname VARCHAR(512) longitude DOUBLE id GUID concept_id GUID note VARCHAR(512) citation created BIGINT text VARCHAR(512) text VARCHAR(512) score DOUBLE pages VARCHAR(512) publication1_id GUID modified BIGINT Indexes Indexes source VARCHAR(512) publisher VARCHAR(512) Indexes publication2_id GUID Indexes tech VARCHAR(512) Indexes volume VARCHAR(512) pub_cat number VARCHAR(512) aut_add category publication_id GUID rawstring VARCHAR(4096) pub_add author_id GUID id GUID category_id GUID xmlfile VARCHAR(512) publication_id GUID address_id GUID text VARCHAR(512) score DOUBLE pdffile VARCHAR(512) address_id GUID Indexes Indexes source VARCHAR(512) topicfile VARCHAR(512) Indexes Indexes created BIGINT modified BIGINT eventseries Indexes address id GUID id GUID text VARCHAR(512) pub_evt text VARCHAR(512) filepath VARCHAR(512) publication_id GUID location_id GUID Indexes event event_id GUID Indexes id GUID Indexes text VARCHAR(512) category_count bib_coupling evt_evs filepath VARCHAR(512) event_id GUID predecessor_id GUID discipline_count concept_count co_author eventseries_id GUID successor_id GUID Indexes Indexes evt_pub_aut_count keyword_count co_citation
  • 12. System components & Work flow System components & Work flow How is our system structured? → Some examples. PG knowAAN 12
  • 13. System components & Work flow Components Model << component >> << component >> Backend ParscitTrainer << component >> << component >> Parscit Clustering WebServices << component >> FrontendReferenceExtraction << component >> << component >> DB TrendDetection WebServices << component >> DocBrowser << component >> << component >> Roundtrip TF-Component JDBC << component >> << component >> << component >> PDFToText JDBC TopicExtraction DataBase << component >> << component >> << component >> WebServices Recommendation xmlBuilder Solr FileSystem << component >> FileStorage PG knowAAN 13
  • 14. DocumentBrowser: RoundTrip : RoundTripExecutor : PDFToText : Parscit: Languagedetection: Lemmatizer: NounExtraction: Solr: DB: a / 1) .addPDF a / 2) .writeToFS a / 2) Path a / 3) .createThread .submitThread a / 3) a / 1) b / 1) .run b / 2) .getText b / 2) Text b / 3) .ParseFullText b / 3) ParscitXML b / 4) .extractBodyAndAstract b / 4) BodyAndAbstract b / 5) .getLanguage b / 5) LanguageString b / 6) .lemmatize b / 6) LemmatizedText b / 7) .extractNouns b / 7) NounsList b / 8) .lemmatizeNounslist b / 8) LemmatizedNouns b / 9) .ReduceToTopNouns b / 9) TopNouns b / 10) .writeToFiles b / 10) Paths b / 11) .addTexts b / 11) Solrid b / 12) .addPublication b / 12) b / 1)
  • 15. System components & Work flow Work flow PG knowAAN 15
  • 16. Analysis & Visualization Analysis & Visualization Third step: Analyze and visualize data. PG knowAAN 16
  • 17. Analysis & Visualization Analysis of authors PG knowAAN 17
  • 18. Analysis & Visualization Analysis of scientific publications PG knowAAN 18
  • 19. Demonstration Demonstration Now: Demo. Image: http://www.flickr.com/photos/plaisanter/5525977163/ PG knowAAN 19
  • 20. Development process Technologies Jersey PG knowAAN 20
  • 21. Development process Methods of agile software development FDD XP Scrum PG knowAAN 21
  • 22. Development process Methods of agile software development Weekly meetings Sit together (as much as possible) Automated building system Continuous integration Issue tracking PG knowAAN 22
  • 23. Summary and Outlook Summary and future work Summary Integrated processing of scientific papers Aggregated visualization of authors, publications and events Compute various analysis over the data Cleaning functionality for automated processed data Future work Parallelized Clustering Additional graphical visualization Improve extraction of metadata from PDF files PG knowAAN 23
  • 24. Summary and Outlook Thank you for your attention Questions? PG knowAAN 24