SlideShare a Scribd company logo
1 of 11
Download to read offline
SKIMMR: Making Knowledge
    Discovery Easier
   Vít Novᡠek (vit.novacek@deri.org)
           c
           February 8th, 2013 @ DERI meeting
Introduction   SKIMMR           Demo            Evaluation   Conclusions


Outline


   Introduction
   SKIMMR
       KB Computation
       KB Utilisation
   Demo
   Evaluation
       Evaluated Features
       Evaluation Methodology
   Conclusions




                                       1 / 10
Introduction            SKIMMR              Demo              Evaluation           Conclusions


Machine-Aided Skim Reading
    Traditional (Skim) Reading
           full reading – deep insights (slow)
           skim reading – superficial overview (quicker)

    How Can Automation Help?
           going deep is hard                                                            Image source:
                                                                                         http://a-pieceofpaper.blogspot.com




           large scale shallow processing more feasible

    What Kind of Automation?
           extraction (text and data mining)
           augmentation (computing more complex content)
           indexing and querying
           presentation of the results

    Related Work
           processing: text mining, graph analysis, distributional semantics, fuzzy IR
           presentation: GoPubMed, Textpresso, IVEA, CORAAL, Exhibit, . . .



                                                   2 / 10
Introduction          SKIMMR           Demo            Evaluation      Conclusions


Input/Extraction Pipe-Lines

    Text Extraction
           preprocessing (tokenization, tagging, shallow parsing)
           NE recognition
           relation extraction
           co-occurrence analysis + statistics (PMI, TF/IDF, . . . )
                                                                             Image source:
                                                                             http://atyoursurveys.blogspot.com


    Digesting Linked Data
           graph decomposition
           cluster analysis
           co-occurrence analysis + statistics (PMI, TF/IDF, . . . )

    Extraction Results
           (s, p, o, r , w) statements
           subject, predicate, object, provenance, weight


                                              3 / 10
Introduction            SKIMMR          Demo            Evaluation   Conclusions


Computing the Knowledge Base

    Distributional Representation
               aggregated co-occurrence/relation statements
               statements → tensor representation
               every element still linked to its provenance
               matrix perspectives of the tensor
                                                                           Image source:
                                                                           www.bystonline.org



    Augmentation
               perspectives give rise to emergent patterns like:
                   semantic similarity
                   concept clusters and taxonomies
                   IF-THEN rules
                   concept ordering and relative relevance




                                               4 / 10
Introduction                SKIMMR             Demo                Evaluation             Conclusions


Indexing the Knowledge Base

  Term Index                                      Provenance Index
                T1      T2     ...       Tn                     P1       P2     ...      Pq
        T1     ¯        ¯
               w1,1 w1,2 . . .          ¯
                                        w1,n           S1      w1,1     w1,2 . . .      w1,q
        T2     ¯        ¯
               w2,1 w2,2 . . .          ¯
                                        w2,n           S2      w2,1     w2,2 . . .      w2,q
         .
         .       .
                 .        .
                          .    ..         .
                                          .             .
                                                        .        .
                                                                 .        .
                                                                          .      ..       .
                                                                                          .
         .       .        .       .       .             .        .        .         .     .
        Tn     ¯        ¯
               wn,1 wn,2 . . .          ¯
                                        wn,n           Sm      wm,1 wm,2 . . .          wm,q
                   ¯
                   wi,j ∈ [0, 1]                                    wi,j ∈ [0, 1]
                                                                                                  Image source:
                                                                                                  http://teptdataservices.blogspot.com




  Statement Index                                 Auxiliary Fulltext Index
               S1         S2     ...    Sm                user’s entry point
         T1    c1,1       c1,2 . . .    c1,m              increasing robustness
         T2    c2,1       c2,2 . . .    c2,m
          .     .          .              .
                                                          “keys”: queries
          .     .          .     ..       .
          .     .          .        .     .               values: term identifiers
         Tn    cn,1       cn,2 . . .    cn,m              fairly standard IR:
                   ci,j   ∈ {0, 1}
                                                                 OKAPI BM25F


                                                      5 / 10
Introduction              SKIMMR                   Demo                      Evaluation   Conclusions


Querying the Knowledge Base
    Initial Result Term Set
           example query: ? ↔ Tx AND (? ↔ Ty OR ? ↔ Tz )
           term index look-up:
                            ¯              ¯                      ¯
                Fx = {(T1 , wx,1 ), (T2 , wx,2 ), . . . , (Tn , wx,n )}
                            ¯              ¯                      ¯
                Fy = {(T1 , wy ,1 ), (T2 , wy ,2 ), . . . , (Tn , wy ,n )}
                            ¯              ¯                      ¯
                Fz = {(T1 , wz,1 ), (T2 , wz,2 ), . . . , (Tn , wz,n )}
                                                                                                Image source:


           combining atomic results: Fx ∩ (Fy ∪ Fz )                                            http://nuget.org




    Complete Results
           terms: RT = {(T1 , w1 ), (T2 , w2 ), . . . , Tn , wn }, where wiT are
                                  T        T                  T

           the weights resulting from the combination
                                      S             S                S
           statements: RS = {(S1 , w1 ), (S2 , w2 ), . . . , (Sm , wm )}, where
           wiS = fν ( n wjT cj,i )
                      j=1
                                         P              P              P
           provenances: RP = {(P1 , w1 ), (P2 , w2 ), . . . , (Pq , wq )}, where
           wiP = f ( m wSw )
                  ν   j=1 j    j,i




                                                            6 / 10
Introduction   SKIMMR   Demo            Evaluation   Conclusions


Let’s Learn About Some Grim Stuff!




                               7 / 10
Introduction            SKIMMR           Demo            Evaluation   Conclusions


What to Evaluate?


   Quality of the Extracted/Computed Content
               “noise-to-signal” ratio
               relevance of results w.r.t. queries
               information value (obvious vs. enlightening)

   User Experience
                                                                            Image source:
                                                                            http://voguepay.com




               usability of SKIMMR
                   general
                   domain-specific
               performance benefits (over a base-line)
                                                8 / 10
Introduction            SKIMMR         Demo            Evaluation   Conclusions


How to Evaluate?


   Quality of the Extracted/Computed Content
               identification (or creation) of a gold standard
               generalised IR measures
               committee-based annotation of the results

   User Experience                                                        Image source:
                                                                          http://www.123rf.com




               SUS survey
               domain-specific survey
               user performance analysis (SKIMMR vs. base-line)

                                              9 / 10
Introduction           SKIMMR               Demo                Evaluation       Conclusions


Conclusions and Future Work
    Current Status
           machine-aided skim reading notion coined
           basic theoretical background proposed
           a prototype implemented (general and biomedical versions)
               http://pypi.python.org/pypi/skimmr_gt/0.1-a1
               http://pypi.python.org/pypi/skimmr_bm/0.1-a1
                                                                                       Image source:
                                                                                       http://support.pacifichost.com




    Next Steps
           evaluation (with a gold standard and sample users)
           dissemination and follow-ups (write-up, proposals)
           back-end extensions:
               more (complex) types of relations
               proper APIs (development, web service, . . . )
               database and/or cloud storage
           front-end extensions:
               smoother transition between the graphs
               complex querying
               additional visualisations (trends, focused provenances, . . . )



                                                   10 / 10

More Related Content

Viewers also liked

Hjo City And Countryside
Hjo City And CountrysideHjo City And Countryside
Hjo City And Countrysidedecembertjej
 
Capital Facilities Committee 100201rev
Capital Facilities Committee 100201revCapital Facilities Committee 100201rev
Capital Facilities Committee 100201revguest160db3
 
Online Collaboration presented at Newport Interactive Marketers
Online Collaboration presented at Newport Interactive MarketersOnline Collaboration presented at Newport Interactive Marketers
Online Collaboration presented at Newport Interactive MarketersEditMe (Matt Wiseley)
 
EditMe Webinar: Avoid the Post Launch Flop, with Jeff Cutler
EditMe Webinar: Avoid the Post Launch Flop, with Jeff CutlerEditMe Webinar: Avoid the Post Launch Flop, with Jeff Cutler
EditMe Webinar: Avoid the Post Launch Flop, with Jeff CutlerEditMe (Matt Wiseley)
 
EditMe Webinar: Launch Successful Software Products
EditMe Webinar: Launch Successful Software ProductsEditMe Webinar: Launch Successful Software Products
EditMe Webinar: Launch Successful Software ProductsEditMe (Matt Wiseley)
 

Viewers also liked (7)

Hjo City And Countryside
Hjo City And CountrysideHjo City And Countryside
Hjo City And Countryside
 
Capital Facilities Committee 100201rev
Capital Facilities Committee 100201revCapital Facilities Committee 100201rev
Capital Facilities Committee 100201rev
 
Online Collaboration presented at Newport Interactive Marketers
Online Collaboration presented at Newport Interactive MarketersOnline Collaboration presented at Newport Interactive Marketers
Online Collaboration presented at Newport Interactive Marketers
 
Een brief van God
Een brief van GodEen brief van God
Een brief van God
 
Turok Amman Aegis Jan 2010
Turok Amman Aegis Jan 2010Turok Amman Aegis Jan 2010
Turok Amman Aegis Jan 2010
 
EditMe Webinar: Avoid the Post Launch Flop, with Jeff Cutler
EditMe Webinar: Avoid the Post Launch Flop, with Jeff CutlerEditMe Webinar: Avoid the Post Launch Flop, with Jeff Cutler
EditMe Webinar: Avoid the Post Launch Flop, with Jeff Cutler
 
EditMe Webinar: Launch Successful Software Products
EditMe Webinar: Launch Successful Software ProductsEditMe Webinar: Launch Successful Software Products
EditMe Webinar: Launch Successful Software Products
 

Similar to Im2013vit

Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKOlivier Grisel
 
In-Database Predictive Analytics
In-Database Predictive AnalyticsIn-Database Predictive Analytics
In-Database Predictive AnalyticsJohn De Goes
 
Leveraging collaborativetaggingforwebitemdesign ajithajjarani
Leveraging collaborativetaggingforwebitemdesign ajithajjaraniLeveraging collaborativetaggingforwebitemdesign ajithajjarani
Leveraging collaborativetaggingforwebitemdesign ajithajjaraniAjith Ajjarani
 
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...Matthew Rowe
 
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...Paolo Missier
 
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...National Institute of Informatics
 
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVMTUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVMMediaEval2012
 
Lecture11
Lecture11Lecture11
Lecture11Bo Li
 
LISA: Explaining RNN Judgments via Layer-wIse Semantic Accumulation and Examp...
LISA: Explaining RNN Judgments via Layer-wIse Semantic Accumulation and Examp...LISA: Explaining RNN Judgments via Layer-wIse Semantic Accumulation and Examp...
LISA: Explaining RNN Judgments via Layer-wIse Semantic Accumulation and Examp...Pankaj Gupta, PhD
 
Invited talk: Second Search Computing workshop
Invited talk: Second Search Computing workshopInvited talk: Second Search Computing workshop
Invited talk: Second Search Computing workshopPaolo Missier
 
Comparison of Semantic Similarity Measures for NDVC Detection Using Semantic ...
Comparison of Semantic Similarity Measures for NDVC Detection Using Semantic ...Comparison of Semantic Similarity Measures for NDVC Detection Using Semantic ...
Comparison of Semantic Similarity Measures for NDVC Detection Using Semantic ...Wesley De Neve
 
Computational Semantics
Computational SemanticsComputational Semantics
Computational SemanticsRossi Setchi
 
Distilling Free-Form Natural Laws from Experimental Data
Distilling Free-Form Natural Laws from Experimental DataDistilling Free-Form Natural Laws from Experimental Data
Distilling Free-Form Natural Laws from Experimental Dataswissnex San Francisco
 
RNN sharing at Trend Micro
RNN sharing at Trend MicroRNN sharing at Trend Micro
RNN sharing at Trend MicroChun Hao Wang
 
Feature Location for Multi-Layer System Based on Formal Concept Analysis
Feature Location for Multi-Layer System Based on Formal Concept AnalysisFeature Location for Multi-Layer System Based on Formal Concept Analysis
Feature Location for Multi-Layer System Based on Formal Concept AnalysisHiroshi Kazato
 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)pauldix
 
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...United States Air Force Academy
 
Using Vector Clocks to Visualize Communication Flow
Using Vector Clocks to Visualize Communication FlowUsing Vector Clocks to Visualize Communication Flow
Using Vector Clocks to Visualize Communication FlowMartin Harrigan
 

Similar to Im2013vit (20)

Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
 
Yahoo search-study
Yahoo search-studyYahoo search-study
Yahoo search-study
 
In-Database Predictive Analytics
In-Database Predictive AnalyticsIn-Database Predictive Analytics
In-Database Predictive Analytics
 
Leveraging collaborativetaggingforwebitemdesign ajithajjarani
Leveraging collaborativetaggingforwebitemdesign ajithajjaraniLeveraging collaborativetaggingforwebitemdesign ajithajjarani
Leveraging collaborativetaggingforwebitemdesign ajithajjarani
 
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
 
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
 
Tscd pweb
Tscd pwebTscd pweb
Tscd pweb
 
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
 
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVMTUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
 
Lecture11
Lecture11Lecture11
Lecture11
 
LISA: Explaining RNN Judgments via Layer-wIse Semantic Accumulation and Examp...
LISA: Explaining RNN Judgments via Layer-wIse Semantic Accumulation and Examp...LISA: Explaining RNN Judgments via Layer-wIse Semantic Accumulation and Examp...
LISA: Explaining RNN Judgments via Layer-wIse Semantic Accumulation and Examp...
 
Invited talk: Second Search Computing workshop
Invited talk: Second Search Computing workshopInvited talk: Second Search Computing workshop
Invited talk: Second Search Computing workshop
 
Comparison of Semantic Similarity Measures for NDVC Detection Using Semantic ...
Comparison of Semantic Similarity Measures for NDVC Detection Using Semantic ...Comparison of Semantic Similarity Measures for NDVC Detection Using Semantic ...
Comparison of Semantic Similarity Measures for NDVC Detection Using Semantic ...
 
Computational Semantics
Computational SemanticsComputational Semantics
Computational Semantics
 
Distilling Free-Form Natural Laws from Experimental Data
Distilling Free-Form Natural Laws from Experimental DataDistilling Free-Form Natural Laws from Experimental Data
Distilling Free-Form Natural Laws from Experimental Data
 
RNN sharing at Trend Micro
RNN sharing at Trend MicroRNN sharing at Trend Micro
RNN sharing at Trend Micro
 
Feature Location for Multi-Layer System Based on Formal Concept Analysis
Feature Location for Multi-Layer System Based on Formal Concept AnalysisFeature Location for Multi-Layer System Based on Formal Concept Analysis
Feature Location for Multi-Layer System Based on Formal Concept Analysis
 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)
 
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...
 
Using Vector Clocks to Visualize Communication Flow
Using Vector Clocks to Visualize Communication FlowUsing Vector Clocks to Visualize Communication Flow
Using Vector Clocks to Visualize Communication Flow
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 

Im2013vit

  • 1. SKIMMR: Making Knowledge Discovery Easier Vít Novᡠek (vit.novacek@deri.org) c February 8th, 2013 @ DERI meeting
  • 2. Introduction SKIMMR Demo Evaluation Conclusions Outline Introduction SKIMMR KB Computation KB Utilisation Demo Evaluation Evaluated Features Evaluation Methodology Conclusions 1 / 10
  • 3. Introduction SKIMMR Demo Evaluation Conclusions Machine-Aided Skim Reading Traditional (Skim) Reading full reading – deep insights (slow) skim reading – superficial overview (quicker) How Can Automation Help? going deep is hard Image source: http://a-pieceofpaper.blogspot.com large scale shallow processing more feasible What Kind of Automation? extraction (text and data mining) augmentation (computing more complex content) indexing and querying presentation of the results Related Work processing: text mining, graph analysis, distributional semantics, fuzzy IR presentation: GoPubMed, Textpresso, IVEA, CORAAL, Exhibit, . . . 2 / 10
  • 4. Introduction SKIMMR Demo Evaluation Conclusions Input/Extraction Pipe-Lines Text Extraction preprocessing (tokenization, tagging, shallow parsing) NE recognition relation extraction co-occurrence analysis + statistics (PMI, TF/IDF, . . . ) Image source: http://atyoursurveys.blogspot.com Digesting Linked Data graph decomposition cluster analysis co-occurrence analysis + statistics (PMI, TF/IDF, . . . ) Extraction Results (s, p, o, r , w) statements subject, predicate, object, provenance, weight 3 / 10
  • 5. Introduction SKIMMR Demo Evaluation Conclusions Computing the Knowledge Base Distributional Representation aggregated co-occurrence/relation statements statements → tensor representation every element still linked to its provenance matrix perspectives of the tensor Image source: www.bystonline.org Augmentation perspectives give rise to emergent patterns like: semantic similarity concept clusters and taxonomies IF-THEN rules concept ordering and relative relevance 4 / 10
  • 6. Introduction SKIMMR Demo Evaluation Conclusions Indexing the Knowledge Base Term Index Provenance Index T1 T2 ... Tn P1 P2 ... Pq T1 ¯ ¯ w1,1 w1,2 . . . ¯ w1,n S1 w1,1 w1,2 . . . w1,q T2 ¯ ¯ w2,1 w2,2 . . . ¯ w2,n S2 w2,1 w2,2 . . . w2,q . . . . . . .. . . . . . . . . .. . . . . . . . . . . . . Tn ¯ ¯ wn,1 wn,2 . . . ¯ wn,n Sm wm,1 wm,2 . . . wm,q ¯ wi,j ∈ [0, 1] wi,j ∈ [0, 1] Image source: http://teptdataservices.blogspot.com Statement Index Auxiliary Fulltext Index S1 S2 ... Sm user’s entry point T1 c1,1 c1,2 . . . c1,m increasing robustness T2 c2,1 c2,2 . . . c2,m . . . . “keys”: queries . . . .. . . . . . . values: term identifiers Tn cn,1 cn,2 . . . cn,m fairly standard IR: ci,j ∈ {0, 1} OKAPI BM25F 5 / 10
  • 7. Introduction SKIMMR Demo Evaluation Conclusions Querying the Knowledge Base Initial Result Term Set example query: ? ↔ Tx AND (? ↔ Ty OR ? ↔ Tz ) term index look-up: ¯ ¯ ¯ Fx = {(T1 , wx,1 ), (T2 , wx,2 ), . . . , (Tn , wx,n )} ¯ ¯ ¯ Fy = {(T1 , wy ,1 ), (T2 , wy ,2 ), . . . , (Tn , wy ,n )} ¯ ¯ ¯ Fz = {(T1 , wz,1 ), (T2 , wz,2 ), . . . , (Tn , wz,n )} Image source: combining atomic results: Fx ∩ (Fy ∪ Fz ) http://nuget.org Complete Results terms: RT = {(T1 , w1 ), (T2 , w2 ), . . . , Tn , wn }, where wiT are T T T the weights resulting from the combination S S S statements: RS = {(S1 , w1 ), (S2 , w2 ), . . . , (Sm , wm )}, where wiS = fν ( n wjT cj,i ) j=1 P P P provenances: RP = {(P1 , w1 ), (P2 , w2 ), . . . , (Pq , wq )}, where wiP = f ( m wSw ) ν j=1 j j,i 6 / 10
  • 8. Introduction SKIMMR Demo Evaluation Conclusions Let’s Learn About Some Grim Stuff! 7 / 10
  • 9. Introduction SKIMMR Demo Evaluation Conclusions What to Evaluate? Quality of the Extracted/Computed Content “noise-to-signal” ratio relevance of results w.r.t. queries information value (obvious vs. enlightening) User Experience Image source: http://voguepay.com usability of SKIMMR general domain-specific performance benefits (over a base-line) 8 / 10
  • 10. Introduction SKIMMR Demo Evaluation Conclusions How to Evaluate? Quality of the Extracted/Computed Content identification (or creation) of a gold standard generalised IR measures committee-based annotation of the results User Experience Image source: http://www.123rf.com SUS survey domain-specific survey user performance analysis (SKIMMR vs. base-line) 9 / 10
  • 11. Introduction SKIMMR Demo Evaluation Conclusions Conclusions and Future Work Current Status machine-aided skim reading notion coined basic theoretical background proposed a prototype implemented (general and biomedical versions) http://pypi.python.org/pypi/skimmr_gt/0.1-a1 http://pypi.python.org/pypi/skimmr_bm/0.1-a1 Image source: http://support.pacifichost.com Next Steps evaluation (with a gold standard and sample users) dissemination and follow-ups (write-up, proposals) back-end extensions: more (complex) types of relations proper APIs (development, web service, . . . ) database and/or cloud storage front-end extensions: smoother transition between the graphs complex querying additional visualisations (trends, focused provenances, . . . ) 10 / 10