Your SlideShare is downloading. ×
0
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

ESSIR LivingKnowledge DiversityEngine tutorial

711

Published on

Mike's testbed tutorial

Mike's testbed tutorial

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
711
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. SYMPOSIUM ON BIAS AND DIVERSITY IN IRA TESTBED FOR DIVERSIFICATON IN SEARCH<br />Koblenz, August 31, 2011<br />Michael Matthews, Barcelona Media/Yahoo! Research<br />1<br />
  • 2. OVERVIEW<br />Introduction to LivingKnowledge Testbed – The Diversity Engine<br />Getting started – Our first application!<br />Adding text analysis<br />Adding multimedia analysis<br />Evaluation<br />Indexing and search<br />Developing applications<br />Future work<br />2<br />
  • 3. DIVERSITY ENGINE<br />Provide collections, annotation tools and an evaluation framework to allow for collaborative and comparable research<br />Supports indexing and searching on a wide variety of document annotations including entities, bias, trust, polarity, and multimedia features <br />Support development of bias and diversity aware applications<br />
  • 4. ARCHITECTURE<br />Document<br />Collections<br />Analysis<br />Pipeline<br />Index/<br />Search<br />Application<br />Development<br />NYT<br />Yahoo! News<br />ARC Crawls<br />Evaluation Framework<br />
  • 5. DESIGN DECISIONS<br />Use Open Source tools when available<br />Programming Language - Java 1.6<br />Data format – LK XML<br />Analysis tools Operating System – Linux (any software language)<br />Indexing/Search - Solr<br />GUI – JSP, HTML, JavaScript, CSS<br />5<br />
  • 6. LK-XML format.<br />
  • 7. DOCUMENT COLLECTIONS<br />Supported Formats -ARC (Internet Memory Crawls) ,Text, HTML. Kyoto, BBN, NYT<br />Collections<br />Testing Examples included with Diversity Engine<br />Large ARCs available from Internet Memory<br />Converters provided for other collections (MPQA, BBN, NYT) that have licensing restrictions<br />7<br />
  • 8. ANALYSIS MODULES<br />8<br />
  • 9. INDEXING/SEARCH<br />Solr<br />Enterprise search platform built on top of Lucene<br />Xml input and output allows for easy integration with Diversity Engine<br />Plug-in framework allows customization<br />Built-in facet capabilities support indexing and searching on annotations<br />Integration<br />Converter from LK XML – Solr XML<br />Plug-in for facet ranking and speed improvements<br />9<br />
  • 10. APPLICATION DEVELOPMENT<br /><ul><li>Basis for LivingKnowledge Applications
  • 11. Future Predictor
  • 12. Media Content Analysis
  • 13. Support development – coding required!
  • 14. Real World Problems
  • 15. HTML Extraction
  • 16. Scaling to Large Collections
  • 17. Provenance
  • 18. Some pluggable GUI components
  • 19. Examples to ease learning curve</li></ul>10<br />
  • 20. APPLICATION DEVELOPMENT<br />11<br />
  • 21. APPLICATION DEVELOPMENT<br />12<br />
  • 22. EVALUATION FRAMEWORK<br /><ul><li>Framework for the evaluation of analysis tools
  • 23. Evaluates any possible annotation pipeline
  • 24. Measures correctness and quality
  • 25. Outputs Precision + Recall
  • 26. Compares annotation output of pipeline with ground truth data</li></ul>13<br />
  • 27. OUR FIRST APPLICATION<br />Download Diversity Engine release from SourceForge <br />tar xzvf [release file]<br />cd testbed<br />ant build<br />apps/testbed conf/testbed/tutorial-application.xml<br />What happened?<br />197 text files and 127 images files converted from arc format to LK XML and stored in devapps/example/data/lkxml<br />2 annotators were run over collection<br />OpenNLP for tokenization, sentence splitting, Pos tags<br />SST named entity recognizer<br />Results stored in devapps/example/data/lkxml<br />Files were converted to Solr xml format and indexed using solr<br />Solr XML stored to devapps/example/data/solr<br />HTML Visualization Files stored in devapps/example/data/html<br />ant deploy-testbed<br />Solr running at http://localthost:8983/solr/<br />Example app running at http://localhost:8983/testbed/<br />14<br />
  • 28. EXAMPLE SOLR OUTPUT<br />http://localhost:8983/solr/select/?q=putin<br />15<br />
  • 29. EXAMPLE APPLICATION<br />http://localhost:8983/testbed/results.jsp?query=putin<br />16<br />
  • 30. EXAMPLE DOCUMENT<br />17<br />
  • 31. CONFIGURATION FILE<br />&lt;lk-applicationlogDir=&quot;log&quot;appDir=&quot;devapps/example&quot;&gt;<br /> &lt;corpusdir=&quot;corpora/examples/smallarc&quot;format=&quot;arc&quot;/&gt;<br /> &lt;image-pipeline&gt;<br /> &lt;annotators&gt;<br /> &lt;/annotators&gt;<br /> &lt;/image-pipeline&gt;<br /> &lt;pipeline&gt;<br /> &lt;annotators&gt;<br /> &lt;annotatorexec=&quot;./opennlp&quot;/&gt;<br /> &lt;annotatorexec=&quot;./sst&quot;/&gt;<br /> &lt;/annotators&gt;<br /> &lt;/pipeline&gt;<br /> &lt;visualize/&gt;<br /> &lt;indexersolrHomeDir=&quot;solr/solr“<br /> solrDataDir=&quot;solr/solr/data“<br /> converter=&quot;conf/testbed/tutorial-lk2solr.xml&quot;/&gt;<br /> &lt;searcherappTitle=&quot;LivingKnowledge - Example Application&quot;<br />appShortTitle=&quot;Example Application&quot;<br />appUrl=&quot;http://localhost:8983/solr/&quot;&gt;<br /> &lt;facets&gt;<br /> &lt;facetfield=&quot;per&quot;description=&quot;Person&quot;/&gt;<br /> &lt;facetfield=&quot;loc&quot;description=&quot;Location&quot;/&gt;<br /> &lt;/facets&gt;<br /> &lt;/searcher&gt;<br />&lt;/lk-application&gt;<br />18<br />
  • 32. TEXT ANALYSIS<br /> &lt;pipeline&gt;<br /> &lt;annotators&gt;<br /> &lt;annotatorexec=&quot;./opennlp&quot;/&gt;<br /> &lt;annotatorexec=&quot;./sst&quot;/&gt;<br /> &lt;/annotators&gt;<br /> &lt;/pipeline&gt;<br /> &lt;pipeline&gt;<br /> &lt;annotators&gt;<br /> &lt;annotatorexec=&quot;./opennlp&quot;/&gt;<br /> &lt;annotatorexec=&quot;./sst&quot;/&gt;<br /> &lt;annotatorexec=&quot;./facts&quot;/&gt;<br /> &lt;annotatorexec=&quot;./unitn_tagger&quot;/&gt;<br /> &lt;annotatorexec=&quot;./unitn_subjexpr&quot;/&gt;<br /> &lt;/annotators&gt;<br /> &lt;/pipeline&gt;<br />apps/testbed –run pipeline conf/testbed/tutorial-application.xml<br />apps/testbed –run visualization conf/testbed/tutorial-application.xml<br />19<br />
  • 33. TEXT ANALYSIS - FACTS<br />devapps/example/data/lkxml/EA-EUElections2009-euobserver-0729-20090729085530-00000.arc.15521713.facts.xml<br />20<br />
  • 34. TEXT ANALYSIS - FACTS<br />devapps/example/data/html/EA-EUElections2009-euobserver-0729-20090729085530-00000.arc.15521713.html<br />21<br />
  • 35. IMAGE ANALYSIS<br /> &lt;image-pipeline&gt;<br /> &lt;annotators&gt;<br /> &lt;annotatorexec=&quot;./soton_haarfacedetector&quot;/&gt;<br /> &lt;/annotators&gt;<br /> &lt;/pipeline&gt;<br /> &lt;pipeline&gt;<br /> &lt;annotators&gt;<br /> &lt;annotatorexec=&quot;./opennlp&quot;/&gt;<br /> &lt;annotatorexec=&quot;./sst&quot;/&gt;<br /> &lt;annotatorexec=&quot;./facts&quot;/&gt;<br /> &lt;annotatorexec=&quot;./unitn_tagger&quot;/&gt;<br /> &lt;annotatorexec=&quot;./unitn_subjexpr&quot;/&gt;<br /> &lt;annotatorexec=&quot;./imageannots&quot;/&gt;<br /> &lt;/annotators&gt;<br /> &lt;/pipeline&gt;<br />apps/testbed –run pipeline,image-pipeline –pipeline imageannotsconf/testbed/tutorial-application.xml<br />ls devapps/example/data/lkxml/img/*<br />22<br />
  • 36. ANALYSIS API<br />Documents in LK XML format <br />Annotators passed a single document directory –They should add annotations for each document in directory<br />Files will have consistent naming convention<br />LkText file = id + “.lktext.xml”<br />LkMedia = id + “.lkmedia.xml”<br />LkAnnotation = id + “.” + annotatorId + “.xml”<br />Annotators will be processed sequentially in the order listed in the XML file<br />Annotators can be written in any language but must run on Linux – Helper classes will exist for Java, but there is no obligation to use them.<br />Add application calling your new annotator to apps directory<br />Add your application to the configuration file as before<br />23<br />
  • 37. ANALYSIS API – JAVA<br />Extend class org.diversityengine.annotator.AbstractAnnotator<br />Implement Methods<br />getName()<br />getType() - TEXT OR IMAGE<br />For Image Analysis implement<br />LkAnnotation getLkAnnotation(ImageDocument document)<br />For Text Analysis implement<br />LkAnnotation getLkAnnotation(TextDocument document)<br />In main, instantiate and call annotator<br />NewAnnotator annotator = new NewAnnotator()<br />annotator.processDirectory(args[0]);<br />Add application calling your new annotator to apps directory<br />Add your application to the configuration file as before<br />24<br />
  • 38. EVALUATION<br />Evaluation works with same configuration file. Simply add evaluation element<br />&lt;lk-applicationlogDir=&quot;log&quot;appDir=&quot;devapps/evaluation&quot;&gt;<br /> &lt;corpusdir=&quot;corpora/evaluation/sst/text/&quot;format=&quot;bbn&quot;/&gt;<br /> &lt;pipeline&gt;<br /> &lt;annotators&gt;<br /> &lt;annotatorexec=&quot;./sst&quot;/&gt;<br /> &lt;/annotators&gt;<br /> &lt;/pipeline&gt;<br /> &lt;evaluationevalDir=&quot;evaluation/sst/&quot;&gt;<br /> &lt;evaluatorprovides=&quot;ENTITIES&quot;<br />goldDir=&quot;corpora/evaluation/sst/gold/&quot;<br />goldAnnotator=&quot;sstgold&quot;<br />annotator=&quot;sst&quot; /&gt;<br /> &lt;/evaluation&gt;<br />&lt;/lk-application&gt;<br />apps/testbed conf/evaluation/sst.xml<br />25<br />
  • 39. EVALUATION RESULTS<br />&lt;evaluationgoldDir=&quot;/home/mikemat/code/livingknowledge/WP6/testbed/corpora/evaluation/sst/gold/&quot;lkDir=&quot;/home/mikemat/code/livingknowledge/WP6/testbed/devapps/evaluation/data/lkxml&quot;annotation=&quot;sst&quot;goldAnnotation=&quot;sstgold&quot;provides=&quot;ENTITIES&quot;&gt;<br />&lt;docs&gt;<br />&lt;docid=&quot;WSJ0375&quot;N=&quot;19&quot;tp=&quot;18&quot;fp=&quot;1&quot;fn=&quot;1&quot; /&gt;<br />&lt;docid=&quot;WSJ0380&quot;N=&quot;19&quot;tp=&quot;15&quot;fp=&quot;4&quot;fn=&quot;1&quot; /&gt;<br />&lt;docid=&quot;WSJ0376&quot;N=&quot;72&quot;tp=&quot;61&quot;fp=&quot;11&quot;fn=&quot;7&quot; /&gt;<br />&lt;docid=&quot;WSJ0377&quot;N=&quot;26&quot;tp=&quot;17&quot;fp=&quot;9&quot;fn=&quot;6&quot; /&gt;<br />&lt;docid=&quot;WSJ0378&quot;N=&quot;10&quot;tp=&quot;10&quot;fp=&quot;0&quot;fn=&quot;0&quot; /&gt;<br />&lt;docid=&quot;WSJ0379&quot;N=&quot;24&quot;tp=&quot;19&quot;fp=&quot;5&quot;fn=&quot;2&quot; /&gt;<br />&lt;/docs&gt;<br />&lt;totalsN=&quot;170&quot;tp=&quot;140&quot;fp=&quot;30&quot;fn=&quot;17&quot;p=&quot;0.8235294117647058&quot;r=&quot;0.89171974522293&quot;f=&quot;0.8562691131498471&quot; /&gt;<br />&lt;/evaluation&gt;<br />cat evaluation/sst/sst.ENTITIES.xml<br />26<br />
  • 40. INDEXING AND SEARCH<br />Search Engines - Traditional<br />Bag-of-words representation<br />Inverted index (words -&gt; documents) for efficiency<br />10 docs ranked according tf-idf similarity with query<br />Search Engines – Today<br />Much metadata associated with documents<br />Ranking based on 100s of features (date, location, pagerank, click data, etc, personalization)<br />Richer display<br />Facets for exploratory search<br />Answers when appropriate<br />etc..<br />Many open source options - Lucene/Solr most widely used<br />27<br />
  • 41. APACHE LUCENE/SOLR<br />Lucene/Solr<br />28<br />
  • 42. FACETED SEARCH<br />Diagram by Yonik Seeley<br />29<br />
  • 43. FACETED SEACH<br /><ul><li>Summarize query results aggregation properties of returned pages
  • 44. price ranges for product query
  • 45. related people or locations for news query
  • 46. Exploratory Search
  • 47. Show documents that matching the query term and a selected facet
  • 48. Make inferences not clear from simple document list
  • 49. Living Knowledge Analysis is modeled very well by facets
  • 50. Topics as determined by entity and fact extraction
  • 51. Location and Time diversity dimensions
  • 52. Opinions as determined by opinion extraction</li></ul>30<br />
  • 53. LK XML TO SOLR<br /><ul><li>Solr has well defined XML input format for adding new documents
  • 54. Diversity Engine provides a simple language to map LX XML to Solr XML</li></ul>31<br />
  • 55. LK2SOLR CONVERSION<br /> &lt;indexersolrHomeDir=&quot;solr/solr“<br /> solrDataDir=&quot;solr/solr/data“<br /> converter=&quot;conf/testbed/tutorial-lk2solr.xml&quot;/&gt;<br />&lt;lktosolr&gt;<br /> &lt;fieldsolr=&quot;per&quot;annotation=&quot;ENTITIES_CLEAN&quot;value=&quot;$text“<br />filter=&quot;org.diversityengine.solr.converter.filters.PerValueFilter&quot;/&gt;<br /> &lt;fieldsolr=&quot;loc&quot;annotation=&quot;ENTITIES_CLEAN&quot;value=&quot;$text“<br /> filter=&quot;org.diversityengine.solr.converter.filters.LocValueFilter&quot;/&gt;<br /> &lt;fieldsolr=&quot;keywords&quot;annotation=&quot;TOP_ENTITIES&quot;value=&quot;$text&quot; /&gt;<br /> &lt;fieldsolr=&quot;pubdate&quot;annotation=&quot;metainfo:lktext&quot;value=&quot;date“<br /> type=&quot;date&quot;/&gt;<br />&lt;/lktosolr&gt;<br />solr – Name of the field in solr<br />annotation – Name of the LKXML Annotation<br />value – Value of annotation<br />filter – Allows post processing on annotation<br />type – Only Date supported currently<br />32<br />
  • 56. ADDING FACTS TO INDEX<br />&lt;lktosolr&gt;<br /> &lt;fieldsolr=&quot;per&quot;annotation=&quot;ENTITIES_CLEAN&quot;value=&quot;$text“<br />filter=&quot;org.diversityengine.solr.converter.filters.PerValueFilter&quot;/&gt;<br /> &lt;fieldsolr=&quot;loc&quot;annotation=&quot;ENTITIES_CLEAN&quot;value=&quot;$text“<br /> filter=&quot;org.diversityengine.solr.converter.filters.LocValueFilter&quot;/&gt;<br /> &lt;fieldsolr=&quot;keywords&quot;annotation=&quot;TOP_ENTITIES&quot;value=&quot;$text&quot; /&gt;<br /> &lt;fieldsolr=&quot;pubdate&quot;annotation=&quot;metainfo:lktext&quot;value=&quot;date“<br /> type=&quot;date&quot;/&gt;<br /> &lt;fieldsolr=&quot;yago&quot;annotation=&quot;yago-entities&quot;value=&quot;$text&quot; /&gt;<br /> &lt;fieldsolr=&quot;yago-country&quot;annotation=&quot;facts&quot;<br /> value=&quot;xpath:/entity-information[facts/type/text()=<br /> &apos;wordnet_country_108544813&apos;]/id/text()&quot; /&gt;<br />&lt;/lktosolr&gt;<br />apps/testbed –run convert-solr conf/testbed/tutorial-application.xml<br />ls devapps/example/data/solr/*<br />apps/testbed –run index conf/testbed/tutorial-application.xml<br />33<br />
  • 57. FACTS TO SOLR<br /> &lt;fieldsolr=&quot;yago&quot;annotation=&quot;yago-entities&quot;value=&quot;$text&quot; /&gt;<br />34<br />
  • 58. FACTS TO SOLR<br /> &lt;fieldsolr=&quot;yago-country&quot;annotation=&quot;facts&quot;<br /> value=&quot;xpath:/entity-information[facts/type/text()=<br /> &apos;wordnet_country_108544813&apos;]/id/text()&quot; /&gt;<br />35<br />
  • 59. ADDING IMAGES TO INDEX<br />&lt;lktosolr&gt;<br /> &lt;fieldsolr=&quot;per&quot;annotation=&quot;ENTITIES_CLEAN&quot;value=&quot;$text“<br /> filter=&quot;org.diversityengine.solr.converter.filters.PerValueFilter&quot;/&gt;<br /> &lt;fieldsolr=&quot;loc&quot;annotation=&quot;ENTITIES_CLEAN&quot;value=&quot;$text“<br /> filter=&quot;org.diversityengine.solr.converter.filters.LocValueFilter&quot;/&gt;<br /> &lt;fieldsolr=&quot;keywords&quot;annotation=&quot;TOP_ENTITIES&quot;value=&quot;$text&quot; /&gt;<br /> &lt;fieldsolr=&quot;yago&quot;annotation=&quot;yago-entities&quot;value=&quot;$text&quot; /&gt;<br /> &lt;fieldsolr=&quot;yago-country&quot;annotation=&quot;facts&quot;<br />value=&quot;xpath:/entityinformation[facts/type/text()<br /> =&apos;wordnet_country_108544813&apos;]/id/text()&quot; /&gt;<br /> &lt;fieldsolr=&quot;pubdate&quot;annotation=&quot;metainfo:lktext&quot;value=&quot;date“<br />type=&quot;date&quot;/&gt;<br /> &lt;fieldsolr=&quot;image&quot;annotation=&quot;IMAGE_ANNOTS&quot;value=&quot;$text&quot; /&gt;<br /> &lt;fieldsolr=&quot;bestimage&quot;annotation=&quot;BEST_IMAGES&quot;value=&quot;$text&quot; /&gt;<br />&lt;/lktosolr&gt;<br />apps/testbed –run convert-solr conf/testbed/tutorial-application.xml<br />ls devapps/example/data/solr/*<br />apps/testbed –run index conf/testbed/tutorial-application.xml<br />36<br />
  • 60. APPLICATION DEVELOPMENT<br />Examples<br />HTML Extraction<br />Scaling to Large Collections<br />Provenance<br />Some pluggable GUI components<br />37<br />
  • 61. FACT/IMAGE APPLICATION<br /> &lt;searcherappTitle=&quot;LivingKnowledge - Example Application&quot;<br />appShortTitle=&quot;Example Application&quot;<br />appUrl=&quot;http://localhost:8983/solr/&quot;&gt;<br /> &lt;facets&gt;<br />&lt;facetfield=“yago&quot;description=“Yago&quot;/&gt; &lt;facetfield=“yago-country&quot;description=“Country&quot;/&gt;<br /> &lt;facetfield=&quot;per&quot;description=&quot;Person&quot;/&gt;<br /> &lt;facetfield=&quot;loc&quot;description=&quot;Location&quot;/&gt; &lt;facetfield=“image&quot;description=“Images&quot;/&gt; &lt;/facets&gt;<br /> &lt;/searcher&gt;<br />ant deploy-testbed<br />38<br />
  • 62. FACT/IMAGE APPLICATION<br />http://localhost:8983/testbed/results.jsp?query=putin<br />39<br />
  • 63. OPINION APPLICATION<br />Opinions are at sentence level, not document level – same analysis, but different indexing<br />cat conf/testbed/tutorial-lk2solr-sentence.xml<br />&lt;lktosolrsolrDoc=&quot;SENTENCES&quot;contextSize=&quot;1&quot;&gt;<br /> &lt;fieldsolr=&quot;per&quot;annotation=&quot;ENTITIES_CLEAN&quot;value=&quot;$text“<br /> filter=&quot;org.diversityengine.solr.converter.filters.PerValueFilter“ source=&quot;solrdoc&quot; /&gt;<br /> &lt;fieldsolr=&quot;loc&quot;annotation=&quot;ENTITIES_CLEAN&quot;value=&quot;$text“<br /> filter=&quot;org.diversityengine.solr.converter.filters.LocValueFilter“<br /> source=&quot;solrdoc&quot; /&gt;<br /> &lt;fieldsolr=&quot;keywords&quot;annotation=&quot;TOP_ENTITIES&quot;value=&quot;$text&quot; /&gt;<br /> &lt;fieldsolr=&quot;yago&quot;annotation=&quot;yago-entities&quot;value=&quot;$text“<br /> source=&quot;solrdoc&quot; /&gt;<br /> &lt;fieldsolr=&quot;image&quot;annotation=&quot;IMAGE_ANNOTS&quot;value=&quot;$text&quot; /&gt;<br /> &lt;fieldsolr=&quot;bestimage&quot;annotation=&quot;BEST_IMAGES&quot;value=&quot;$text&quot; /&gt;<br /> &lt;fieldsolr=&quot;pubdate&quot;annotation=&quot;metainfo:lktext&quot;value=&quot;date“<br /> type=&quot;date&quot;/&gt;<br /> &lt;fieldsolr=&quot;polarity&quot;<br />annotation=&quot;MPQA-expressive-subjectivity,MPQA-direct-subjective“<br /> value=&quot;xpath:/node()[@pol]/@pol&quot;source=&quot;solrdoc“<br /> filter=&quot;org.diversityengine.solr.converter.filters.PolarityValueFilter&quot;/&gt;<br /> &lt;fieldsolr=&quot;pol-int“<br /> annotation=&quot;MPQA-expressive-subjectivity,MPQA-direct-subjective“<br /> value=&quot;xpath:concat(/node()[@pol and @int]/@pol,/node()[@int and @pol]/@int)“<br /> source=&quot;solrdoc&quot;/&gt;<br />&lt;/lktosolr&gt;<br />apps/testbed –run convert-solr,index <br /> conf/testbed/tutorial-application-sentence.xml<br />ls devapps/example/data/solr/*<br />40<br />
  • 64. SOLR XML – SENTENCE<br />41<br />
  • 65. OPINION APPLICATION<br />modify webappWEB-INFweb.xml<br />&lt;web-appxmlns=&quot;http://java.sun.com/xml/ns/javaee&quot;<br />xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;<br />xsi:schemaLocation=&quot;http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd&quot;<br />version=&quot;2.5&quot;&gt;<br />&lt;description&gt;<br /> LivingKnowledge Testbed Example Application<br />&lt;/description&gt;<br />&lt;display-name&gt;Testbed Examples&lt;/display-name&gt;<br /> &lt;context-param&gt;<br /> &lt;param-name&gt;applicationDef&lt;/param-name&gt;<br />&lt;param-value&gt;conf/testbed/tutorial-application-sentence.xml&lt;/param-value&gt;<br /> &lt;description&gt;The Living Knowledge application description XML file &lt;/description&gt;<br />&lt;/context-param&gt;<br />&lt;/web-app&gt;<br />ant deploy-testbed<br />42<br />
  • 66. OPINION APPLICATION<br />http://localhost:8983/testbed/results.jsp?query=putin<br />43<br />
  • 67. HTML EXTRACTION<br />44<br />
  • 68. HTML EXTRACTION<br />Boilerplate can lead to false positive results and inaccurate facet aggregation<br />Real example – before extraction developed, most common person for most queries was in a top story title (on all pages) the day of the crawl!<br />Titles, Authors and Dates are important for bias and diversity aware search<br />45<br />
  • 69. PROVENANCE<br />How an annotation is derived is often as important as the annotation itself<br />Users want to verify results<br />Developers need to validate results<br />Open Provenance provides an open source solution<br />Testbed annotations can be extended with Open Provenance chains<br />46<br />
  • 70. Provenance Diagram<br />47<br />
  • 71. SCALING TO LARGE COLLECTIONS<br />In the real world, even “small” datasets have million of documents<br />NLP/Image processing is expensive – 1 doc/sec = 11 days for 1 million docs!<br />Hadoop Mapper allows for scaling – scales linearly with number of machines<br />ZipCollection writer allows partitioning data into subsets for processing<br />48<br />
  • 72. COMPONENTS- OPINIONS<br />49<br />
  • 73. COMPONENTS - TIME<br />50<br />
  • 74. COMPONENTS - GEO<br />51<br />
  • 75. FUTURE WORK<br />More components <br />Maven to manage dependencies<br />Better integration of Timeline and Geo visualization components<br />Integration of ranking algorithms<br />Better Documentation <br />52<br />
  • 76. Thanks!<br />LivingKnowledge Partners!<br />You for coming!!<br />Questions?<br />53<br />

×