SlideShare a Scribd company logo
Lemur Toolkit Tutorial
Introductions ,[object Object],[object Object]
Installation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview (part 2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Language Modeling for IR Estimate a multinomial  probability distribution  from the text Smooth the distribution with one estimated from  the entire collection P(w|  D ) = (1-  ) P(w|D)+    P(w|C)
[object Object],Query Likelihood ? P(Q|  D ) =    P(q|  D )
Kullback-Leibler Divergence ,[object Object],= ? KL(  Q |  D ) =    P(w|  Q ) log(P(w|  Q ) / P(w|  D ))
Inference Networks ,[object Object],q1 q2 qn q3 I d1 d2 d3 di
Summary of Ranking ,[object Object],[object Object],[object Object],[object Object],[object Object]
Other Techniques ,[object Object],[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing ,[object Object],[object Object],[object Object]
Two Index Formats ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing – Document Preparation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],(*) Note: Microsoft Word and Microsoft PowerPoint can only be indexed on a Windows-based machine, and Office must be installed.
Indexing – Document Preparation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing - Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing – Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing – Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing - Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing - Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing - Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing – Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing anchor text ,[object Object],[object Object]
Retrieval ,[object Object],[object Object],[object Object]
Retrieval - Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Retrieval - Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Retrieval - Parameters ,[object Object],[object Object],[object Object],[object Object]
Retrieval - Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Retrieval - Parameters ,[object Object],[object Object],[object Object]
Retrieval – Query Formatting ,[object Object],[object Object],[object Object],[object Object]
Retrieval - Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object]
Retrieval - Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Retrieval – Parameters ,[object Object],[object Object],[object Object],[object Object]
Retrieval - Parameters ,[object Object],[object Object]
Retrieval – Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Outputting INEX Result Format ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Retrieval – Interpreting Results ,[object Object],[object Object],[object Object],[object Object],[object Object]
Retrieval – Interpreting Results ,[object Object],[object Object],[object Object],[object Object]
Retrieval - Evaluation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Smoothing ,[object Object],[object Object],[object Object]
Use RetEval for TF.IDF ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Use RetEval for TF.IDF ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indri Query Language ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Term Operations name example behavior term dog occurrences of  dog   (Indri will stem and stop) “ term” “ dog” occurrences of  dog   (Indri will not stem or stop) ordered window #od n (blue car) blue   n  words or less before  car unordered window #ud n (blue car) blue  within  n  words of  car synonym list #syn(car automobile) occurrences of  car  or  automobile weighted synonym #wsyn(1.0 car 0.5 automobile) like synonym, but only counts occurrences of  automobile  as 0.5 of an occurrence any operator #any:person all occurrences of  the  person  field
Field Restriction/Evaluation name example behavior restriction dog.title counts only occurrences of  dog  in  title  field dog.title,header counts occurrences of  dog   in  title  or  header evaluation dog.(title) builds belief  b (dog)  using  title  language model dog.(title,header) b (dog)  estimated using language model from concatenation of  all  title  and  header  fields #od1(trevor strohman).person(title) builds a model from all  title  text for b (#od1(trevor strohman).person) - only counts “ trevor strohman ” occurrences in  person  fields
Numeric Operators name example behavior less #less(year 2000) occurrences of  year  field  < 2000 greater #greater(year 2000) year  field  > 2000 between #between(year 1990 2000) 1990 < year  field  < 2000 equals #equals(year 2000) year  field  = 2000
Belief Operations name example behavior combine #combine(dog train) 0.5 log(  b (dog)  ) +  0.5 log(  b (train)  ) weight, wand #weight(1.0 dog 0.5 train) 0.67 log(  b (dog)  ) + 0.33 log(  b (train)  ) wsum #wsum(1.0 dog  0.5 dog.(title)) log( 0.67  b (dog)  + 0.33  b (dog.(title))  ) not #not(dog) log( 1 -  b (dog)  ) max #max(dog train) returns maximum of  b (dog)  and  b (train) or #or(dog cat) log(1 - (1 -  b (dog) ) *  (1 -  b (cat) ))
Field/Passage Retrieval name example behavior field retrieval #combine[title]( query ) return only title fields ranked according to  #combine(query) - beliefs are estimated on each  title ’s language model  -may use any belief node passage retrieval #combine[passage200:100]( query ) dynamically created passages of length  200  created every  100  words are ranked by  #combine(query)
More Field/Passage Retrieval ,[object Object],[object Object],example behavior #combine[section]( bootstrap  #combine[./title]( methodology )) Rank sections matching  bootstrap  where the section’s  title  also matches  methodology
Filter Operations name example behavior filter require #filreq(elvis #combine(blue shoes)) rank documents that  contain  elvis  by  #combine(blue shoes) filter reject #filrej(shopping #combine(blue shoes)) rank documents that do not contain  shopping  by #combine(blue shoes)
Document Priors ,[object Object],name example behavior prior #combine(#prior(RECENT) global warming) ,[object Object],[object Object]
Ad Hoc Retrieval ,[object Object],[object Object],[object Object]
Query Expansion ,[object Object],[object Object]
Known Entity Search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview (part 2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing Your Data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing Text Corpora ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing: Offset Annotation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],docno type id name start length value parent
Indexing Text Corpora ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing Text Corpora ,[object Object],[object Object],[object Object],[object Object],[object Object]
Overview (part 2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ParsedDocument struct ParsedDocument { const char* text; size_t textLength; indri::utility::greedy_vector<char*> terms; indri::utility::greedy_vector<indri::parse::TagExtent*> tags; indri::utility::greedy_vector<indri::parse::TermExtent> positions; indri::utility::greedy_vector<indri::parse::MetadataPair> metadata; };
ParsedDocument: Text ,[object Object],[object Object],[object Object],[object Object]
ParsedDocument: Content ,[object Object],[object Object],[object Object],[object Object],[object Object]
ParsedDocument: Terms ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ParsedDocument: Terms ,[object Object],[object Object],[object Object],[object Object]
ParsedDocument: Tags ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ParsedDocument: Tags ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ParsedDocument: Tags ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ParsedDocument: Tags ,[object Object],[object Object],<doc> <par> <sent> My dog still has fleas. </sent>   <sent> My cat does not have fleas. </sent> </par> </doc>
ParsedDocument: Tags ,[object Object],[object Object],[object Object],[object Object]
ParsedDocument: Tags ,[object Object],[object Object],[object Object],[object Object]
ParsedDocument: Metadata ,[object Object],[object Object],[object Object],[object Object],greedy_vector<indri::parse::MetadataPair> metadata
Overview (part 2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Tag Conflation ,[object Object],[object Object],[object Object],[object Object]
Indexing Fields ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing Fields ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview (part 2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
dumpindex ,[object Object],[object Object],[object Object],[object Object]
dumpindex ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],word  term_count doc_count
dumpindex ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],document, count, positions term, stem, count, total_count
dumpindex ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
dumpindex ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],document, weight, begin, end
Overview (part 2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Introducing the API ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indri: IndexEnvironment ,[object Object],[object Object],[object Object],[object Object],[object Object]
Indri: IndexEnvironment ,[object Object],[object Object],[object Object],[object Object],[object Object]
Indri: QueryEnvironment ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
QueryEnvrionment: Opening ,[object Object],[object Object],[object Object],[object Object],[object Object]
QueryEnvironment: Running ,[object Object],[object Object],[object Object]
QueryEnvironment: Retrieving ,[object Object],[object Object],[object Object],[object Object],[object Object]
Lemur “Classic” API ,[object Object],[object Object],[object Object],[object Object]
Lemur Index Browsing ,[object Object],[object Object],[object Object],[object Object],[object Object]
Lemur Index Browsing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Lemur Index Browsing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Lemur Index Browsing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Lemur Index Browsing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Lemur: DocInfoList ,[object Object],[object Object],[object Object],[object Object],[object Object]
Lemur: TermInfoList ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Lemur Retrieval Class Name Description TFIDFRetMethod BM25 SimpleKLRetMethod KL-Divergence InQueryRetMethod Simplified InQuery CosSimRetMethod Cosine CORIRetMethod CORI OkapiRetMethod Okapi IndriRetMethod Indri (wraps QueryEnvironment)
Lemur Retrieval ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Lemur: Other tasks ,[object Object],[object Object],[object Object]
Getting Help ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Concluding: In Review ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Concluding: In Review ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Questions Ask us questions! What is the best way to do  x ? How do I get started with my particular task? Does the toolkit have the  x  feature? How can I modify the toolkit to do  x ? When do we get coffee?

More Related Content

What's hot

Search explained T3DD15
Search explained T3DD15Search explained T3DD15
Search explained T3DD15
Hans Höchtl
 
Extracting article text from the web with maximum subsequence segmentation
Extracting article text from the web with maximum subsequence segmentationExtracting article text from the web with maximum subsequence segmentation
Extracting article text from the web with maximum subsequence segmentationJhih-Ming Chen
 
Python cheat-sheet
Python cheat-sheetPython cheat-sheet
Python cheat-sheet
srinivasanr281952
 
11 Unit 1 Chapter 02 Python Fundamentals
11  Unit 1 Chapter 02 Python Fundamentals11  Unit 1 Chapter 02 Python Fundamentals
11 Unit 1 Chapter 02 Python Fundamentals
Praveen M Jigajinni
 
Ryan hamahashi 10x10 presentation
Ryan hamahashi 10x10 presentationRyan hamahashi 10x10 presentation
Ryan hamahashi 10x10 presentationronin124
 
Python interview questions
Python interview questionsPython interview questions
Python interview questions
Pragati Singh
 
An approach to source code plagiarism
An approach to source code plagiarismAn approach to source code plagiarism
An approach to source code plagiarism
varsha_bhat
 
NetBase API Presentation
NetBase API PresentationNetBase API Presentation
NetBase API Presentation
Netbase Solutions Inc.
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
Chandan Deb
 
NLP and LSA getting started
NLP and LSA getting startedNLP and LSA getting started
NLP and LSA getting started
Innovation Engineering
 
Java platform
Java platformJava platform
Java platform
Visithan
 
Python interview questions and answers
Python interview questions and answersPython interview questions and answers
Python interview questions and answers
RojaPriya
 
Fundamentals of Python Programming
Fundamentals of Python ProgrammingFundamentals of Python Programming
Fundamentals of Python Programming
Kamal Acharya
 
ImplementingChangeTrackingAndFlagging
ImplementingChangeTrackingAndFlaggingImplementingChangeTrackingAndFlagging
ImplementingChangeTrackingAndFlaggingSuite Solutions
 
What is Python?
What is Python?What is Python?
What is Python?
wesley chun
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLPRobert Viseur
 
Introduction to programming with python
Introduction to programming with pythonIntroduction to programming with python
Introduction to programming with python
Porimol Chandro
 

What's hot (20)

Search explained T3DD15
Search explained T3DD15Search explained T3DD15
Search explained T3DD15
 
Extracting article text from the web with maximum subsequence segmentation
Extracting article text from the web with maximum subsequence segmentationExtracting article text from the web with maximum subsequence segmentation
Extracting article text from the web with maximum subsequence segmentation
 
Python cheat-sheet
Python cheat-sheetPython cheat-sheet
Python cheat-sheet
 
11 Unit 1 Chapter 02 Python Fundamentals
11  Unit 1 Chapter 02 Python Fundamentals11  Unit 1 Chapter 02 Python Fundamentals
11 Unit 1 Chapter 02 Python Fundamentals
 
Ryan hamahashi 10x10 presentation
Ryan hamahashi 10x10 presentationRyan hamahashi 10x10 presentation
Ryan hamahashi 10x10 presentation
 
Python interview questions
Python interview questionsPython interview questions
Python interview questions
 
An approach to source code plagiarism
An approach to source code plagiarismAn approach to source code plagiarism
An approach to source code plagiarism
 
NetBase API Presentation
NetBase API PresentationNetBase API Presentation
NetBase API Presentation
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
 
NLP and LSA getting started
NLP and LSA getting startedNLP and LSA getting started
NLP and LSA getting started
 
AdvancedXPath
AdvancedXPathAdvancedXPath
AdvancedXPath
 
Java platform
Java platformJava platform
Java platform
 
Python interview questions and answers
Python interview questions and answersPython interview questions and answers
Python interview questions and answers
 
Python Tutorial Part 1
Python Tutorial Part 1Python Tutorial Part 1
Python Tutorial Part 1
 
Fundamentals of Python Programming
Fundamentals of Python ProgrammingFundamentals of Python Programming
Fundamentals of Python Programming
 
ImplementingChangeTrackingAndFlagging
ImplementingChangeTrackingAndFlaggingImplementingChangeTrackingAndFlagging
ImplementingChangeTrackingAndFlagging
 
What is Python?
What is Python?What is Python?
What is Python?
 
Web search engines
Web search enginesWeb search engines
Web search engines
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
 
Introduction to programming with python
Introduction to programming with pythonIntroduction to programming with python
Introduction to programming with python
 

Viewers also liked

Info 2402 irt-chapter_2
Info 2402 irt-chapter_2Info 2402 irt-chapter_2
Info 2402 irt-chapter_2
Shahriar Rafee
 
Web search engines ( Mr.Mirza )
Web search engines ( Mr.Mirza )Web search engines ( Mr.Mirza )
Web search engines ( Mr.Mirza )
Ali Saif Mirza
 
120521 agile contracts 2.1
120521 agile contracts 2.1120521 agile contracts 2.1
120521 agile contracts 2.1
Proyectalis / Improvement21
 
Harvard Business School Presentation "Outsourcing Selling"
Harvard Business School Presentation "Outsourcing Selling"Harvard Business School Presentation "Outsourcing Selling"
Harvard Business School Presentation "Outsourcing Selling"
Charles Cohon
 
Agile Contracts by Drew Jemilo (Agile2015)
Agile Contracts by Drew Jemilo (Agile2015)Agile Contracts by Drew Jemilo (Agile2015)
Agile Contracts by Drew Jemilo (Agile2015)
Drew Jemilo
 
Outsourcing ppt
Outsourcing pptOutsourcing ppt
Outsourcing ppt
Aparna Ramesh
 
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Ted Drake
 
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
Romeo Kienzler
 
Design Thinking Method Cards (Beta 1.0)
Design Thinking Method Cards (Beta 1.0)Design Thinking Method Cards (Beta 1.0)
Design Thinking Method Cards (Beta 1.0)
Boris Friedrich Milkowski
 
Design Thinking Method Sticker 2014
Design Thinking Method Sticker 2014Design Thinking Method Sticker 2014
Design Thinking Method Sticker 2014
Design Thinking HSG
 
Scope of electronics and communication engineering.ppt
Scope of electronics and communication engineering.pptScope of electronics and communication engineering.ppt
Scope of electronics and communication engineering.ppt
Rajesh Kumar
 
Electronics past,present and future
Electronics past,present and futureElectronics past,present and future
Electronics past,present and futureRajat Dhiman
 
How To Study Effectively
How To Study EffectivelyHow To Study Effectively
How To Study Effectively
Simmons Marcus
 
Design Thinking: The one thing that will transform the way you think
Design Thinking: The one thing that will transform the way you thinkDesign Thinking: The one thing that will transform the way you think
Design Thinking: The one thing that will transform the way you think
Digital Surgeons
 
Kickstarting Design Thinking
Kickstarting Design ThinkingKickstarting Design Thinking
Kickstarting Design Thinking
Erin 'Folletto' Casali
 
IDEO's design thinking.
IDEO's design thinking. IDEO's design thinking.
IDEO's design thinking.
BeeCanvas
 

Viewers also liked (20)

Info 2402 irt-chapter_2
Info 2402 irt-chapter_2Info 2402 irt-chapter_2
Info 2402 irt-chapter_2
 
Chap1
Chap1Chap1
Chap1
 
Reddit
RedditReddit
Reddit
 
Web search engines ( Mr.Mirza )
Web search engines ( Mr.Mirza )Web search engines ( Mr.Mirza )
Web search engines ( Mr.Mirza )
 
120521 agile contracts 2.1
120521 agile contracts 2.1120521 agile contracts 2.1
120521 agile contracts 2.1
 
Harvard Business School Presentation "Outsourcing Selling"
Harvard Business School Presentation "Outsourcing Selling"Harvard Business School Presentation "Outsourcing Selling"
Harvard Business School Presentation "Outsourcing Selling"
 
Agile Contracts by Drew Jemilo (Agile2015)
Agile Contracts by Drew Jemilo (Agile2015)Agile Contracts by Drew Jemilo (Agile2015)
Agile Contracts by Drew Jemilo (Agile2015)
 
Outsourcing ppt
Outsourcing pptOutsourcing ppt
Outsourcing ppt
 
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
 
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
 
Outsourcing
OutsourcingOutsourcing
Outsourcing
 
Design Thinking Method Cards (Beta 1.0)
Design Thinking Method Cards (Beta 1.0)Design Thinking Method Cards (Beta 1.0)
Design Thinking Method Cards (Beta 1.0)
 
Design Thinking Method Sticker 2014
Design Thinking Method Sticker 2014Design Thinking Method Sticker 2014
Design Thinking Method Sticker 2014
 
Scope of electronics and communication engineering.ppt
Scope of electronics and communication engineering.pptScope of electronics and communication engineering.ppt
Scope of electronics and communication engineering.ppt
 
Electronics past,present and future
Electronics past,present and futureElectronics past,present and future
Electronics past,present and future
 
How To Study Effectively
How To Study EffectivelyHow To Study Effectively
How To Study Effectively
 
Design Thinking Method Cards
Design Thinking Method CardsDesign Thinking Method Cards
Design Thinking Method Cards
 
Design Thinking: The one thing that will transform the way you think
Design Thinking: The one thing that will transform the way you thinkDesign Thinking: The one thing that will transform the way you think
Design Thinking: The one thing that will transform the way you think
 
Kickstarting Design Thinking
Kickstarting Design ThinkingKickstarting Design Thinking
Kickstarting Design Thinking
 
IDEO's design thinking.
IDEO's design thinking. IDEO's design thinking.
IDEO's design thinking.
 

Similar to Lemur Tutorial at SIGIR 2006

Presentation log4 j
Presentation log4 jPresentation log4 j
Presentation log4 j
Sylvain Bouchard
 
Presentation log4 j
Presentation log4 jPresentation log4 j
Presentation log4 j
Sylvain Bouchard
 
Dan Holevoet, Google
Dan Holevoet, GoogleDan Holevoet, Google
Dan Holevoet, Google
500 Startups
 
Processing XML with Java
Processing XML with JavaProcessing XML with Java
Processing XML with Java
BG Java EE Course
 
Pmm05 16
Pmm05 16Pmm05 16
Pmm05 16
Rohit Luthra
 
Developing web apps using Erlang-Web
Developing web apps using Erlang-WebDeveloping web apps using Erlang-Web
Developing web apps using Erlang-Web
fanqstefan
 
Phing - A PHP Build Tool (An Introduction)
Phing - A PHP Build Tool (An Introduction)Phing - A PHP Build Tool (An Introduction)
Phing - A PHP Build Tool (An Introduction)
Michiel Rook
 
Html
HtmlHtml
Introduction to PHP
Introduction to PHPIntroduction to PHP
Introduction to PHP
Kengatharaiyer Sarveswaran
 
Struts2
Struts2Struts2
C:\Users\User\Desktop\Eclipse Infocenter
C:\Users\User\Desktop\Eclipse InfocenterC:\Users\User\Desktop\Eclipse Infocenter
C:\Users\User\Desktop\Eclipse InfocenterSuite Solutions
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
Andrew Lowe
 
Methods and Best Practices for High Performance eCommerce
Methods and Best Practices for High Performance eCommerceMethods and Best Practices for High Performance eCommerce
Methods and Best Practices for High Performance eCommerce
dmitriysoroka
 
Code Generation using T4
Code Generation using T4Code Generation using T4
Code Generation using T4
Joubin Najmaie
 
Sumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced AnalyticsSumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic
 
Back-2-Basics: Exception & Event Instrumentation in .NET
Back-2-Basics: Exception & Event Instrumentation in .NETBack-2-Basics: Exception & Event Instrumentation in .NET
Back-2-Basics: Exception & Event Instrumentation in .NET
David McCarter
 
Back-2-Basics: Exception & Event Instrumentation in .NET
Back-2-Basics: Exception & Event Instrumentation in .NETBack-2-Basics: Exception & Event Instrumentation in .NET
Back-2-Basics: Exception & Event Instrumentation in .NET
David McCarter
 

Similar to Lemur Tutorial at SIGIR 2006 (20)

PDF Localization
PDF  LocalizationPDF  Localization
PDF Localization
 
Presentation log4 j
Presentation log4 jPresentation log4 j
Presentation log4 j
 
Presentation log4 j
Presentation log4 jPresentation log4 j
Presentation log4 j
 
Dan Holevoet, Google
Dan Holevoet, GoogleDan Holevoet, Google
Dan Holevoet, Google
 
Tags in html
Tags in htmlTags in html
Tags in html
 
Processing XML with Java
Processing XML with JavaProcessing XML with Java
Processing XML with Java
 
Pmm05 16
Pmm05 16Pmm05 16
Pmm05 16
 
Developing web apps using Erlang-Web
Developing web apps using Erlang-WebDeveloping web apps using Erlang-Web
Developing web apps using Erlang-Web
 
Phing - A PHP Build Tool (An Introduction)
Phing - A PHP Build Tool (An Introduction)Phing - A PHP Build Tool (An Introduction)
Phing - A PHP Build Tool (An Introduction)
 
Html
HtmlHtml
Html
 
Introduction to PHP
Introduction to PHPIntroduction to PHP
Introduction to PHP
 
Struts2
Struts2Struts2
Struts2
 
C:\Users\User\Desktop\Eclipse Infocenter
C:\Users\User\Desktop\Eclipse InfocenterC:\Users\User\Desktop\Eclipse Infocenter
C:\Users\User\Desktop\Eclipse Infocenter
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
 
Methods and Best Practices for High Performance eCommerce
Methods and Best Practices for High Performance eCommerceMethods and Best Practices for High Performance eCommerce
Methods and Best Practices for High Performance eCommerce
 
Code Generation using T4
Code Generation using T4Code Generation using T4
Code Generation using T4
 
Unit 5
Unit 5Unit 5
Unit 5
 
Sumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced AnalyticsSumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced Analytics
 
Back-2-Basics: Exception & Event Instrumentation in .NET
Back-2-Basics: Exception & Event Instrumentation in .NETBack-2-Basics: Exception & Event Instrumentation in .NET
Back-2-Basics: Exception & Event Instrumentation in .NET
 
Back-2-Basics: Exception & Event Instrumentation in .NET
Back-2-Basics: Exception & Event Instrumentation in .NETBack-2-Basics: Exception & Event Instrumentation in .NET
Back-2-Basics: Exception & Event Instrumentation in .NET
 

Recently uploaded

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 

Recently uploaded (20)

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 

Lemur Tutorial at SIGIR 2006

  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7. Language Modeling for IR Estimate a multinomial probability distribution from the text Smooth the distribution with one estimated from the entire collection P(w|  D ) = (1-  ) P(w|D)+  P(w|C)
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47. Term Operations name example behavior term dog occurrences of dog (Indri will stem and stop) “ term” “ dog” occurrences of dog (Indri will not stem or stop) ordered window #od n (blue car) blue n words or less before car unordered window #ud n (blue car) blue within n words of car synonym list #syn(car automobile) occurrences of car or automobile weighted synonym #wsyn(1.0 car 0.5 automobile) like synonym, but only counts occurrences of automobile as 0.5 of an occurrence any operator #any:person all occurrences of the person field
  • 48. Field Restriction/Evaluation name example behavior restriction dog.title counts only occurrences of dog in title field dog.title,header counts occurrences of dog in title or header evaluation dog.(title) builds belief b (dog) using title language model dog.(title,header) b (dog) estimated using language model from concatenation of all title and header fields #od1(trevor strohman).person(title) builds a model from all title text for b (#od1(trevor strohman).person) - only counts “ trevor strohman ” occurrences in person fields
  • 49. Numeric Operators name example behavior less #less(year 2000) occurrences of year field < 2000 greater #greater(year 2000) year field > 2000 between #between(year 1990 2000) 1990 < year field < 2000 equals #equals(year 2000) year field = 2000
  • 50. Belief Operations name example behavior combine #combine(dog train) 0.5 log( b (dog) ) + 0.5 log( b (train) ) weight, wand #weight(1.0 dog 0.5 train) 0.67 log( b (dog) ) + 0.33 log( b (train) ) wsum #wsum(1.0 dog 0.5 dog.(title)) log( 0.67 b (dog) + 0.33 b (dog.(title)) ) not #not(dog) log( 1 - b (dog) ) max #max(dog train) returns maximum of b (dog) and b (train) or #or(dog cat) log(1 - (1 - b (dog) ) * (1 - b (cat) ))
  • 51. Field/Passage Retrieval name example behavior field retrieval #combine[title]( query ) return only title fields ranked according to #combine(query) - beliefs are estimated on each title ’s language model -may use any belief node passage retrieval #combine[passage200:100]( query ) dynamically created passages of length 200 created every 100 words are ranked by #combine(query)
  • 52.
  • 53. Filter Operations name example behavior filter require #filreq(elvis #combine(blue shoes)) rank documents that contain elvis by #combine(blue shoes) filter reject #filrej(shopping #combine(blue shoes)) rank documents that do not contain shopping by #combine(blue shoes)
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66. ParsedDocument struct ParsedDocument { const char* text; size_t textLength; indri::utility::greedy_vector<char*> terms; indri::utility::greedy_vector<indri::parse::TagExtent*> tags; indri::utility::greedy_vector<indri::parse::TermExtent> positions; indri::utility::greedy_vector<indri::parse::MetadataPair> metadata; };
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87.
  • 88.
  • 89.
  • 90.
  • 91.
  • 92.
  • 93.
  • 94.
  • 95.
  • 96.
  • 97.
  • 98.
  • 99.
  • 100.
  • 101.
  • 102.
  • 103.
  • 104. Lemur Retrieval Class Name Description TFIDFRetMethod BM25 SimpleKLRetMethod KL-Divergence InQueryRetMethod Simplified InQuery CosSimRetMethod Cosine CORIRetMethod CORI OkapiRetMethod Okapi IndriRetMethod Indri (wraps QueryEnvironment)
  • 105.
  • 106.
  • 107.
  • 108.
  • 109.
  • 110. Questions Ask us questions! What is the best way to do x ? How do I get started with my particular task? Does the toolkit have the x feature? How can I modify the toolkit to do x ? When do we get coffee?

Editor's Notes

  1. More information can be found at: http://www.lemurproject.org/tutorials/begin_indexing-1.html
  2. Blue text indicates settings specific to Indri applications. Refer to online documentation for other Lemur application parameters.
  3. Could also work with plain text documents
  4. Put link into web page for document classes
  5. Run a simple query now from the command line
  6. Command line options –runID=runName
  7. Run canned queries now! Evaluate using trec_eval
  8. &lt;parameters&gt; &lt;docFormat&gt;web&lt;/docFormat&gt; &lt;outputFile&gt;queries.reteval&lt;/outputFile&gt; &lt;stemmer&gt;krovetz&lt;/stemmer&gt; &lt;/parameters&gt;
  9. &lt;parameters&gt; &lt;index&gt;index&lt;/index&gt; &lt;retModel&gt;0&lt;/retModel&gt; &lt;textQuery&gt;queries.reteval&lt;/textQuery&gt; &lt;resultCount&gt;1000&lt;/resultCount&gt; &lt;resultFile&gt;tfidf.res&lt;/resultFile&gt; &lt;/parameters&gt;
  10. Indri normalizes weights to sum to 1.
  11. Better example and mention filed hierarchy at index time