By
SATHISHKUMAR G
(sathishsak111@gmail.com)
 Homework assignments and programming
exercises: ~40%
 Mid-term exam: ~25%
 Term project: ~35%
 Including proposal, presentation, and final report
 About 3 programming exercises
 Team-based (at most 2 persons per team)
 You can either write your own code or reuse existing
open source code
 The term project
 Either team-based system development (the same as
programming exercises)
 Or academic paper presentation
 Only one person per team allowed
 A proposal is *required* before midterm (Apr. 11,
2014)
 The score you get depends on the functions,
difficulty and quality of your project
 For system development:
 System functions and correctness
 For academic paper presentation
 Quality and your presentation of the paper
 Major methods/experimental results *must* be presented
 Papers from top conferences are strongly suggested
 E.g. SIGIR, WWW, CIKM, WSDM, JCDL, ICMR, …
 Proposals are *required* for each team, and will be counted
in the score
 Submission instructions
 Programs, project proposals, and project reports in
electronic files must be submitted to the TA online at:
 Submissions website: (TBD)
 Before submission:
 User name: Your student ID
 Please change your default password at your first login
 This course will NOT tell you
 The tips and tricks of using search engines,
although power users might have better ideas on
how to improve them
 There’re plenty of books and websites on that…
 How to find books in libraries,
although it’s somewhat related to the basic IR
concepts
 How to make money on the Web,
although the currently largest search engine did it
 Things that you have been doing all day!
 Searching for something interesting: Web, news,
e-mail, image, video, …
 Asking for advices
 …
 User interests are changing all the time…
 2011: New Zealand Earthquake
 2012: Jeremy Lin
 2013: Meteor Russia
 2014: ? (next slide)
 Blast
 Explosion
 Chelyabinsk
 Asteroid 2012 DA14
 …
 流星
 彗星
 隕石
 俄羅斯
 地球
 …
 And other languages…
 And other search engines…
 And social websites…
 “Information retrieval is a field concerned with the
structure, analysis, organization, storage, searching,
and retrieval of information.” (Salton, 1968)
 Information retrieval (IR): a research field that
targets at effectively and efficiently searching
information in text and multimedia documents
 In this course, we will introduce the basic text
and query models in IR, retrieval evaluation,
indexing and searching, and applications for IR
Inverted
Index
User
Interface
Text Operations
Query
Expansion
Indexing
Retrieval
Ranking
Text
query
user need
user feedback
ranked docs
retrieved docs
Doc representationlogical view
inverted file
Document
Collection
 Text IR
 Indexing and searching
 Query languages and operations
 Retrieval evaluation
 Modeling
 Boolean model
 Vector space model
 Probabilistic model
 Applications for IR
 Multimedia IR
 Web search
 Digital libraries
 Basics in IR (focus)
 Inverted indexes for boolean queries (Ch.1-5)
 Term weighting and vector space model (Ch. 6-7)
 Evaluation in IR (Ch. 8)
 Advanced Topics
 Relevance feedback (Ch. 9)
 XML retrieval (Ch. 10)
 Probabilistic IR (Ch. 11)
 Language models (Ch. 12)
 Machine learning in IR (useful)
 Text classification (Ch. 13-15)
 Document clustering (Ch. 16-18)
 Web Search
 Web crawling and indexes (Ch. 19-20)
 Link analysis (Ch. 21)
 Text mining
 Machine Learning
 Natural Language Processing
 Social Network Analysis
 …
 Cross-language IR
 Image, video, and multimedia IR
 Speech retrieval
 Music retrieval
 User interfaces
 Parallel, distributed, and P2P IR
 Digital libraries
 Information science perspective
 Logic-based approaches to IR
 Natural language processing techniques
 …
 Before midterm
 Boolean retrieval (1 wk)
 Indexing (2 wks)
 Vector space model and evaluation (2 wk)
 Relevance feedback (1 wk)
 Probabilistic IR (2 wk)
 After midterm
 Text classification (1-2 wk)
 Document clustering (1-2 wk)
 Web search (2 wks)
 Advanced topics: CLIR, IE, … (2 wks)
 Term Project Presentation (3 wks)
 Wikipedia page on Information Retrieval:
http://en.wikipedia.org/wiki/Information_ret
rieval
 Information Retrieval Resources: http://www-
csli.stanford.edu/~hinrich/information-
retrieval.html

 Journals
 ACM TOIS: Transactions on Information Systems
 JASIST: Journal of the American Society of Information Sciences
 IP&M: Information Processing and Management
 IEEE TKDE: Transactions on Knowledge and Data Engineering
 Conferences
 ACM SIGIR: International Conference on Information Retrieval
 WWW: World Wide Web Conference
 ACM CIKM: Conference on Information Knowledge and
Management
 JCDL: ACM/IEEE Joint Conference on Digital Libraries
 ACM WSDM: International Conference on Web Search and
Data Mining
 TREC: Text Retrieval Conference
 Slides and lectures will be offered mainly in
English
 For better understanding for domestic students,
important concepts will be briefly summarized
in Chinese
 An Introduction to Information Retrieval and Applications

An Introduction to Information Retrieval and Applications

  • 1.
  • 2.
     Homework assignmentsand programming exercises: ~40%  Mid-term exam: ~25%  Term project: ~35%  Including proposal, presentation, and final report
  • 3.
     About 3programming exercises  Team-based (at most 2 persons per team)  You can either write your own code or reuse existing open source code  The term project  Either team-based system development (the same as programming exercises)  Or academic paper presentation  Only one person per team allowed  A proposal is *required* before midterm (Apr. 11, 2014)
  • 4.
     The scoreyou get depends on the functions, difficulty and quality of your project  For system development:  System functions and correctness  For academic paper presentation  Quality and your presentation of the paper  Major methods/experimental results *must* be presented  Papers from top conferences are strongly suggested  E.g. SIGIR, WWW, CIKM, WSDM, JCDL, ICMR, …  Proposals are *required* for each team, and will be counted in the score
  • 5.
     Submission instructions Programs, project proposals, and project reports in electronic files must be submitted to the TA online at:  Submissions website: (TBD)  Before submission:  User name: Your student ID  Please change your default password at your first login
  • 6.
     This coursewill NOT tell you  The tips and tricks of using search engines, although power users might have better ideas on how to improve them  There’re plenty of books and websites on that…  How to find books in libraries, although it’s somewhat related to the basic IR concepts  How to make money on the Web, although the currently largest search engine did it
  • 7.
     Things thatyou have been doing all day!  Searching for something interesting: Web, news, e-mail, image, video, …  Asking for advices  …  User interests are changing all the time…  2011: New Zealand Earthquake  2012: Jeremy Lin  2013: Meteor Russia  2014: ? (next slide)
  • 17.
     Blast  Explosion Chelyabinsk  Asteroid 2012 DA14  …
  • 19.
     流星  彗星 隕石  俄羅斯  地球  …  And other languages…  And other search engines…  And social websites…
  • 27.
     “Information retrievalis a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information.” (Salton, 1968)
  • 28.
     Information retrieval(IR): a research field that targets at effectively and efficiently searching information in text and multimedia documents  In this course, we will introduce the basic text and query models in IR, retrieval evaluation, indexing and searching, and applications for IR
  • 30.
    Inverted Index User Interface Text Operations Query Expansion Indexing Retrieval Ranking Text query user need userfeedback ranked docs retrieved docs Doc representationlogical view inverted file Document Collection
  • 31.
     Text IR Indexing and searching  Query languages and operations  Retrieval evaluation  Modeling  Boolean model  Vector space model  Probabilistic model  Applications for IR  Multimedia IR  Web search  Digital libraries
  • 32.
     Basics inIR (focus)  Inverted indexes for boolean queries (Ch.1-5)  Term weighting and vector space model (Ch. 6-7)  Evaluation in IR (Ch. 8)  Advanced Topics  Relevance feedback (Ch. 9)  XML retrieval (Ch. 10)  Probabilistic IR (Ch. 11)  Language models (Ch. 12)  Machine learning in IR (useful)  Text classification (Ch. 13-15)  Document clustering (Ch. 16-18)  Web Search  Web crawling and indexes (Ch. 19-20)  Link analysis (Ch. 21)
  • 33.
     Text mining Machine Learning  Natural Language Processing  Social Network Analysis  …
  • 34.
     Cross-language IR Image, video, and multimedia IR  Speech retrieval  Music retrieval  User interfaces  Parallel, distributed, and P2P IR  Digital libraries  Information science perspective  Logic-based approaches to IR  Natural language processing techniques  …
  • 35.
     Before midterm Boolean retrieval (1 wk)  Indexing (2 wks)  Vector space model and evaluation (2 wk)  Relevance feedback (1 wk)  Probabilistic IR (2 wk)  After midterm  Text classification (1-2 wk)  Document clustering (1-2 wk)  Web search (2 wks)  Advanced topics: CLIR, IE, … (2 wks)  Term Project Presentation (3 wks)
  • 36.
     Wikipedia pageon Information Retrieval: http://en.wikipedia.org/wiki/Information_ret rieval  Information Retrieval Resources: http://www- csli.stanford.edu/~hinrich/information- retrieval.html 
  • 37.
     Journals  ACMTOIS: Transactions on Information Systems  JASIST: Journal of the American Society of Information Sciences  IP&M: Information Processing and Management  IEEE TKDE: Transactions on Knowledge and Data Engineering  Conferences  ACM SIGIR: International Conference on Information Retrieval  WWW: World Wide Web Conference  ACM CIKM: Conference on Information Knowledge and Management  JCDL: ACM/IEEE Joint Conference on Digital Libraries  ACM WSDM: International Conference on Web Search and Data Mining  TREC: Text Retrieval Conference
  • 38.
     Slides andlectures will be offered mainly in English  For better understanding for domestic students, important concepts will be briefly summarized in Chinese