Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

00 intro

369 views

Published on

  • Be the first to comment

  • Be the first to like this

00 intro

  1. 1. Course Overview:An Introduction to Information Retrieval and Applications J. H. Wang Feb. 22, 2012
  2. 2. Instructor & TA• Instructor – J. H. Wang ( 王正豪 ) – Assistant Professor, CSIE, NTUT – Office: R1534, Technology Building – E-mail: jhwang@csie.ntut.edu.tw – Tel: ext. 4238 – Office Hour: 9:00-12:00 am, every Tuesday and Wednesday• TA – Mr. Liu ( 劉瀚之 ) – R1424, Technology BuildingIR, Spring 2012 NTUT CSIE 2
  3. 3. Course Description• Course Web Page – http://www.ntut.edu.tw/~jhwang/IR/• Time: 9:10-12:00am, Thu.• Classroom: R1322, Technology Building• Textbook: – Christopher D. Manning, Prabhakar Raghavan and Hinrich Schuetze, Introduction to Information Retrieval, Cambridge University Press, 2008. • Available online • International Student Edition, imported by Kai-Fa ( 開發 ) Publishing• Prerequisites: – Basic knowledge of data structures and algorithms, linear algebra, and probability theory – Programming experience is *required* for homeworks & projectsIR, Spring 2012 NTUT CSIE 3
  4. 4. Additional References• References: – Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval: The Concepts and Technology behind Search, Addison-Wesley, 2011. • This is the second edition of their book Modern Information Retrieval in 1999. ( 華通 ) – Stefan Buettcher, Charles L.A. Clarke, and Gordon V. Cormack, Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010. – Bruce Croft, Donald Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice, Addison-Wesley, 2010. ( 全華 )IR, Spring 2012 NTUT CSIE 4
  5. 5. More Books on IR• Gerald Salton, Automatic information organization and retrieval, McGraw-Hill, 1968.• Gerald Salton and M.J. McGill, Introduction to modern information retrieval, McGraw-Hill, 1983. – Two classics, but out-of-print.• C. J. van Rijsbergen, Information Retrieval, Butterworths, 1979. – The classic. More than 40 years old, but still worth reading.• K. Sparck Jones, P. Willett, Readings in Information Retrieval, Morgan Kaufmann, 1997. – A collection of classical IR papers. (out of print)• I.H. Witten, A. Moffat, T.C. Bell. Morgan Kaufmann, Managing Gigabytes, 2nd edition, 1999. – The authority on index construction and compression.IR, Spring 2012 NTUT CSIE 5
  6. 6. Grading Policy• Homework assignments and programming exercises: 40%• Mid-term exam: 25%• Term project: 35% – Including the proposal and final reportIR, Spring 2012 NTUT CSIE 6
  7. 7. Programming Exercises and Term Project• About 3 programming exercises – Team-based (at most 2 persons per team) – You can either write your own code or reuse existing open source code• The term project – Either team-based system development (the same as programming exercises) – Or academic paper presentation • Only one person per team allowed – A proposal is required before midterm (Apr. 12, 2012)IR, Spring 2012 NTUT CSIE 7
  8. 8. About the Term Project• The score you get depends on the difficulty and quality of your project – For system development: • System functions and correctness – For academic paper presentation • Quality and your presentation of the paper • Major methods/experimental results *must* be presented • Papers from top conferences are strongly suggested – E.g. SIGIR, WWW, CIKM, WSDM, JCDL, ICMR, … • Proposals are *required* for each team, and will counted in the scoreIR, Spring 2012 NTUT CSIE 8
  9. 9. Online Submission• Submission instructions – Programs, project proposals, and project reports in electronic files must be submitted to the TA online at: • http://140.124.183.39/ir/ – Before submission: • User name: Your student ID • Please change your default password at your first loginIR, Spring 2012 NTUT CSIE 9
  10. 10. What this Course is NOT about• This course will NOT tell you – The tips and tricks of using search engines, although power users might have better ideas on how to improve them • There’re plenty of books and websites on that… – How to find books in libraries, although it’s somewhat related to the basic IR concepts – How to make money on the Web, although the currently largest search engine did itIR, Spring 2012 NTUT CSIE 10
  11. 11. What’s Information RetrievalIR, Spring 2012 NTUT CSIE 11
  12. 12. On WikipediaIR, Spring 2012 NTUT CSIE 12
  13. 13. On Google ImagesIR, Spring 2012 NTUT CSIE 13
  14. 14. On Google Video SearchIR, Spring 2012 NTUT CSIE 14
  15. 15. On Google News (TW)IR, Spring 2012 NTUT CSIE 15
  16. 16. On Google News (US)IR, Spring 2012 NTUT CSIE 16
  17. 17. On BlogsIR, Spring 2012 NTUT CSIE 17
  18. 18. On Google Translate…IR, Spring 2012 NTUT CSIE 18
  19. 19. Or More Related Keywords• NBA• New York Knicks• Linsanity•…IR, Spring 2012 NTUT CSIE 19
  20. 20. What if We Search in ChineseIR, Spring 2012 NTUT CSIE 20
  21. 21. And More…• 紐約尼克• 哈佛• 台裔球員•…• And other languages…• And other search engines…• And social websites…IR, Spring 2012 NTUT CSIE 21
  22. 22. In Google TrendsIR, Spring 2012 NTUT CSIE 22
  23. 23. And More…IR, Spring 2012 NTUT CSIE 23
  24. 24. And Other Keywords…IR, Spring 2012 NTUT CSIE 24
  25. 25. And Other Keywords…IR, Spring 2012 NTUT CSIE 25
  26. 26. Palanteer – TW ElectionIR, Spring 2012 NTUT CSIE 26
  27. 27. IR, Spring 2012 NTUT CSIE 27
  28. 28. IR, Spring 2012 NTUT CSIE 28
  29. 29. What Is Information Retrieval?• “Information retrieval is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information.” (Salton, 1968)IR, Spring 2012 NTUT CSIE 29
  30. 30. Goal• Information retrieval (IR): a research field that targets at effectively and efficiently searching information in text and multimedia documents• In this course, we will introduce the basic text and query models in IR, retrieval evaluation, indexing and searching, and applications for IRIR, Spring 2012 NTUT CSIE 30
  31. 31. A Big PictureIR, Spring 2012 NTUT CSIE 31
  32. 32. User Interface user need Text Text Operations logical view Doc representation Query Indexing Indexing user feedback Expansion query inverted file Retrieval Retrieval Inverte d Index retrieved docs Document Collection Ranking Ranking ranked docsIR, Spring 2012 NTUT CSIE 32
  33. 33. Topics• Text IR – Indexing and searching – Query languages and operations• Retrieval evaluation• Modeling – Boolean model – Vector space model – Probabilistic model• Applications for IR – Multimedia IR – Web search – Digital librariesIR, Spring 2012 NTUT CSIE 33
  34. 34. Organization of the Textbook• Basics in IR (focus) – Inverted indexes for boolean queries (Ch.1-5) – Term weighting and vector space model (Ch. 6-7) – Evaluation in IR (Ch. 8)• Advanced Topics – Relevance feedback (Ch. 9) – XML retrieval (Ch. 10) – Probabilistic IR (Ch. 11) – Language models (Ch. 12)• Machine learning in IR (useful) – Text classification (Ch. 13-15) – Document clustering (Ch. 16-18)• Web Search – Web crawling and indexes (Ch. 19-20) – Link analysis (Ch. 21)IR, Spring 2012 NTUT CSIE 34
  35. 35. Pointers to Other Topics• Cross-language IR• Image, video, and multimedia IR• Speech retrieval• Music retrieval• User interfaces• Parallel, distributed, and P2P IR• Digital libraries• Information science perspective• Logic-based approaches to IR• Natural language processing techniquesIR, Spring 2012 NTUT CSIE 35
  36. 36. Tentative Schedule• Before midterm – Boolean retrieval (1 wk) – Indexing (2 wks) – Vector space model and evaluation (2 wk) – Relevance feedback (1 wk) – Probabilistic IR (2 wk)• After midterm – Text classification (1-2 wk) – Document clustering (1-2 wk) – Web search (2 wks) – Advanced topics: CLIR, IE, … (2 wks) – Term Project Presentation (3 wks)IR, Spring 2012 NTUT CSIE 36
  37. 37. Generic Resources• Wikipedia page on Information Retrieval: http://en.wikipedia.org/wiki/Information_re• Information Retrieval Resources: http://www- csli.stanford.edu/~hinrich/information- retrieval.html•IR, Spring 2012 NTUT CSIE 37
  38. 38. Academic Resources• Journals – ACM TOIS: Transactions on Information Systems – JASIST: Journal of the American Society of Information Sciences – IP&M: Information Processing and Management – IEEE TKDE: Transactions on Knowledge and Data Engineering• Conferences – ACM SIGIR: International Conference on Information Retrieval – WWW: World Wide Web Conference – ACM CIKM: Conference on Information Knowledge and Management – JCDL: ACM/IEEE Joint Conference on Digital Libraries – ACM WSDM: International Conference on Web Search and Data Mining – TREC: Text Retrieval ConferenceIR, Spring 2012 NTUT CSIE 38
  39. 39. Thanks for Your Attention!IR, Spring 2012 NTUT CSIE 39

×