Documente Academic
Documente Profesional
Documente Cultură
16
LP: CS6007
Department of Computer Science and Engineering
Rev. No: 01
B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013
Date: 27/06/2017
PG Specialisation :_
Sub. Code / Sub. Name : CS6007 – INFORMATION RETRIEVAL
Unit : I - INTRODUCTION
Unit Syllabus:
Introduction - History of IR - Components of IR - Issues – Open source Search engine
Frameworks - The impact of the web on IR - The role of artificial intelligence (AI) in IR – IR Versus
Web Search - Components of a Search engine - Characterizing the web
Objective:
To learn the role of information retrieval in various real-time applications
Session Teaching
Topics to be covered Ref
No * Aids
T1 (Ch 1 : 1 – 5)
Introduction to Information Retrieval, History of IR and T2 (Ch 1 : 1 – 3)
1 LCD
Components of IR T3 (Ch 1 : 1 – 9)
T2 (Ch 1 : 3 – 5) LCD
2 Issues in IR
LCD
3 Open source Search engine Frameworks R1 (Ch 1 : 27 – 30)
T2 (Ch 1 : 8 – 12) LCD
4 The impact of the web on IR
LCD
5 The role of artificial intelligence (AI) in IR R4
LCD
6 IR Versus Web Search R1 (Ch 1 : 5 – 8)
LCD
7 Brief history of search engines T4 (Ch 1 : 1 – 6)
T3 (Ch 1 : 13 –28) LCD
8 Components of a Search engine
T2 (Ch 13 : 373 – 383)
Characterizing the web, Comparing web search to T2 (Ch 13 : 367 – 371)
9 LCD
traditional information retrieval T4 (Ch 2 : 29 – 32)
• Content beyond syllabus covered (if any): Library and Information Science - Concerned with
effective categorization of human knowledge, citation analysis and bibliometrics (structure of
information).
Unit Syllabus:
Boolean and vector-space retrieval models - Term weighting - TF-IDF weighting - cosine
similarity – Preprocessing - Inverted indices - efficient processing with sparse vectors – Language
Model based IR - Probabilistic IR – Latent Semantic Indexing - Relevance feedback and query
expansion.
Objective:
To learn and apply information retrieval models
Session Teaching
Topics to be covered Ref
No * Aids
Basic IR models & Retrieval strategies – Vector- T1 (Ch 1 : 1 – 15)
10 space model, Probabilistic IR, Language models, R2 (Ch 2 : 9 – 57) LCD
T3 (Ch 7 : 233 – 250)
Inference
Retrieval networks
strategies : Extended Boolean retrieval, LCD
R2 (Ch 2 : 57– 84)
11
Latent Semantic Indexing, Neural network, Genetic
Term weighting, TF-IDF weighting and cosine
algorithms R2 (Ch 2 : 11– 21) LCD
12
similarity in Vector-space model
Term weighting, TF-IDF weighting and cosine R2 (Ch 2 : 21– 45) LCD
13
similarity in Probabilistic retrieval strategies
T3 (Ch 5: 129 – 140) LCD
Inverted indices – Documents, Counts, Positions,
14 R1 (Ch 4 : 104 – 131)
Fields and Extents, Scores and Ordering
R2 (Ch 5 : 181– 182)
T1 (Ch 12: 218 – 231) LCD
15 Language Model based IR
R1 (Ch 9 : 286 – 304)
T1 (Ch 11: 201 – 216) LCD
16 Probabilistic IR
R1 (Ch 8 : 259 – 281)
17 Latent Semantic Indexing T1 (Ch 18: 412 – 417) LCD
LCD
18 Relevance feedback and query expansion T1 (Ch 9 : 162 – 177)
Content beyond syllabus covered (if any): Comparison of Google/Yahoo ranking
Unit Syllabus:
Web search overview, web structure, the user, paid placement, search engine optimization/ spam.
Web size measurement - search engine optimization/spam – Web Search Architectures - crawling -
meta-crawlers- Focused Crawling - web indexes –- Near-duplicate detection - Index Compression –
XML retrieval
Objective:
To design Web Search Engine
Session Teaching
Topics to be covered Ref
No * Aids
Web search basics – Background and history, Web
19 characteristics, Search user experience, Index size and T1 (Ch 19 : 385 – 400) LCD
estimation
Web search – The structure of the web, Queries and
R1 (Ch 15 : 507 – 540)
20 users, Static ranking, Dynamic ranking, Evaluating web LCD
T4 (Ch 2: 20 – 23)
search
Web structure - The user, paid placement, Search
21 T4 (Ch 7 : 228 – 230) LCD
engine optimization / spam
Web size measurement - search engine T4 (Ch 5: 91 – 98)
22 LCD
optimization/spam T4 (Ch 7 : 225 – 230)
Web Search Architectures – Crawling, Meta-crawlers T4 (Ch 4 : 78 – 85)
23 LCD
and Focused Crawling T3 (Ch 2 : 13 – 28)
Web Crawlers : Crawling the web, Document feeds, T3 (Ch 3 : 31 – 63)
24 Storing documents and detecting duplicates R1 (Ch 15 : 541 – 549) LCD
Unit Syllabus :
Link Analysis – hubs and authorities – Page Rank and HITS algorithms - Searching and Ranking –
Relevance Scoring and ranking for Web – Similarity - Hadoop & Map Reduce - Evaluation -
Personalized search - Collaborative filtering and content-based recommendation of documents and
products – handling “invisible” Web - Snippet generation, Summarization, Question Answering,
Cross-Lingual Retrieval
Objective:
To be exposed to Link Analysis
Understand Hadoop and MapReduce
Session Teachin
Topics to be covered Ref
No * g Aids
Link Analysis – hubs and authorities, Page Rank and T1 (Ch 21 : 421 – 439)
28 LCD
HITS algorithms T3 (Ch 4 : 104 – 113)
Content beyond syllabus covered (if any): Application : Social Network Analysis
Unit Syllabus:
Information filtering; organization and relevance feedback – Text Mining - Text classification
and clustering - Categorization algorithms: naive Bayes; decision trees; and nearest neighbor -
Clustering algorithms: agglomerative clustering; k-means; expectation maximization (EM).
Objective:
Session Teachin
Topics to be covered Ref
No * g Aids
42 Text classification – Feature selection and Evaluation T1 (Ch 13 : 251 – 264) LCD
TEXT BOOKS:
1. C. Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval, Cambridge
University Press, 2008.
2. Ricardo Baeza -Yates and Berthier Ribeiro - Neto, Modern Information Retrieval: The Concepts and
nd
Technology behind Search 2 Edition, ACM Press Books 2011
3. Bruce Croft, Donald Metzler and Trevor Strohman, Search Engines: Information Retrieval in Practice,
1st Edition Addison Wesley, 2009.
nd
4. Mark Levene, An Introduction to Search Engines and Web Navigation, 2 Edition Wiley, 2010.
REFERENCES:
1. Stefan Buettcher, Charles L. A. Clarke, Gordon V. Cormack, Information Retrieval: Implementing and
Evaluating Search Engines, The MIT Press, 2010.
2. Ophir Frieder “Information Retrieval: Algorithms and Heuristics: The Information Retrieval Series”,
nd
2 Edition, Springer, 2004.
3. Manu Konchady, “Building Search Applications: Lucene, Ling Pipe”, and First Edition, Gate Mustru
Publishing, 2008.
4. www.nptel.ac.in
Prepared by Approved by
Signature
Remarks *:
* If the same lesson plan is followed in the subsequent semester/year it should be mentioned and
signed by the Faculty and the HOD