Sunteți pe pagina 1din 6

FT/GN/68/01/23.01.

16

SRI VENKATESWARA COLLEGE OF ENGINEERING

COURSE DELIVERY PLAN - THEORY Page 1 of 6

LP: CS6007
Department of Computer Science and Engineering
Rev. No: 01
B.E/B.Tech/M.E/M.Tech : B.E. Regulation: 2013
Date: 27/06/2017
PG Specialisation :_
Sub. Code / Sub. Name : CS6007 – INFORMATION RETRIEVAL
Unit : I - INTRODUCTION

Unit Syllabus:
Introduction - History of IR - Components of IR - Issues – Open source Search engine
Frameworks - The impact of the web on IR - The role of artificial intelligence (AI) in IR – IR Versus
Web Search - Components of a Search engine - Characterizing the web

Objective:
 To learn the role of information retrieval in various real-time applications

Session Teaching
Topics to be covered Ref
No * Aids
T1 (Ch 1 : 1 – 5)
Introduction to Information Retrieval, History of IR and T2 (Ch 1 : 1 – 3)
1 LCD
Components of IR T3 (Ch 1 : 1 – 9)
T2 (Ch 1 : 3 – 5) LCD
2 Issues in IR
LCD
3 Open source Search engine Frameworks R1 (Ch 1 : 27 – 30)
T2 (Ch 1 : 8 – 12) LCD
4 The impact of the web on IR
LCD
5 The role of artificial intelligence (AI) in IR R4
LCD
6 IR Versus Web Search R1 (Ch 1 : 5 – 8)
LCD
7 Brief history of search engines T4 (Ch 1 : 1 – 6)
T3 (Ch 1 : 13 –28) LCD
8 Components of a Search engine
T2 (Ch 13 : 373 – 383)
Characterizing the web, Comparing web search to T2 (Ch 13 : 367 – 371)
9 LCD
traditional information retrieval T4 (Ch 2 : 29 – 32)
• Content beyond syllabus covered (if any): Library and Information Science - Concerned with
effective categorization of human knowledge, citation analysis and bibliometrics (structure of
information).

* Session duration: 50 minutes


FT/GN/68/01/23.01.16

SRI VENKATESWARA COLLEGE OF ENGINEERING

COURSE DELIVERY PLAN - THEORY Page 2 of 6

Sub. Code / Sub. Name: CS6007 – INFORMATION RETRIEVAL

Unit : II - INFORMATION RETRIEVAL

Unit Syllabus:
Boolean and vector-space retrieval models - Term weighting - TF-IDF weighting - cosine
similarity – Preprocessing - Inverted indices - efficient processing with sparse vectors – Language
Model based IR - Probabilistic IR – Latent Semantic Indexing - Relevance feedback and query
expansion.

Objective:
 To learn and apply information retrieval models

Session Teaching
Topics to be covered Ref
No * Aids
Basic IR models & Retrieval strategies – Vector- T1 (Ch 1 : 1 – 15)
10 space model, Probabilistic IR, Language models, R2 (Ch 2 : 9 – 57) LCD
T3 (Ch 7 : 233 – 250)
Inference
Retrieval networks
strategies : Extended Boolean retrieval, LCD
R2 (Ch 2 : 57– 84)
11
Latent Semantic Indexing, Neural network, Genetic
Term weighting, TF-IDF weighting and cosine
algorithms R2 (Ch 2 : 11– 21) LCD
12
similarity in Vector-space model
Term weighting, TF-IDF weighting and cosine R2 (Ch 2 : 21– 45) LCD
13
similarity in Probabilistic retrieval strategies
T3 (Ch 5: 129 – 140) LCD
Inverted indices – Documents, Counts, Positions,
14 R1 (Ch 4 : 104 – 131)
Fields and Extents, Scores and Ordering
R2 (Ch 5 : 181– 182)
T1 (Ch 12: 218 – 231) LCD
15 Language Model based IR
R1 (Ch 9 : 286 – 304)
T1 (Ch 11: 201 – 216) LCD
16 Probabilistic IR
R1 (Ch 8 : 259 – 281)
17 Latent Semantic Indexing T1 (Ch 18: 412 – 417) LCD

LCD
18 Relevance feedback and query expansion T1 (Ch 9 : 162 – 177)
Content beyond syllabus covered (if any): Comparison of Google/Yahoo ranking

* Session duration: 50 mins


FT/GN/68/01/23.01.16

SRI VENKATESWARA COLLEGE OF ENGINEERING

COURSE DELIVERY PLAN - THEORY Page 3 of 6

Sub. Code / Sub. Name: CS6007 – INFORMATION RETRIEVAL

Unit : III WEB SEARCH ENGINE – INTRODUCTION AND CRAWLING

Unit Syllabus:
Web search overview, web structure, the user, paid placement, search engine optimization/ spam.
Web size measurement - search engine optimization/spam – Web Search Architectures - crawling -
meta-crawlers- Focused Crawling - web indexes –- Near-duplicate detection - Index Compression –
XML retrieval

Objective:
 To design Web Search Engine

Session Teaching
Topics to be covered Ref
No * Aids
Web search basics – Background and history, Web
19 characteristics, Search user experience, Index size and T1 (Ch 19 : 385 – 400) LCD
estimation
Web search – The structure of the web, Queries and
R1 (Ch 15 : 507 – 540)
20 users, Static ranking, Dynamic ranking, Evaluating web LCD
T4 (Ch 2: 20 – 23)
search
Web structure - The user, paid placement, Search
21 T4 (Ch 7 : 228 – 230) LCD
engine optimization / spam
Web size measurement - search engine T4 (Ch 5: 91 – 98)
22 LCD
optimization/spam T4 (Ch 7 : 225 – 230)
Web Search Architectures – Crawling, Meta-crawlers T4 (Ch 4 : 78 – 85)
23 LCD
and Focused Crawling T3 (Ch 2 : 13 – 28)
Web Crawlers : Crawling the web, Document feeds, T3 (Ch 3 : 31 – 63)
24 Storing documents and detecting duplicates R1 (Ch 15 : 541 – 549) LCD

Index Compression - Statistical properties of terms in T1 (Ch 5: 78 – 96)


25
IR, Dictionary compression, Postings File compression R3 (Ch 8 : 313 – 319) LCD

XML retrieval – Basic XML concepts, Challenges in


26 T1 (Ch 10 : 178 – 192)
XML retrieval, A vector space model for XML retrieval LCD

T1 (Ch 10 : 194 – 198)


27 XML retrieval
R1 (Ch 16 : 564 – 584) LCD
Content beyond syllabus covered (if any): IR techniques for the web, including crawling, link-based
algorithms, and metadata usage

* Session duration: 50 mins


FT/GN/68/01/23.01.16

SRI VENKATESWARA COLLEGE OF ENGINEERING

COURSE DELIVERY PLAN - THEORY Page 4 of 6

Sub. Code / Sub. Name: CS6007 – INFORMATION RETRIEVAL

Unit : IV WEB SEARCH – LINK ANALYSIS AND SPECIALIZED SEARCH

Unit Syllabus :
Link Analysis – hubs and authorities – Page Rank and HITS algorithms - Searching and Ranking –
Relevance Scoring and ranking for Web – Similarity - Hadoop & Map Reduce - Evaluation -
Personalized search - Collaborative filtering and content-based recommendation of documents and
products – handling “invisible” Web - Snippet generation, Summarization, Question Answering,
Cross-Lingual Retrieval

Objective:
 To be exposed to Link Analysis
 Understand Hadoop and MapReduce

Session Teachin
Topics to be covered Ref
No * g Aids
Link Analysis – hubs and authorities, Page Rank and T1 (Ch 21 : 421 – 439)
28 LCD
HITS algorithms T3 (Ch 4 : 104 – 113)

29 Searching and Ranking R4 LCD

30 Relevance Scoring and ranking for Web R4 LCD

31 Hadoop & Map Reduce Evaluation R4 LCD

32 Personalized search R4 LCD

Collaborative filtering and content-based T4 (Ch 9 : 333 - 346)


33 LCD
recommendation of documents and products T3 (Ch 10 : 432 – 437)

34 Handling “invisible” Web, Snippet generation R4 LCD

35 Summarization, Question Answering R4 LCD

36 Cross-Lingual Retrieval R4 LCD

Content beyond syllabus covered (if any): Application : Social Network Analysis

* Session duration: 50 mins


FT/GN/68/01/23.01.16

SRI VENKATESWARA COLLEGE OF ENGINEERING

COURSE DELIVERY PLAN - THEORY Page 5 of 6

Sub. Code / Sub. Name: CS6007 – INFORMATION RETRIEVAL

Unit : V DOCUMENT TEXT MINING

Unit Syllabus:
Information filtering; organization and relevance feedback – Text Mining - Text classification
and clustering - Categorization algorithms: naive Bayes; decision trees; and nearest neighbor -
Clustering algorithms: agglomerative clustering; k-means; expectation maximization (EM).

Objective:

 Learn document text mining techniques

Session Teachin
Topics to be covered Ref
No * g Aids

37 Information filtering R4 LCD

38 Organization and relevance feedback R4 LCD

T4 (Ch 7 : 230 – 237)


39 Text Mining LCD

Text classification and clustering : Categorization


40 T3 (Ch 9 : 339 – 364) LCD
algorithms and Clustering
Text classification – The text classification problem,
41 Naive Bayes text classification, The Bernoulli model, T1 (Ch 13 : 234 – 251) LCD
Properties of Naive Bayes

42 Text classification – Feature selection and Evaluation T1 (Ch 13 : 251 – 264) LCD

Categorization algorithms: Naive Bayes, Decision


43 R3 (Ch 7 : 281 – 294) LCD
trees and K-Nearest Neighbor

44 Agglomerative clustering and K-Means algorithm T3 (Ch 9 : 373 – 389) LCD

45 Expectation Maximization (EM) algorithm T1(Ch 16: 368 – 372) LCD


Content beyond syllabus covered (if any): Porter Stemming algorithm

* Session duration: 50 mins


FT/GN/68/01/23.01.16

SRI VENKATESWARA COLLEGE OF ENGINEERING

COURSE DELIVERY PLAN - THEORY Page 6 of 6

Sub Code / Sub Name: CS6007 – INFORMATION RETRIEVAL

TEXT BOOKS:
1. C. Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval, Cambridge
University Press, 2008.
2. Ricardo Baeza -Yates and Berthier Ribeiro - Neto, Modern Information Retrieval: The Concepts and
nd
Technology behind Search 2 Edition, ACM Press Books 2011
3. Bruce Croft, Donald Metzler and Trevor Strohman, Search Engines: Information Retrieval in Practice,
1st Edition Addison Wesley, 2009.
nd
4. Mark Levene, An Introduction to Search Engines and Web Navigation, 2 Edition Wiley, 2010.

REFERENCES:
1. Stefan Buettcher, Charles L. A. Clarke, Gordon V. Cormack, Information Retrieval: Implementing and
Evaluating Search Engines, The MIT Press, 2010.
2. Ophir Frieder “Information Retrieval: Algorithms and Heuristics: The Information Retrieval Series”,
nd
2 Edition, Springer, 2004.
3. Manu Konchady, “Building Search Applications: Lucene, Ling Pipe”, and First Edition, Gate Mustru
Publishing, 2008.
4. www.nptel.ac.in

Prepared by Approved by

Signature

Name Dr. R. Jayabhaduri Dr. R. Anitha


Designation Associate Professor/CS Professor & HOD/CS
Date 29/06/2017 29/06/2017
Remarks *:

Remarks *:

* If the same lesson plan is followed in the subsequent semester/year it should be mentioned and
signed by the Faculty and the HOD

S-ar putea să vă placă și