Sunteți pe pagina 1din 2

Course: Advanced Analytics

Faculty: Prof. V. Nagadevara, nagadev@iimb.ernet.in


Term: Term 5 (2017-19); Pre-requisites: Core courses.
No. of sessions: 17 sessions of 90 minutes each.
Textbook: Bing Liu, “Web Data Mining: Exploring Hyperlinks, Contents, and Usage
Data” 2nd ed. 2011 Edition, Springer

In addition, a number of journal articles will be supplied for additional reading.

Course Objective
This course is planned to provide students with:

 understanding of basic concepts and methods in text mining, such as document


representation, information extraction, text classification and clustering
 the ability to use benchmark corpora, commercial and open-source text analysis and
visualization tools to explore interesting patterns in textual data
 understanding of various techniques and algorithms (such as support vector machines, naïve
bayes) for advanced text mining, text classification and clustering, opinion mining, and their
applications in real-world problems;
 knowledge of various components of web mining such as web structure mining, web
content mining and web usage mining
 familiarity with the basic concepts involved in image mining including image processing and
classification

Given the large amounts of unstructured data flooding the Internet, mining high-quality information
from text and web becomes increasingly critical. The actionable knowledge extracted from text data
facilitates effective decision making in a broad spectrum of areas, including business intelligence,
information acquisition, social behaviour analysis and strategization. This course will cover important
topics in text mining, web mining and image mining leading to text and web analytics. Students will
also be exposed to use of software for text and web mining. The course places emphasis on the use
of techniques for different aspects of text and web mining and strategization based on the mining
results. In general, exposure to each topic will be supported by a real life case study. In order to
make the course more relevant, and practice oriented, participants will be using the very popular
analytics package, WEKA and applying the techniques on different corpora (text databases).

Each topic will include a real life application.

Course Outline

S No. Topic Reading material Application


Introduction to mining
1 Chapter 6
unstructured data
Natural language processing
2 and 3 and document Article on NLP, Chapter 6
representation
Classification Techniques for
4 Chapters 3.2 and 3.3 Automated patent classification
Textual Documents

Support Vector Machines for Sentiment analysis based on sports


5 Chapter 3.8
Classification forum

Naïve Bayes method for


6 Chapters 3.6 and 3.7 Extraction of Product Attributes
classification
Clustering techniques for Automatic labelling of hierarchical
7 Chapters 4.1 to 4.4
mining text data clusters
Documentation-to-Source-Code
8 Latent Semantic Analysis Chapter 6.7
Traceability
Sentiment analysis of Movie
9 Sentiment analysis Chapter 11.1
reviews

10 Use of WEKA for Text Mining

Analyzing e-Commerce website -


11 Introduction to web mining Hand-out
Flipcart
Effectiveness of web search
12 Web structure mining Chapter 8
engines
Case – Analyzing Customer
13 Web content mining Chapter 9 and 10 Reviews (audio)

Making automated
14 Web usage mining Chapter 12 recommendations and
personalization

15 Recommendation Systems Hand-out

16 Attribution Models Hand-out

17 Project Presentations

Learning Outcomes
At the end of this course, a participant should be able to
 Mine and analyse the data from text data and web data
 Identify the appropriate techniques for analysing the data drawn from text and web sources
 Extract sentiment from social media and strategize based on sentiment analysis
 Use open source mining software package
 Strategize based on the analysis of unstructured data

Evaluation:

1. Term Paper (Individual) 30%


2. End term (Take home individual) 30%
3. Project (Group) 40%

S-ar putea să vă placă și