Documente Academic
Documente Profesional
Documente Cultură
Business Intelligence
Systems
(9th Ed., Prentice Hall)
Chapter 7:
Text and Web Mining
Learning Objectives
7-2
Learning Objectives
7-3
Opening Vignette:
Mining Text for Security and
Counterterrorism
What is MITRE?
Problem description
Proposed solution
Results
Answer and discuss the case
questions
7-4
Opening Vignette:
Mining Text For Security
7-5
7-6
7-7
7-8
Spam filtering
Email prioritization and categorization
Automatic response generation
7-9
Information extraction
Topic tracking
Summarization
Categorization
Clustering
Concept linking
Question answering
7-10
Term dictionary
Word frequency
Part-of-speech tagging
Morphology
Term-by-document matrix
7-11
Occurrence matrix
Latent semantic indexing
What is a patent?
7-12
NLP is
7-13
What is Understanding ?
7-14
Challenges in NLP
Dream of AI community
7-15
Part-of-speech tagging
Text segmentation
Word sense disambiguation
Syntax ambiguity
Imperfect or irregular input
Speech acts
WordNet
Sentiment Analysis
7-16
7-17
Information retrieval
Information extraction
Named-entity recognition
Question answering
Automatic summarization
Natural language generation and understanding
Machine translation
Foreign language reading and writing
Speech recognition
Text proofing
Optical character recognition
Marketing applications
Security applications
Academic applications
7-18
ECHELON, OASIS
Deception detection ()
The study
7-19
A difficult problem
If detection is limited to only text, then
the problem is even more difficult
analyzed text based testimonies of
person of interests at military bases
used only text-based features (cues)
7-20
7-21
7-22
Logistic regression
Decision trees 71.60
Neural networks73.46
67.28
7-23
Software/hardware limitations
Privacy issues
Linguistic limitations
Extract
knowledge
from available
data sources
A0
Context-specific knowledge
Domain expertise
Tools and techniques
7-24
7-26
7-27
7-28
7-29
7-30
Association
Trend Analysis ()
7-31
7-32
Journal Year
Author(s)
MISQ
2005
A. Malhotra,
S. Gosain and
O. A. El Sawy
ISR
1999
JMIS
2001
R. Aron and
E. K. Clemons
Title
Vol/No Pages
Absorptive capacity
configurations in
supply chains:
Gearing for partnerenabled market
knowledge creation
D. Robey and
Accounting for the
M. C. Boudreau contradictory
organizational
consequences of
information
technology:
Theoretical directions
and methodological
implications
Keywords
Abstract
29/1
No of Articles
7-33
3
3
2
2
1
1
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
C LU S TER : 4
C LU STER : 5
C LU STER : 6
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
5
0
5
0
5
0
5
0
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
5
0
5
0
5
0
5
0
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
3
3
2
2
1
1
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
5
0
5
0
5
0
5
0
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
3
3
2
2
1
1
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
C LU S TER : 1
C LU STER : 2
C LU STER : 3
C LU S TER : 7
C LU STER : 8
C LU STER : 9
Y EAR
IS R
J M IS
M IS Q
IS R
No of Articles
C LU S T ER : 1
J M IS
M IS Q
IS R
C LU S T ER : 2
J M IS
M IS Q
C LU S T E R : 3
100
90
80
70
60
50
40
30
20
10
0
IS R
J M IS
M IS Q
IS R
C LU S T ER : 4
J M IS
M IS Q
IS R
C LU S T ER : 5
J M IS
M IS Q
C LU S T E R : 6
100
90
80
70
60
50
40
30
20
10
0
IS R
J M IS
M IS Q
C LU S T ER : 7
IS R
J M IS
M IS Q
C LU S T ER : 8
JO U R N AL
7-34
IS R
J M IS
M IS Q
C LU S T E R : 9
7-35
7-36
The
The
The
The
The
Web
Web
Web
Web
Web
Web Mining
7-37
7-38
Authoritative pages
Hubs
hyperlink-induced topic search (HITS)
alg
7-39
Clickstream data
Clickstream analysis
7-40
(clickstream analysis)
7-41
7-42
7-43
Product Name
URL
angoss.com
ClickTracks
clicktracks.com
deepmetrix.com
Megaputer WebAnalyst
megaputer.com
microstrategy.com
sas.com
spss.com
WebTrends
webtrends.com
XML Miner
scientio.com
7-44
Questions / comments
7-45