Documente Academic
Documente Profesional
Documente Cultură
ir
svm
svm
Classifying Farsi texts using SVM algorithms and examining feature reduction methods
In text classification, words are usually considered as the features of the text. Therefore text
classifier are about to deal with a large number of features. Different approaches have been
proposed to reduce those features. In this paper we compare different methods used in text
classification and introduce the best one. Naïve Bayesian, Rocchio, KNN, Regression,
Decision tree, Neural networks, SVM, Rule based and Evolutionary methods are among
those methods. SVM, which is a supervised learning method, is one of the best methods used
in text classification. This method maps the information from the existing space to another
vector space with different (usually more) dimensions in which the linear learning algorithms
are possible to be applied. This method is computationally complex and its advantage is that
it is not dependent to the number of samples in the experimental set and yet it can work well
with a few samples and a high number of features.
Keywords: Feature selection, Text classification, SVM method, Feature extraction, Vector
space.
charkari@modares.ac.ir
zaman_ma@modares.ac.ir
CHI MI IG DF
KNN Rocchio
SVM SVM
XML HTMl
A
Dn
Amn fn
fm
dn
A
A=(ajk)
m k i ajk
m
ajk= fjk
tf*idf
fjk k i tf*idf
Ni
ajk= fjk * log
tfc
tfc tf*idf
tf*idf
ltc
lemma lemma post-tag
(d)
d*c c
Rocchio
SVM KNN
Rocchio
cj
d d
C CJ
KNN
K
K
dj
DNF
C d
SVM
SVM
SVM
SVM
SVM
Decision Boundry
SVM
x w.x=b N
b w
dmin
QP
phi
phi phi
[3]
vector