Documente Academic
Documente Profesional
Documente Cultură
CLASSIFICATION
Abstract - Text classification is one of the most widely used natural language processing
technologies. Common text classification applications include spam identification, news text
classification, information retrieval, emotion analysis, and intention judgment, etc.
Traditional text classifiers based on machine learning methods have defects such as data
sparsity, dimension explosion and poor generalization ability, while classifiers based on deep
learning network greatly improve these defects, avoid cumbersome feature extraction
process, and have strong learning ability and higher prediction accuracy. For example,
convolutional neural network (CNN) .This paper introduces the process of text classification
and focuses on the deep learning model used in text classification.
Introduction - In recent years, artificial intelligence develops rapidly and gradually changes
our life. Natural Language Processing is an artificial intelligence technology full of charm
and challenge. It includes syntactic semantic analysis, information extraction, text
classification, machine translation, information retrieval, dialogue system and so on. Text
classification is one of the most widespread applications, such as spam identification, news
text categorization, information retrieval, emotional analysis, and intention judgment. The
process of text classification includes text preprocessing, text feature extraction and
classification model construction. First, due to the particularity of text structure, the text
prediction first needs to be preprocessed, which generally requires the removal of stop
words, and special text also needs to undergo special processing, such as word segmentation
in Chinese processing. Second, feature engineering is established for pre-processed texts to
extract key features reflecting text features from texts so as to establish the mapping between
features and classification. Finally, the classification model is established. This paper mainly
thinks about it based on the deep learning method.
Text classification
The purpose of text classification is to determine the category of a given document, and
the result of classification may be binary classification or multiple classification.
The basic steps of text classification include text preprocessing, text feature extraction and
classification model construction.
2.3.1. TextCNN
CNN (convolutional neural network) is used to extract key information similar to n-gram
in sentences. It was proposed by Yoon Kim in 2014. The model contains following part:
Part1: Input Layer. The preprocessed text data is input into the model.
Part2: Embedding Layer. Text feature extraction.
Part3: Convolution Layer. Each convolution layer is established by filters of different
sizes in the result of obtaining multiple feature maps.
Part4: Max-Pooling Layer. The dimension of convolution layer is reduced.
Part5: SoftMax Layer. Outputting the probability of each category in a multicategory task.
2.3.2. TextRNN
One of the biggest problems of CNN is the size of filter is fix. On the one hand, it is
impossible to model longer sequence information; on the other hand, overparameter
adjustment of filter size is tedious.
However, TextRNN, or bi-directional RNN (bidirectional LSTM) can capture bi-
directional "n-gram" information with variable length.
REFERENCES
[1] Yoon Kim, “Convolutional Neural Networks for Sentence Classification”, EMNLP
2014, Part number.
1of1, pp. 1746-1751, Aug. 2014
[2] Li. Hui 1, Chen. Ping Hua, “Improved backtrackingforward algorithm for maximum
matching Chinese